Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to exchange Messages between two docker containers when run on host network option.(--network=host) [14317] #2624

Closed
1 task done
sreemtech opened this issue Apr 4, 2022 · 16 comments

Comments

@sreemtech
Copy link

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

Two docker containers should be able to exchange messages that runs on Host network

Current behavior

Two processes in two different containers which run on the host network

Container A Published messages and Container B not receiving the messages

Container A and Container B are created on --network=host

Steps to reproduce

  1. Create two containers as below

a. docker run -it --privileged --network=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix
ubuntu-fastdds-suite:

b. docker run -it --privileged --network=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix
ubuntu-fastdds-suite:

publish message from a to b . b is not receiving message

Fast DDS version/commit

v.2.3.0-1.-249

Platform/Architecture

Ubuntu Focal 20.04 arm64

Transport layer

Default configuration, UDPv4 & SHM

Additional context

No response

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

@sreemtech sreemtech added the triage Issue pending classification label Apr 4, 2022
@EduPonz EduPonz changed the title Failed to exchange Messages between two docker containers when run on host network option.(--network=host) Failed to exchange Messages between two docker containers when run on host network option.(--network=host) [14317] Apr 5, 2022
@EduPonz
Copy link

EduPonz commented Apr 5, 2022

Hi @sreemtech ,

This has in fact been reported before in #1698, #1750, and #1755, and has a partial fix (not really ready to be merge in master) in #1801. Although I'm closing this ticket as a duplicate, I'll briefly explain the problem and possible workarounds here for traceability.

The source of the problem

  1. Fast DDS attaches, unless specified otherwise, 2 transports to each DomainParticipant; UDPv4 and SHM (shared memory).
  2. When one DomainParticipant discovers another, it checks whether the SHM transport can be used for communication by checking whether the remote participant runs in the same host as itself. This is done by comparing the third and forth bytes on the GUID of each DomainParticipant, as Fast DDS sets them as a Host ID of sorts.
  3. These two bytes are calculated by hashing the available networks interfaces.
  4. When running two different containers without network isolation, that is using --network=host, the two of them get the same network interfaces, which results in all the DomainParticipants created in each of the two containers having the same Host ID component of their GUID, which in turn makes Fast DDS think they can communicate over SHM. As you've seen, this might not always be the case.
  5. The last bit is that Fast DDS avoids using any other transport between two participants if SHM can be used, as it is the most performative and avoids network traffic.

Possible workarounds

Until Fast DDS establishes a more reliable way of discerning whether two DomainParticipants can use SHM between them, there are 3 possible workarounds:

  1. The most obvious one, albeit not the most ideal, is to disable the default SHM transport. This can be done in different ways:
    1. When compiling Fast DDS, by setting the CMake option -DSHM_TRANSPORT_DEFAULT=OFF (see CMake options).
    2. Using XML:
      <?xml version="1.0" encoding="UTF-8" ?>
      <dds>
          <profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
              <transport_descriptors>
                  <transport_descriptor>
                      <transport_id>udp_transport</transport_id>
                      <type>UDPv4</type>
                  </transport_descriptor>
              </transport_descriptors>
              <participant profile_name="participant_profile" is_default_profile="true">
                  <rtps>
                      <userTransports>
                          <transport_id>udp_transport</transport_id>
                      </userTransports>
                      <useBuiltinTransports>false</useBuiltinTransports>
                  </rtps>
              </participant>
          </profiles>
      </dds>
    3. Via the C++ API when creating the DomainParticipant:
      DomainParticipantQos participant_qos;
      // Create a descriptor for the new transport.
      auto udp_transport = std::make_shared<UDPv4TransportDescriptor>();
      // Link the Transport Layer to the Participant.
      participant_qos.transport().user_transports.push_back(udp_transport);
      // Avoid using the default transport
      participant_qos.transport().use_builtin_transports = false;
      // Create the DomainParticipant (in this case in Domain 0)
      DomainParticipant* participant = 
          DomainParticipantFactory::get_instance()->create_participant(0, participant_qos);
  2. It is also possible to simply share the host shared memory directory (/dev/shm in Ubuntu) as a volume with each container.
  3. Lastly, what I'd recommend is it to use Docker's builtin IPC flag, running the containers with --ipc="host".

Hope that helps!

@EduPonz EduPonz closed this as completed Apr 5, 2022
@JLBuenoLopez JLBuenoLopez added duplicate and removed triage Issue pending classification labels Apr 5, 2022
@sreemtech
Copy link
Author

@EduPonz . Thanks. I will try the work around. Thanks for the very speedy support.

@sreemtech
Copy link
Author

@EduPonz I have the issue on TCP with no message exchange. I am working on the same subnet (server and client are running in two containers using the host network)

I have two containers using the host Ip address and wanted to exchange message via TCP.
ex: HelloWorldExampleTCP subscriber -a 172.19.168.186 -p 5050
HelloWorldExampleTCP publisher -a 172.19.168.186 -p 5050

any advise here please?

@EduPonz
Copy link

EduPonz commented Apr 6, 2022

Hi @sreemtech ,

I have not been able to reproduce your issue using the Fast DDS Suite Docker image for v2.6.0.

Terminal 1

docker run -it --rm --network=host ubuntu-fastdds-suite:v2.6.0

Within the container

goToExamples
cd DDS/HelloWorldExampleTCP/bin/
./DDSHelloWorldExampleTCP publisher -a 192.168.1.42 -p 5050

Terminal 2

docker run -it --rm --network=host ubuntu-fastdds-suite:v2.6.0

Within the container

goToExamples
cd DDS/HelloWorldExampleTCP/bin/
./DDSHelloWorldExampleTCP subscriber -a 192.168.1.42 -p 5050

Discovery may take a bit (it's slower when using the TCP transport due to handshakes and such), but after a while I can receive samples with no problems. Can you please verify? If that works (it does for me), then we should take a look at your Dockerfile to try to spot differences.

@EduPonz EduPonz reopened this Apr 6, 2022
@sreemtech
Copy link
Author

hi

@EduPonz
Copy link

EduPonz commented Apr 6, 2022

hi

I hit enter too soon, I've updated my comment.

@sreemtech
Copy link
Author

Hi @EduPonz

Hi
Spot on the problem... Handshaking is taking time. Its taking more than 10 seconds.

But interestingly I found something else if I have the setup as below

Container A: Ip: 172.19.168.186
Publisher - ./HelloWorldExampleTCP publisher -a 172.19.168.186 -p 5050
Subscriber - ./HelloWorldExampleTCP subscriber -a 172.19.168.186 -p 5050

Container B : Ip: 172.19.168.157
Subscriber - ./HelloWorldExampleTCP subscriber -a 172.19.168.186 -p 5050

Problem : 1. Both subscribers on Container A and Container B receives it - Good and expected behavior
2. sometimes only Subscriber on container B receive it , "A" never gets the message
3. sometimes only Subscriber on container A receive it , "B" never gets the message

@sreemtech
Copy link
Author

Hi @EduPonz Another issue I have found - After handshake With Container A and Container B - the messages are not receving from the begining
ex:

Container A :
./HelloWorldExampleTCP publisher -a 172.19.168.186 -p 5050
Starting
172.19.168.186:5050
Publisher running. Please press enter to stop_ the Publisher at any time.
[RTCP] Message: HelloWorld with index: 1 SENT
[RTCP] Message: HelloWorld with index: 2 SENT
[RTCP] Message: HelloWorld with index: 3 SENT
[RTCP] Message: HelloWorld with index: 4 SENT
[RTCP] Message: HelloWorld with index: 5 SENT
[RTCP] Message: HelloWorld with index: 6 SENT
[RTCP] Message: HelloWorld with index: 7 SENT

Container B :
./build/examples/C++/HelloWorldExampleTCP/HelloWorldExampleTCP subscriber -a 172.19.168.186 -p 5050
Starting
172.19.168.186:5050
[RTCP] Subscriber running. Please press enter to stop the Subscriber
[RTCP] Subscriber matched
[RTCP] Message HelloWorld 78 RECEIVED
[RTCP] Message HelloWorld 79 RECEIVED
[RTCP] Message HelloWorld 80 RECEIVED
[RTCP] Message HelloWorld 81 RECEIVED
[RTCP] Message HelloWorld 82 RECEIVED
[RTCP] Message HelloWorld 83 RECEIVED
[RTCP] Message HelloWorld 84 RECEIVED

If you look at the above we lost the messages and Container B receving only from the index "78"

@EduPonz
Copy link

EduPonz commented Apr 8, 2022

Hi @sreemtech

Problem : 1. Both subscribers on Container A and Container B receives it - Good and expected behavior
2. sometimes only Subscriber on container B receive it , "A" never gets the message
3. sometimes only Subscriber on container A receive it , "B" never gets the message

This I have not been able to reproduce en v2.6.0. I have three questions:

  1. How are you installing Fast DDS? v.2.3.0-1.-249 is not a tag on the repository and we do not distribute packages ourselves.
  2. Are you tied to v2.3.0 for any reason? There were some fixes for TCP in 2.5.1 that may be of help (if TCP is indeed what you require).
  3. I do not know whether you are aware, but it seems that you are using the example for the old (and to-be-deprecated) Fast RTPS API, the corresponding example with Fast DDS API is here (is the one I'm using).

the messages are not receving from the begining

This is to be expected according to the example's Qos. The history is configured to keep last 30 both for the publisher and subscriber, so late joiners may not receive all the samples.

@sreemtech
Copy link
Author

Hi @EduPonz

Cloned Latest - git clone https://github.com/eProsima/Fast-DDS.git

7124ff8 refs/tags/v2.6.0

Last commit
git show --name-only
commit 01550cf (HEAD -> master, tag: log, origin/master, origin/feature/content-filter/writer/main, origin/HEAD, origin/2.6.x)
Author: Miguel Company miguelcompany@eprosima.com
Date: Fri Apr 1 07:25:59 2022 +0200

@EduPonz
Copy link

EduPonz commented Apr 8, 2022

Then I guess we'll need to spot differences between your setup and mine. Have you tried reproducing as I did here?

@sreemtech
Copy link
Author

@EduPonz Let me try with another testing with your approach and with history settings . Thanks again for the speedy support.

@sreemtech
Copy link
Author

sreemtech commented Apr 11, 2022

@EduPonz with history settings I see the improvement in packets. But I still see some missing packets when both starts simultenously , could be our network issue but still I am not sure.

Thanks once gain . You close the ticket now please

@tunahanertekin
Copy link

Hey there!

How can I configure a Fast RTPS profile such as below?

  • clients will only consume UDPv4
  • it will be super client
  • it will use discovery server

Here is the profile I use:

<?xml version='1.0' encoding='UTF-8' ?>
<dds>
	<profiles
		xmlns='http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles'>
		<participant profile_name='super_client_profile' is_default_profile='true'>
			<rtps>
				<builtin>
					<discovery_config>
						<discoveryProtocol>SUPER_CLIENT</discoveryProtocol>
						<discoveryServersList>
							<RemoteServer prefix='44.53.00.5f.45.50.52.4f.53.49.4d.41'>
								<metatrafficUnicastLocatorList>
									<locator>
										<udpv4>
											<address>10.244.0.221</address>
											<port>11811</port>
										</udpv4>
									</locator>
								</metatrafficUnicastLocatorList>
							</RemoteServer>
						</discoveryServersList>
					</discovery_config>
				</builtin>
			</rtps>
		</participant>
	</profiles>
</dds>

This profile is already working connecting to a discovery server. I want this profile to support messages only using UDPv4. I would be appreciated if anyone could help me on this.

@MiguelCompany
Copy link
Member

@tunahanertekin For questions like yours, we have the Discussions forum. I posted a copy of your question in #3262 and will answer there.

@maxpolzin
Copy link

maxpolzin commented Aug 16, 2023

I have created this repository with my setup to build (as root user) and run (with host's user id) ROS 2 packages/ nodes inside containers with the --net=host option. This allows to use Shared Memory Transport (SHM) between ROS 2 nodes run inside the container and on the host.

https://github.com/rosblox/ros-template

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants