-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast-DDS fails to transmit messages between containers on same Kubernetes pod [10123] #1633
Comments
ah, i see the reason now. ros2/rmw_fastrtps#349 is related to container but kuberntes actually. so i was confirming that only with containers, and that works okay since MD5 host's external network interfaces is different on each container. but once it comes to k8s with |
we've confirmed this problem, @stevewolter thanks for the heads-up! |
@stevewolter @fujitatomoya Please check #1637 for a possible solution |
@MiguelCompany appreciate for the quick response 👍 we will try that out, and get back to you. |
we confirmed this PR works ros:foxy with Kubernetes (talker and listener in the same pod can communicate). Note: this PR is based on master branch, there is merge conflict against |
Thanks for the super-quick work, looks great! |
@fujitatomoya Great! Thanks for checking!
Yeah, we would have to backport the changes to 2.0.x, but I cannot give you an ETA now. |
I will keep this open till we have the backport in place. |
thanks for the effort @MiguelCompany 👍 |
Setup: We run two processes in two different containers in the same pod in Kubernetes. A Kubernetes pod is a set of Docker containers with the same external IP. Each Kubernetes container has its own PID namespace. Process A in container A is a DDS publisher, process B in container B is a subscriber.
Expected Behavior
Subscriber and publisher should be able to exchange messages.
Current Behavior
Participant matching fails because both participants end up with the same GUID. This can be tracked down to both processes ending up with the same PID, because each container in Kubernetes has its own PID namespace. The GUID is created in FastRTPS from:
The problem is even more insidious when participant IDs don't happen to match: In this case, both FastDDS instances switch to intra-process delivery (because host MD5 and process ID match), and no messages are transmitted even though PDP succeeds.
The problem goes away when switching to host pids (shareProcessNamespace: true in the pod YAML).
Why this is a problem
Kubernetes and other Docker-based orchestrators are getting more and more common. The PID has ceased to be a globally unique for kernels.
We fixed the problem internally by replacing GetPID() in RTPSDomain.cpp by a once-per-process call to rand(). I quickly wanted to give you heads-up and make this issue visible to others. ROS also ran into the same issue (ros2/rmw_fastrtps#349) without understanding it fully.
The text was updated successfully, but these errors were encountered: