Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dds lost packets when it transferred large data locally, and sometimes the subscriber would not receive the data anymore #1993

Open
Gummum opened this issue May 6, 2024 · 3 comments

Comments

@Gummum
Copy link

Gummum commented May 6, 2024

Embedded devices, DDS_RELIABILITY_BEST_EFFORT This strategy, transmission is similar to the original data of the image, when the subscriber is more, there will be packet loss, and sometimes will never receive the data, I check through gdb, the receiving thread is stuck on the select call

@Gummum
Copy link
Author

Gummum commented May 6, 2024

I may not have the ability to find out the problem, so I came to ask for some advice.

@eboasson
Copy link
Contributor

Hi @Gummum, yes packet loss is the big issue with "best-effort"1.

I am not sure what you mean by "when the subscriber is more", it can happen always and is more likely to happen for bigger samples. Perhaps you meant when you create more subscribers? That could be: that usually causes Cyclone to switch to multicast, and there are many network switches that are more likely to drop multicast than unicast.

So what can you do? Not so much ...

What you can do is to try to find out where the packet loss occurs. Is it in the switch, the physical network (multicast on WiFi is notorious!) or in the socket receive buffer? Wireshark can help, netstat -s (look for UDP errors) on Linux is great for the socket receive buffer overruns. If it is the use of multicast, you can disable that in the Cyclone configuration. (Even for specific topics.)

If you see socket buffer overruns, then increasing the size of them will probably help. It is a known problem when the data size is similar to or larger than the socket receive buffer. The many packets that make up the data of a large sample are sent in a quick burst, and so if the receiving thread can't keep up (or is a bit late in starting) you can overrun a small socket buffer. (Linux has a default maximum of about 400kB, Cyclone by default asks for 1MB, but by default accepts whatever it gets.)

Hopefully this helps a bit. If the loss can't be fixed, then there are still interesting options but they involve distributing the image data over many small samples, and building the processing in such a way that it treats the missing pixels as something like noise. That works nicely if you make sure it is always some other bunch of pixels that are missing ... But that's a very different subject.

Footnotes

  1. I don't understand why it is called "best-effort", unless a marketing department got involved at some point. It is "unreliable" or "send-and-forget", definitely not what I consider "best-effort" 🙂

@Gummum
Copy link
Author

Gummum commented May 17, 2024

Thank you very much for your answer. "when the subscriber is more" means that I open multiple subscription processes to receive the same topic. I use dds for local inter-process, so I don't think it should lose packets, or never receive data. I set a 2M buffer for cyclonedds. I think this setting should be large enough.I guess it's partly due to cpu scheduling, because after I killed the main process on the device, I started many processes to subscribe to the same topic without any problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants