Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many sub and one pub #25

Closed
jwcesign opened this issue Oct 17, 2018 · 5 comments
Closed

Many sub and one pub #25

jwcesign opened this issue Oct 17, 2018 · 5 comments

Comments

@jwcesign
Copy link

Hi,
I test the example HelloworldPublisher, if over 70 sub, but just one pub, many sub will not receive the message, what's the reason for this? and is there place to configure this

@eboasson
Copy link
Contributor

Hi,

I had a look and it is to be expected: you can get lucky and have all 70 subscribers get it (I did, so I know it can happen), but there is no guarantee whatsoever. The reason is firstly the way the two programs work, and secondly the chosen quality-of-service settings:

The first part, the way they work is the following: the publisher waits until the discovery data indicates there is a subscriber (really just until there is a change in the number of discovered subscribers), then it writes one sample and terminates. Because all the processes independently discover each other, it may be that it publishes that sample when it has discovered just one subscriber, some of them, or all of them.

While that is entirely timing dependent, in practice it seems likely that starting the publisher after the subscribers were started gives you a good chance that it will discover many of them in time, simply because starting the publisher triggers the discovery. Another factor could be that the first call to dds_get_status_changes is likely to happen before anything has been discovered (discovery requires multiple roundtrips) and in consequence it'll probably sleep for 20ms between the call to dds_create_writer and the call to dds_write. Those 20ms might well be enough to complete discovery in many cases.

The second part has to do with the QoS: this is a volatile (durability QoS kind) topic/writer/reader, and so the one sample published only goes to the readers discovered at the time of writing [1]. Addressing this type of problem is what the "durability" QoS setting is for. If, instead of DDS_DURABILITY_VOLATILE (the default), you use DDS_DURABILITY_TRANSIENT_LOCAL (on both sides!), then writer will keep the sample for any reader discovered later and the readers will request it when they discover the writer.

There is one problem with this approach though: the writer must remain in existence, so you can't terminate the process immediately. If you change the QoS this way and add a sleep — e.g., of 1s — after the call to dds_write, I would expect it to reliably deliver the data to all 70 readers.

This is nice, of course, but you should not have to keep the process and the writer in existence (it would mean you could never stop it as long as another reader might show up ...), and that is why DDS has a DDS_DURABILITY_TRANSIENT setting as well [2]. The idea behind "transient" is that the writer need not be kept around, that the DDS middleware stores that data independently of the application processes. For the subscribers not much changes [3], they still get the historical data. [4]

The real strength of DDS lies in this particular mode, "transient" data is the concept that really helps for building fault-tolerant, extensible systems where processes can come and go. "Transient-local" is nice, but it can't really help when components can fail/crash. However, it is also vastly simpler to implement than "transient" data. While full support for transient data is very much in sight for Cyclone, at the moment it is not yet supported. And so, though transient-local is but a poor alternative, it is the only option at the moment [5].

Does this clear up things or did I only make it more confusing? 🤔

[1] It may even be dropped by the reader if it hasn't yet discovered the writer, that's a grey area.
[2] It is specified and available in several implementations, but it is not required by the "minimal" profile in the specification, which is where Cyclone currently is.
[3] Nothing, really, for normal uses, but you can design experiments in which you can tell the difference in correct implementations.
[4] There is an obscure QoS "writer data lifecycle" that contains a setting called "autodispose_unregistered_instances". It defaults to true, but that means that the data written by a transient writer would be deleted from the system when the writer disappears, kinda defeating the purpose of setting "transient" durability ... It is much better to set it to false!
[5] It is partially supported, in that it can work with the transient data support in OpenSplice.

@jwcesign
Copy link
Author

Thank you so much for explanation!
is there any way to make all the sub receive the message but not set QoS "durability" as DDS_DURABILITY_TRANSIENT_LOCAL?

@jwcesign
Copy link
Author

And one more question, if I launch 70 sub, then one pub, sometimes all receive, but if I launch 100 sub, then one pub, there is few sub receive the message, what's the reason?

@eboasson
Copy link
Contributor

eboasson commented Oct 17, 2018

I guess the more subscribers, the more time it takes to the discovery. It all runs in parallel, so it is conceivable that with 70 it can still make it, but that with 100 it has almost all of them halfway through the discovery. 'Tis but a guess ... there is a hard-to-read tracing format that would tell you more, but I'm quite certain the overhead of tracing to a text file will significantly affect this timing.

As to making sure all subs receive the message while using "volatile" data, the only option is to wait longer between dds_create_writer and dds_write. If you know there will be n subs, you could wait until the dds_get_publication_matched_status returns a current_count equal to n. Or, easier, if you know that you start the subscribers first, you could just wait for a little while.

P.S. All that is perfectly fine, but it does make for a much tighter coupling between the subscribers and the publisher than I personally would be happy with.

@jwcesign
Copy link
Author

I think u right, DDS should be not so tight coupling. Thank you for your careful explanation!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants