Many sub and one pub #25

jwcesign · 2018-10-17T03:25:36Z

Hi,
I test the example HelloworldPublisher, if over 70 sub, but just one pub, many sub will not receive the message, what's the reason for this? and is there place to configure this

eboasson · 2018-10-17T05:35:00Z

Hi,

I had a look and it is to be expected: you can get lucky and have all 70 subscribers get it (I did, so I know it can happen), but there is no guarantee whatsoever. The reason is firstly the way the two programs work, and secondly the chosen quality-of-service settings:

The first part, the way they work is the following: the publisher waits until the discovery data indicates there is a subscriber (really just until there is a change in the number of discovered subscribers), then it writes one sample and terminates. Because all the processes independently discover each other, it may be that it publishes that sample when it has discovered just one subscriber, some of them, or all of them.

While that is entirely timing dependent, in practice it seems likely that starting the publisher after the subscribers were started gives you a good chance that it will discover many of them in time, simply because starting the publisher triggers the discovery. Another factor could be that the first call to dds_get_status_changes is likely to happen before anything has been discovered (discovery requires multiple roundtrips) and in consequence it'll probably sleep for 20ms between the call to dds_create_writer and the call to dds_write. Those 20ms might well be enough to complete discovery in many cases.

The second part has to do with the QoS: this is a volatile (durability QoS kind) topic/writer/reader, and so the one sample published only goes to the readers discovered at the time of writing [1]. Addressing this type of problem is what the "durability" QoS setting is for. If, instead of DDS_DURABILITY_VOLATILE (the default), you use DDS_DURABILITY_TRANSIENT_LOCAL (on both sides!), then writer will keep the sample for any reader discovered later and the readers will request it when they discover the writer.

There is one problem with this approach though: the writer must remain in existence, so you can't terminate the process immediately. If you change the QoS this way and add a sleep — e.g., of 1s — after the call to dds_write, I would expect it to reliably deliver the data to all 70 readers.

This is nice, of course, but you should not have to keep the process and the writer in existence (it would mean you could never stop it as long as another reader might show up ...), and that is why DDS has a DDS_DURABILITY_TRANSIENT setting as well [2]. The idea behind "transient" is that the writer need not be kept around, that the DDS middleware stores that data independently of the application processes. For the subscribers not much changes [3], they still get the historical data. [4]

The real strength of DDS lies in this particular mode, "transient" data is the concept that really helps for building fault-tolerant, extensible systems where processes can come and go. "Transient-local" is nice, but it can't really help when components can fail/crash. However, it is also vastly simpler to implement than "transient" data. While full support for transient data is very much in sight for Cyclone, at the moment it is not yet supported. And so, though transient-local is but a poor alternative, it is the only option at the moment [5].

Does this clear up things or did I only make it more confusing? 🤔

[1] It may even be dropped by the reader if it hasn't yet discovered the writer, that's a grey area.
[2] It is specified and available in several implementations, but it is not required by the "minimal" profile in the specification, which is where Cyclone currently is.
[3] Nothing, really, for normal uses, but you can design experiments in which you can tell the difference in correct implementations.
[4] There is an obscure QoS "writer data lifecycle" that contains a setting called "autodispose_unregistered_instances". It defaults to true, but that means that the data written by a transient writer would be deleted from the system when the writer disappears, kinda defeating the purpose of setting "transient" durability ... It is much better to set it to false!
[5] It is partially supported, in that it can work with the transient data support in OpenSplice.

jwcesign · 2018-10-17T08:30:15Z

Thank you so much for explanation!
is there any way to make all the sub receive the message but not set QoS "durability" as DDS_DURABILITY_TRANSIENT_LOCAL?

jwcesign · 2018-10-17T08:32:30Z

And one more question, if I launch 70 sub, then one pub, sometimes all receive, but if I launch 100 sub, then one pub, there is few sub receive the message, what's the reason?

eboasson · 2018-10-17T08:52:08Z

I guess the more subscribers, the more time it takes to the discovery. It all runs in parallel, so it is conceivable that with 70 it can still make it, but that with 100 it has almost all of them halfway through the discovery. 'Tis but a guess ... there is a hard-to-read tracing format that would tell you more, but I'm quite certain the overhead of tracing to a text file will significantly affect this timing.

As to making sure all subs receive the message while using "volatile" data, the only option is to wait longer between dds_create_writer and dds_write. If you know there will be n subs, you could wait until the dds_get_publication_matched_status returns a current_count equal to n. Or, easier, if you know that you start the subscribers first, you could just wait for a little while.

P.S. All that is perfectly fine, but it does make for a much tighter coupling between the subscribers and the publisher than I personally would be happy with.

jwcesign · 2018-10-18T11:11:23Z

I think u right, DDS should be not so tight coupling. Thank you for your careful explanation！！！

jwcesign closed this as completed Oct 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many sub and one pub #25

Many sub and one pub #25

jwcesign commented Oct 17, 2018

eboasson commented Oct 17, 2018

jwcesign commented Oct 17, 2018

jwcesign commented Oct 17, 2018

eboasson commented Oct 17, 2018 •

edited

Loading

jwcesign commented Oct 18, 2018

Many sub and one pub #25

Many sub and one pub #25

Comments

jwcesign commented Oct 17, 2018

eboasson commented Oct 17, 2018

jwcesign commented Oct 17, 2018

jwcesign commented Oct 17, 2018

eboasson commented Oct 17, 2018 • edited Loading

jwcesign commented Oct 18, 2018

eboasson commented Oct 17, 2018 •

edited

Loading