New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure delivery of first sample written following "publication matched" event #188
Merged
eboasson
merged 5 commits into
eclipse-cyclonedds:master
from
eboasson:init-sample-deliv
May 29, 2019
Merged
Ensure delivery of first sample written following "publication matched" event #188
eboasson
merged 5 commits into
eclipse-cyclonedds:master
from
eboasson:init-sample-deliv
May 29, 2019
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sometimes it can be useful to force all data transmissions to go out via multicasts, instead of using a unicast when a single unicast suffices. This commit adds a General/PreferMulticast setting that, when set to true, implements this behaviour. It is "PreferMulticast" because readers that only advertise unicast addresses will still receive the data via unicast. The default behaviour is unchanged. Signed-off-by: Erik Boasson <eb@ilities.com>
With the introduction of FreeRTOS support, the requirement that the underlying platform supports multiple processes became untenable, and therefore the code for dealing with multiple processes is now automatically detected via some CMake code and then effectuated in the sources via a macro (DDSRT_HAVE_MULTI_PROCESS). Unfortunately, a typo resulted in all platforms being treated as if without support for multiple processes. Fortunately, at this stage the only consequence was the disabling of the first few multi-process tests. Signed-off-by: Erik Boasson <eb@ilities.com>
The multi-process test driver program waits until all its children have stopped (or timeout) by calling ddsrt_proc_waitpids in a loop. The way the loop is written, it can try once more when all children have already been waited for, in which case it reports a spurious error. This fixes that problem by checking whether the error code was to be expected. Signed-off-by: Erik Boasson <eb@ilities.com>
If a firewall blocks traffic over one network but not another, and the one happens to be the default pick of Cyclone, then the MPT basic tests won't work properly. With the recent changes to the configuration handling for supporting multiple sources of configuration, it makes far more sense to remove the hardcoded network interface selection in the test configurations and to append the test configuration file to the environment list rather than to replace it. Signed-off-by: Erik Boasson <eb@ilities.com>
…pse-cyclonedds#165) A long-standing bug of Cyclone is that a sample written immediately after a publication-matched event may never arrive at the reader that was just matched. This happened because the reader need not have completed discovery of the writer by the time the writer discovers the reader, at which point the reader ignores the sample because it either doesn't know the writer at all, or it hasn't yet seen a Heartbeat from it. That Heartbeat arrives shortly after, but by then it is too late: the reader slaves decides to accept the next sample to be written by the writer. (It has no choice, really: either you risk losing some data, or you will be requesting all historical data, which is empathically not what a volatile reader is about ...) A related issue is the handling of historical data for transient-local readers: it used to deliver this out-of-order, but that is firstly against the specification, and secondly, against reasonable expectations of those who use DDS as a mere publish-subscribe messaging system. To add insult to injury, it didn't completely handle some reordering issues with disposes ... This commit changes the way writers respond to a request for retransmission from volatile proxy readers and the way the in-sync/out-of-sync setting of a reader with respect to a proxy-writer is used. The first makes it safe for a Cyclone reader to ask a Cyclone writer for all data (all these details not being covered in the specs it errs on the reasonable side for other vendors, but that may cause the data loss mentioned above): the writer simply send a Gap message to the reader for all the sequence numbers prior to the matching. The second changes the rule for switching from out-of-sync to in-sync: that transition is now simply once the next sequence number to be delivered to the reader equals the next sequence number that will be delivered directly from the proxy writer object to all readers. (I.e., a much more intuitive notion than reaching some seemingly arbitrary sequence number.) To avoid duplicates the rule for delivery straight from a proxy writer has changed: where samples were delivered from the proxy writer to all matching readers, they are now delivered only to the matching readers that are in-sync. To avoid ordering problems, the idea that historical data can be delivered through the asynchronous delivery path even when the regular data goes through the synchronous delivery path has been abandoned. All data now always follows the same path. As these same mechanisms are used for getting historical data into transient-local readers, the ordering problem for the historical data also disappeared. The test stuff in src/core/xtests/initsampledeliv covers a lot of the interesting cases: data published before the existene of a reader, after it, mixes of volatile and transient-local. Running them takes quite a bit of time, and they are not yet integrated in the CI builds (if ever, because of that time). Note: the "conservative built-in startup" option has been removed, because it really makes no sense to keep a vague compatibility option added a decade ago "just in case" that has never been used ... Note: the workaround in the src/mpt/tests/basic/procs/hello.c (use transient-local to ensure delivery of data) has been removed, as has been its workaround for the already-fixed eclipse-cyclonedds#146. Signed-off-by: Erik Boasson <eb@ilities.com>
eboasson
force-pushed
the
init-sample-deliv
branch
from
May 28, 2019 10:02
10ad12c
to
83b1f27
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses a long-standing problem of which #165 is simply the most recent manifestation: if a writer W received a "publication matched" event for a (remote) volatile reader R, a sample written by W immediately afterward might not make it to R.
This is highly counter-intuitive and takes a large effort to work around it in the application.
I do actually believe the old behaviour to be within the rules of the specification, and certainly the only reasonable one I can see without assuming anything beyond what the specification guarantees while implementing the retransmit behaviour that the specification indicates as expected of a good implementation (a writer "should" retransmit samples for which a retransmit is requested even if the requestor has already acknowledged it previously).
Fortunately, the new behaviour is also within the rules of the specification, because it relies on the vendor id to distinguish between a Cyclone/Cyclone pair and all other cases, and alters the retransmit behaviour to never retransmit old stuff to a Cyclone reader. From an interoperability point, there is no behavioural change. That also means the old problems still exist when interoperating with other implementations.
If anyone reading this can point me to a different and better solution, I'd love to hear it.
The change coincidentally also ensures that all historical transient-local data is received in-order — bringing Cyclone's behaviour in line with the specification.