Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure delivery of first sample written following "publication matched" event #188

Merged
merged 5 commits into from May 29, 2019

Conversation

eboasson
Copy link
Contributor

This PR addresses a long-standing problem of which #165 is simply the most recent manifestation: if a writer W received a "publication matched" event for a (remote) volatile reader R, a sample written by W immediately afterward might not make it to R.

This is highly counter-intuitive and takes a large effort to work around it in the application.

I do actually believe the old behaviour to be within the rules of the specification, and certainly the only reasonable one I can see without assuming anything beyond what the specification guarantees while implementing the retransmit behaviour that the specification indicates as expected of a good implementation (a writer "should" retransmit samples for which a retransmit is requested even if the requestor has already acknowledged it previously).

Fortunately, the new behaviour is also within the rules of the specification, because it relies on the vendor id to distinguish between a Cyclone/Cyclone pair and all other cases, and alters the retransmit behaviour to never retransmit old stuff to a Cyclone reader. From an interoperability point, there is no behavioural change. That also means the old problems still exist when interoperating with other implementations.

If anyone reading this can point me to a different and better solution, I'd love to hear it.

The change coincidentally also ensures that all historical transient-local data is received in-order — bringing Cyclone's behaviour in line with the specification.

Sometimes it can be useful to force all data transmissions to go out via
multicasts, instead of using a unicast when a single unicast suffices.
This commit adds a General/PreferMulticast setting that, when set to
true, implements this behaviour.

It is "PreferMulticast" because readers that only advertise unicast
addresses will still receive the data via unicast.

The default behaviour is unchanged.

Signed-off-by: Erik Boasson <eb@ilities.com>
With the introduction of FreeRTOS support, the requirement that the
underlying platform supports multiple processes became untenable, and
therefore the code for dealing with multiple processes is now
automatically detected via some CMake code and then effectuated in the
sources via a macro (DDSRT_HAVE_MULTI_PROCESS).  Unfortunately, a typo
resulted in all platforms being treated as if without support for
multiple processes.  Fortunately, at this stage the only consequence was
the disabling of the first few multi-process tests.

Signed-off-by: Erik Boasson <eb@ilities.com>
The multi-process test driver program waits until all its children have
stopped (or timeout) by calling ddsrt_proc_waitpids in a loop.  The way
the loop is written, it can try once more when all children have already
been waited for, in which case it reports a spurious error.

This fixes that problem by checking whether the error code was to be
expected.

Signed-off-by: Erik Boasson <eb@ilities.com>
If a firewall blocks traffic over one network but not another, and the
one happens to be the default pick of Cyclone, then the MPT basic tests
won't work properly.

With the recent changes to the configuration handling for supporting
multiple sources of configuration, it makes far more sense to remove the
hardcoded network interface selection in the test configurations and to
append the test configuration file to the environment list rather than
to replace it.

Signed-off-by: Erik Boasson <eb@ilities.com>
…pse-cyclonedds#165)

A long-standing bug of Cyclone is that a sample written immediately
after a publication-matched event may never arrive at the reader that
was just matched.  This happened because the reader need not have
completed discovery of the writer by the time the writer discovers the
reader, at which point the reader ignores the sample because it either
doesn't know the writer at all, or it hasn't yet seen a Heartbeat from
it.

That Heartbeat arrives shortly after, but by then it is too late: the
reader slaves decides to accept the next sample to be written by the
writer.  (It has no choice, really: either you risk losing some data, or
you will be requesting all historical data, which is empathically not
what a volatile reader is about ...)

A related issue is the handling of historical data for transient-local
readers: it used to deliver this out-of-order, but that is firstly
against the specification, and secondly, against reasonable expectations
of those who use DDS as a mere publish-subscribe messaging system.  To
add insult to injury, it didn't completely handle some reordering issues
with disposes ...

This commit changes the way writers respond to a request for
retransmission from volatile proxy readers and the way the
in-sync/out-of-sync setting of a reader with respect to a proxy-writer
is used.  The first makes it safe for a Cyclone reader to ask a Cyclone
writer for all data (all these details not being covered in the specs it
errs on the reasonable side for other vendors, but that may cause the
data loss mentioned above): the writer simply send a Gap message to the
reader for all the sequence numbers prior to the matching.

The second changes the rule for switching from out-of-sync to in-sync:
that transition is now simply once the next sequence number to be
delivered to the reader equals the next sequence number that will be
delivered directly from the proxy writer object to all readers.  (I.e.,
a much more intuitive notion than reaching some seemingly arbitrary
sequence number.)

To avoid duplicates the rule for delivery straight from a proxy writer
has changed: where samples were delivered from the proxy writer to all
matching readers, they are now delivered only to the matching readers
that are in-sync.  To avoid ordering problems, the idea that historical
data can be delivered through the asynchronous delivery path even when
the regular data goes through the synchronous delivery path has been
abandoned.  All data now always follows the same path.

As these same mechanisms are used for getting historical data into
transient-local readers, the ordering problem for the historical data
also disappeared.

The test stuff in src/core/xtests/initsampledeliv covers a lot of the
interesting cases: data published before the existene of a reader, after
it, mixes of volatile and transient-local.  Running them takes quite a
bit of time, and they are not yet integrated in the CI builds (if ever,
because of that time).

Note: the "conservative built-in startup" option has been removed,
because it really makes no sense to keep a vague compatibility option
added a decade ago "just in case" that has never been used ...

Note: the workaround in the src/mpt/tests/basic/procs/hello.c (use
transient-local to ensure delivery of data) has been removed, as has
been its workaround for the already-fixed eclipse-cyclonedds#146.

Signed-off-by: Erik Boasson <eb@ilities.com>
@eboasson eboasson merged commit a652ecb into eclipse-cyclonedds:master May 29, 2019
@eboasson eboasson deleted the init-sample-deliv branch June 3, 2019 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant