Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#5510 - Pull request fastrtps feature/discovery-server #562

Merged
merged 94 commits into from
Jul 2, 2019

Conversation

MiguelBarro
Copy link
Contributor

@MiguelBarro MiguelBarro commented Jun 14, 2019

New concepts:

  • PDP is now stateful so:
  • StatefulReader should be modify to allow trustedWriterEntityId incomming traffic.

  • Clients should have beforehand knowledge of server's guidPrefix.

  • Clients should send direct periodical DATA(p) announcement to their servers till connection is acknowledge outside the StatefulWriter
    implementation. That is because StatefulWriter would only rely Heartbeats to the server (no enough info provided).

  • PDPServer has a Writer Cache with many DATA(p) stored (actual discovery information) so its announcement mechanism must be modified. Basically it does direct announcement. Announcement goes like this:

  • PDP superclass. Merely updates single cache entry on creation-destruction. Ordinarily does nothing.

  • PDPSimple subclass. PDP behavior but forces resend cache entry on ordinary announcement.

  • PDPClient subclass. PDP behavior on creation but on disposal and ordinary announcement directly sends the DATA(p) to its servers.

  • PDPServer subclass. PDP behavior on creation but on disposal and ordinary announcement directly sends the DATA(p) to its clients and unmatched servers.

Note that all PDP subclasses have a ResendParticipantProxyDataPeriod timed event for lease duration announcement.

  • DSClientEvent and DSServerEvent synchronization of EDP matching.

#4522 Splitting PDPSimple functionality between PDP superclass and PDPSimple:

  • deciding which methods and members should be in each class.

  • deciding which methods should be virtual

  • EDP should stay within PDP

  • PDPSimpleListener becames PDPListener and its references to PDP are updated to the superclass.

  • Update builtinprotocols to reference PDP superclass and a server list.

#4522 PDPServer implementation:

  • Stateful entities introduction for PDP discovery (CreatePDPEndpoints override).

  • EDP classes update to reference PDP superclass.

  • Implement client-server and server-server PDP handshake:

  • Mimic StatelessReader capability to accept incomming messages from trusted trustedWriterEntityId. Thus servers can accept incoming client traffic and automatically match client's PDP entities.

  • clients must have beforehand knowledge of servers guidPrefix to allow matching before as things are I cannot create Writer or ReaderProxies without a proper guidPrefix.

  • I must add a specific announcement from clients to servers because plain stateful writer behavior only sends an initial DATA(p) and later only heartbeats. This heartbeats don't provide info enough (lack locators) in order to the server properly identify a newbie client.

  • virtual PDP::initializeParticipantProxyData and PDP::initPDP() overrides take over PDP and EDP settings.

  • virtual PDP::announceParticipantState(bool new_change, bool dispose) this method and its overrides deals with quite complex behavior. Note that new_change = false in announcement and true on disposal and PDP startup.

  • Now a WriteParam argument is added as argument (TODO: explain why?)

  • PDP defaults to usual PDPSimple behavior. PDPSimple only has an intro in the Writer's cache ... its own PDP info:

    • for first discovery announcement this intro is removed and restored.

    • for further discovery announcement nothing is done.

    • for disposal announcement this intro is removed and replace by a special one NOT_ALIVE_DISPOSED_UNREGISTERED.

  • PDPSimple relies on PDP implementation for startup and disposal but for ordinary announcement calls PDP Writer::unsent_changes_reset() to force the Writer's cache to be resent. Note PDP Writer is stateless thus an actual DATA(p) would be send.

  • PDPClient doesn't rely on PDP implementation. Directly resends the DATA(p) stored in Writer's cache to its servers using a RTPSMessageGroup. Note that PDPSimple has a Stateless Writer but PDPClient has a stateful one (updating writer's cache would only lead to Heartbeats been send).

On disposal PDP implementation is called to update the Writer's cache but once again a RTPSMessageGroup is used to notify the servers. This must be done because we cannot rely the Stateful Writer would send the info to the servers before the participant is actually destroyed ... thus we send it ourselves.

  • PDPServer only relies on PDP implementation for startup. On disposal it creates and sends its own demise cache. As in PDPClient PDP Writer is stateful thus direct send through RTPSMessageGroup assures it will actually be delivered before participant destruction.

On ordinary announcement is directly send (RTPSMessageGroup because we want actual DATA(p)s and not merely Heartbeats) to all clients and non-engaged servers (engaged servers are actually clients :-). Why server's should make ordinary announcements if their addresses and guidPrefix should be well known? because of liveliness mechanism (liveliness subtleties to be discuss TODO: point where are they discuss).

#4955 Server logic implementation on PDP Server.

  • In order to workaround stateful entity discovery issues (basically that Heartbeats don't provide information enough for a server to match a client) a temporary periodical annoucement period is introduced. Client will ping on the server with DATA(p)s till it matches its PDP endpoints.

  • DSClientEvent and DSServerEvent are introduced to provide the PDPClient and PDPServer matching logic. Basically announcement will be done to those server's that don't acknowledge client's DATA(p). Once all matched servers (not the whole list but only those match) have acknowledge client's DATA(p) and the client has received all server's discovery data the EDP matching will be done (to those with PDP matched). This way no EDP data could ever be lost (EDP data can only be ignored if the client receives EDP datagrams from an unknown client).

Note that PDP endpoints matching is:

  • done on creation for clients (see PDPClient::createPDPEndpoints it matches its servers).

  • PDPListener and PDPServerListener call PDP::assignRemoteEndpoints overrides when DATA(p) received:

    • PDPSimple::assignRemoteEndpoints and PDPServer::assignRemoteEndpoints do the match.

    • PDPClient::assignRemoteEndpoints does nothing.

PDP::removeRemoteEndpoints overrides in PDPClient and PDPServer only unmatch client's PDP endpoints...server's PDP endpoints are automatically rematch.

Note that EDP endpoints are created on initialization (see initEDP) but matching depends on PDP subclass (through a PDP call to getEDP()->assignRemoteEndpoints() or mp_EDP):

  • PDPSimple whenever an unknown DATA(p) is received by PDPListener.

  • PDPClient by the DSClientEvent when PDPClient::all_servers_acknowledge_PDP() and PDPCLient::is_all_servers_PDPdata_updated() by calling PDPClient::match_servers_EDP_endpoints().

  • PDPServer by the DServerEvent but mechanism depends on remote participant nature:

    • remote PDPServer. PDPServer in this case behave almost as a client (clients doesn't defer the matching): checks if all remote servers (with PDP matched) received its data and if all matched servers data is updated. Then it queues for future EDP endpoints match the current PDP matches without EDP ones.

    • remote PDPClient. When client's DATA(p) is received the client is queued for EDP matching (TODO: explain the delayed mechanism). DServerEvent checks (see PDPServer::pendingEDPMatches()) if there are EDP matchings to do and all clients acknowledge its data. Then and only then is the actual matching done (and the container _p2match is cleared).

Basically the participants candidate for matching (already PDP matched) are stored in a _p2match container waiting until all PDP information is shared in order to prevent EDP info dismissal (note that EDP processing gets rid of EDP messages from unknown participants). In DSClientEvent we don't need this device because the server list acts as a container and all servers are supposed to be immortal.

Because clients are not suppose to be inmortal we need a trimming mechanism for _p2match because some clients may disappear before the actual EDP matching is done. Whenever a DATA(Up) is received and PDPServer::removeRemoteParticipant called _p2match is trimmed by PDPServer::removeParticipantForEDPMatch.

This mechanism guarantees PDP and EDP info awareness in all network nodes.

#4522 PDPServer implementation:

  • RemoteServerAttributes auxiliary class is created.

  • If the server must provide persistence (discovery data stored in a sqlite file database) we must specify its PDPWriter durability as TRANSIENT instead of TRANSIENT_LOCAL but the PDPReader must always be TRANSIENT. So PDPWriter durability depends on server settings.

#4952. Updating XMLParser to support client-server functionality. Profile XML schema modifications and validation.

#4953 - StatefulReader modifications to allow PDP client-server traffic

  • Modified Modify StatefulReader::acceptMsgFrom to accept messages from unknown participants m_trustedWriterEntityId (set with RTPSReader::setTrustedWriter). RTPSParticipantImpl::createReader will calls it automatically if isBuiltin flag is passed.

  • StatefulReader::processDataFragMsg, StatefulReader::processHeartbeatMsg, StatefulReader::processGapMsg. In all this cases I just extended the condition to manage adding the pWP != nullptr. This will keep everything as was before discarding any unknown writer message. StatefulReader::processDataMsg and StatefulReader::change_received to allow stateless fashion handling of framework messages.

#4955 - Server logic implementation on PDPServer
#4977 - Create the PDPServer methods for Writer manipulation
#4978 - Create the EDPServer methods for Writer manipulation

  • Note that PDPServerListener must add the received discovery info to the server's PDP Writer Cache to make it available to all clients and other servers. Note also that this Writer Cache must be trimmed of all DATA(p) associated to demise participants and replaced by DATA(Up) signaling its dead. The methods PDPServer::addRelayedChangeToHistory and PDPServer::removeParticipantFromHistory perform this function.

  • Server EDP endpoints should also keep all client discovery data relayed and trim it whenever a participant is killed. New EDP listener classes were developed EDPServerPUBListener and EDPServerSUBListener and a common EDPListener superclass was provided to them EDPSimplePUBListener and EDPSimpleSubListener. PDPClients and PDPSimple manage EDPSimple objects but PDPServer manages EDPServer objects with EDPServerXXXListeners. As in PDPServer a container of demise pub and subs is needed together with a trim mechanism provided by: EDPServer::addXXXFromHistory and EDPServer::removeXXXFromHistory.

  • PDP EDP Writer History strategy for picking up specific participant data and avoid duplications. The main issue is that the CacheChange_t data is serialized (see SerializedPayload_t serializedPayload member) thus not readily available. Two strategies were develop:

  • In order to trim (remove) writer changes without deserialization we resorted to InstanceHandle_t instanceHandle member that actually keeps the participant GUID (analogous to the ParticipantProxyData::m_key member.

  • In order to easily find participant related changes in the writer history we resorted to WriteParams write_params member. Whenever the PDP or EDP server listeners received a CacheChange_t first they check if its already stored by comparing the sample_identities. This WriteParams structure isn't currently been used in the library (maybe was there because of a external tool use...Julian talk about some RPC).

Because we are using them only in meta data and only if discovery server is used I believe is harmless to profit from them but in the future another mechanism should be devised (the problem keeps been the deserialization one).

Note that whenever a client or server make an announcement they fill the CacheChange_t WriteParams with PDPWriter GUID and sequence number. All meta DATA can then be univocally identifiable by any participant of the network.

Server's may also work as clients of another server thus almost all of the client behavior should be mimicked, this involves that, DSClientEvent announcement and matching strategy and ancillary methods like all_servers_acknowledge_PDP, is_all_servers_PDPdata_updated and match_servers_EDP_endpoints translate to DServerEvent. PDPServer now has methods as all_servers_acknowledge_PDP, is_all_servers_PDPdata_updated and match_servers_EDP_endpoints.

Note that as in clients the announcement (not initialization but announcement) must be send directly using RTPSMessageGroup because the StatefulWriter would not send DATA but HEARTBEATs.

#5275 - Solving liveliness issues

A new participant liveliness mechanism for the client-server scenario was devised been as faithful as possible to the RTPS standard philosophy:

  • clients should not track liveliness of another clients (those who are not in its server list). This can be done by not using a RemoteParticipantLeaseDuration TimedEvent in its ParticipantProxyData (mp_leaseDurationTimer = nullptr). PDPSimple & PDPClient use PDPListener but must generate a different participant proxy data (with or without mp_leaseDurationTimer) so a virtual PDP::createParticipantProxyData was introduced.

  • servers should track liveliness of clients directly connected to him (not track clients of another servers). This validation is done within PDPServer:: createParticipantProxyData override. If the serialized data guidPrefix matches the cache writer guidPrefix then it's not a relayed message and we must track the lease duration of this client.

  • clients and servers should take care of its servers. If the guidPrefix matches one of the related servers we must keep lease duration also.

Periodical announcement DATA(p)s should be sent to assert liveliness. ResendParticipantProxyDataPeriod PDP::mp_resendParticipantTimer takes care of it. Note that this event just calls PDP override announceParticipantState(false). Thus

  • PDPClient::announceParticipantState must resent the DATA(p) to all its servers. Note that this function is called during:

    • client initial pinging on the servers to make himself discovered. All yet to be matched servers must be posted.
    • client lease duration activity. All servers must be posted.
      Instead of adding a new parameter to the function I decide to add a flag (_serverPing) to the PDPClient class. Basically DSClientEvent sets _serverPing during a ping operation and gets reset after the announcement.
  • PDPServer::announceParticipantState must resent the DATA(p) to all its clients and servers. Because now StatefulWriter matches are unaccessible PDPServer must keep an accountability of the matched clients.

When a server kills a client or server by lease duration a new DATA(p[UD]) is relayed to all clients and servers to report this demise. Note that overrides of PDP::removeRemoteParticipant may reject this DATA(p[UD]) if announces its own dead.

Solving TCP issues (#5510 - Pull request fastrtps feature/discovery-server)

TCP listener ports must be provided to the clients of a discovery server. Basically as the current TCP transport runs:

  • all would be given the same physical port (if not user provided) thus only one would actually manage to match the server endpoints.

  • if we launch the clients from different processes in order to avoid the physical port issue then the locators in the client meta DATA only would make sense for the server. Other clients wouldn't be able to bind to this makeout listening ports.

In order to use discovery server over TCP is mandatory that all participants must be provided with listening ports.

  • solved deadlock on discovery server shutdown

As fastrtps is currently implemented one cannot unmatch a stateful reader from its own transport callback. Thats because the transport callback locks on the reader mutex and the HeartbeatResponseDelay's associated with their WriterProxies locks this mutex also (from the event thread). Thus the transport callback waits for the HeartbeatResponseDelay to finish but this is trap in the mutex barrier. This failure was never detected before because the only transport callbacks that unmatch their own reader are the PDP ones and were stateless readers.

raquelalvarezbanos and others added 30 commits March 5, 2019 08:49
* at least one multicast or unicast locator list MUST appear.
* the locator list may appear in any order

<xs:complexType name="RemoteServerAttributes">
	<xs:choice>
		<xs:sequence>
			<xs:element name="metatrafficUnicastLocatorList" type="locatorListType" minOccurs="1" />
			<xs:element name="metatrafficMulticastLocatorList" type="locatorListType" minOccurs="0" />
		</xs:sequence>
		<xs:sequence>
			<xs:element name="metatrafficMulticastLocatorList" type="locatorListType" minOccurs="1" />
			<xs:element name="metatrafficUnicastLocatorList" type="locatorListType" minOccurs="0" />
		</xs:sequence>
	</xs:choice>
	<xs:attribute name="Prefix" type="guid" use="required"/>
</xs:complexType>
…scovered. Now it stops when all are discovered and starts again when any of them vanishes.
…a listening port. It was decided by quorum that dynamic allocation of tcp listening ports for clients was error prone is some scenarios.
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

src/cpp/rtps/builtin/BuiltinProtocols.cpp Outdated Show resolved Hide resolved
src/cpp/rtps/builtin/BuiltinProtocols.cpp Outdated Show resolved Hide resolved
src/cpp/rtps/builtin/BuiltinProtocols.cpp Outdated Show resolved Hide resolved
src/cpp/rtps/builtin/BuiltinProtocols.cpp Outdated Show resolved Hide resolved
src/cpp/rtps/builtin/BuiltinProtocols.cpp Outdated Show resolved Hide resolved
src/cpp/rtps/builtin/discovery/participant/PDPServer.cpp Outdated Show resolved Hide resolved
src/cpp/rtps/builtin/discovery/participant/PDPServer.cpp Outdated Show resolved Hide resolved
// //locators.push_back(ep.multicastLocatorList);
//}
// temporary workaround
for (auto & svr : mp_builtin->m_DiscoveryServers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not protected with the PDP mutex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
Before the merge with develop it wasn't needed because I got the client's locators from the Writer now I must resort to participant info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid a deadlock, remove this lock and take the mutex in the DClientEvent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer included in above comment.

…t the change sequence number and atomic access to server and client lists.
include/fastrtps/rtps/common/WriteParams.h Show resolved Hide resolved
src/cpp/CMakeLists.txt Outdated Show resolved Hide resolved
src/cpp/CMakeLists.txt Outdated Show resolved Hide resolved
src/cpp/rtps/reader/timedevent/HeartbeatResponseDelay.cpp Outdated Show resolved Hide resolved
richiware
richiware previously approved these changes Jun 27, 2019
@richiware richiware closed this Jul 1, 2019
@richiware richiware reopened this Jul 1, 2019
@richiware richiware changed the base branch from develop to master July 1, 2019 09:56
@richiware
Copy link
Member

Tested manually with new Jenkins jobs. Failed tests not related.

@richiware richiware merged commit 39fa601 into master Jul 2, 2019
@richiware richiware deleted the feature/discovery-server branch July 2, 2019 13:50
@MiguelCompany MiguelCompany added this to the v1.9.0 milestone Jul 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants