Skip to content

Synchronization Barrier Protocol

Renato Costa edited this page Jun 7, 2018 · 2 revisions

Abstract

This describes the protocol used by a distributed PlusCal algorithm compiled by PGo. Before processes can start executing their algorithms, they need to agree whether global state has been properly initialized and to establish connections with one another. In this protocol, a process is arbitrarily chosen to play the role of coordinator and the other processes get information about node locations from it.

This protocol is currently used at least twice in an application compiled by PGo:

  • before each process starts their algorithm defined in the PlusCal spec
  • upon termination, to make sure processes holding on to global state will not terminate before other processes (that might depend on it) also terminate.

The Protocol

initialization_protocol

DIAL

When processes are created, they attempt to send a DIAL message to the coordinator (determined at compile-time). In case that fails, processes keep trying to contact the process coordinator until it finally succeeds (in the example above, P1 is started before P2). The coordinator acknowledges DIAL messages, and processes then wait for a message from the coordinator.

{ P1, ..., Pn }

As soon as the coordinator receives DIAL messages from every process, it sends each process a list of IP:port addresses where each other process in listening on. These addresses can be used to communicate across processes.

CONNECTED

When a process receives the list of other processes in the system, it establishes a connection with them. When that is completed, a CONNECTED message is sent to the coordinator to indicate that that process is ready to start any time.

START

When the coordinator received CONNECTED messages from every other process, it is then responsible for starting the actual execution of the algorithms specified in PlusCal. The coordinator sends a START message to every process, indicating that it can now start algorithm execution.

Outcomes and Assumptions

  • By the end of the protocol outlined above, all processes will be connected to each other forming a fully connected mesh topology.
  • Failures are not handled
  • Processes do not disconnect

Implementation Considerations

  • Communication happens over Go RPC, which runs on top of TCP.
  • The choice of the coordinator process is currently arbitrary and set to the first element in the list of peers declared in the configuration file.