Skip to content

Latest commit

 

History

History
108 lines (84 loc) · 4.45 KB

overview.adoc

File metadata and controls

108 lines (84 loc) · 4.45 KB

Overview

The jgroups-raft project is an implementation of Raft in JGroups.

It provides a consensus based system where leader election and changes are committed by consensus (majority agreement). A fixed number of nodes form a cluster and each node is a state machine. A leader is elected by consensus and all changes happen through the leader which replicates them to all nodes, which add them to their persistent log.

Because Raft guarantees that there’s only ever one leader at any time, and changes are identified uniquely, all state machines receive the same ordered stream of updates and thus have the exact same state.

Raft favors consistency over availability; in terms of the Cap theorem, jgroups-raft is a CP system. This means jgroups-raft is highly consistent, and the data replicated to nodes will never diverge, even in the face of network partitions (split brains), or restarts. Or, on an extended version, jgroups-raft provides the means to build PC/EC systems concerning the PACELC theorem.

In case of a network partition, in a cluster of N nodes, at least N/2+1 nodes have to be running for the system to be available.

If for example, in a 5 node cluster, 2 nodes go down, then the system can still commit changes and elect leaders as 3 is still the majority. However, if another node goes down, the system becomes unavailable and client requests will be rejected. (Depending on configuration, there may still be some limited form of read-only availability.)

By implementing jgroups-raft in JGroups, the following benefits can be had:

  • Transports already available: UDP, TCP

    • Contains thread pools, priority delivery (OOB), batching etc

  • Variety of discovery protocols

  • Encryption, authentication, compression

  • Fragmentation, reliability over UDP

  • Multicasting for larger clusters

  • Failure detection

  • Sync/async cluster RPCs

The code required to be written for a full Raft implementation is smaller than if it had been implemented outside of JGroups.

The feature set of jgroups-raft includes

  • Leader election and append entries functionality by consensus

  • Persistent log (using LevelDB)

  • Dynamic addition and removal of cluster nodes

  • Cluster wide atomic counters

  • Replicated hash maps (replicated state machines)

Architecture

The architecture of jgroups-raft is shown below.

The architecture of jgroups-raft.
     +----------------+
     |                |
   +-+  StateMachine  |<-+
   | |                |  |
   | +----------------+  |
   |                     |
set/get                apply
   |                     |
   |  +--------------+   |
   +->|  RaftHandle  +---+
      +--------------+

       +------------+
       |  Channel   |
       +------------+

       +------------+
       |   CLIENT   |
       +------------+
       |  REDIRECT  |
       +------------+        +-------+
       |    RAFT    +--------+  Log  |
       +------------+        +-------+
       |  ELECTION  |
       +------------+
             .
             .
       +------------+
       |  NO_DUPES  |
       +------------+
             .
             .

The components that make up jgroups-raft are

  • A JGroups protocol stack with jgroups-raft specific protocols added:

    • NO_DUPES: makes sure that a jgroups-raft node does not appear in a view more than once

    • ELECTION: handles leader election

    • RAFT: implements the Raft algorithm, ie. appending entries to the persistent log, committing them, syncing new members etc

    • REDIRECT: redirects requests to the leader

    • CLIENT: accepts client requests over a socket, executes them and sends the results back to the clients

  • Channel: this is a regular JGroups JChannel or ForkChannel

  • RaftHandle: the main class for users of jgroups-raft to interact with

  • StateMachine: an implementation of StateMachine. This is typically a replicated state machine. jgroups-raft ships with a number of building blocks implementing StateMachine such as CounterService or ReplicatedStateMachine.

The figure above shows one node in a cluster, but the other nodes have the same setup except that every node is required to have a different raft_id (defined in RAFT). This is a string which defines one cluster member; all members need to have different raft_ids (more on this later).