Skip to content

Commit

Permalink
update notes for raft (#156)
Browse files Browse the repository at this point in the history
Signed-off-by: Xiaoming Guo <danniel1205@gmail.com>
  • Loading branch information
danniel1205 authored Feb 13, 2024
1 parent 600ec41 commit f2b449a
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 17 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,44 +3,47 @@
## Safety

![config-change](resources/config-change.png)

From above diagram, it is possible S1 and S2 form the majority of C-old, and S3, S4, S5 form the majority of C-new. We
need to avoid two leaders from both C-old and C-new to be elected within the same term.

### Safety Add or remove one server at a time
### Safety of adding or removing one server at a time

![change-one-member-at-a-time](resources/change-one-member-at-a-time.png)
If add or remove one server at a time, this prevents cluster from splitting into two independent majorities. Which means

If adding or removing one server at a time, this prevents cluster from splitting into two independent majorities. Which means
it is not possible to have two leaders within the same term.

#### Workflow

- leader receives the request to change membership
- leader appends C-new to its log
- leader replicates C-new to all followers
- configuration compltes once the C-new log entries are committed
- configuration completes once the C-new log entries are committed

---
It is possible leader crashes before the C-new gets committed. In this case, a new leader will be elected, client could
retry the configuration change since it does not receive the response from previous leader.

### Safety Add or remove arbitrary servers at a time
### Safety of adding or removing arbitrary servers at a time

The solution is mentioned in 4.3 of the [paper](https://github.com/ongardie/dissertation/blob/master/stanford.pdf) which uses two phases.

![joint-consensus](resources/joint-consensus.png)

- Client sends a config change request to leader
- Leader enters the joint consensus phase
- Leader enters the joint consensus phase by adding a configuration change entry in its log describing the join consensus
phase (The cluster is consist of both new and old configurations)
- Store the C[old,new] as log entry and replicate to all 5 servers
- Config change log entry is applied immediately on receipt
- Need joint consensus from both C[old] servers and C[new] servers in order to commit a log entry and select a new leader.
(If we had 3 servers, now adding 9 new servers, joint consensus needs 2/3 + 5/9 to reach majorit)
- Need joint consensus from both C[old] servers and C[new] servers in order to commit a log entry or select a new leader.
(If we had 3 servers, now adding 9 new servers, joint consensus needs 2/3 + 5/9 to reach majority)
- Once C[old,new] log entries are committed, leader creates a log entry C[new] and replicates to all servers
- Once C[new] log entries are committed, old config becomes irrelevant, cluster is under new config now

## Availability

### Availability Add or remove one server at a time
### Availability of adding or removing one server at a time

#### Catching up new servers

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,23 @@
Term: Election + Normal operation under a single leader

- Term ID is increment only
- 0/1 leader per term
- 0 or 1 leader per term
- Each server maintain the current term on disk

Heartbeats and Timeouts

- Servers start up as follower
- Leaders must send `heartbeats`(empty AppendEntries RPCs) to maintain authority
- Followers expect to receive RPCs from leader
- If no RPCs from leader with `electionTimeout`(100-500ms), follower starts a new election
- If no RPCs from leader with `electionTimeout`(100-500ms), follower assumes leader has crashed and starts a new
election.

Election basics when a server starts an election:

- Increment curent term
- Increment current term
- Change to `candidate` state
- Vote for self
- Send `RequestVote` RPCs to all other servers, retries until
- Send `RequestVote` RPCs in parallel to all other servers, retries until
- Receive votes from majority(`n /2 + 1`) of servers
- Become leader
- Send `heartbeats` to all other servers
Expand All @@ -40,7 +41,7 @@ Election safety: At most one leader per term

- Each server gives one vote per term

Election liveness: Some candidate must eventually becomes a leader
Election liveness: Some candidate must eventually become a leader

- Each server choose `electionTimeout` randomly between [t, 2t]

Expand Down Expand Up @@ -80,7 +81,7 @@ Because of above properties, there is an `AppendEntries` consistency check:

![append-entries-consistency-check](resources/appendentries-consistency-check.png)

## When leader changes
## Leader changes

When leader changes, logs among servers might not be identical. Leader's log is the only truth, and eventually leader
makes followers log identical to its log.
Expand Down Expand Up @@ -116,14 +117,15 @@ This guarantees S4 and S5 will NOT be elected as the new leader from the followi
However, the following case will still mess things up. The leader on Term2 only replicated entries on S1 and S2 before
its term ended. S5 was selected as leader on Term3 and append logs to its own then crashed. S1 is the current leader which
is trying to finish committing entry from Term2. Now the entry 2 is replicated on [S1, S2, S3], but is not safely
committed, since S5 could still be elected as leader at Term5 and will overwrite the entry 3 on [S1, S2, S3]
committed, since S5 could still be elected as leader at Term5 and will broadcast the entry 3 on [S1, S2, S3] and in this
case we will lose entry 2 which has been committed.

![pick-best-leader](resources/pick-best-leader-2.png)

For a leader to decide an entry is committed:

- Must be stored on the majority of servers
- At least one new entry from the leader's term must also be stored on the majority of servers. (Entry 4 needs to be
- At least one new entry (`4` in purple) from the leader's term must also be stored on the majority of servers. (Entry 4 needs to be
stored on majority of servers as well)

![new-commitment-rule](resources/new-commitment-rules.png)
Expand All @@ -132,9 +134,13 @@ If entry 4 is committed, then S5 cannot be elected as leader at term 5.

## How to make log entries identical after leader changes

![](resources/leader-change-log-inconsistency.png)

- Leader deletes extraneous entries of followers
- Leader fills in missing entries of followers

![](resources/repair-follower-logs.png)

``` text
- keeps nextIdx for each follower, nextIdx initialized to leader's last index + 1
- leader sends the preceding log index and term with the appendEntries RPC for consisitency check
Expand Down Expand Up @@ -183,7 +189,7 @@ uses two phases.
---
Above solution works, but Raft is now using a simpler solution described in 4.2 of the [paper](https://github.com/ongardie/dissertation/blob/master/stanford.pdf)

See [deep-dive-config-change](./resources/deep-dive-config-change.md) for more details.
See [deep-dive-config-change](./deep-dive-config-change.md) for more details.

## Reading materials

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f2b449a

Please sign in to comment.