Skip to content

The summary of the Raft

daviszhen edited this page Feb 16, 2020 · 2 revisions

After many dates’ work, the lab2 and lab3 in the mit6.824 course about Raft have been done. The Raft is a consensus algorithm promoted in 2014. It provides fault-tolerance with replicated state machines in a distributed environment. In the meantime, understandability and easy implementation are also concerned in the algorithm.

The code about the Raft in Go language has been uploaded here. At first, I was not familiar with the Go language. With the help of Google, the code has been done finally. It’s worthy to refine it in the future.

Now, It is time to share experiences about it. I think it is useful for you.

Table of Contents

  • Very Useful Materials
  • Short Recap

Very Useful Materials

You’d better read these texts before coding. That will save you a lot of time with my words.

The original paper, “In Search of an Understandable Consensus Algorithm(Extended Version)” wrote by Diego Ongaro and John Ousterhout. You can find it on the homepage. Especially, the essential parts of the raft are covered in Figure 2 in the paper. It is helpful to follow the description exactly as it says. Of course, texts the following will help you a lot.

Students’ Guide to Raft, wrote by Jon Gjengset. If you pass it, you will regret with tears.

MIT 6.824 lectures, FAQ and lab introductions about the Raft. In a macro view, these texts provide you more details about the Raft.

My Reviews about the Raft:

Short Recap

With the help of 2N+1 servers, the Raft can tolerate the N failures among them. Servers in the Raft only have one role of Follower, Candidate, and Leader. At any time, there is only one or none leader in it which is the Election Safty Property. The leader should get the votes from the majority and serves clients in its term.

The role of the server transfers between Follower, Candidate, and Leader. In a different role, the server performs differently. Only the leader can serve clients and replicate requests as logs to the majority. The follower responds to the candidate and the leader. The candidate requires votes from the majority.

Server states

As mentions above, operations from clients are put into the log of the leader. The log consists of many entries. Each entry takes operations from clients, the index of the entry in the log, and the term of the leader.

Then the leader tries to replicate entries to the majority. If the majority makes a consensus on entries. it means these entries have been committed in the majority. The Raft promises that committed entries will be present in all higher-numbered terms. It is claimed as the Leader Completeness Property.

There are two other properties about the entries in the log. One is the Leader Append-Only Property. The Raft makes sure that the leader never overwrites or deletes entries in its log, and only appends new entries.

Another is the Log Matching property. If two logs from two different servers contain an entry with the same index and term, the logs are identical in all entries up through the entry.

Notwithstanding the replicated state machine is different from the replicated log. The replicated log and the state machine make contributions to the replicated state machine. If a server has put entries in its replicated log to its state machine sequentially, other servers will put the same entries in the same order finally. This is the State Machine Safety Property claims.

These five properties preserve the replicated logs and the state machines in servers are consistent finally. This the real replicated state machine technology means.

Let’s see what a real log looks like in a moment.

Logs

Now, how to replicate logs also should be explained clearly.

The Raft presents two kinds of RPC(Remote Procedure Call) to communicate with each other about synchronizing logs.

AppendEntries RPC works when heartbeats need to be sent from the leader to the follower. And replicating log entries also needs AppendEntries RPCs. AppendEntries RPC takes an empty array of entries in a heartbeat. But It should take at least one entry in replicating log entries. That is the only difference between them. The follower takes care of them in the same way. When a follower confirms an AppendEntries RPC, it means the entries the follower contained are matching the entries in the leader up through the prevLogIndex which is an important argument in AppendEntries RPC.

AppendEntries RPC

RequestVote RPC is invoked by candidates to gather votes from other followers or candidates. If or not a follower or candidate votes another candidate needs careful design without violating the Election Safety and Leader Completeness property. It is that only one leader can be elected in one term.

RequestVote RPC

Now, a question pops up that how to put properties, roles, logs, and RPCs above together to achieve a dynamic system. More states and algorithms are required to bridge these pieces.

The state’s part in Figure 2 from the paper gives essential information for connecting pieces together.

State of the server

The currentTerm is the term when the server is working in.

The votedFor is the candidate who the serer has voted.

The log[] is the container of entries in the server.

These three states must be persistent in all servers. They help servers work rightly even after crashing and rebooting.

The commitIndex denotes the index of the highest log entry has been committed. All entries with indexes that are less than and equal to the commitIndex have been replicated to the majority.

The lastApplied shows the index of the highest log entry has been applied to the state machine. All entries with indexes that are less than and equal to the lastApplied have been executed into the state machine.

Entries in [ commitIndex +1, lastApplied ] in the log are waiting to be applied.

The nextIndex[S] indicates the index of the log entry that will be sent for the server S. Apparently, entries less than the nextIndex[S] have been sent previously.

The matchIndex[S] shows the index of the highest log entry that has been replicated to the server S. In other words, the server S has confirmed the replication of the entries up to the matchIndex[S].

So, the server S does not give any clues about entries in [ matchIndex[S] +1, nextIndex[S] -1] that are carefully replicated.

Until now, the most significant components are introduced statically. What do they look like in real and dynamic systems is also outlined particularly in Figure 2 in the paper. But, it requires many details that are missing the paper for implementation. The purpose of this text is to supplement these details.

Rules of servers

The server in a different role(Follower, Candidate, and Leader) works differently. Instead of putting all logics from three roles together, it’s better to deliver the details of each role independently.