RATIS-1864. Support lease based read-only requests #928
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
see https://issues.apache.org/jira/browse/RATIS-1864
What is a Leader Lease
In Raft, the leader is responsible for processing and coordinating client requests, replicating data among other followers, and maintaining the distributed state machine.
Vanilla Raft requires the leader to obtain majority acknowledgements before serving every read requests. During normal operations, this prerequisite leads to unnecessary rpcs exchanged among the cluster, diminished read throughput and increased latency.
The leader lease is a concept that allows the leader to maintain its leadership without obtaining majority acknowledgements for a certain period of time (lease duration), during which it can directly serve client read requests.
The requirement of Lease is first brought up by the Alluxio Community @codings-dan.
How to extend lease during normal operations
Prerequisite
Initialize
Once a leader is elected and its authority being comfirmed by majorities through successfully replicating its first no-op log, the leader gains the lease. The lease validity starts from T(0).
Renewal
As long as the leader continues to send heartbeats and receives acknowledgments from a majority of other nodes, it can renew its lease. Theoretically, if the most recent acknowledged heartbeat was sent at time T(n), the validity of the new lease commences at T(n).
In practice, rather than updating the lease with every heartbeat, we opt for a more efficient approach by lazily updating the leader's lease upon each query. Here's how it works:
At time T(n), when the leader is questioned about its authority, it first collects the send times of the last replied AppendEntries from each of its followers, denoted as TR(1), TR(2), ..., TR(2n), sorted in descending order.
Next, it selects the maximum timestamp at when the majority of followers are known to be active, that is, TR(n).
If TR(n) falls within the time range [T(n), T(n) + LeaseTimeoutDuration], then the lease can be successfully renewed.
Revoke
If the lease is expired and the leader cannot renew it, it loses the lease and stops serving read-only requests directly.
How to handle lease during configuration changes
During the configuration changes, the lease can only be renewed if acknowledgments be received by both the old group and the new group. It is the same to leader election restrictions during reconfiguration.
What to do when forced step down
When a leader is forced down, its lease should be effectively revoked.
How to handle CPU drifts
We can lower the ratio allowed for lease timeouts. If the CPU drifts are unbound, better not to use lease read :)