New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use phi accrual failure detectors for Raft elections and session timeouts #294
Conversation
@@ -29,93 +25,69 @@ | |||
* <p> | |||
* Based on a paper titled: "The φ Accrual Failure Detector" by Hayashibara, et al. | |||
*/ | |||
public class PhiAccrualFailureDetector<T extends Identifier> { | |||
private final Map<T, History> states = Maps.newConcurrentMap(); | |||
public class PhiAccrualFailureDetector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a builder for this class
public void report(T nodeId, long arrivalTime) { | ||
checkNotNull(nodeId, "NodeId must not be null"); | ||
public void report(long arrivalTime) { | ||
checkNotNull("NodeId must not be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is checking a string literal
log.trace("Starting heartbeat timer"); | ||
AtomicLong lastHeartbeat = new AtomicLong(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value needs to be updated, otherwise the last heartbeat time is always reported.
|
||
Scheduled oldTimer; | ||
if (minTimeout.isPresent()) { | ||
Scheduled newTimer = raft.getThreadContext().schedule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should be using a recurring timer or setting a timer recursively when responses are received.
.filter(m -> m != null) | ||
.collect(Collectors.toList())) | ||
.build(); | ||
log.trace("Sending {}", request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the MemberId
to this log statement
@@ -52,19 +53,23 @@ | |||
* Raft session. | |||
*/ | |||
public class RaftSessionContext implements RaftSession { | |||
private static final int PHI_FAILURE_THRESHOLD = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to remove this threshold constant
a2f4e5d
to
dbbf32a
Compare
This implementation seems to have a problem with doing several rounds of election at startup. I haven't looked at the logs closely enough to figure out why yet. |
nvm... I think it was because nodes weren't resetting election timers after voting in an election. Should be fixed now. |
dbbf32a
to
1e3b6aa
Compare
1e3b6aa
to
95af42b
Compare
Rebased with #295. This should pass consistently now. |
This PR refactors how leadership elections and session expirations are handled in Raft.
It adds a phi accrual failure detector used to determine when to start a new election. In order to avoid multiple servers starting an election at the same time, randomized timers are used to check the current phi value. The Raft election timeout is used as a fallback to ensure the timeout doesn't surpass that point.
Sessions are also expired using phi accrual failure detectors. This is done by the leader sending heartbeats to clients. New sessions are opened with a minimum timeout, and the leader sends heartbeats to the clients at the rate of the minimum session timeout. Sending heartbeats from the leader to clients also ensures clients resolve new leaders as soon as possible. In order to account for the time period during which an old leader crash was being detected and a new leader was being elected, nodes track the last heartbeat time and subtract that time from session timeouts. This means sessions can be expired via the failure detector immediately after a leader change if the client can't be reached by the leader.