-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write leader instance ID in epoch record and pass on epoch's leader's ID and gossip info to leader of elections #2454
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3d1b8e7
to
1e49bd7
Compare
54fdcd7
to
317d693
Compare
jageall
previously requested changes
May 20, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to update the UI submodule in this PR?
317d693
to
71a4860
Compare
It was a rebasing mistake. Fixed, thanks!
Tests currently failing due to #2505 |
…andidate node and it's still alive, propose it as the best candidate for Leader.
…e PrepareOk message GetBestLeaderCandidate(): Use the cluster gossip information to determine if the previous leader may still be alive GetBestLeaderCandidate(): Clean up code and add debug log messages
Remove tests that are no longer relevant since we no longer rely on the value of the in-memory last elected master to determine the new leader
…rCandidate() Add TestElectedLeaderId property to ElectionsService for test visibility
71a4860
to
2c3dc58
Compare
jageall
previously approved these changes
May 21, 2020
…y - only change leader if the current leader is resigning
hayley-jean
previously requested changes
May 26, 2020
jageall
approved these changes
May 26, 2020
hayley-jean
approved these changes
May 28, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changed: Write leader's instance ID in epoch record. Pass on the epoch record's leader's instance id and each node's gossip information during elections to the leader of elections to determine more accurately if the previous leader is still alive when choosing the best leader candidate.
Fixes: #2213
Problem statement
Current algorithm
The current election algorithm works like this when choosing the best candidate:
There are a few problems with this approach:
Problem 1
It is not necessary that the in-memory previously elected leader is accurate. For instance, consider this scenario in a 3 node cluster:
Problem 2
The leader of the elections also uses only it's own gossip information to determine if the previous leader is alive or not. It may be possible that the gossip information is outdated and the node thinks that the leader is dead but it is not.
Problem 3
Restarting a node loses the in memory previously elected leader. If this node becomes leader of the elections, it may not propose the previous leader.
Proposed solution
New algorithm
The new algorithm works as follows:
PrepareOk
message.PrepareOk
message.BestCandidate
PreviousLeader
) from the epoch record in thePrepareOk
message sent byBestCandidate
.PreviousLeader
is alive by either seeing if we have directly received aPrepareOk
message from it or if at least one node among the quorum thinks it is alive (using the gossip info sent in thePrepareOk
).PreviousLeader
as best candidateBestCandidate
as the best candidate.Guarantees
BestCandidate
will have it independently of which subset quorum of nodes participate in the elections.See this comment for more details: https://github.com/EventStore/EventStore/pull/2454/files#diff-65f89e62d3f64576ec45b0860d37ffc8R437
BestCandidate
node has the latest replicated data.BestCandidate
instead, we're still better than nothing.Stress tests
Stress tests have been carried out with https://github.com/EventStore/XstreamTester with random cluster sizes between 3 and 7 nodes. On this PR, out of around 5640 elections, 7 resulted in truncation which is around 0.1%. The reason for truncation is due to one of the two reasons stated above. There have also been no truncations of committed records.
Tests have also been carried out on V6
master
(ca67b9a) and in this case, the number of truncations is higher: 14/4996 elections ~= (0.3%). This has probably improved a bit compared to V5 due to #2386. There was also a case where epoch with committed records was truncated.Tests on 5.0.8 also show a higher number of truncations and a higher number of truncation of epoch with committed records.