Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Master scenario due to node using outdated information about last elected master #2213

Closed
pgermishuys opened this issue Jan 9, 2020 · 0 comments · Fixed by #2386 or #2454
Closed
Assignees
Labels
kind/bug Issues which are a software defect

Comments

@pgermishuys
Copy link
Contributor

pgermishuys commented Jan 9, 2020

There is a possibility whereby a multi-master scenario will arise due to nodes using out of date information (from gossip) during elections to propose or accept a node as a master candidate.

The gossip carries with it the Epoch Number which is used to determine whether a master candidate is a legitimate master candidate. In the following example, a master proposal with a higher epoch number could result in a node accepting a new master candidate even if its current master is still alive.

The above means that if a node gets a master proposal before it received a gossip from it's master after an election, the master proposal is accepted.

Example:

  1. 5 node cluster (Node 1 through to Node 5)
  2. Node 1 is elected master
  3. An election is started because Node 2 is restarted
  4. Node 2 is the leader of the elections
  5. Node 2 proposes Node 3 as the master as according to it, it's the best master candidate (it might have not received a gossip from the current master node)
  6. Node 2 accepts it's proposal
  7. Node 1 rejects the proposal as it's the master and it has the most up to date information about itself (Epoch Number included) and it's still alive
  8. Node 3 accepts the proposal because even though it has a previously elected master, it hasn't received a gossip from the master (Node1) with the updated Epoch Number whereas the proposal has a higher Epoch Number than the current master (Node1) at this point
  9. Node 4 rejects the proposal as it received a gossip from the current master (Node1)
  10. Node 5 accepts the proposal because it also hasn't received a gossip from the current master (Node1)

The above results in multiple masters existing in the same cluster without a network partition having occurred.

Once this happens, a series of elections will take place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment