New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix case where master node may crash after 2 consecutive elections #1510

Merged
merged 1 commit into from Dec 11, 2017

Conversation

2 participants
@shaan1337
Member

shaan1337 commented Dec 7, 2017

Bug description
When two elections occur quickly resulting in the same master being elected, the master node may crash.

Sample case (data has been anonymized):

1. INFO  ElectionsService    ] ELECTIONS: (V=1) DONE. ELECTED MASTER = [192.168.1.5:2112...
2. INFO  ClusterVNodeControll] ========== [192.168.1.5:2112] PRE-MASTER STATE, WAITING FOR CHASER TO CATCH UP...
...
3. INFO  ElectionsService    ] ELECTIONS: (V=2) DONE. ELECTED MASTER = [192.168.1.5:2112...
4. INFO  ClusterVNodeControll] ========== [192.168.1.5:2112] IS MASTER... SPARTA!
...
5. INFO  ElectionsService    ] ELECTIONS: (V=3) DONE. ELECTED MASTER = [192.168.1.5:2112...
6. FATAL StorageWriterService] Unexpected error in StorageWriterService. Terminating the process...
System.Exception: New Epoch request not in master state. State: PreMaster.
   at EventStore.Core.Services.Storage.StorageWriterService.EventStore.Core.Bus.IHandle<EventStore.Core.Messages.SystemMessage.WriteEpoch>.Handle(WriteEpoch message)...

It looks like the following sequence of events is occuring:

ClusterVNodeController                  StorageWriterService
======================                  ====================
ElectionMessage.ElectionsDone (#1)
SystemMessage.BecomePreMaster (#2)
                                        SystemMessage.BecomePreMaster (#2, _vnodeState = PreMaster)
ElectionMessage.ElectionsDone (#3)
Outputs SystemMessage.WriteEpoch here
SystemMessage.BecomeMaster (#4)
ElectionMessage.ElectionsDone (#5)
                                        Receives SystemMessage.WriteEpoch (_vnodeState = PreMaster) and crashes
                                        SystemMessage.BecomeMaster (#4) has not yet been received here

Getting a second ElectionMessage.ElectionsDone message before SystemMessage.BecomeMaster is published doesn't give the chance to the StorageWriterService to become Master and it's still in the PreMaster state when the WriteEpoch message is received.

Resolution

  1. Ensure we're in the Master state before publishing SystemMessage.WriteEpoch. Otherwise that means we're still in PreMaster state (the master info matches the node info, there is no other possible state) and will soon transition to Master where an epoch will be written.
  2. Drop WriteEpoch message if we're still in PreMaster state. After the StorageWriterService's state change to Master occurs soon after, the epoch will be written anyway.
Fix case where master node may still be in PreMaster state when there…
… are 2 quick consecutive elections and the same master is elected

@hayley-jean hayley-jean merged commit f825926 into release-v4.0.4 Dec 11, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
wercker/build-mono4 Wercker pipeline passed
Details

@hayley-jean hayley-jean deleted the fix-epoch-write-premaster branch Dec 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment