New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose resignation of Master over HTTP. (Previously known as Maintenance Mode) #1982
Expose resignation of Master over HTTP. (Previously known as Maintenance Mode) #1982
Conversation
@muzaffar1331 Will this need docs? If so add the label… |
Testing NotesReplica status blinks on the UI pageSteps to reproduce the issue:
What's the expected result? What's the actual result? Nodes shuts down when enable maintenance node.Steps to reproduce the issue:
What's the expected result? What's the actual result? Nodes shuts down when you launch them in a 3 node clusterSteps to reproduce the issue: What's the expected result? What's the actual result? Stack trace:
Test EnvironmentOperating System: MacOS |
How to run the unit tests on Ubuntu 18.04:
|
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
…hen endpoint is triggered Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
… ElectionService Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
…priate Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
…onMaster Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
… only when GossipUpdate is received and master's priority has changed Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Signed-off-by: Muzaffar Auhammud <muzaffar1331@gmail.com>
Deposing the master will kick off elections. This has a number of benefits instead of kicking off elections manually. - You won't potentially end up in a case where too many master's are alive. - Client's that prefer master will automatically reconnect to the new master node as we handle this case in the state machine (ClusterVNodeController) as the node state is now `Unknown`. - The other nodes in the cluster will realize via their tcp connections being dropped that the cluster has changed and will act accordingly.
This provides nodes with a hint to not re-elect the last master if it has signalled that it is resigning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lack of tests around most of the components we've touched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases:
All tests done on a 3 node cluster with node A as master, unless specified otherwise.
Happy case:
- Constantly write data to node A.
- Set priority on node A to -1000.
- Resign node A.
- Node B or C elected master, A becomes unknown and eventuallly becomes a slave.
- No nodes taken offline for truncation.
Resigning node without changing priority
- Don't change priority on node A.
- Resign node A.
- Node A is not necessarily re-elected, but may be, if it is found to be the best candidate.
Changing priority and triggering elections without resigning a node
- Take down node C.
- Set priority on node A to -1000.
- Bring node C back up (triggering elections).
- Node A remains master.
This test case is here because previously it would cause instability
Resigning node when other nodes are not caught up
- Write a lot of data.
- Take down nodes B and C, delete their db folders.
- Bring nodes B and C back up, wait for them to go into a CatchingUp state.
- Set priority on node A to -1000.
- Resign node A.
- Node A is elected master again.
We are merging this PR for now, but there are improvements that need to be made to the election and gossip services in order to improve the stability of the cluster. |
…nce Mode) (#1982) Add ability to issue a resignation of the master node over http. Allow setting the priority for a node during runtime. Add NodePriority to ElectionMessage.Proposal. Add new ResigningMaster state. The node will enter this state when it is told to resign. While resigning, the master will ignore any new write requests and wait until the request queue is drained. Once the request queue is empty, the master will enter the Unknown state. Broadcast the resigning master message to other nodes in the cluster. This provides nodes with a hint to not re-elect the last master if it has signaled that it is resigning.
…nce Mode) (#1982) Add ability to issue a resignation of the master node over http. Allow setting the priority for a node during runtime. Add NodePriority to ElectionMessage.Proposal. Add new ResigningMaster state. The node will enter this state when it is told to resign. While resigning, the master will ignore any new write requests and wait until the request queue is drained. Once the request queue is empty, the master will enter the Unknown state. Broadcast the resigning master message to other nodes in the cluster. This provides nodes with a hint to not re-elect the last master if it has signaled that it is resigning.
…nce Mode) (#1982) Add ability to issue a resignation of the master node over http. Allow setting the priority for a node during runtime. Add NodePriority to ElectionMessage.Proposal. Add new ResigningMaster state. The node will enter this state when it is told to resign. While resigning, the master will ignore any new write requests and wait until the request queue is drained. Once the request queue is empty, the master will enter the Unknown state. Broadcast the resigning master message to other nodes in the cluster. This provides nodes with a hint to not re-elect the last master if it has signaled that it is resigning.
closed #1170 |
Ref: #1170
Adds the ability to issue a resignation of the master node over http. This command will generally be issued after the priority of master is set to a lower priority of the other nodes in the cluster.
This provides the ability to be able to perform maintenance tasks such as scavenging on the master without having to restart the master but issuing a resignation which will elect another member in the cluster as master.
Important Note
There is no UI bits for this and currently is only exposed via HTTP.
The process of resigning a master is a 2 step process.
Set node priority
We need to ensure that we set the node priority (this is a hint for when elections occur as it will select nodes based on a set of criteria and will order them by node priority amongst other things.) to a lower priority than the other nodes in the cluster.
Issue a resignation of the current running master node
When a master node is given the instruction to resign, it will ensure that all the writes in flight are handled before it changes it's state from
Master
toUnkown
. The intermediate state isResigningMaster
. During this state it will no longer accept any incoming writes.