Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632

Merged

Conversation

martijnvg
Copy link
Member

If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes.

This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually.

return currentState;
if (currentState.nodes().masterNodeId() != null) {
if (!currentState.nodes().masterNodeId().equals(updatedState.nodes().masterNodeId())) {
logger.info("received a cluster state that has [{}] master node, while current cluster state has [{}] as master node, ignoring...", updatedState.nodes().masterNode(), currentState.nodes().masterNode());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be a warn log statement...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on warn. Can also do this check outside of the cluster state update task? it's a shame to go into an update task. We will still need this check in the cluster state as we may have two masters publishing while we are in the join process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

final String oldMasterNode = internalCluster().getMasterName();

// Simulating a painful gc by suspending all threads for a long time on the current elected master node.
LongGCDisruption masterNodeDisruption = new LongGCDisruption(oldMasterNode, getRandom(), 0, 1, 60000, 60001);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a LongGCDisruption variant that allows using the startDisruption and stopDisrupting to control the GC? These extra params feel clunky (and yeah, I probably did it before too :))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh :) I was thinking the same thing. Maybe we should have a subclass called SuspendDistruption that does this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call it suspend if you want. To me LongGC is easier to remember as it is what we talk about all them time.

@bleskes
Copy link
Contributor

bleskes commented Feb 10, 2015

left some comments. I wonder if with this change, we can also enable master checks in the nodes fault detection.

@martijnvg
Copy link
Member Author

@bleskes I updated the PR and applied the feedback.

+1 for investigating how the master check (master node in ping request is equal to local master) in the nodes fault detection behaves. This can make a cluster deal better when two elected master nodes are active.

* are true then the cluster state is dated and we should ignore it.
*/
private boolean newClusterStateDated(ClusterState currentState, ClusterState newClusterState) {
if (currentState.nodes().masterNodeId() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do

if (currentState.nodes().masterNodeId() == null) {
// we welcome our new overlords
  return false;
}

One less nesting to put on the mental stack

@bleskes
Copy link
Contributor

bleskes commented Feb 16, 2015

@martijnvg left some comments. Thx!

@martijnvg
Copy link
Member Author

@bleskes Thanks! I updated the PR based on your comments.

@@ -596,6 +598,102 @@ public boolean apply(Object input) {
}

/**
* Tests that emulates a frozen elected master node that unfreezes and pushes his cluster state to other nodes
* that already are following another elected master node. These nodes should reject this cluster state and prevent
* that they are going to follow the stale master.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment - should be and prevent them from following the stale master

@martijnvg
Copy link
Member Author

@bleskes I updated the PR and applied your comments.

@@ -866,6 +841,63 @@ public void clusterStateProcessed(String source, ClusterState oldState, ClusterS
}
}

/**
* Picks the cluster state with highest version from the queue. All cluster states with lower versions are ignored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to say something about the fact that we only skip cluster states if we have a higher cluster state version from the same master


currentNodes = DiscoveryNodes.builder();
currentNodes.masterNodeId("b");
currentState.nodes(currentNodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some randomization here around the version - check that new has a higher version then old and vice versa

@bleskes
Copy link
Contributor

bleskes commented Feb 17, 2015

@martijnvg looking good! I left some more comments, mostly around the testing. Thanks for taking it the extra mile!

@martijnvg
Copy link
Member Author

@bleskes thanks for the thorough review. I updated the PR.

majoritySide.remove(oldMasterNode);

// Keep track of the masters that appear in the cluster state on both nodes on the majority side,
// only new_master may appear in here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a bit out dated, no? we keep track of changes now..

@bleskes
Copy link
Contributor

bleskes commented Feb 22, 2015

thx @martijnvg . I left minor comments and some questions about testStaleMasterNotHijackingMajority - maybe I miss something. Other than that - looking good!

@martijnvg
Copy link
Member Author

@bleskes I updated the PR and added one question on one of your comments.

@martijnvg
Copy link
Member Author

@bleskes Changed the DiscoveryWithServiceDisruptions#testStaleMasterNotHijackingMajority() test to record all the master transitions per node and assert that only two happen.

@bleskes
Copy link
Contributor

bleskes commented Feb 23, 2015

Left one minor comment. LGTM!!

@martijnvg martijnvg force-pushed the zen/prevent-master-node-hijacking branch from 71c4320 to d074de0 Compare February 23, 2015 11:18
…tes to nodes that have moved to a different master node.

If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes.

This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually.

Closes elastic#9632
@martijnvg martijnvg force-pushed the zen/prevent-master-node-hijacking branch from d074de0 to 4fddda3 Compare February 23, 2015 11:42
@martijnvg martijnvg merged commit 4fddda3 into elastic:master Feb 23, 2015
martijnvg added a commit that referenced this pull request Feb 23, 2015
…tes to nodes that have moved to a different master node.

If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes.

This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually.

Closes #9632
@clintongormley clintongormley added :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure and removed review labels Mar 19, 2015
@martijnvg martijnvg deleted the zen/prevent-master-node-hijacking branch May 18, 2015 23:26
@clintongormley clintongormley changed the title [Discovery] Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node Jun 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >enhancement resiliency v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants