Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632

martijnvg · 2015-02-10T15:20:21Z

If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes.

This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually.

martijnvg · 2015-02-10T15:21:52Z

src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java

-                            return currentState;
+                        if (currentState.nodes().masterNodeId() != null) {
+                            if (!currentState.nodes().masterNodeId().equals(updatedState.nodes().masterNodeId())) {
+                                logger.info("received a cluster state that has [{}] master node, while current cluster state has [{}] as master node, ignoring...", updatedState.nodes().masterNode(), currentState.nodes().masterNode());


maybe this should be a warn log statement...

+1 on warn. Can also do this check outside of the cluster state update task? it's a shame to go into an update task. We will still need this check in the cluster state as we may have two masters publishing while we are in the join process.

bleskes · 2015-02-10T20:47:47Z

src/test/java/org/elasticsearch/discovery/DiscoveryWithServiceDisruptions.java

+        final String oldMasterNode = internalCluster().getMasterName();
+
+        // Simulating a painful gc by suspending all threads for a long time on the current elected master node.
+        LongGCDisruption masterNodeDisruption = new LongGCDisruption(oldMasterNode, getRandom(), 0, 1, 60000, 60001);


Can we add a LongGCDisruption variant that allows using the startDisruption and stopDisrupting to control the GC? These extra params feel clunky (and yeah, I probably did it before too :))

heh :) I was thinking the same thing. Maybe we should have a subclass called SuspendDistruption that does this?

We can call it suspend if you want. To me LongGC is easier to remember as it is what we talk about all them time.

bleskes · 2015-02-10T20:53:51Z

left some comments. I wonder if with this change, we can also enable master checks in the nodes fault detection.

martijnvg · 2015-02-11T12:20:35Z

@bleskes I updated the PR and applied the feedback.

+1 for investigating how the master check (master node in ping request is equal to local master) in the nodes fault detection behaves. This can make a cluster deal better when two elected master nodes are active.

bleskes · 2015-02-16T08:52:46Z

src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java

+     * are true then the cluster state is dated and we should ignore it.
+     */
+    private boolean newClusterStateDated(ClusterState currentState, ClusterState newClusterState) {
+        if (currentState.nodes().masterNodeId() != null) {


can we do

if (currentState.nodes().masterNodeId() == null) { // we welcome our new overlords return false; }

One less nesting to put on the mental stack

bleskes · 2015-02-16T09:06:30Z

@martijnvg left some comments. Thx!

martijnvg · 2015-02-16T12:35:34Z

@bleskes Thanks! I updated the PR based on your comments.

bleskes · 2015-02-16T12:44:37Z

src/test/java/org/elasticsearch/discovery/DiscoveryWithServiceDisruptions.java

@@ -596,6 +598,102 @@ public boolean apply(Object input) {
    }

    /**
+     * Tests that emulates a frozen elected master node that unfreezes and pushes his cluster state to other nodes
+     * that already are following another elected master node. These nodes should reject this cluster state and prevent
+     * that they are going to follow the stale master.


minor comment - should be and prevent them from following the stale master

martijnvg · 2015-02-16T21:24:41Z

@bleskes I updated the PR and applied your comments.

bleskes · 2015-02-17T08:24:59Z

src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java

@@ -866,6 +841,63 @@ public void clusterStateProcessed(String source, ClusterState oldState, ClusterS
        }
    }

+    /**
+     * Picks the cluster state with highest version from the queue. All cluster states with lower versions are ignored


we need to say something about the fact that we only skip cluster states if we have a higher cluster state version from the same master

bleskes · 2015-02-17T08:38:09Z

src/test/java/org/elasticsearch/discovery/zen/ZenDiscoveryUnitTest.java

+
+        currentNodes = DiscoveryNodes.builder();
+        currentNodes.masterNodeId("b");
+        currentState.nodes(currentNodes);


can we add some randomization here around the version - check that new has a higher version then old and vice versa

bleskes · 2015-02-17T08:44:35Z

@martijnvg looking good! I left some more comments, mostly around the testing. Thanks for taking it the extra mile!

martijnvg · 2015-02-17T13:26:56Z

@bleskes thanks for the thorough review. I updated the PR.

bleskes · 2015-02-22T17:05:01Z

src/test/java/org/elasticsearch/discovery/DiscoveryWithServiceDisruptions.java

+        majoritySide.remove(oldMasterNode);
+
+        // Keep track of the masters that appear in the cluster state on both nodes on the majority side,
+        // only new_master may appear in here.


The comment is a bit out dated, no? we keep track of changes now..

bleskes · 2015-02-22T17:13:39Z

thx @martijnvg . I left minor comments and some questions about testStaleMasterNotHijackingMajority - maybe I miss something. Other than that - looking good!

martijnvg · 2015-02-22T21:37:16Z

@bleskes I updated the PR and added one question on one of your comments.

martijnvg · 2015-02-23T11:02:27Z

@bleskes Changed the DiscoveryWithServiceDisruptions#testStaleMasterNotHijackingMajority() test to record all the master transitions per node and assert that only two happen.

bleskes · 2015-02-23T11:11:13Z

Left one minor comment. LGTM!!

…tes to nodes that have moved to a different master node. If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes. This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually. Closes elastic#9632

…tes to nodes that have moved to a different master node. If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes. This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually. Closes #9632

martijnvg reviewed Feb 10, 2015
View reviewed changes

martijnvg added review v2.0.0-beta1 v1.5.0 >enhancement labels Feb 10, 2015

bleskes reviewed Feb 10, 2015
View reviewed changes

bleskes added the resiliency label Feb 11, 2015

bleskes reviewed Feb 16, 2015
View reviewed changes

bleskes reviewed Feb 17, 2015
View reviewed changes

drewr force-pushed the master branch from dcc3da0 to 7c20a8a Compare February 20, 2015 16:48

bleskes reviewed Feb 22, 2015
View reviewed changes

martijnvg force-pushed the zen/prevent-master-node-hijacking branch from 71c4320 to d074de0 Compare February 23, 2015 11:18

martijnvg force-pushed the zen/prevent-master-node-hijacking branch from d074de0 to 4fddda3 Compare February 23, 2015 11:42

martijnvg merged commit 4fddda3 into elastic:master Feb 23, 2015

clintongormley added :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure and removed review labels Mar 19, 2015

martijnvg deleted the zen/prevent-master-node-hijacking branch May 18, 2015 23:26

clintongormley changed the title ~~[Discovery] Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node~~ Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node Jun 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632

Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632

martijnvg commented Feb 10, 2015

martijnvg Feb 10, 2015

bleskes Feb 10, 2015

martijnvg Feb 10, 2015

bleskes Feb 10, 2015

martijnvg Feb 10, 2015

bleskes Feb 11, 2015

bleskes commented Feb 10, 2015

martijnvg commented Feb 11, 2015

bleskes Feb 16, 2015

bleskes commented Feb 16, 2015

martijnvg commented Feb 16, 2015

bleskes Feb 16, 2015

martijnvg commented Feb 16, 2015

bleskes Feb 17, 2015

bleskes Feb 17, 2015

bleskes commented Feb 17, 2015

martijnvg commented Feb 17, 2015

bleskes Feb 22, 2015

bleskes commented Feb 22, 2015

martijnvg commented Feb 22, 2015

martijnvg commented Feb 23, 2015

bleskes commented Feb 23, 2015

Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632

Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632

Conversation

martijnvg commented Feb 10, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Feb 10, 2015

martijnvg commented Feb 11, 2015

Choose a reason for hiding this comment

bleskes commented Feb 16, 2015

martijnvg commented Feb 16, 2015

Choose a reason for hiding this comment

martijnvg commented Feb 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Feb 17, 2015

martijnvg commented Feb 17, 2015

Choose a reason for hiding this comment

bleskes commented Feb 22, 2015

martijnvg commented Feb 22, 2015

martijnvg commented Feb 23, 2015

bleskes commented Feb 23, 2015