New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632
Prevent stale master nodes from sharing dated cluster states to nodes that have moved to a different master node #9632
Conversation
return currentState; | ||
if (currentState.nodes().masterNodeId() != null) { | ||
if (!currentState.nodes().masterNodeId().equals(updatedState.nodes().masterNodeId())) { | ||
logger.info("received a cluster state that has [{}] master node, while current cluster state has [{}] as master node, ignoring...", updatedState.nodes().masterNode(), currentState.nodes().masterNode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be a warn log statement...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on warn. Can also do this check outside of the cluster state update task? it's a shame to go into an update task. We will still need this check in the cluster state as we may have two masters publishing while we are in the join process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
final String oldMasterNode = internalCluster().getMasterName(); | ||
|
||
// Simulating a painful gc by suspending all threads for a long time on the current elected master node. | ||
LongGCDisruption masterNodeDisruption = new LongGCDisruption(oldMasterNode, getRandom(), 0, 1, 60000, 60001); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a LongGCDisruption variant that allows using the startDisruption and stopDisrupting to control the GC? These extra params feel clunky (and yeah, I probably did it before too :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh :) I was thinking the same thing. Maybe we should have a subclass called SuspendDistruption that does this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call it suspend if you want. To me LongGC is easier to remember as it is what we talk about all them time.
left some comments. I wonder if with this change, we can also enable master checks in the nodes fault detection. |
@bleskes I updated the PR and applied the feedback. +1 for investigating how the master check (master node in ping request is equal to local master) in the nodes fault detection behaves. This can make a cluster deal better when two elected master nodes are active. |
* are true then the cluster state is dated and we should ignore it. | ||
*/ | ||
private boolean newClusterStateDated(ClusterState currentState, ClusterState newClusterState) { | ||
if (currentState.nodes().masterNodeId() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do
if (currentState.nodes().masterNodeId() == null) {
// we welcome our new overlords
return false;
}
One less nesting to put on the mental stack
@martijnvg left some comments. Thx! |
@bleskes Thanks! I updated the PR based on your comments. |
@@ -596,6 +598,102 @@ public boolean apply(Object input) { | |||
} | |||
|
|||
/** | |||
* Tests that emulates a frozen elected master node that unfreezes and pushes his cluster state to other nodes | |||
* that already are following another elected master node. These nodes should reject this cluster state and prevent | |||
* that they are going to follow the stale master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment - should be and prevent them from following the stale master
@bleskes I updated the PR and applied your comments. |
@@ -866,6 +841,63 @@ public void clusterStateProcessed(String source, ClusterState oldState, ClusterS | |||
} | |||
} | |||
|
|||
/** | |||
* Picks the cluster state with highest version from the queue. All cluster states with lower versions are ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to say something about the fact that we only skip cluster states if we have a higher cluster state version from the same master
|
||
currentNodes = DiscoveryNodes.builder(); | ||
currentNodes.masterNodeId("b"); | ||
currentState.nodes(currentNodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some randomization here around the version - check that new has a higher version then old and vice versa
@martijnvg looking good! I left some more comments, mostly around the testing. Thanks for taking it the extra mile! |
@bleskes thanks for the thorough review. I updated the PR. |
majoritySide.remove(oldMasterNode); | ||
|
||
// Keep track of the masters that appear in the cluster state on both nodes on the majority side, | ||
// only new_master may appear in here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is a bit out dated, no? we keep track of changes now..
thx @martijnvg . I left minor comments and some questions about |
@bleskes I updated the PR and added one question on one of your comments. |
@bleskes Changed the DiscoveryWithServiceDisruptions#testStaleMasterNotHijackingMajority() test to record all the master transitions per node and assert that only two happen. |
Left one minor comment. LGTM!! |
71c4320
to
d074de0
Compare
…tes to nodes that have moved to a different master node. If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes. This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually. Closes elastic#9632
d074de0
to
4fddda3
Compare
…tes to nodes that have moved to a different master node. If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes. This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually. Closes #9632
If an elected master node goes into a long gc then other nodes' fault detection will notice this and a new master election is started and eventually a new master node is elected. If the previous master nodes goes out of the long gc it can still have pending tasks which can result in new cluster state updates. Nodes that are still in the nodes list of this previous elected master node can get these cluster state updates. This commit makes sure that this dated cluster states are not accepted by these nodes.
This issue can temporary lead to the fact that non elected master nodes switch to the previous elected master node. The new elected master node also gets the same dated cluster state, but rejects it and tells the previous elected master node to step down and rejoin. Because the new elected master is the only master node the previous elected master node will follow the new elected master node. Any nodes that follow the previous elected master node (by accident), will also rejoin and follow the new elected master node because their master fault detection will fail. So all in all this isn't a severe problem, because the problem fixes itself eventually.