Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent node failures can cause unneeded cluster state publishing #8933

Closed

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Dec 12, 2014

When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters.

Closes #8804

…publishing

When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters.

Closes elastic#8804
@s1monw
Copy link
Contributor

s1monw commented Dec 15, 2014

LGTM

@bleskes bleskes closed this in d62bf5f Dec 15, 2014
bleskes added a commit that referenced this pull request Dec 15, 2014
…publishing

When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters.

Closes #8804
Closes #8933
bleskes added a commit that referenced this pull request Dec 15, 2014
…publishing

When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters.

Closes #8804
Closes #8933
@bleskes bleskes changed the title Discovery: concurrent node failures can cause unneeded cluster state pubishing Discovery: concurrent node failures can cause unneeded cluster state publishing Dec 15, 2014
@clintongormley clintongormley changed the title Discovery: concurrent node failures can cause unneeded cluster state publishing Discovery: Concurrent node failures can cause unneeded cluster state publishing Dec 16, 2014
@clintongormley clintongormley added :Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure and removed review labels Mar 19, 2015
@clintongormley clintongormley changed the title Discovery: Concurrent node failures can cause unneeded cluster state publishing Concurrent node failures can cause unneeded cluster state publishing Jun 6, 2015
mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015
…publishing

When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters.

Closes elastic#8804
Closes elastic#8933
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >enhancement resiliency v1.4.2 v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nodes can't join anymore after they were killed (1.4.1)
3 participants