Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use a background thread to disconnect node which are removed from the ClusterState #7543

Closed

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Sep 2, 2014

After a node fails to respond to a ping correctly (master or node fault detection), they are removed from the cluster state through an UpdateTask. When a node is removed, a background task is scheduled using the generic threadpool to actually disconnect the node. However, in the case of temporary node failures (for example) it may be that the node was re-added by the time the task get executed, causing an untimely disconnect call. Disconnect is cheep and should be done during the UpdateTask.

…ing it (after a ping failure)

After a node fails to respond to a ping correctly (master or node fault detection), they are removed from the cluster state through an UpdateTask. When a node is removed, a background task is scheduled using the generic threadpool to actually disconnect the node. However, in the case of temporary node failures (for example) it may be that the node was re-added by the time the task get executed. We should check for that.
@kimchy
Copy link
Member

kimchy commented Sep 2, 2014

I think simpler solution is just to remove the execution on a different thread, disconnect should be super cheap

@bleskes
Copy link
Contributor Author

bleskes commented Sep 2, 2014

@kimchy updated with another commit. I'll change the description before pushing (assuming no more feedback)

@kimchy
Copy link
Member

kimchy commented Sep 2, 2014

just to be safe, I would wrap each disconnect in a try ... catch block, similar to the connect code before. Other than that, LGTM.

@bleskes bleskes closed this in 1f8db67 Sep 3, 2014
bleskes added a commit that referenced this pull request Sep 3, 2014
…e remove from the ClusterState

After a node fails to respond to a ping correctly (master or node fault detection), they are removed from the cluster state through an UpdateTask. When a node is removed, a background task is scheduled using the generic threadpool to actually disconnect the node. However, in the case of temporary node failures (for example) it may be that the node was re-added by the time the task get executed, causing an untimely disconnect call. Disconnect is cheep and should be done during the UpdateTask.

Closes #7543
@bleskes bleskes changed the title [Internal] verify node is no longer in ClusterState before disconnecting it (after a ping failure) [Internal] Do not use a background thread to disconnect node which are removed from the ClusterState Sep 3, 2014
@bleskes bleskes deleted the verify_before_disconnect_on_update_task branch September 3, 2014 06:53
bleskes added a commit that referenced this pull request Sep 8, 2014
…e remove from the ClusterState

After a node fails to respond to a ping correctly (master or node fault detection), they are removed from the cluster state through an UpdateTask. When a node is removed, a background task is scheduled using the generic threadpool to actually disconnect the node. However, in the case of temporary node failures (for example) it may be that the node was re-added by the time the task get executed, causing an untimely disconnect call. Disconnect is cheep and should be done during the UpdateTask.

Closes #7543
@clintongormley clintongormley changed the title [Internal] Do not use a background thread to disconnect node which are removed from the ClusterState Internal: Do not use a background thread to disconnect node which are removed from the ClusterState Sep 8, 2014
@clintongormley clintongormley changed the title Internal: Do not use a background thread to disconnect node which are removed from the ClusterState Resiliency: Do not use a background thread to disconnect node which are removed from the ClusterState Sep 8, 2014
@jpountz jpountz removed the review label Oct 21, 2014
@clintongormley clintongormley changed the title Resiliency: Do not use a background thread to disconnect node which are removed from the ClusterState Do not use a background thread to disconnect node which are removed from the ClusterState Jun 7, 2015
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement resiliency v1.4.0.Beta1 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants