Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait till node is part of cluster state for join process #6480

Closed
wants to merge 2 commits into from

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Jun 12, 2014

When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.

When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.
closes elastic#6480
@@ -74,7 +74,7 @@ elected or joined to. This is done automatically. The
configure the election to handle cases of slow or congested networks
(higher values assure less chance of failure). Once a node joins, it
will send a join request to the master (`discovery.zen.join_timeout`)
with a timeout defaulting at 10 times the ping timeout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this have some note when this was changed?

@s1monw s1monw removed the review label Jun 12, 2014
@bleskes
Copy link
Contributor

bleskes commented Jun 12, 2014

LGTM

@kimchy kimchy closed this in 2330421 Jun 12, 2014
kimchy added a commit that referenced this pull request Jun 12, 2014
When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.
closes #6480
@kimchy kimchy deleted the join_wait_till_cluster_state branch June 12, 2014 16:16
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jul 3, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  elastic#6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.
bleskes added a commit that referenced this pull request Jul 3, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Jul 3, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Jul 8, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Jul 9, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Jul 11, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Jul 12, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Jul 14, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  elastic#6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes elastic#6706
bleskes added a commit that referenced this pull request Jul 16, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
@clintongormley clintongormley changed the title Wait till node is part of cluster state for join process Resiliency: Wait till node is part of cluster state for join process Jul 16, 2014
bleskes added a commit that referenced this pull request Jul 17, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Jul 21, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Jul 22, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Aug 4, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Aug 5, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Aug 14, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Aug 22, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Aug 23, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit that referenced this pull request Aug 27, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Sep 1, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  elastic#6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes elastic#6706
bleskes added a commit that referenced this pull request Sep 8, 2014
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With  #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

Closes #6706
@clintongormley clintongormley changed the title Resiliency: Wait till node is part of cluster state for join process Wait till node is part of cluster state for join process Jun 7, 2015
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement resiliency v1.3.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants