Wait till node is part of cluster state for join process #6480

kimchy · 2014-06-12T10:54:08Z

When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.

When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published. This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state. Since now the join request is longer, also increase the join request timeout default. closes elastic#6480

s1monw · 2014-06-12T11:02:04Z

docs/reference/modules/discovery/zen.asciidoc

@@ -74,7 +74,7 @@ elected or joined to. This is done automatically. The
 configure the election to handle cases of slow or congested networks
 (higher values assure less chance of failure). Once a node joins, it
 will send a join request to the master (`discovery.zen.join_timeout`)
- with a timeout defaulting at 10 times the ping timeout.


can this have some note when this was changed?

bleskes · 2014-06-12T11:07:02Z

LGTM

When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published. This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state. Since now the join request is longer, also increase the join request timeout default. closes #6480

After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With elastic#6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.

After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side. Closes #6706

After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With elastic#6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side. Closes elastic#6706

After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side. Closes #6706

After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With elastic#6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side. Closes elastic#6706

After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side. Closes #6706

kimchy added review labels Jun 12, 2014

s1monw reviewed Jun 12, 2014
View reviewed changes

s1monw removed the review label Jun 12, 2014

fix docs

297c9e7

kimchy closed this in 2330421 Jun 12, 2014

kimchy deleted the join_wait_till_cluster_state branch June 12, 2014 16:16

bleskes mentioned this pull request Jul 3, 2014

Start Master|Node fault detection pinging immediately during discovery #6706

Closed

clintongormley changed the title ~~Wait till node is part of cluster state for join process~~ Resiliency: Wait till node is part of cluster state for join process Jul 16, 2014

clintongormley added the resiliency label Jul 16, 2014

clintongormley added the :Cluster label Jun 7, 2015

clintongormley changed the title ~~Resiliency: Wait till node is part of cluster state for join process~~ Wait till node is part of cluster state for join process Jun 7, 2015

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait till node is part of cluster state for join process #6480

Wait till node is part of cluster state for join process #6480

kimchy commented Jun 12, 2014

s1monw Jun 12, 2014

bleskes commented Jun 12, 2014

Wait till node is part of cluster state for join process #6480

Wait till node is part of cluster state for join process #6480

Conversation

kimchy commented Jun 12, 2014

s1monw Jun 12, 2014

Choose a reason for hiding this comment

bleskes commented Jun 12, 2014