New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes can't join anymore after they were killed (1.4.1) #8804
Comments
@bluelu the first logs represents the master kicking off the river nodes. This has to be done on the cluster state update thread and my i guess is that it took a long time for the master to get there due to it being stuck in reroute (ref: #6372 (comment) ) The second is the join timing out because of the same issue (master can't get to it on time and the request times out after the default of 60s). I suggest you disable the disk threshold allocator (as suggested in 6372) and see if that helps. O.w. we can increase the timeout using |
@bleskes, I can confirm that identical repeating tasks are removed (#8860) but not the failed entries for nodes that had been killed. We can kill one or two nodes in our cluster without any issue. If we kill more than 10 nodes (non data nodes at the moment), then the cluster will never recover and it will spawn more and more of disco_node_failed entries in the pending tasks. The pending tasks for that type will grow and grow. Unfortunately I overwrite the log file before so I don't have the output of the pending tasks anymore when this occured. |
Here is one example (the clsuter state has 212 of those entries for about 10 failed nodes).
|
thx @bluelu . I'll chase it down. |
Also an API to delete single pending tasks based on their id from the master task list would be great. In that case, as a workaround, we could just delete the offending ones. |
While handling the node failure (zen-disco-node_failed / zen-disco-node_left) in ZenDiscovery.java, wouldn't it be possible to skip the rerouting if the node is not in the cluster state anymore. So that the first update removes the node, handles the rerouting and the following updates can just take the shortcut as the node is not part of the updated clusterstate anyway. |
…publishing When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters. Closes elastic#8804
@bluelu @miccon I can confirm that concurrent shutdown of nodes will cause and O(n^2) number of failure events + reroutes. I just made a PR to reduce the overhead. |
…publishing When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters. Closes #8804 Closes #8933
…publishing When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters. Closes #8804 Closes #8933
…publishing When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters. Closes elastic#8804 Closes elastic#8933
We are running 1.4.1 (large cluster). Please take note that computation of shard allocation takes about 40-50 seconds on our cluster, so we suspect that this issue could be indeed related to (#6372)
We shutdown some processing river nodes with:
http://localhost:9200/_cluster/nodes/service:searchriver/_shutdown
The nodes disappeared from the cluster health information status page.
Still the master node keeps them somehow in the list, and can not dispatch any new cluster updates anymore as it still wants to dispatch updates to the missing nodes (10 nodes). (The issue was not resolved after 3 hours with the same messages reappearing, so we restarted the cluster)
Master log during that time:
[2014-12-06 21:45:20,849][DEBUG][cluster.service ] [master] cluster state updated, version [1568], source [zen-disco-node_failed([I56NODE][pWbBegdLTOm45Si7s46wTQ][i56NODE][inet[/x.x.18.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR56, master=false}), reason transport disconnected] {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:45:21,029][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I56NODE][pWbBegdLTOm45Si7s46wTQ][i56NODE][inet[/x.x.18.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR56, master=false}), reason transport disconnected]: done applying updated cluster_state (version: 1568) {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:45:21,029][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I54NODE][8dL9CH0ITuKs7SlGjXcClQ][i54NODE][inet[/x.x.16.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR54, master=false}), reason transport disconnected]: execute {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:02,672][DEBUG][cluster.service ] [master] cluster state updated, version [1569], source [zen-disco-node_failed([I54NODE][8dL9CH0ITuKs7SlGjXcClQ][i54NODE][inet[/x.x.16.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR54, master=false}), reason transport disconnected] {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:03,077][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I54NODE][8dL9CH0ITuKs7SlGjXcClQ][i54NODE][inet[/x.x.16.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR54, master=false}), reason transport disconnected]: done applying updated cluster_state (version: 1569) {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:03,078][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I61NODE][V_zq6_bWSy-QiODn7kOMZw][i61NODE][inet[/x.x.39.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR61, master=false}), reason transport disconnected]: execute {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:45,565][DEBUG][cluster.service ] [master] cluster state updated, version [1570], source [zen-disco-node_failed([I61NODE][V_zq6_bWSy-QiODn7kOMZw][i61NODE][inet[/x.x.39.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR61, master=false}), reason transport disconnected] {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:45,902][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I61NODE][V_zq6_bWSy-QiODn7kOMZw][i61NODE][inet[/x.x.39.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR61, master=false}), reason transport disconnected]: done applying updated cluster_state (version: 1570) {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:45,902][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I58NODE][7opie5gmS4uJ7frkv1bbCg][i58NODE][inet[/x.x.32.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR58, master=false}), reason transport disconnected]: execute {elasticsearch[master][clusterService#updateTask][T#1]}
I could be wrong here, but as far as I remember, during that time we also didn't see any other nodes having obtained any new cluster state. Also we couldn't execute any commands anymore (like closing an index). (timed out)
When we try to start the river nodes during above faulty state, the nodes can't join anymore:
Node log:
[2014-12-06 23:25:01,660][DEBUG][discovery.zen ] [I61node] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], id[9762], master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], hasJoinedOnce [true], cluster_name[talkwalker]} {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:26:41,680][INFO ][discovery.zen ] [I61node] failed to send join request to master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:26:41,680][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: execute {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:26:41,681][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: no change in cluster_state {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:26:46,697][DEBUG][discovery.zen ] [I61node] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], id[9792], master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], hasJoinedOnce [true], cluster_name[talkwalker]} {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:28:26,716][INFO ][discovery.zen ] [I61node] failed to send join request to master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:28:26,717][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: execute {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:28:26,737][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: no change in cluster_state {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:28:31,745][DEBUG][discovery.zen ] [I61node] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], id[9822], master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], hasJoinedOnce [true], cluster_name[talkwalker]} {elasticsearch[I61node][generic][T#1]}
From the code,
https://github.com/elasticsearch/elasticsearch/blob/1.4/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java#L523-523
Since these nodes are non data nodes (flag is set in configuration file), is the complete reroute of shards necessary? I guess in our case it seems that those reroute calls were just piling up?
The text was updated successfully, but these errors were encountered: