Nodes can't join anymore after they were killed (1.4.1) #8804

bluelu · 2014-12-07T09:09:16Z

We are running 1.4.1 (large cluster). Please take note that computation of shard allocation takes about 40-50 seconds on our cluster, so we suspect that this issue could be indeed related to (#6372)

We shutdown some processing river nodes with:
http://localhost:9200/_cluster/nodes/service:searchriver/_shutdown

The nodes disappeared from the cluster health information status page.

Still the master node keeps them somehow in the list, and can not dispatch any new cluster updates anymore as it still wants to dispatch updates to the missing nodes (10 nodes). (The issue was not resolved after 3 hours with the same messages reappearing, so we restarted the cluster)

Master log during that time:
[2014-12-06 21:45:20,849][DEBUG][cluster.service ] [master] cluster state updated, version [1568], source [zen-disco-node_failed([I56NODE][pWbBegdLTOm45Si7s46wTQ][i56NODE][inet[/x.x.18.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR56, master=false}), reason transport disconnected] {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:45:21,029][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I56NODE][pWbBegdLTOm45Si7s46wTQ][i56NODE][inet[/x.x.18.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR56, master=false}), reason transport disconnected]: done applying updated cluster_state (version: 1568) {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:45:21,029][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I54NODE][8dL9CH0ITuKs7SlGjXcClQ][i54NODE][inet[/x.x.16.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR54, master=false}), reason transport disconnected]: execute {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:02,672][DEBUG][cluster.service ] [master] cluster state updated, version [1569], source [zen-disco-node_failed([I54NODE][8dL9CH0ITuKs7SlGjXcClQ][i54NODE][inet[/x.x.16.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR54, master=false}), reason transport disconnected] {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:03,077][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I54NODE][8dL9CH0ITuKs7SlGjXcClQ][i54NODE][inet[/x.x.16.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR54, master=false}), reason transport disconnected]: done applying updated cluster_state (version: 1569) {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:03,078][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I61NODE][V_zq6_bWSy-QiODn7kOMZw][i61NODE][inet[/x.x.39.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR61, master=false}), reason transport disconnected]: execute {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:45,565][DEBUG][cluster.service ] [master] cluster state updated, version [1570], source [zen-disco-node_failed([I61NODE][V_zq6_bWSy-QiODn7kOMZw][i61NODE][inet[/x.x.39.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR61, master=false}), reason transport disconnected] {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:45,902][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I61NODE][V_zq6_bWSy-QiODn7kOMZw][i61NODE][inet[/x.x.39.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR61, master=false}), reason transport disconnected]: done applying updated cluster_state (version: 1570) {elasticsearch[master][clusterService#updateTask][T#1]}
[2014-12-06 21:46:45,902][DEBUG][cluster.service ] [master] processing [zen-disco-node_failed([I58NODE][7opie5gmS4uJ7frkv1bbCg][i58NODE][inet[/x.x.32.20:9301]]{trendiction_scluster=SEARCH1, data=false, service=searchriver, max_local_storage_nodes=1, trendiction_cluster=HR58, master=false}), reason transport disconnected]: execute {elasticsearch[master][clusterService#updateTask][T#1]}

I could be wrong here, but as far as I remember, during that time we also didn't see any other nodes having obtained any new cluster state. Also we couldn't execute any commands anymore (like closing an index). (timed out)

When we try to start the river nodes during above faulty state, the nodes can't join anymore:
Node log:
[2014-12-06 23:25:01,660][DEBUG][discovery.zen ] [I61node] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], id[9762], master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], hasJoinedOnce [true], cluster_name[talkwalker]} {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:26:41,680][INFO ][discovery.zen ] [I61node] failed to send join request to master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:26:41,680][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: execute {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:26:41,681][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: no change in cluster_state {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:26:46,697][DEBUG][discovery.zen ] [I61node] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], id[9792], master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], hasJoinedOnce [true], cluster_name[talkwalker]} {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:28:26,716][INFO ][discovery.zen ] [I61node] failed to send join request to master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] {elasticsearch[I61node][generic][T#1]}
[2014-12-06 23:28:26,717][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: execute {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:28:26,737][DEBUG][cluster.service ] [I61node] processing [finalize_join ([master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true})]: no change in cluster_state {elasticsearch[I61node][clusterService#updateTask][T#1]}
[2014-12-06 23:28:31,745][DEBUG][discovery.zen ] [I61node] filtered ping responses: (filter_client[true], filter_data[false])
--> ping_response{node [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], id[9822], master [[master][TH779p0eShWRaeyyU2Qqmg][master][inet[/x.x.12.16:9300]]{trendiction_scluster=NO_ROLE, data=false, service=cluster, max_local_storage_nodes=1, trendiction_cluster=HR51, river=none, master=true}], hasJoinedOnce [true], cluster_name[talkwalker]} {elasticsearch[I61node][generic][T#1]}

From the code,
https://github.com/elasticsearch/elasticsearch/blob/1.4/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java#L523-523

Since these nodes are non data nodes (flag is set in configuration file), is the complete reroute of shards necessary? I guess in our case it seems that those reroute calls were just piling up?

bleskes · 2014-12-08T08:27:41Z

@bluelu the first logs represents the master kicking off the river nodes. This has to be done on the cluster state update thread and my i guess is that it took a long time for the master to get there due to it being stuck in reroute (ref: #6372 (comment) )

The second is the join timing out because of the same issue (master can't get to it on time and the request times out after the default of 60s). I suggest you disable the disk threshold allocator (as suggested in 6372) and see if that helps. O.w. we can increase the timeout using discovery.zen.join_timeout (which requires a node restart)

bluelu · 2014-12-11T21:10:45Z

@bleskes, I can confirm that identical repeating tasks are removed (#8860) but not the failed entries for nodes that had been killed.

We can kill one or two nodes in our cluster without any issue. If we kill more than 10 nodes (non data nodes at the moment), then the cluster will never recover and it will spawn more and more of disco_node_failed entries in the pending tasks. The pending tasks for that type will grow and grow.

Unfortunately I overwrite the log file before so I don't have the output of the pending tasks anymore when this occured.

bluelu · 2014-12-12T09:35:01Z

Here is one example (the clsuter state has 212 of those entries for about 10 failed nodes).
It will grow indefinitely over time. Each of these failed messages will trigger an allocate, which triggers more messages to appear? (or they are repeated because the allocate takes longer)

      52624        4.6m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason transport disconnected
      52656        4.1m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52788        2.5m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52640        4.2m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52818          2m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52693        3.7m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52876       42.6s IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52715        3.4m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52730        3.3m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52749        3.1m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52674        3.9m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52827        1.8m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52833        1.6m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52842        1.5m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52697        3.6m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52901        2.5s IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52740        3.2m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52759        2.9m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52775        2.7m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52793        2.3m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52802        2.1m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52850        1.4m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52857        1.2m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52862          1m IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52870         53s IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52880       32.4s IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52887       22.5s IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout
      52893       12.6s IMMEDIATE zen-disco-node_failed([node1][lcZ6igcTSdKR2WoylVX-mA][node1][inet[/ip:9300]]{trendiction_scluster=SEARCH2, service=s1, max_local_storage_nodes=1, trendiction_cluster=HR76, river=_none_, master=false}), reason failed to ping, tried [3] times, each with maximum [1m] timeout

bleskes · 2014-12-12T09:36:12Z

thx @bluelu . I'll chase it down.

bluelu · 2014-12-12T15:24:02Z

Also an API to delete single pending tasks based on their id from the master task list would be great. In that case, as a workaround, we could just delete the offending ones.

ghost · 2014-12-12T17:30:57Z

While handling the node failure (zen-disco-node_failed / zen-disco-node_left) in ZenDiscovery.java, wouldn't it be possible to skip the rerouting if the node is not in the cluster state anymore.

So that the first update removes the node, handles the rerouting and the following updates can just take the shortcut as the node is not part of the updated clusterstate anyway.

…publishing When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters. Closes elastic#8804

bleskes · 2014-12-12T19:57:17Z

@bluelu @miccon I can confirm that concurrent shutdown of nodes will cause and O(n^2) number of failure events + reroutes. I just made a PR to reduce the overhead.

…publishing When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters. Closes #8804 Closes #8933

…publishing When a node fails (or closes), the master processes the network disconnect event and removes the node from the cluster state. If multiple nodes fail (or shut down) in rapid succession, we process the events and remove the nodes one by one. During this process, the intermediate cluster states may cause the node fault detection to signal the failure of nodes that are not yet removed from the cluster state. While this is fine, it currently causes unneeded reroutes and cluster state publishing, which can be cumbersome in big clusters. Closes elastic#8804 Closes elastic#8933

bleskes self-assigned this Dec 8, 2014

clintongormley added the feedback_needed label Dec 9, 2014

ghost mentioned this issue Dec 10, 2014

"IMMEDIATE" tasks getting queued up in pending tasks #8860

Closed

bluelu mentioned this issue Dec 10, 2014

Performance improvement: Drop identical repeating tasks from #8867

Closed

bleskes removed the feedback_needed label Dec 12, 2014

bleskes mentioned this issue Dec 12, 2014

Concurrent node failures can cause unneeded cluster state publishing #8933

Closed

bleskes closed this as completed in d62bf5f Dec 15, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodes can't join anymore after they were killed (1.4.1) #8804

Nodes can't join anymore after they were killed (1.4.1) #8804

bluelu commented Dec 7, 2014

bleskes commented Dec 8, 2014

bluelu commented Dec 11, 2014

bluelu commented Dec 12, 2014

bleskes commented Dec 12, 2014

bluelu commented Dec 12, 2014

ghost commented Dec 12, 2014

bleskes commented Dec 12, 2014

Nodes can't join anymore after they were killed (1.4.1) #8804

Nodes can't join anymore after they were killed (1.4.1) #8804

Comments

bluelu commented Dec 7, 2014

bleskes commented Dec 8, 2014

bluelu commented Dec 11, 2014

bluelu commented Dec 12, 2014

bleskes commented Dec 12, 2014

bluelu commented Dec 12, 2014

ghost commented Dec 12, 2014

bleskes commented Dec 12, 2014