NoSuchNodeException during startup #11923

clintongormley · 2015-06-29T18:16:17Z

When adding a new node to the cluster, master throws a series of NoSuchNodeException exceptions until the new node is ready:

[2015-06-29 20:14:27,958][WARN ][gateway                  ] [foo] [t][3]: failed to list shard for shard_store on node [g51KTc9wQZWlLiwiYFkcgg]
FailedNodeException[Failed node [g51KTc9wQZWlLiwiYFkcgg]]; nested: NoSuchNodeException[No such node [g51KTc9wQZWlLiwiYFkcgg]];
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:179)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:131)
    at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$100(TransportNodesAction.java:91)
    at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:65)
    at org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:42)
    at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
    at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.list(TransportNodesListShardStoreMetaData.java:82)
    at org.elasticsearch.gateway.AsyncShardFetch.asyncFetch(AsyncShardFetch.java:267)
    at org.elasticsearch.gateway.AsyncShardFetch.fetchData(AsyncShardFetch.java:117)
    at org.elasticsearch.gateway.GatewayAllocator.allocateUnassigned(GatewayAllocator.java:406)
    at org.elasticsearch.cluster.routing.allocation.allocator.ShardsAllocators.allocateUnassigned(ShardsAllocators.java:72)
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:179)
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:159)
    at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:145)
    at org.elasticsearch.discovery.zen.ZenDiscovery$11.execute(ZenDiscovery.java:937)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:378)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:209)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:179)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)
Caused by: NoSuchNodeException[No such node [g51KTc9wQZWlLiwiYFkcgg]]
    ... 20 more

The text was updated successfully, but these errors were encountered:

s1monw · 2015-06-29T21:30:29Z

@kimchy can you take a look at this?

kimchy · 2015-06-29T21:42:46Z

I think I know what happens, now that we reroute within the same cluster state when we add nodes, it means that they will be part of the cluster state being built. When we go and list the started shards, we use the existing cluster state that hasn't yet been updated to find the relevant nodes, and they will not be there since they are just being added... .

elastic#11776 has simplified our rerouting logic by removing a scheduled background reroute in favor of an explicit reroute during the cluster state processing of a node join (the only place where we didn't do it explicitly). While that change is conceptually good, it change semantics a bit in two ways: - shard listing actions underpinning shard allocation do not have access to that new node yet (causing errors during shard allocation see elastic#11923 - the very first cluster state published to a node already has shard assignments to it. This surfaced other issues we are working to fix separately This commit changes the reroute to be done post processing the initial join cluster state to side step these issues while we work on a longer term solution.

- shard listing actions underpinning shard allocation do not have access to that new node yet (causing errors during shard allocation see #11923 - the very first cluster state published to a node already has shard assignments to it. This surfaced other issues we are working to fix separately This commit changes the reroute to be done post processing the initial join cluster state to side step these issues while we work on a longer term solution. Closes #11960

bleskes · 2015-06-30T20:20:04Z

closed with #11960

- shard listing actions underpinning shard allocation do not have access to that new node yet (causing errors during shard allocation see #11923 - the very first cluster state published to a node already has shard assignments to it. This surfaced other issues we are working to fix separately This commit changes the reroute to be done post processing the initial join cluster state to side step these issues while we work on a longer term solution. Closes #11960

- shard listing actions underpinning shard allocation do not have access to that new node yet (causing errors during shard allocation see elastic#11923 - the very first cluster state published to a node already has shard assignments to it. This surfaced other issues we are working to fix separately This commit changes the reroute to be done post processing the initial join cluster state to side step these issues while we work on a longer term solution. Closes elastic#11960

clintongormley added >enhancement v2.0.0-beta1 help wanted adoptme :Cluster labels Jun 29, 2015

s1monw assigned kimchy Jun 29, 2015

bleskes mentioned this issue Jun 30, 2015

Reroute after node join is processed #11960

Closed

bleskes closed this as completed Jun 30, 2015

bleskes added the v1.7.0 label Jun 30, 2015

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NoSuchNodeException during startup #11923

NoSuchNodeException during startup #11923

clintongormley commented Jun 29, 2015

s1monw commented Jun 29, 2015

kimchy commented Jun 29, 2015

bleskes commented Jun 30, 2015

NoSuchNodeException during startup #11923

NoSuchNodeException during startup #11923

Comments

clintongormley commented Jun 29, 2015

s1monw commented Jun 29, 2015

kimchy commented Jun 29, 2015

bleskes commented Jun 30, 2015