Remove scheduled routing #11776

kimchy · 2015-06-19T12:36:29Z

Today, we have scheduled reroute that kicks every 10 seconds and checks if a
reroute is needed. We use it when adding nodes, since we don't reroute right
away once its added, and give it a time window to add additional nodes.

We do have recover after nodes setting and such in order to wait for enough
nodes to be added, and also, it really depends at what part of the 10s window
you end up, sometimes, it might not be effective at all. In general, its historic
from the times before we had recover after nodes and such.

This change removes the 10s scheduling, simplifies RoutingService, and adds
explicit reroute when a node is added to the system. It also adds unit tests
to RoutingService.

bleskes · 2015-06-19T15:02:19Z

core/src/test/java/org/elasticsearch/test/InternalTestCluster.java

@@ -365,7 +365,6 @@ private static Settings getRandomNodeSettings(long seed) {
        Random random = new Random(seed);
        Builder builder = Settings.settingsBuilder()
                // decrease the routing schedule so new nodes will be added quickly - some random value between 30 and 80 ms


comment can go away..

right, missed it..

bleskes · 2015-06-19T16:14:29Z

I think it's awesome how easy this turned up to be. I looked at all the places we submit a cluster state update task and didn't see any place we miss a reroute (and would rely on the 10s) apart from the node join we already knew about.

Made some small suggestions.

s1monw · 2015-06-19T21:13:06Z

core/src/main/java/org/elasticsearch/cluster/routing/RoutingService.java

-        this.schedule = settings.getAsTime("cluster.routing.schedule", timeValueSeconds(10));
-        clusterService.addFirst(this);
+        if (clusterService != null) {
+            clusterService.addFirst(this);


at some point we should really add these ourside of the classes itself IMO. but that is totally unrelated

kimchy · 2015-06-19T21:50:39Z

@bleskes I pushed a first review round, I don't like the suggestion on the atomic reference one, I don't think it adds a lot of value, on the other hand, I have an idea on what would, but its a bigger change, will see if I can work on it and if it stays small

bleskes · 2015-06-22T09:25:38Z

LGTM. I'm +1 on 1.7 - it makes things easier to trace, reason about and debug.

s1monw · 2015-06-23T15:13:32Z

I am ok with this too... this entire circular dep stuff is just really bad but lets move on

Today, we have scheduled reroute that kicks every 10 seconds and checks if a reroute is needed. We use it when adding nodes, since we don't reroute right away once its added, and give it a time window to add additional nodes. We do have recover after nodes setting and such in order to wait for enough nodes to be added, and also, it really depends at what part of the 10s window you end up, sometimes, it might not be effective at all. In general, its historic from the times before we had recover after nodes and such. This change removes the 10s scheduling, simplifies RoutingService, and adds explicit reroute when a node is added to the system. It also adds unit tests to RoutingService. closes elastic#11776

Today, we have scheduled reroute that kicks every 10 seconds and checks if a reroute is needed. We use it when adding nodes, since we don't reroute right away once its added, and give it a time window to add additional nodes. We do have recover after nodes setting and such in order to wait for enough nodes to be added, and also, it really depends at what part of the 10s window you end up, sometimes, it might not be effective at all. In general, its historic from the times before we had recover after nodes and such. This change removes the 10s scheduling, simplifies RoutingService, and adds explicit reroute when a node is added to the system. It also adds unit tests to RoutingService. closes #11776

elastic#11776 has simplified our rerouting logic by removing a scheduled background reroute in favor of an explicit reroute during the cluster state processing of a node join (the only place where we didn't do it explicitly). While that change is conceptually good, it change semantics a bit in two ways: - shard listing actions underpinning shard allocation do not have access to that new node yet (causing errors during shard allocation see elastic#11923 - the very first cluster state published to a node already has shard assignments to it. This surfaced other issues we are working to fix separately This commit changes the reroute to be done post processing the initial join cluster state to side step these issues while we work on a longer term solution.

kimchy added >enhancement v2.0.0-beta1 review labels Jun 19, 2015

bleskes reviewed Jun 19, 2015
View reviewed changes

s1monw reviewed Jun 19, 2015
View reviewed changes

kimchy force-pushed the remove_reroute_schedule branch from 560116d to 3a85874 Compare June 23, 2015 11:13

kimchy force-pushed the remove_reroute_schedule branch from 3a85874 to 435ce7f Compare June 23, 2015 15:21

kimchy added the v1.7.0 label Jun 23, 2015

kimchy merged commit 435ce7f into elastic:master Jun 23, 2015

kevinkluge removed the review label Jun 23, 2015

clintongormley added the :Cluster label Jun 23, 2015

bleskes mentioned this pull request Jun 30, 2015

Reroute after node join is processed #11960

Closed

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove scheduled routing #11776

Remove scheduled routing #11776

kimchy commented Jun 19, 2015

bleskes Jun 19, 2015

kimchy Jun 19, 2015

bleskes commented Jun 19, 2015

s1monw Jun 19, 2015

kimchy commented Jun 19, 2015

bleskes commented Jun 22, 2015

s1monw commented Jun 23, 2015

Remove scheduled routing #11776

Remove scheduled routing #11776

Conversation

kimchy commented Jun 19, 2015

bleskes Jun 19, 2015

Choose a reason for hiding this comment

kimchy Jun 19, 2015

Choose a reason for hiding this comment

bleskes commented Jun 19, 2015

s1monw Jun 19, 2015

Choose a reason for hiding this comment

kimchy commented Jun 19, 2015

bleskes commented Jun 22, 2015

s1monw commented Jun 23, 2015