Fix messaging about delayed allocation #12515

dakrone · 2015-07-28T16:32:59Z

Previously when RoutingService checks for delayed shards, it can see
shards that are delayed, but are past their delay time so the logged
output looks like:

delaying allocation for [0] unassigned shards, next check in [0s]

This change allows shards that have passed their delay to be counted
correctly for the logging. Additionally, it places a 5 second minimum
delay between scheduled reroutes to try to minimize the number of
reroutes run.

This also adds a test that creates a large number of unassigned delayed
shards and ensures that they are rerouted even if a single reroute does
not allocated all shards (due to a low concurrent_recoveries setting).

Resolves #12456

(This PR is against 1.7 and will be forward-ported)

Previously when RoutingService checks for delayed shards, it can see shards that are delayed, but are past their delay time so the logged output looks like: ``` delaying allocation for [0] unassigned shards, next check in [0s] ``` This change allows shards that have passed their delay to be counted correctly for the logging. Additionally, it places a 5 second minimum delay between scheduled reroutes to try to minimize the number of reroutes run. This also adds a test that creates a large number of unassigned delayed shards and ensures that they are rerouted even if a single reroute does not allocated all shards (due to a low concurrent_recoveries setting). Resolves elastic#12456

martijnvg · 2015-07-28T19:01:13Z

src/main/java/org/elasticsearch/cluster/routing/RoutingService.java

-                TimeValue nextDelay = TimeValue.timeValueMillis(UnassignedInfo.findNextDelayedAllocationIn(settings, event.state()));
+                long nextDelayMillis = UnassignedInfo.findNextDelayedAllocationIn(settings, event.state());
+                // Schedule the delay at least 5 seconds in the future
+                nextDelayMillis = Math.max(5000, nextDelayMillis);


perhaps the minimum delay should be a setting?

Sure, I will make it a non-dynamic configurable setting.

martijnvg · 2015-07-28T19:01:38Z

LGTM. Left a non blocking question.

nik9000 · 2015-07-28T19:28:39Z

src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

@@ -190,7 +190,7 @@ public static int getNumberOfDelayedUnassigned(Settings settings, ClusterState s
            if (shard.primary() == false) {
                IndexMetaData indexMetaData = state.metaData().index(shard.getIndex());
                long delay = shard.unassignedInfo().getDelayAllocationExpirationIn(settings, indexMetaData.getSettings());
-                if (delay > 0) {
+                if (delay != 0) {


So negative delays count delays? I figured they should count as not-delayed.

Count as delays I mean.

Ah! I get it! 0 means they are not delayed at all but a negative number means they've already expired. They aren't delayed any more. OK - I retract my question. Maybe a comment?

Sure I'll add a comment

nik9000 · 2015-07-28T22:04:57Z

LGTM

dakrone · 2015-07-29T15:16:42Z

Closing this, handled a different way in #12532

In order to avoid extra reroutes, `RoutingService` should avoid scheduling a reroute of any shards where the delay is negative. To make sure that we don't encounter a race condition between the GatewayAllocator thinking a shard is delayed and RoutingService thinking it is not, the GatewayAllocator will update the RoutingService with the last time it checked in order to use a consistent "view" of the delay. Resolves elastic#12456 Relates to elastic#12515 and elastic#12456

dakrone added v2.0.0-beta1 review v1.7.1 labels Jul 28, 2015

martijnvg reviewed Jul 28, 2015
View reviewed changes

Make minimum reroute delay configurable

ebbdef2

nik9000 reviewed Jul 28, 2015
View reviewed changes

Add clarifying comment

fa58026

clintongormley added v1.7.2 >bug and removed v1.7.1 labels Jul 29, 2015

dakrone mentioned this pull request Jul 29, 2015

Avoid extra reroutes of delayed shards in RoutingService #12532

Merged

dakrone closed this Jul 29, 2015

dakrone mentioned this pull request Aug 5, 2015

Avoid extra reroutes of delayed shards in RoutingService #12678

Merged

clintongormley removed the review label Aug 7, 2015

dakrone deleted the delay-reroute-enhancement branch May 13, 2016 16:44

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix messaging about delayed allocation #12515

Fix messaging about delayed allocation #12515

dakrone commented Jul 28, 2015

martijnvg Jul 28, 2015

dakrone Jul 28, 2015

martijnvg commented Jul 28, 2015

nik9000 Jul 28, 2015

nik9000 Jul 28, 2015

nik9000 Jul 28, 2015

dakrone Jul 28, 2015

nik9000 commented Jul 28, 2015

dakrone commented Jul 29, 2015

Fix messaging about delayed allocation #12515

Fix messaging about delayed allocation #12515

Conversation

dakrone commented Jul 28, 2015

martijnvg Jul 28, 2015

Choose a reason for hiding this comment

dakrone Jul 28, 2015

Choose a reason for hiding this comment

martijnvg commented Jul 28, 2015

nik9000 Jul 28, 2015

Choose a reason for hiding this comment

nik9000 Jul 28, 2015

Choose a reason for hiding this comment

nik9000 Jul 28, 2015

Choose a reason for hiding this comment

dakrone Jul 28, 2015

Choose a reason for hiding this comment

nik9000 commented Jul 28, 2015

dakrone commented Jul 29, 2015