Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix messaging about delayed allocation #12515

Closed
wants to merge 3 commits into from

Conversation

dakrone
Copy link
Member

@dakrone dakrone commented Jul 28, 2015

Previously when RoutingService checks for delayed shards, it can see
shards that are delayed, but are past their delay time so the logged
output looks like:

delaying allocation for [0] unassigned shards, next check in [0s]

This change allows shards that have passed their delay to be counted
correctly for the logging. Additionally, it places a 5 second minimum
delay between scheduled reroutes to try to minimize the number of
reroutes run.

This also adds a test that creates a large number of unassigned delayed
shards and ensures that they are rerouted even if a single reroute does
not allocated all shards (due to a low concurrent_recoveries setting).

Resolves #12456

(This PR is against 1.7 and will be forward-ported)

Previously when RoutingService checks for delayed shards, it can see
shards that are delayed, but are past their delay time so the logged
output looks like:

```
delaying allocation for [0] unassigned shards, next check in [0s]
```

This change allows shards that have passed their delay to be counted
correctly for the logging. Additionally, it places a 5 second minimum
delay between scheduled reroutes to try to minimize the number of
reroutes run.

This also adds a test that creates a large number of unassigned delayed
shards and ensures that they are rerouted even if a single reroute does
not allocated all shards (due to a low concurrent_recoveries setting).

Resolves elastic#12456
TimeValue nextDelay = TimeValue.timeValueMillis(UnassignedInfo.findNextDelayedAllocationIn(settings, event.state()));
long nextDelayMillis = UnassignedInfo.findNextDelayedAllocationIn(settings, event.state());
// Schedule the delay at least 5 seconds in the future
nextDelayMillis = Math.max(5000, nextDelayMillis);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the minimum delay should be a setting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will make it a non-dynamic configurable setting.

@martijnvg
Copy link
Member

LGTM. Left a non blocking question.

@@ -190,7 +190,7 @@ public static int getNumberOfDelayedUnassigned(Settings settings, ClusterState s
if (shard.primary() == false) {
IndexMetaData indexMetaData = state.metaData().index(shard.getIndex());
long delay = shard.unassignedInfo().getDelayAllocationExpirationIn(settings, indexMetaData.getSettings());
if (delay > 0) {
if (delay != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So negative delays count delays? I figured they should count as not-delayed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count as delays I mean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I get it! 0 means they are not delayed at all but a negative number means they've already expired. They aren't delayed any more. OK - I retract my question. Maybe a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll add a comment

@nik9000
Copy link
Member

nik9000 commented Jul 28, 2015

LGTM

@dakrone
Copy link
Member Author

dakrone commented Jul 29, 2015

Closing this, handled a different way in #12532

@dakrone dakrone closed this Jul 29, 2015
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 5, 2015
In order to avoid extra reroutes, `RoutingService` should avoid
scheduling a reroute of any shards where the delay is negative. To make
sure that we don't encounter a race condition between the
GatewayAllocator thinking a shard is delayed and RoutingService thinking
it is not, the GatewayAllocator will update the RoutingService with the
last time it checked in order to use a consistent "view" of the delay.

Resolves elastic#12456
Relates to elastic#12515 and elastic#12456
dakrone added a commit to dakrone/elasticsearch that referenced this pull request Aug 5, 2015
In order to avoid extra reroutes, `RoutingService` should avoid
scheduling a reroute of any shards where the delay is negative. To make
sure that we don't encounter a race condition between the
GatewayAllocator thinking a shard is delayed and RoutingService thinking
it is not, the GatewayAllocator will update the RoutingService with the
last time it checked in order to use a consistent "view" of the delay.

Resolves elastic#12456
Relates to elastic#12515 and elastic#12456
@dakrone dakrone deleted the delay-reroute-enhancement branch May 13, 2016 16:44
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v1.7.2 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants