New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix messaging about delayed allocation #12515
Conversation
Previously when RoutingService checks for delayed shards, it can see shards that are delayed, but are past their delay time so the logged output looks like: ``` delaying allocation for [0] unassigned shards, next check in [0s] ``` This change allows shards that have passed their delay to be counted correctly for the logging. Additionally, it places a 5 second minimum delay between scheduled reroutes to try to minimize the number of reroutes run. This also adds a test that creates a large number of unassigned delayed shards and ensures that they are rerouted even if a single reroute does not allocated all shards (due to a low concurrent_recoveries setting). Resolves elastic#12456
TimeValue nextDelay = TimeValue.timeValueMillis(UnassignedInfo.findNextDelayedAllocationIn(settings, event.state())); | ||
long nextDelayMillis = UnassignedInfo.findNextDelayedAllocationIn(settings, event.state()); | ||
// Schedule the delay at least 5 seconds in the future | ||
nextDelayMillis = Math.max(5000, nextDelayMillis); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps the minimum delay should be a setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will make it a non-dynamic configurable setting.
LGTM. Left a non blocking question. |
@@ -190,7 +190,7 @@ public static int getNumberOfDelayedUnassigned(Settings settings, ClusterState s | |||
if (shard.primary() == false) { | |||
IndexMetaData indexMetaData = state.metaData().index(shard.getIndex()); | |||
long delay = shard.unassignedInfo().getDelayAllocationExpirationIn(settings, indexMetaData.getSettings()); | |||
if (delay > 0) { | |||
if (delay != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So negative delays count delays? I figured they should count as not-delayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count as delays I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I get it! 0 means they are not delayed at all but a negative number means they've already expired. They aren't delayed any more. OK - I retract my question. Maybe a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I'll add a comment
LGTM |
Closing this, handled a different way in #12532 |
In order to avoid extra reroutes, `RoutingService` should avoid scheduling a reroute of any shards where the delay is negative. To make sure that we don't encounter a race condition between the GatewayAllocator thinking a shard is delayed and RoutingService thinking it is not, the GatewayAllocator will update the RoutingService with the last time it checked in order to use a consistent "view" of the delay. Resolves elastic#12456 Relates to elastic#12515 and elastic#12456
In order to avoid extra reroutes, `RoutingService` should avoid scheduling a reroute of any shards where the delay is negative. To make sure that we don't encounter a race condition between the GatewayAllocator thinking a shard is delayed and RoutingService thinking it is not, the GatewayAllocator will update the RoutingService with the last time it checked in order to use a consistent "view" of the delay. Resolves elastic#12456 Relates to elastic#12515 and elastic#12456
Previously when RoutingService checks for delayed shards, it can see
shards that are delayed, but are past their delay time so the logged
output looks like:
This change allows shards that have passed their delay to be counted
correctly for the logging. Additionally, it places a 5 second minimum
delay between scheduled reroutes to try to minimize the number of
reroutes run.
This also adds a test that creates a large number of unassigned delayed
shards and ensures that they are rerouted even if a single reroute does
not allocated all shards (due to a low concurrent_recoveries setting).
Resolves #12456
(This PR is against 1.7 and will be forward-ported)