New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional Delayed Allocation on Node leave #11712
Conversation
95ad53a
to
da7e279
Compare
} | ||
if (scheduled) { | ||
logger.info("delaying unassigned shard allocation, shards: {}", sb); | ||
} | ||
} else { | ||
FutureUtils.cancel(scheduledRoutingTableFuture); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to cancel all the futures in delayedShardsToReroute ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will refactor it where its not relevant
Did a review cycle. I like how things come together. One concern I had was the implementation in RoutingService where we maintain a queue of pending reroutes per unassigned (delayed) shard. I think it will be simpler to just use the single future we already have and set it every time to the next expected change moment (i.e., the minimum delay of all unassigned shards). On a setting change we can do a reroute all the time (which we might do already). Am I missing something? |
one more little thing - we need do some docs work as well, it's an important change. I can help if need be. |
I pushed another round, mainly simplifying the code, adding more unit tests, and addressing comments. @bleskes once we agree on this as the way forward, I will add docs |
registeredNextDelaySetting = nextDelaySetting; | ||
TimeValue nextDelay = TimeValue.timeValueMillis(UnassignedInfo.findNextDelayedAllocationIn(settings, event.state())); | ||
logger.info("delaying allocation for [{}] unassigned shards, next check in [{}]", UnassignedInfo.getNumberOfDelayedUnassigned(settings, event.state()), nextDelay); | ||
registeredNextDelayFuture = threadPool.schedule(nextDelay, ThreadPool.Names.SAME, new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use AbstractRunnable
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
b816349
to
93214d7
Compare
@s1monw applied another round of changes |
} | ||
|
||
/** | ||
* The delay in millis when delaying assigning the shard need to expire in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got confused by this and had to go to the code :) - I think this will be clearer "returns the time in millisecond until this unassigned shard can be reassigned."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change
LGTM +1 |
LGTM makes sense @kimchy |
Allow to set delayed allocation timeout on unassigned shards when a node leaves the cluster. This allows to wait for the node to come back for a specific period in order to try and assign the shards back to it to reduce shards movements and unnecessary relocations. The setting is an index level setting under `index.unassigned.node_left.delayed_timeout` and defaults to 0 (== no delayed allocation). We might want to change the default, but lets do it in a different change to come up with the best value for it. The setting can be updated dynamically. When shards are delayed, a log message with "info" level will notify how many shards are being delayed. An implementation note, we really only need to care about delaying allocation on unassigned replica shards. If the primary shard is unassigned, anyhow we are going to wait for a copy of it, so really the only case to delay allocation is for replicas. close elastic#11712
0851073
to
792a545
Compare
Allow to set delayed allocation timeout on unassigned shards when a node leaves the cluster. This allows to wait for the node to come back for a specific period in order to try and assign the shards back to it to reduce shards movements and unnecessary relocations. The setting is an index level setting under `index.unassigned.node_left.delayed_timeout` and defaults to 0 (== no delayed allocation). We might want to change the default, but lets do it in a different change to come up with the best value for it. The setting can be updated dynamically. When shards are delayed, a log message with "info" level will notify how many shards are being delayed. An implementation note, we really only need to care about delaying allocation on unassigned replica shards. If the primary shard is unassigned, anyhow we are going to wait for a copy of it, so really the only case to delay allocation is for replicas. close #11712
pushed to master and 1.x, @clintongormley I forgot to add the docs, where do you think it makes sense to document this? |
@kimchy i'd say in the Index Shard Allocation page: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html plus a note on the cluster health page |
I think it’s also good to mention on the (rolling) upgrade docs: docs/reference/setup/upgrade.asciidoc
|
Allow to set delayed allocation timeout on unassigned shards when a node leaves the cluster. This allows to wait for the node to come back for a specific period in order to try and assign the shards back to it to reduce shards movements and unnecessary relocations.
The setting is an index level setting under
index.unassigned.node_left.delayed_timeout
and defaults to 0 (== no delayed allocation). We might want to change the default, but lets do it in a different change to come up with the best value for it. The setting can be updated dynamically.When shards are delayed, a log message with "info" level will notify which shards are being delayed and for how long.
An implementation note, we really only need to care about delaying allocation on unassigned replica shards. If the primary shard is uniassigned, anyhow we are going to wait for a copy of it, so really the only case to delay allocation is for replicas.