Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Delayed Allocation on Node leave #11712

Merged
merged 1 commit into from Jun 18, 2015

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Jun 17, 2015

Allow to set delayed allocation timeout on unassigned shards when a node leaves the cluster. This allows to wait for the node to come back for a specific period in order to try and assign the shards back to it to reduce shards movements and unnecessary relocations.

The setting is an index level setting under index.unassigned.node_left.delayed_timeout and defaults to 0 (== no delayed allocation). We might want to change the default, but lets do it in a different change to come up with the best value for it. The setting can be updated dynamically.

When shards are delayed, a log message with "info" level will notify which shards are being delayed and for how long.

An implementation note, we really only need to care about delaying allocation on unassigned replica shards. If the primary shard is uniassigned, anyhow we are going to wait for a copy of it, so really the only case to delay allocation is for replicas.

}
if (scheduled) {
logger.info("delaying unassigned shard allocation, shards: {}", sb);
}
} else {
FutureUtils.cancel(scheduledRoutingTableFuture);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to cancel all the futures in delayedShardsToReroute ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will refactor it where its not relevant

@bleskes
Copy link
Contributor

bleskes commented Jun 17, 2015

Did a review cycle. I like how things come together. One concern I had was the implementation in RoutingService where we maintain a queue of pending reroutes per unassigned (delayed) shard. I think it will be simpler to just use the single future we already have and set it every time to the next expected change moment (i.e., the minimum delay of all unassigned shards). On a setting change we can do a reroute all the time (which we might do already). Am I missing something?

@bleskes
Copy link
Contributor

bleskes commented Jun 17, 2015

one more little thing - we need do some docs work as well, it's an important change. I can help if need be.

@kimchy
Copy link
Member Author

kimchy commented Jun 17, 2015

I pushed another round, mainly simplifying the code, adding more unit tests, and addressing comments. @bleskes once we agree on this as the way forward, I will add docs

registeredNextDelaySetting = nextDelaySetting;
TimeValue nextDelay = TimeValue.timeValueMillis(UnassignedInfo.findNextDelayedAllocationIn(settings, event.state()));
logger.info("delaying allocation for [{}] unassigned shards, next check in [{}]", UnassignedInfo.getNumberOfDelayedUnassigned(settings, event.state()), nextDelay);
registeredNextDelayFuture = threadPool.schedule(nextDelay, ThreadPool.Names.SAME, new Runnable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use AbstractRunnable here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@kimchy
Copy link
Member Author

kimchy commented Jun 18, 2015

@s1monw applied another round of changes

}

/**
* The delay in millis when delaying assigning the shard need to expire in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got confused by this and had to go to the code :) - I think this will be clearer "returns the time in millisecond until this unassigned shard can be reassigned."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change

@kimchy
Copy link
Member Author

kimchy commented Jun 18, 2015

@s1monw @bleskes pushed another set of changes

@bleskes
Copy link
Contributor

bleskes commented Jun 18, 2015

LGTM +1

@s1monw
Copy link
Contributor

s1monw commented Jun 18, 2015

LGTM makes sense @kimchy

Allow to set delayed allocation timeout on unassigned shards when a node leaves the cluster. This allows to wait for the node to come back for a specific period in order to try and assign the shards back to it to reduce shards movements and unnecessary relocations.

The setting is an index level setting under `index.unassigned.node_left.delayed_timeout` and defaults to 0 (== no delayed allocation). We might want to change the default, but lets do it in a different change to come up with the best value for it. The setting can be updated dynamically.

When shards are delayed, a log message with "info" level will notify how many shards are being delayed.

An implementation note, we really only need to care about delaying allocation on unassigned replica shards. If the primary shard is unassigned, anyhow we are going to wait for a copy of it, so really the only case to delay allocation is for replicas.

close elastic#11712
@kimchy kimchy merged commit 792a545 into elastic:master Jun 18, 2015
@kevinkluge kevinkluge removed the review label Jun 18, 2015
kimchy added a commit that referenced this pull request Jun 18, 2015
Allow to set delayed allocation timeout on unassigned shards when a node leaves the cluster. This allows to wait for the node to come back for a specific period in order to try and assign the shards back to it to reduce shards movements and unnecessary relocations.

The setting is an index level setting under `index.unassigned.node_left.delayed_timeout` and defaults to 0 (== no delayed allocation). We might want to change the default, but lets do it in a different change to come up with the best value for it. The setting can be updated dynamically.

When shards are delayed, a log message with "info" level will notify how many shards are being delayed.

An implementation note, we really only need to care about delaying allocation on unassigned replica shards. If the primary shard is unassigned, anyhow we are going to wait for a copy of it, so really the only case to delay allocation is for replicas.

close #11712
@kimchy
Copy link
Member Author

kimchy commented Jun 18, 2015

pushed to master and 1.x, @clintongormley I forgot to add the docs, where do you think it makes sense to document this?

@kimchy kimchy deleted the delayed_allocation_2 branch June 18, 2015 15:34
@clintongormley
Copy link

@kimchy i'd say in the Index Shard Allocation page: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html

plus a note on the cluster health page

@bleskes
Copy link
Contributor

bleskes commented Jun 18, 2015

I think it’s also good to mention on the (rolling) upgrade docs: docs/reference/setup/upgrade.asciidoc

On 18 Jun 2015, at 21:47, Clinton Gormley notifications@github.com wrote:

@kimchy i'd say in the Index Shard Allocation page: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html

plus a note on the cluster health page


Reply to this email directly or view it on GitHub.

szroland pushed a commit to szroland/elasticsearch that referenced this pull request Jun 30, 2015
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >feature release highlight v1.7.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants