Allow for 'grace period expiration' before shard reallocation? #3569

diranged · 2013-08-25T23:13:48Z

It would be really useful to allow for a 'grace period' between when ES notices that a particular node has gone down, and shard-reallocation begins. There are times when we might want to do a quick restart of an ES node ... or take one down for a full reboot ... and we don't want to do a re-allocation of shards because thats a very IO-intensive operation. In our case, we also use the Zookeeper plugin, and a shard-reallocation is triggered by a short communication break between the ES nodes and Zookeeper.

s1monw · 2013-08-26T08:22:43Z

I think you can simply use disable allocation via the cluster settings API cluster.routing.allocation.disable_allocation If you have scheduled downtimes that is the way to go. Does that resolve your issue, if so can you please close the issue?

diranged · 2013-08-26T17:25:17Z

Not really -- there are cases where we may see a little blip in the network that trigger the shard reallocation. I think its a pretty worth-while feature to say "wait 5m for things to settle before re-allocating shards". This is also useful for cases where you may be bringing up more than 1 additional node in terms of capacity.

Plasma · 2014-03-28T01:00:20Z

Was curious about this too; I'd like to see ES give me a grace period of something like 5 minutes before deciding to rebalance shards based on a node going offline.

Most likely that node is coming back up (eg, due to maintenance reboot of the service or host) and would be preferred the cluster remain in a degraded state to hope that the node comes back online, but then consider it dead after a certain timeout and rebalance as needed.

clintongormley · 2014-09-05T09:18:45Z

If a shard disappears briefly then returns, it needs to recover from the primary shard in case anything has changed. Currently, that is likely to mean copying lots of segments over from the primary (as the primary and replica will probably have diverged). Making this fast will not be possible until #6069 is implemented.

We can revisit this issue on #6069 is in.

jlintz · 2014-09-27T13:44:33Z

Would love to see a configurable timeout for this as well. While we can disable_allocation for planned events, we've had cases where a network blip will cause indices to begin reshuffling

cfeio · 2015-02-17T21:00:34Z

+1, We would really love to see a configurable timeout also, if not being able to disable it completely.

In an unplanned outage we would prefer to be able to configure not have replicas reassigned and shuffled, as this leads to a lot of data moving. As far as I know I am not aware of such a setting.

shyem · 2015-02-19T15:38:48Z

+1, this would be really helpful for us as we have a massive amount of data in our ELK cluster and since we already have 1 replica available still there is no point of duplicating that one more time.

jpountz · 2015-08-26T14:19:48Z

Fixed via #11712

ghost assigned dakrone Aug 27, 2013

clintongormley added the discuss label Aug 8, 2014

clintongormley added test and removed test labels Sep 5, 2014

jpountz closed this as completed Aug 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for 'grace period expiration' before shard reallocation? #3569

Allow for 'grace period expiration' before shard reallocation? #3569

diranged commented Aug 25, 2013

s1monw commented Aug 26, 2013

diranged commented Aug 26, 2013

Plasma commented Mar 28, 2014

clintongormley commented Sep 5, 2014

jlintz commented Sep 27, 2014

cfeio commented Feb 17, 2015

shyem commented Feb 19, 2015

jpountz commented Aug 26, 2015

Allow for 'grace period expiration' before shard reallocation? #3569

Allow for 'grace period expiration' before shard reallocation? #3569

Comments

diranged commented Aug 25, 2013

s1monw commented Aug 26, 2013

diranged commented Aug 26, 2013

Plasma commented Mar 28, 2014

clintongormley commented Sep 5, 2014

jlintz commented Sep 27, 2014

cfeio commented Feb 17, 2015

shyem commented Feb 19, 2015

jpountz commented Aug 26, 2015