disk.watermark.high relocates all shards creating a relocation storm #8538

kstaken · 2014-11-18T18:23:32Z

I have a relatively large cluster running ES 1.3.5 and we recently started to get low on disk space. The cluster has roughly 110 TB usable space of which about 86TB is used. We have 93 indices of which 90 are rotating daily indices and the other 3 are permanent. With this setup we end up with shards that range in size from a few MB to over 160GB and we're finding the shard count based allocation strategy results in some nodes starting to run out of space earlier than expected which trips the cluster.routing.allocation.disk.watermark.high setting.

It appears when this happens that "all" of the shards from that node are relocated. This then puts more pressure on the other nodes and at some point another node will trip cluster.routing.allocation.disk.watermark.high and relocate all of its shards too. This then goes on and on and on and never stops until we intercede and manually cancel the allocations. At times we've seen more than 200 relocations in progress at the same time.

Is the relocation of "all" shards really the intended behavior? In looking at the source code it appears to do a reroute in this scenario but this seems like the last thing you would want to do. You're almost guaranteed that if one node is tripping that setting then others are going to be close as well. In our cluster we have room on many of the other nodes and if it selectively moved a shard or two everything would be fine but moving all of them is really problematic.

It also seems that when all these relocations occur that cluster.routing.allocation.disk.watermark.low doesn't prevent the overallocation of other nodes. I believe this is occurring because of the number of relocations that are started at once so the node gets asked multiple times to accept shards within a short time period and all requests succeed because no pending relocations are considered in respect to disk utilization. This pretty much guarantees that node will eventually trip cluster.routing.allocation.disk.watermark.high and continue the cycle even though there are other nodes in the cluster with hundreds of GB free.

We're in the process of adding additional capacity to our cluster so we're not at risk of bumping into these limits but otherwise this behavior seems quite problematic.

dakrone · 2014-11-19T17:01:37Z

Okay, this looks like this is because the DiskThresholdDecider's canRemain method needs to take currently relocating shards into account, subtracting them from the limit when deciding whether to relocate more shards.

We currently take relocations in to the node into account, now we need to take relocations out of the node into account, and only in the canRemain method.

s1monw · 2014-11-23T12:54:14Z

@dakrone I think we only need to change the default for cluster.routing.allocation.disk.include_relocations here which is true. we can also think about adding an enum here where you can specify where you wanna take it into account like
cluster.routing.allocation.disk.include_relocations = never | allocation | moves the naming is hard here I guess.. :)

dakrone · 2014-11-24T11:04:19Z

@s1monw I don't follow how that will help? I think it will still see all shards as failing the canRemain() decision and relocating them away from the node that goes over the high watermark

s1monw · 2014-11-24T15:47:39Z

oh well I misunderstood the code - it only takes the incoming into account :/

…away from a node in order to determine if a shard can remain. By taking this into account we can prevent that we move too many shards away than is necessary. Closes elastic#8538 Closes elastic#8659

dakrone assigned martijnvg Nov 19, 2014

s1monw added >bug v1.4.1 v2.0.0-beta1 v1.5.0 blocker PITA labels Nov 21, 2014

martijnvg mentioned this issue Nov 25, 2014

DiskThresholdDecider#remain(...) should take shards relocating away into account #8659

Merged

martijnvg closed this as completed in 099b1a7 Nov 26, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk.watermark.high relocates all shards creating a relocation storm #8538

disk.watermark.high relocates all shards creating a relocation storm #8538

kstaken commented Nov 18, 2014

dakrone commented Nov 19, 2014

s1monw commented Nov 23, 2014

dakrone commented Nov 24, 2014

s1monw commented Nov 24, 2014

disk.watermark.high relocates all shards creating a relocation storm #8538

disk.watermark.high relocates all shards creating a relocation storm #8538

Comments

kstaken commented Nov 18, 2014

dakrone commented Nov 19, 2014

s1monw commented Nov 23, 2014

dakrone commented Nov 24, 2014

s1monw commented Nov 24, 2014