Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk.watermark.high relocates all shards creating a relocation storm #8538

Closed
kstaken opened this issue Nov 18, 2014 · 4 comments
Closed

disk.watermark.high relocates all shards creating a relocation storm #8538

kstaken opened this issue Nov 18, 2014 · 4 comments

Comments

@kstaken
Copy link

kstaken commented Nov 18, 2014

I have a relatively large cluster running ES 1.3.5 and we recently started to get low on disk space. The cluster has roughly 110 TB usable space of which about 86TB is used. We have 93 indices of which 90 are rotating daily indices and the other 3 are permanent. With this setup we end up with shards that range in size from a few MB to over 160GB and we're finding the shard count based allocation strategy results in some nodes starting to run out of space earlier than expected which trips the cluster.routing.allocation.disk.watermark.high setting.

It appears when this happens that "all" of the shards from that node are relocated. This then puts more pressure on the other nodes and at some point another node will trip cluster.routing.allocation.disk.watermark.high and relocate all of its shards too. This then goes on and on and on and never stops until we intercede and manually cancel the allocations. At times we've seen more than 200 relocations in progress at the same time.

Is the relocation of "all" shards really the intended behavior? In looking at the source code it appears to do a reroute in this scenario but this seems like the last thing you would want to do. You're almost guaranteed that if one node is tripping that setting then others are going to be close as well. In our cluster we have room on many of the other nodes and if it selectively moved a shard or two everything would be fine but moving all of them is really problematic.

It also seems that when all these relocations occur that cluster.routing.allocation.disk.watermark.low doesn't prevent the overallocation of other nodes. I believe this is occurring because of the number of relocations that are started at once so the node gets asked multiple times to accept shards within a short time period and all requests succeed because no pending relocations are considered in respect to disk utilization. This pretty much guarantees that node will eventually trip cluster.routing.allocation.disk.watermark.high and continue the cycle even though there are other nodes in the cluster with hundreds of GB free.

We're in the process of adding additional capacity to our cluster so we're not at risk of bumping into these limits but otherwise this behavior seems quite problematic.

@dakrone
Copy link
Member

dakrone commented Nov 19, 2014

Okay, this looks like this is because the DiskThresholdDecider's canRemain method needs to take currently relocating shards into account, subtracting them from the limit when deciding whether to relocate more shards.

We currently take relocations in to the node into account, now we need to take relocations out of the node into account, and only in the canRemain method.

@s1monw
Copy link
Contributor

s1monw commented Nov 23, 2014

@dakrone I think we only need to change the default for cluster.routing.allocation.disk.include_relocations here which is true. we can also think about adding an enum here where you can specify where you wanna take it into account like
cluster.routing.allocation.disk.include_relocations = never | allocation | moves the naming is hard here I guess.. :)

@dakrone
Copy link
Member

dakrone commented Nov 24, 2014

@s1monw I don't follow how that will help? I think it will still see all shards as failing the canRemain() decision and relocating them away from the node that goes over the high watermark

@s1monw
Copy link
Contributor

s1monw commented Nov 24, 2014

oh well I misunderstood the code - it only takes the incoming into account :/

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Nov 26, 2014
…away from a node in order to determine if a shard can remain.

By taking this into account we can prevent that we move too many shards away than is necessary.

Closes elastic#8538
Closes elastic#8659
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Nov 26, 2014
…away from a node in order to determine if a shard can remain.

By taking this into account we can prevent that we move too many shards away than is necessary.

Closes elastic#8538
Closes elastic#8659
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…away from a node in order to determine if a shard can remain.

By taking this into account we can prevent that we move too many shards away than is necessary.

Closes elastic#8538
Closes elastic#8659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants