New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disk.watermark.high relocates all shards creating a relocation storm #8538
Comments
Okay, this looks like this is because the We currently take relocations in to the node into account, now we need to take relocations out of the node into account, and only in the |
@dakrone I think we only need to change the default for |
@s1monw I don't follow how that will help? I think it will still see all shards as failing the |
oh well I misunderstood the code - it only takes the incoming into account :/ |
…away from a node in order to determine if a shard can remain. By taking this into account we can prevent that we move too many shards away than is necessary. Closes elastic#8538 Closes elastic#8659
…away from a node in order to determine if a shard can remain. By taking this into account we can prevent that we move too many shards away than is necessary. Closes elastic#8538 Closes elastic#8659
…away from a node in order to determine if a shard can remain. By taking this into account we can prevent that we move too many shards away than is necessary. Closes elastic#8538 Closes elastic#8659
I have a relatively large cluster running ES 1.3.5 and we recently started to get low on disk space. The cluster has roughly 110 TB usable space of which about 86TB is used. We have 93 indices of which 90 are rotating daily indices and the other 3 are permanent. With this setup we end up with shards that range in size from a few MB to over 160GB and we're finding the shard count based allocation strategy results in some nodes starting to run out of space earlier than expected which trips the cluster.routing.allocation.disk.watermark.high setting.
It appears when this happens that "all" of the shards from that node are relocated. This then puts more pressure on the other nodes and at some point another node will trip cluster.routing.allocation.disk.watermark.high and relocate all of its shards too. This then goes on and on and on and never stops until we intercede and manually cancel the allocations. At times we've seen more than 200 relocations in progress at the same time.
Is the relocation of "all" shards really the intended behavior? In looking at the source code it appears to do a reroute in this scenario but this seems like the last thing you would want to do. You're almost guaranteed that if one node is tripping that setting then others are going to be close as well. In our cluster we have room on many of the other nodes and if it selectively moved a shard or two everything would be fine but moving all of them is really problematic.
It also seems that when all these relocations occur that cluster.routing.allocation.disk.watermark.low doesn't prevent the overallocation of other nodes. I believe this is occurring because of the number of relocations that are started at once so the node gets asked multiple times to accept shards within a short time period and all requests succeed because no pending relocations are considered in respect to disk utilization. This pretty much guarantees that node will eventually trip cluster.routing.allocation.disk.watermark.high and continue the cycle even though there are other nodes in the cluster with hundreds of GB free.
We're in the process of adding additional capacity to our cluster so we're not at risk of bumping into these limits but otherwise this behavior seems quite problematic.
The text was updated successfully, but these errors were encountered: