Disk allocation not periodically checking for whether thresholds/watermarks are exceeded #8146

ppf2 · 2014-10-17T18:47:11Z

Repo video below.

In short:

You will see that there is an index with shards allocated to 2 nodes. Cluster setting has watermark.low set to 80% and high set to 90%. Disk usage for node_local is at 76%, node_vm is at 83%.

You can skip the video from 0:30 to 2:38, that is when I incrementally add large files to the disk on node_vm so that it uses > 90% of disk. After 2:38, You will see that the disk usage has increased to 90%+ but no relocation of shards occurred from node_vm to node_local. But if I wait 30+s after the disk hits 90%+ usage, and then set the same cluster.routing.allocation.disk.* settings again, the relocation immediately happens. This suggests that we are not currently periodically checking for whether the disk is filling up beyond the high mark without manual intervention by either:

Setting the same cluster.routing.allocation.disk.* settings again, or
Using the reroute command with no actions, eg. POST /_cluster/reroute

https://drive.google.com/file/d/0BzqaicoBfqMfdXdSWnI2OFp0eXM/view?usp=sharing

Workaround is to periodically call reroute command (or reset the same settings).

The text was updated successfully, but these errors were encountered:

s1monw · 2014-10-21T11:55:06Z

I agree this is not how it should be... I think it would make sense to issue reroute commands from the ClusterInfoService if needed. I guess we can take the actual settings that we are using into account ie. we should issue reroutes at least once we crossed the high watermark on a any node... @dakrone WDYT

dakrone · 2014-10-21T14:12:43Z

@s1monw I think it should be implemented differently. I think the ClusterInfoService should be entirely separated from the internals of shard allocation, its job should only be to gather information and make it available as needed. This makes it a bit easier to use for other purposes in the future where we may need to gather information (and not operation on it at all) at specific intervals.

I think a better way to implement it would be to allow the ClusterInfoService to register listeners that are called when new information is gathered, then we could register a standalone handler that was responsible for knowing about the high watermark and kicking off the reroute when necessary.

What do you think about this?

s1monw · 2014-10-21T18:35:45Z

yeah I think that is the right way to do it abstraction wise...

This adds a Listener interface to the ClusterInfoService, this is used by the DiskThresholdDecider, which adds a listener to check for nodes passing the high watermark. If a node is past the high watermark an empty reroute is issued so shards can be reallocated if desired. A reroute will only be issued once every `cluster.routing.allocation.disk.reroute_interval`, which is "60s" by default. Refactors InternalClusterInfoService to delegate the nodes stats and indices stats gathering into separate methods so they have be overriden by extending classes. Each stat gathering method returns a CountDownLatch that can be used to wait until processing for that part is successful before calling the listeners. Fixes elastic#8146

This adds a Listener interface to the ClusterInfoService, this is used by the DiskThresholdDecider, which adds a listener to check for nodes passing the high watermark. If a node is past the high watermark an empty reroute is issued so shards can be reallocated if desired. A reroute will only be issued once every `cluster.routing.allocation.disk.reroute_interval`, which is "60s" by default. Refactors InternalClusterInfoService to delegate the nodes stats and indices stats gathering into separate methods so they have be overriden by extending classes. Each stat gathering method returns a CountDownLatch that can be used to wait until processing for that part is successful before calling the listeners. Fixes #8146

This adds a Listener interface to the ClusterInfoService, this is used by the DiskThresholdDecider, which adds a listener to check for nodes passing the high watermark. If a node is past the high watermark an empty reroute is issued so shards can be reallocated if desired. A reroute will only be issued once every `cluster.routing.allocation.disk.reroute_interval`, which is "60s" by default. Refactors InternalClusterInfoService to delegate the nodes stats and indices stats gathering into separate methods so they have be overriden by extending classes. Each stat gathering method returns a CountDownLatch that can be used to wait until processing for that part is successful before calling the listeners. Fixes elastic#8146

ppf2 assigned dakrone Oct 17, 2014

dakrone added the >enhancement label Oct 20, 2014

clintongormley mentioned this issue Oct 20, 2014

Make allocation deciders more debuggable #8170

Closed

dakrone mentioned this issue Oct 29, 2014

Reroute shards automatically when high disk watermark is exceeded #8270

Merged

dakrone closed this as completed in #8270 Oct 31, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk allocation not periodically checking for whether thresholds/watermarks are exceeded #8146

Disk allocation not periodically checking for whether thresholds/watermarks are exceeded #8146

ppf2 commented Oct 17, 2014

s1monw commented Oct 21, 2014

dakrone commented Oct 21, 2014

s1monw commented Oct 21, 2014

Disk allocation not periodically checking for whether thresholds/watermarks are exceeded #8146

Disk allocation not periodically checking for whether thresholds/watermarks are exceeded #8146

Comments

ppf2 commented Oct 17, 2014

s1monw commented Oct 21, 2014

dakrone commented Oct 21, 2014

s1monw commented Oct 21, 2014