Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk allocation not periodically checking for whether thresholds/watermarks are exceeded #8146

Closed
ppf2 opened this issue Oct 17, 2014 · 3 comments · Fixed by #8270
Closed
Assignees

Comments

@ppf2
Copy link
Member

ppf2 commented Oct 17, 2014

Repo video below.

In short:

You will see that there is an index with shards allocated to 2 nodes. Cluster setting has watermark.low set to 80% and high set to 90%. Disk usage for node_local is at 76%, node_vm is at 83%.

You can skip the video from 0:30 to 2:38, that is when I incrementally add large files to the disk on node_vm so that it uses > 90% of disk. After 2:38, You will see that the disk usage has increased to 90%+ but no relocation of shards occurred from node_vm to node_local. But if I wait 30+s after the disk hits 90%+ usage, and then set the same cluster.routing.allocation.disk.* settings again, the relocation immediately happens. This suggests that we are not currently periodically checking for whether the disk is filling up beyond the high mark without manual intervention by either:

  • Setting the same cluster.routing.allocation.disk.* settings again, or
  • Using the reroute command with no actions, eg. POST /_cluster/reroute

https://drive.google.com/file/d/0BzqaicoBfqMfdXdSWnI2OFp0eXM/view?usp=sharing

Workaround is to periodically call reroute command (or reset the same settings).

@s1monw
Copy link
Contributor

s1monw commented Oct 21, 2014

I agree this is not how it should be... I think it would make sense to issue reroute commands from the ClusterInfoService if needed. I guess we can take the actual settings that we are using into account ie. we should issue reroutes at least once we crossed the high watermark on a any node... @dakrone WDYT

@dakrone
Copy link
Member

dakrone commented Oct 21, 2014

@s1monw I think it should be implemented differently. I think the ClusterInfoService should be entirely separated from the internals of shard allocation, its job should only be to gather information and make it available as needed. This makes it a bit easier to use for other purposes in the future where we may need to gather information (and not operation on it at all) at specific intervals.

I think a better way to implement it would be to allow the ClusterInfoService to register listeners that are called when new information is gathered, then we could register a standalone handler that was responsible for knowing about the high watermark and kicking off the reroute when necessary.

What do you think about this?

@s1monw
Copy link
Contributor

s1monw commented Oct 21, 2014

yeah I think that is the right way to do it abstraction wise...

dakrone added a commit to dakrone/elasticsearch that referenced this issue Oct 31, 2014
This adds a Listener interface to the ClusterInfoService, this is used
by the DiskThresholdDecider, which adds a listener to check for nodes
passing the high watermark. If a node is past the high watermark an
empty reroute is issued so shards can be reallocated if desired.

A reroute will only be issued once every
`cluster.routing.allocation.disk.reroute_interval`, which is "60s" by
default.

Refactors InternalClusterInfoService to delegate the nodes stats and
indices stats gathering into separate methods so they have be overriden
by extending classes. Each stat gathering method returns a
CountDownLatch that can be used to wait until processing for that part
is successful before calling the listeners.

Fixes elastic#8146
dakrone added a commit that referenced this issue Oct 31, 2014
This adds a Listener interface to the ClusterInfoService, this is used
by the DiskThresholdDecider, which adds a listener to check for nodes
passing the high watermark. If a node is past the high watermark an
empty reroute is issued so shards can be reallocated if desired.

A reroute will only be issued once every
`cluster.routing.allocation.disk.reroute_interval`, which is "60s" by
default.

Refactors InternalClusterInfoService to delegate the nodes stats and
indices stats gathering into separate methods so they have be overriden
by extending classes. Each stat gathering method returns a
CountDownLatch that can be used to wait until processing for that part
is successful before calling the listeners.

Fixes #8146
dakrone added a commit that referenced this issue Nov 13, 2014
This adds a Listener interface to the ClusterInfoService, this is used
by the DiskThresholdDecider, which adds a listener to check for nodes
passing the high watermark. If a node is past the high watermark an
empty reroute is issued so shards can be reallocated if desired.

A reroute will only be issued once every
`cluster.routing.allocation.disk.reroute_interval`, which is "60s" by
default.

Refactors InternalClusterInfoService to delegate the nodes stats and
indices stats gathering into separate methods so they have be overriden
by extending classes. Each stat gathering method returns a
CountDownLatch that can be used to wait until processing for that part
is successful before calling the listeners.

Fixes #8146
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
This adds a Listener interface to the ClusterInfoService, this is used
by the DiskThresholdDecider, which adds a listener to check for nodes
passing the high watermark. If a node is past the high watermark an
empty reroute is issued so shards can be reallocated if desired.

A reroute will only be issued once every
`cluster.routing.allocation.disk.reroute_interval`, which is "60s" by
default.

Refactors InternalClusterInfoService to delegate the nodes stats and
indices stats gathering into separate methods so they have be overriden
by extending classes. Each stat gathering method returns a
CountDownLatch that can be used to wait until processing for that part
is successful before calling the listeners.

Fixes elastic#8146
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants