Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard allocation to take into account free disk space #3480

Closed
synhershko opened this issue Aug 10, 2013 · 18 comments · Fixed by #3637
Closed

Shard allocation to take into account free disk space #3480

synhershko opened this issue Aug 10, 2013 · 18 comments · Fixed by #3637

Comments

@synhershko
Copy link
Contributor

Simon says:

I can imagine some sort of disk space allocation decider that can restrict a node from
allocating any further shards given the used / free disk space and / or move shards
away given a certain limit etc. We can also make allocation decision based on the size
of the shards or move shards around once they fill up and we see that certain
shards are much bigger than others

More at https://groups.google.com/forum/#!topic/elasticsearch/p-et4UxvcyU

@ghost ghost assigned s1monw Aug 11, 2013
@s1monw
Copy link
Contributor

s1monw commented Aug 11, 2013

Thanks for opening this issue! I think we will get to this pretty soon ie. next week or so

@ghost ghost assigned dakrone Aug 12, 2013
@synhershko
Copy link
Contributor Author

eta?

@s1monw
Copy link
Contributor

s1monw commented Aug 28, 2013

@dakrone what's the status of this...

@dakrone
Copy link
Member

dakrone commented Aug 28, 2013

@synhershko I'm currently working on developing this, it's trickier than a usual AllocationDecider (which prevents allocation) because fetching the disk usages and shard sizes is overhead that we don't want to incur for every operation, so it needs to be cached for a time and refreshed at certain intervals.

@dakrone dakrone closed this as completed in 7d52d58 Sep 9, 2013
dakrone added a commit that referenced this issue Sep 9, 2013
This commit adds two main pieces, the first is a ClusterInfoService
that provides a service running on the master nodes that fetches the
total/free bytes for each data node in the cluster as well as the
sizes of all shards in the cluster. This information is gathered by
default every 30 seconds, and can be changed dynamically by setting
the `cluster.info.update.interval` setting. This ClusterInfoService
can hopefully be used in the future to weight nodes for allocation
based on their disk usage, if desired.

The second main piece is the DiskThresholdDecider, which can disallow
a shard from being allocated to a node, or from remaining on the node
depending on configuration parameters. There are three main
configuration parameters for the DiskThresholdDecider:

`cluster.routing.allocation.disk.threshold_enabled` controls whether
the decider is enabled. It defaults to false (disabled). Note that the
decider is also disabled for clusters with only a single data node.

`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 0.70, meaning ES will not
allocate new shards to nodes once they have more than 70% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.

`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 0.85, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 85%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.

Closes #3480
@synhershko
Copy link
Contributor Author

hey guys - any chance this piece of decider will participate in other operations working with the FS, like optimization for example? what we are seeing is large indices being optimized and occasionally servers running very low on disk space because of that.

Maybe if an index doesn't have enough room to optimize a rebalancing should kick in?

@s1monw
Copy link
Contributor

s1monw commented Sep 30, 2013

Hmm that is something that is pretty rare condition though. I wonder if we really should have something like this in the core system or if we should just ask for a customer allocation decider since deciders can trigger a rebalance on such a condition via canRemain - maybe this should go in a different issue? Can you open one?

@synhershko
Copy link
Contributor Author

#3807

It isn't this rare if you run a large data shop with replicas and all, with data constantly going in. It doesn't happen everyday, but it did happen to us.

@s1monw
Copy link
Contributor

s1monw commented Sep 30, 2013

IMO optimize should be rare in most cases unless you have time based indices etc. ;)

@synhershko
Copy link
Contributor Author

We do use rolling indexes...

@dakrone
Copy link
Member

dakrone commented Sep 30, 2013

Are you seeing a consistent amount of disk used for the optimize? If you know in advance about how much room you'll need for the optimize, you could set the high watermark for the disk threshold and ES should relocate shards if the disk usage passes that watermark.

@synhershko
Copy link
Contributor Author

Since ES can have this kind of info and do the maths for me, I don't see why I need to plan for it in advance. Plus I don't think ES can relocate a shard which is in the middle of it being optimized, and setting too high high-watermark is something we wouldn't want to do as well.

@synhershko
Copy link
Contributor Author

@dakrone something that just occurred to us - how would the free space decider play along with ES's defaut to try and have the same number of shards on each node?

In our scenario (and I'm assuming this is quite common) we have many data servers each with different HD capacities, ranging from ~120GB to ~1000GB. I'm pretty sure if ES will try to balance based on both criteria something will go very wrong.

Did you take that into account? or should we try breaking this with some nasty tests?

@dakrone
Copy link
Member

dakrone commented Nov 15, 2013

@synhershko since the decider is part of the balancing process, the allocator will attempt to find the "best" weights that still satisfy all of the deciders, so it will try to balance the shards evenly, but will still allow uneven allocation in the event that the disk limit has been reached on a particular machine or set of machines.

Did you take that into account? or should we try breaking this with some nasty tests?

It should already be taken into account, but nasty tests are always appreciated! :)

@synhershko
Copy link
Contributor Author

Will try to get to it soon, then

@synhershko
Copy link
Contributor Author

@dakrone just to let you know we now use this feature in our highly uneven cluster and so far all looks good. It seems like ES still tries to even the number of shards on each node but the free-space decider seems to do a good job. Thanks!

@dakrone
Copy link
Member

dakrone commented Dec 19, 2013

Awesome! I'm glad to hear it's working well for you! :D

@s1monw
Copy link
Contributor

s1monw commented Dec 19, 2013

@synhershko this is great that you come back to us. Do you have trouble with the balancing in anyway as you mention that it tries to balance?

@synhershko
Copy link
Contributor Author

@s1monw no it all seems to be fine. What I meant is when there's balancing in action, which hardly ever happens now as far as I can tell, it will try to get to an end result where disk allocation limits are respected AND there is more or less the same amount of shards on each node. Which I think makes sense.

Before upgrading I set out to write some integration which will test our cluster configuration with the internal moving pieces. I was able to recreate a similar scenario in the test (different node sizes, different index sizes) and everything worked (shards were allocated, no node was over-allocated etc). I just couldn't find edge-cases to test there - all seems to have already been tested in @dakrone 's tests so we dropped this effort and decided to take the pill.

We upgraded from a variant of 0.90.0 (custom compiled with some mods) to vanilla 0.90.7. The upgrade took a while (different Lucene versions) but went smooth, and once the disk-aware decider was enabled it took the cluster some while to stabilize, but ever since we did that all seems to work fine.

I'm leaving the company this week so will probably stop monitor that cluster, but as I said so far this looks very good and stable.

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
This commit adds two main pieces, the first is a ClusterInfoService
that provides a service running on the master nodes that fetches the
total/free bytes for each data node in the cluster as well as the
sizes of all shards in the cluster. This information is gathered by
default every 30 seconds, and can be changed dynamically by setting
the `cluster.info.update.interval` setting. This ClusterInfoService
can hopefully be used in the future to weight nodes for allocation
based on their disk usage, if desired.

The second main piece is the DiskThresholdDecider, which can disallow
a shard from being allocated to a node, or from remaining on the node
depending on configuration parameters. There are three main
configuration parameters for the DiskThresholdDecider:

`cluster.routing.allocation.disk.threshold_enabled` controls whether
the decider is enabled. It defaults to false (disabled). Note that the
decider is also disabled for clusters with only a single data node.

`cluster.routing.allocation.disk.watermark.low` controls the low
watermark for disk usage. It defaults to 0.70, meaning ES will not
allocate new shards to nodes once they have more than 70% disk
used. It can also be set to an absolute byte value (like 500mb) to
prevent ES from allocating shards if less than the configured amount
of space is available.

`cluster.routing.allocation.disk.watermark.high` controls the high
watermark. It defaults to 0.85, meaning ES will attempt to relocate
shards to another node if the node disk usage rises above 85%. It can
also be set to an absolute byte value (similar to the low watermark)
to relocate shards once less than the configured amount of space is
available on the node.

Closes elastic#3480
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants