New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shard allocation to take into account free disk space #3480
Comments
Thanks for opening this issue! I think we will get to this pretty soon ie. next week or so |
eta? |
@dakrone what's the status of this... |
@synhershko I'm currently working on developing this, it's trickier than a usual AllocationDecider (which prevents allocation) because fetching the disk usages and shard sizes is overhead that we don't want to incur for every operation, so it needs to be cached for a time and refreshed at certain intervals. |
This commit adds two main pieces, the first is a ClusterInfoService that provides a service running on the master nodes that fetches the total/free bytes for each data node in the cluster as well as the sizes of all shards in the cluster. This information is gathered by default every 30 seconds, and can be changed dynamically by setting the `cluster.info.update.interval` setting. This ClusterInfoService can hopefully be used in the future to weight nodes for allocation based on their disk usage, if desired. The second main piece is the DiskThresholdDecider, which can disallow a shard from being allocated to a node, or from remaining on the node depending on configuration parameters. There are three main configuration parameters for the DiskThresholdDecider: `cluster.routing.allocation.disk.threshold_enabled` controls whether the decider is enabled. It defaults to false (disabled). Note that the decider is also disabled for clusters with only a single data node. `cluster.routing.allocation.disk.watermark.low` controls the low watermark for disk usage. It defaults to 0.70, meaning ES will not allocate new shards to nodes once they have more than 70% disk used. It can also be set to an absolute byte value (like 500mb) to prevent ES from allocating shards if less than the configured amount of space is available. `cluster.routing.allocation.disk.watermark.high` controls the high watermark. It defaults to 0.85, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 85%. It can also be set to an absolute byte value (similar to the low watermark) to relocate shards once less than the configured amount of space is available on the node. Closes #3480
hey guys - any chance this piece of decider will participate in other operations working with the FS, like optimization for example? what we are seeing is large indices being optimized and occasionally servers running very low on disk space because of that. Maybe if an index doesn't have enough room to optimize a rebalancing should kick in? |
Hmm that is something that is pretty rare condition though. I wonder if we really should have something like this in the core system or if we should just ask for a customer allocation decider since deciders can trigger a rebalance on such a condition via |
It isn't this rare if you run a large data shop with replicas and all, with data constantly going in. It doesn't happen everyday, but it did happen to us. |
IMO optimize should be rare in most cases unless you have time based indices etc. ;) |
We do use rolling indexes... |
Are you seeing a consistent amount of disk used for the optimize? If you know in advance about how much room you'll need for the optimize, you could set the high watermark for the disk threshold and ES should relocate shards if the disk usage passes that watermark. |
Since ES can have this kind of info and do the maths for me, I don't see why I need to plan for it in advance. Plus I don't think ES can relocate a shard which is in the middle of it being optimized, and setting too high high-watermark is something we wouldn't want to do as well. |
@dakrone something that just occurred to us - how would the free space decider play along with ES's defaut to try and have the same number of shards on each node? In our scenario (and I'm assuming this is quite common) we have many data servers each with different HD capacities, ranging from ~120GB to ~1000GB. I'm pretty sure if ES will try to balance based on both criteria something will go very wrong. Did you take that into account? or should we try breaking this with some nasty tests? |
@synhershko since the decider is part of the balancing process, the allocator will attempt to find the "best" weights that still satisfy all of the deciders, so it will try to balance the shards evenly, but will still allow uneven allocation in the event that the disk limit has been reached on a particular machine or set of machines.
It should already be taken into account, but nasty tests are always appreciated! :) |
Will try to get to it soon, then |
@dakrone just to let you know we now use this feature in our highly uneven cluster and so far all looks good. It seems like ES still tries to even the number of shards on each node but the free-space decider seems to do a good job. Thanks! |
Awesome! I'm glad to hear it's working well for you! :D |
@synhershko this is great that you come back to us. Do you have trouble with the balancing in anyway as you mention that it tries to balance? |
@s1monw no it all seems to be fine. What I meant is when there's balancing in action, which hardly ever happens now as far as I can tell, it will try to get to an end result where disk allocation limits are respected AND there is more or less the same amount of shards on each node. Which I think makes sense. Before upgrading I set out to write some integration which will test our cluster configuration with the internal moving pieces. I was able to recreate a similar scenario in the test (different node sizes, different index sizes) and everything worked (shards were allocated, no node was over-allocated etc). I just couldn't find edge-cases to test there - all seems to have already been tested in @dakrone 's tests so we dropped this effort and decided to take the pill. We upgraded from a variant of 0.90.0 (custom compiled with some mods) to vanilla 0.90.7. The upgrade took a while (different Lucene versions) but went smooth, and once the disk-aware decider was enabled it took the cluster some while to stabilize, but ever since we did that all seems to work fine. I'm leaving the company this week so will probably stop monitor that cluster, but as I said so far this looks very good and stable. |
This commit adds two main pieces, the first is a ClusterInfoService that provides a service running on the master nodes that fetches the total/free bytes for each data node in the cluster as well as the sizes of all shards in the cluster. This information is gathered by default every 30 seconds, and can be changed dynamically by setting the `cluster.info.update.interval` setting. This ClusterInfoService can hopefully be used in the future to weight nodes for allocation based on their disk usage, if desired. The second main piece is the DiskThresholdDecider, which can disallow a shard from being allocated to a node, or from remaining on the node depending on configuration parameters. There are three main configuration parameters for the DiskThresholdDecider: `cluster.routing.allocation.disk.threshold_enabled` controls whether the decider is enabled. It defaults to false (disabled). Note that the decider is also disabled for clusters with only a single data node. `cluster.routing.allocation.disk.watermark.low` controls the low watermark for disk usage. It defaults to 0.70, meaning ES will not allocate new shards to nodes once they have more than 70% disk used. It can also be set to an absolute byte value (like 500mb) to prevent ES from allocating shards if less than the configured amount of space is available. `cluster.routing.allocation.disk.watermark.high` controls the high watermark. It defaults to 0.85, meaning ES will attempt to relocate shards to another node if the node disk usage rises above 85%. It can also be set to an absolute byte value (similar to the low watermark) to relocate shards once less than the configured amount of space is available on the node. Closes elastic#3480
Simon says:
More at https://groups.google.com/forum/#!topic/elasticsearch/p-et4UxvcyU
The text was updated successfully, but these errors were encountered: