Updates causing hotspots in cluster when multiple primary shards for an index exist on a single node #8149

ppf2 · 2014-10-17T22:03:36Z

Some of our end users are seeing "hotspots" in the cluster when multiple primary shards for an index are allocated to a single node (unbalanced primary shards). This becomes an issue when there are a lot of updates where the documents are loaded onto the primary shards first. The http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-update-settings.html#_balanced_shards weight settings do not always help and require re-tuning when new nodes or new shards are added. It will be nice to provide a setting similar to http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-allocation.html#_total_shards_per_node , but only applies to primary shards, like total_primary_shards_per_node which allows the end user to define the maximum # of primary shards an index can have on a single node. This will ensure that multiple primary shards are not allocated to the same node.

clintongormley · 2014-10-18T09:55:36Z

@ppf2 The total_shards.. options scare me a bit because they are a hard limit, and can make it impossible to assign shards.

It seems like the cluster.routing.allocation.balance.primary setting should do what you are after, but perhaps the value needs some tuning. Could you give more detail about why this particular option doesn't work well enough?

I'd prefer to keep the number of options small, as the more you add, the more difficult it is to reason about the outcome.

clintongormley · 2014-10-20T18:18:32Z

@ppf2 so it looks like balance.primary tries to balance ALL primary shards, rather than just the primary shards within each index. I've also seen problems where I add a new node, and it receives mostly replicas.

So there is no doubt that this heuristic needs improvement. But the total* setting is a blunt weapon, and, like most weapons, can harm the person who wields it.

Part of the problem is that the rebalancing algorithm is a bit of a black box. I've opened #8170 to see if we can improve that situation.

ppf2 · 2014-10-20T23:11:20Z

Thanks @clintongormley. Agree that total* setting if not used properly can cause primary shards not to be fully allocated. Would be nice to provide some way (not necessarily the total* setting) in the future to balance primary shards per index per node, something that's more automatic without having to manually reroute/move the shards around.

s1monw · 2014-10-21T11:21:21Z

There are several reasons why I think we need to solve this problem differently. The problem here is really not the balance function or anything in that code it's the fact that updates put extra pressure on the primary shard and therefore don't scale very well. I am against a setting like this because it can bring your cluster in a very bad state where it can potentially not allocate a primary at all, or we need to move a shard from one node to another to make space for a primary which is the wrong thing to do.

Being a primary is only a boolean flag really and flipping it in order to make another healthy replica a primary should be simple and IMO is the way to go. It might be part of the balancing algorithm but not via an allocation decider or anything along those lines. The problem here is really the update feature and not the balancing algorithm.

clintongormley · 2014-11-06T18:58:13Z

See #8369

s1monw · 2015-02-20T09:43:59Z

removing the discuss tag since we are all on the same page here and we have #8369

clintongormley · 2015-11-21T19:55:16Z

Closing in favour of #8369

clintongormley added the discuss label Oct 18, 2014

ppf2 changed the title ~~Add a total_primary_shards_per_node setting~~ Updates causing hotspots in cluster when multiple primary shards for an index exist on a single node Oct 25, 2014

clintongormley mentioned this issue Nov 6, 2014

Distributed update API #8369

Closed

s1monw removed the discuss label Feb 20, 2015

clintongormley closed this as completed Nov 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates causing hotspots in cluster when multiple primary shards for an index exist on a single node #8149

Updates causing hotspots in cluster when multiple primary shards for an index exist on a single node #8149

ppf2 commented Oct 17, 2014

clintongormley commented Oct 18, 2014

clintongormley commented Oct 20, 2014

ppf2 commented Oct 20, 2014

s1monw commented Oct 21, 2014

clintongormley commented Nov 6, 2014

s1monw commented Feb 20, 2015

clintongormley commented Nov 21, 2015

Updates causing hotspots in cluster when multiple primary shards for an index exist on a single node #8149

Updates causing hotspots in cluster when multiple primary shards for an index exist on a single node #8149

Comments

ppf2 commented Oct 17, 2014

clintongormley commented Oct 18, 2014

clintongormley commented Oct 20, 2014

ppf2 commented Oct 20, 2014

s1monw commented Oct 21, 2014

clintongormley commented Nov 6, 2014

s1monw commented Feb 20, 2015

clintongormley commented Nov 21, 2015