Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve primary shards balancing/reduce primary shard write overhead #15919

Open
hlcianfagna opened this issue Apr 25, 2024 · 1 comment
Open

Comments

@hlcianfagna
Copy link
Contributor

hlcianfagna commented Apr 25, 2024

Problem Statement

With the current shard allocation and balancing mechanisms, it is possible to have a situation where, for instance, given a 2-nodes cluster and a table with 4 shards and 1 replica, 3 primaries and 1 replica go to one node and 3 replicas and 1 primary to the other, instead of 2 primaries and 2 replicas on each.
In the large majority of cases this is not a problem, but in very busy systems, ingestion degradation of up to 25% can be seen as nodes with more primary shards will get fully utilized on the CPU while nodes with less primary but mostly replica shards aren't.

The main reason that primary shards aren't evenly balanced relates to that all of the current balancing logic (and related settings like cluster.routing.allocation.balance.index/shard) does not distinguish between a primary and a replica shard. Additionally, the available settings to control the cluster/index.total_shards_per_node will also not distinguish between a primary and a replica, but using these settings for shard balancing would be a kind of a workaround anyhow as the intention for these settings is more of a protection than a control mechanism and also can lead to a situation where no shards can be allocated at all.

Related to elastic/elasticsearch#41543, elastic/elasticsearch#17213, #14594.

Possible Solutions

  1. Reduce primary write load so it will be almost the same as a replica write
  2. Introduce primary-only related balancing, e.g. backport related changes of OpenSearch (as they do segment-based replication, their primaries will have more load in general)
  3. Improve balancing logic to take write load into account similar like Elasticsearch did: Improve shard balancing elastic/elasticsearch#91603 although this seems to target a bit of a different problem of hot nodes in general and not related to primary vs. replica shard distribution.

Considered Alternatives

  • Disable automatic balancing of primary shards by setting cluster.routing.rebalance.enable to replicas.
  • Use ALTER TABLE ... REROUTE commands to redistribute primary shards.
  • Re-enable automatic balancing by resetting cluster.routing.rebalance.enable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant