KAFKA-19048: Minimal movement replica balancing algorithm for reassignment #19858
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Kafka clusters typically require rebalancing of topic replicas after horizontal scaling to evenly distribute the load across new and existing brokers. The current rebalancing approach does not consider the existing replica distribution, often resulting in excessive and unnecessary replica movements. These unnecessary movements increase rebalance duration, consume significant bandwidth and CPU resources, and potentially disrupt ongoing production and consumption operations. Thus, a replica rebalancing strategy that minimizes movements while achieving an even distribution of replicas is necessary.
Goals
The proposed approach prioritizes the following objectives:
Minimal Movement: Minimize the number of replica relocations during rebalancing.
Replica Balancing: Ensure that replicas are evenly distributed across brokers.
Anti-Affinity Support: Support rack-aware allocation when enabled.
Leader Balancing: Distribute leader replicas evenly across brokers.
ISR Order Optimization: Optimize adjacency relationships to prevent failover traffic concentration in case of broker failures.
Leader Stability: Where possible, retain existing leader assignments to reduce leadership churn, provided this does not compromise the first five objectives.