Skip to content

KAFKA-19048: Minimal movement replica balancing algorithm for reassignment #19858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

pjl1070048431
Copy link

Motivation
Kafka clusters typically require rebalancing of topic replicas after horizontal scaling to evenly distribute the load across new and existing brokers. The current rebalancing approach does not consider the existing replica distribution, often resulting in excessive and unnecessary replica movements. These unnecessary movements increase rebalance duration, consume significant bandwidth and CPU resources, and potentially disrupt ongoing production and consumption operations. Thus, a replica rebalancing strategy that minimizes movements while achieving an even distribution of replicas is necessary.

Goals
The proposed approach prioritizes the following objectives:

Minimal Movement: Minimize the number of replica relocations during rebalancing.
Replica Balancing: Ensure that replicas are evenly distributed across brokers.
Anti-Affinity Support: Support rack-aware allocation when enabled.
Leader Balancing: Distribute leader replicas evenly across brokers.
ISR Order Optimization
: Optimize adjacency relationships to prevent failover traffic concentration in case of broker failures.
Leader Stability: Where possible, retain existing leader assignments to reduce leadership churn, provided this does not compromise the first five objectives.

The new replica rebalancing strategy aims to achieve the following objectives:

1. Minimal Movement: Minimize the number of replica relocations during rebalancing.
2. Replica Balancing: Ensure that replicas are evenly distributed across brokers.
3. Anti-Affinity Support: Support rack-aware allocation when enabled.
4. Leader Balancing: Distribute leader replicas evenly across brokers.
5. ISR Order Optimization: Optimize adjacency relationships to prevent failover traffic concentration in case of broker failures.
The new replica rebalancing strategy aims to achieve the following objectives:

1. Minimal Movement: Minimize the number of replica relocations during rebalancing.
2. Replica Balancing: Ensure that replicas are evenly distributed across brokers.
3. Anti-Affinity Support: Support rack-aware allocation when enabled.
4. Leader Balancing: Distribute leader replicas evenly across brokers.
5. ISR Order Optimization: Optimize adjacency relationships to prevent failover traffic concentration in case of broker failures.
The new replica rebalancing strategy aims to achieve the following objectives:

1. Minimal Movement: Minimize the number of replica relocations during rebalancing.
2. Replica Balancing: Ensure that replicas are evenly distributed across brokers.
3. Anti-Affinity Support: Support rack-aware allocation when enabled.
4. Leader Balancing: Distribute leader replicas evenly across brokers.
5. ISR Order Optimization: Optimize adjacency relationships to prevent failover traffic concentration in case of broker failures.
6. Leader Stability: Keep the original partition leader unchanged as much as possible to minimize leader transitions. This objective has a lower priority than the first five.
…plica_balancer

# Conflicts:
#	tools/src/main/java/org/apache/kafka/tools/reassign/ReassignPartitionsCommand.java
@github-actions github-actions bot added the triage PRs from the community label May 30, 2025
@pjl1070048431 pjl1070048431 changed the title KAFKA-19048: KIP-1151: Minimal movement replica balancing algorithm for reassignment KAFKA-19048: Minimal movement replica balancing algorithm for reassignment May 30, 2025
Copy link

github-actions bot commented Jun 7, 2025

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant