Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16277: AbstractStickyAssignor - Sort owned TopicPartitions by partition when reassigning #15416

Merged
merged 4 commits into from Feb 26, 2024

Conversation

credpath-seek
Copy link
Contributor

@credpath-seek credpath-seek commented Feb 22, 2024

Context

Treats KAFKA-16277 - CooperativeStickyAssignor does not spread topics evenly among consumer group
https://issues.apache.org/jira/browse/KAFKA-16277

@ableegoldman :

I suspect the assignor could be making a better effort. Presumably what is happening is that during the phase where it attempts to re-assign previously-owned partitions back to their former owner, we make a pass over a sorted list of previously-owned partitions that is grouped by topic. The assignor will then assign partitions from this list one-by-one to its previous owner until it hits the expected total number of partitions. So in the scenario you describe, it's basically looping over (t1p0, t1p1, t1p2, t1p3...t1pN, t2p0, t2p1, t2p2...t2pN) and assigning the first N partitions to the first consumer, which would be everything from topic 1, then just dumping the remaining partitions – all of which belong to topic 2 – onto the new consumer.
The fix should be fairly simple – we just need to group this sorted list by partition, rather than by topic (ie t1p0, t2p0, t1p1, t2p1...t1pN, t2pN).

Test strategy:

  • Update tests with new assignments. Looking at the new assignments, the topics are now balanced between consumers, ie when there are multiple topics, one consumer no longer holds all partitions for topic1, they are spread across multiple consumers.
  • Create new test that verifies consumers have an equal number of partitions from each topic as more consumers are added and removed

Changes

  • AbstractStickyAssignor - Sort owned TopicPartitions by partition when reassigning
  • Update tests

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Contributor

@ableegoldman ableegoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! One more small suggestion: it would be great if you could write a short test case that covers the exact scenario you described in the ticket. That way we can be completely sure the fix works as intended, and hopefully prevent any future issues that break this case

credpath-seek and others added 2 commits February 23, 2024 20:12
…als/AbstractStickyAssignor.java

Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
@credpath-seek
Copy link
Contributor Author

Thanks for the review @ableegoldman - I have incorporated your suggested changes

@credpath-seek
Copy link
Contributor Author

I was going to try to fix the pipeline issue:

[Checks API] No suitable checks publisher found.

But then I saw it had been attempted and reverted 😭 #15292

@credpath-seek credpath-seek changed the title KAFKA-16277: AbstractStickyAssignor - Sort owned TopicPartitions by p… KAFKA-16277: AbstractStickyAssignor - Sort owned TopicPartitions by partition when reassigning Feb 26, 2024
@ableegoldman ableegoldman merged commit 027fad4 into apache:trunk Feb 26, 2024
1 check failed
@ableegoldman
Copy link
Contributor

Test failures are unrelated, merged to trunk.

Thanks for the fix!

@credpath-seek credpath-seek deleted the patch-1 branch February 27, 2024 22:10
wcarlson5 pushed a commit to wcarlson5/kafka that referenced this pull request Mar 28, 2024
…artition when reassigning (apache#15416)

Treats KAFKA-16277 - CooperativeStickyAssignor does not spread topics evenly among consumer group

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
clolov pushed a commit to clolov/kafka that referenced this pull request Apr 5, 2024
…artition when reassigning (apache#15416)

Treats KAFKA-16277 - CooperativeStickyAssignor does not spread topics evenly among consumer group

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
ableegoldman pushed a commit that referenced this pull request May 15, 2024
…artition when reassigning (#15416)

Treats KAFKA-16277 - CooperativeStickyAssignor does not spread topics evenly among consumer group

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
@ableegoldman
Copy link
Contributor

FYI I cherrypicked this back to 3.7 while cherrypicking back another sticky assignor fix. Should be in the 3.7.1 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants