Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16565: IncrementalAssignmentConsumerEventHandler throws error when attempting to remove a partition that isn't assigned #15737

Conversation

kirktrue
Copy link
Contributor

Checking that the TopicPartition is in assignment before attempting to remove it.

Also added some logging and refactoring.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

…hen attempting to remove a partition that isn't assigned

Checking that the TopicPartition is in assignment before attempting to remove it.

Also added some logging and refactoring.
self.assignment.remove(tp)
revoked.append(tp)
else:
logger.warn("Could not remove topic partition %s from assignment as it was not previously assigned to %s" % (tp, node.account.hostname))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we understand why this situation is happening? Is it related maybe to the mismatch assignment failure we've seen elsewhere in the tests? My point is just to make sure we're not hiding the real failure with this change. I wouldn't expect that the consumer would ever receive a partition to revoke if it was not previously assigned right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right @lianetm, this fix could result in a sweeping the problem under the rug, so to speak. I'll change the logic so that this case still results an error, but with more information so we can debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lianetm—I changed the logging to an assert that provides useful information for troubleshooting:

tp = _create_partition_from_dict(topic_partition)
assert tp in self.assignment, \
    "Topic partition %s cannot be revoked from %s as it was not previously assigned to that consumer" % \
    (tp, node.account.hostname)
self.assignment.remove(tp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! better I believe. Do we have a Jira to investigate the failure leading to this? it's concerning (and even more if the case is that is happening with the new protocol only??)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lianetm—I will file a JIra on this in the next day or two. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed KAFKA-16623, FYI.

@kirktrue
Copy link
Contributor Author

@lucasbru—Can you review this change to the consumer system test harness? Thanks!

@lucasbru
Copy link
Member

Do I understand it correctly that there is no functional change here, just logging?

Comment on lines +161 to +163
assert tp in self.assignment, \
"Topic partition %s cannot be revoked from %s as it was not previously assigned to that consumer" % \
(tp, node.account.hostname)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucasbru—this is the main functional change: ensure that an attempt to remove a partition from the local state verifies that it was previously assigned.

Copy link
Member

@lucasbru lucasbru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@lucasbru lucasbru merged commit 21faf87 into apache:trunk Apr 26, 2024
1 check failed
@kirktrue kirktrue deleted the KAFKA-16565-check-topic-partition-before-removing branch April 30, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants