-
Notifications
You must be signed in to change notification settings - Fork 13.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-16566: Fix consumer static membership system test with new protocol #15738
Merged
lucasbru
merged 3 commits into
apache:trunk
from
lianetm:fix-consumer-sys-test-static-member
Apr 19, 2024
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -348,26 +348,45 @@ def test_fencing_static_consumer(self, num_conflict_consumers, fencing_stage, me | |
consumer.start() | ||
self.await_members(consumer, len(consumer.nodes)) | ||
|
||
num_rebalances = consumer.num_rebalances() | ||
conflict_consumer.start() | ||
self.await_members(conflict_consumer, num_conflict_consumers) | ||
self.await_members(consumer, len(consumer.nodes) - num_conflict_consumers) | ||
if group_protocol == consumer_group.classic_group_protocol: | ||
# Classic protocol: conflicting members should join, and the intial ones with conflicting instance id should fail. | ||
self.await_members(conflict_consumer, num_conflict_consumers) | ||
self.await_members(consumer, len(consumer.nodes) - num_conflict_consumers) | ||
|
||
wait_until(lambda: len(consumer.dead_nodes()) == num_conflict_consumers, | ||
wait_until(lambda: len(consumer.dead_nodes()) == num_conflict_consumers, | ||
timeout_sec=10, | ||
err_msg="Timed out waiting for the fenced consumers to stop") | ||
else: | ||
# Consumer protocol: Existing members should remain active and new conflicting ones should not be able to join. | ||
self.await_consumed_messages(consumer) | ||
assert num_rebalances == consumer.num_rebalances(), "Static consumers attempt to join with instance id in use should not cause a rebalance" | ||
assert len(consumer.joined_nodes()) == len(consumer.nodes) | ||
assert len(conflict_consumer.joined_nodes()) == 0 | ||
|
||
# Stop existing nodes, so conflicting ones should be able to join. | ||
consumer.stop_all() | ||
wait_until(lambda: len(consumer.dead_nodes()) == len(consumer.nodes), | ||
timeout_sec=self.session_timeout_sec+5, | ||
err_msg="Timed out waiting for the consumer to shutdown") | ||
conflict_consumer.start() | ||
self.await_members(conflict_consumer, num_conflict_consumers) | ||
|
||
|
||
else: | ||
consumer.start() | ||
conflict_consumer.start() | ||
|
||
wait_until(lambda: len(consumer.joined_nodes()) + len(conflict_consumer.joined_nodes()) == len(consumer.nodes), | ||
timeout_sec=self.session_timeout_sec, | ||
err_msg="Timed out waiting for consumers to join, expected total %d joined, but only see %d joined from" | ||
timeout_sec=self.session_timeout_sec*2, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this to help a bit with the flaky behaviour, making it also consistent to how we wait for members in all other tests that rely on the await_members. |
||
err_msg="Timed out waiting for consumers to join, expected total %d joined, but only see %d joined from " | ||
"normal consumer group and %d from conflict consumer group" % \ | ||
(len(consumer.nodes), len(consumer.joined_nodes()), len(conflict_consumer.joined_nodes())) | ||
) | ||
wait_until(lambda: len(consumer.dead_nodes()) + len(conflict_consumer.dead_nodes()) == len(conflict_consumer.nodes), | ||
timeout_sec=self.session_timeout_sec, | ||
err_msg="Timed out waiting for fenced consumers to die, expected total %d dead, but only see %d dead in" | ||
timeout_sec=self.session_timeout_sec*2, | ||
err_msg="Timed out waiting for fenced consumers to die, expected total %d dead, but only see %d dead in " | ||
"normal consumer group and %d dead in conflict consumer group" % \ | ||
(len(conflict_consumer.nodes), len(consumer.dead_nodes()), len(conflict_consumer.dead_nodes())) | ||
) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we anticipate any timing issues here? That is, will
num_rebalances()
andjoined_nodes()
be "guaranteed" to return the correct values immediately after the call toawait_consumed_messages()
is finished? Or do we want to wrap those assertions aswait_until()
s to give them a few seconds to coalesce to the correct value?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be no timing issues as I see it. For the
consumer.joined_nodes
there is a previousself.await_members
, that ensures that we wait for the time needed for all the nodes to join. As for theconflict_consumer.joined_nodes()
, its for nodes that never joined, we're just asserting that after the non-conflicting remained without rebalance, consuming (ensuring activity), the conflicting ones did not join. Makes sense?