Skip to content

KAFKA-20423: Fix flakiness of testWakeupWithFetchDataAvailable#22364

Merged
lianetm merged 2 commits into
apache:trunkfrom
chickenchickenlove:KAFKA-20423
Jun 2, 2026
Merged

KAFKA-20423: Fix flakiness of testWakeupWithFetchDataAvailable#22364
lianetm merged 2 commits into
apache:trunkfrom
chickenchickenlove:KAFKA-20423

Conversation

@chickenchickenlove
Copy link
Copy Markdown
Contributor

@chickenchickenlove chickenchickenlove commented May 24, 2026

Description

I couldn't reproduce it in my local. This PR is based on the specific
assumptions and fix non-deterministic path caused by it.

  1. The foreground test thread puts the fetch response into
    MockClient.responses via respondFrom(...).
  2. The heartbeat thread enters MockClient.poll() first.
  3. The heartbeat thread removes the fetch response from the queue by
    calling responses.poll().
  4. However, it gets descheduled before calling response.onComplete().
  5. The foreground thread calls client.poll(0, ...), but since the
    response queue is empty, it completes nothing and returns.
  6. consumer.wakeup() is called.
  7. The first consumer.poll(Duration.ZERO) immediately throws
    WakeupException from maybeTriggerWakeup().
  8. consumer.position(tp0) already has a valid position of 0, so it
    returns 0 directly without performing a network poll.
  9. The next consumer.poll(Duration.ZERO) also returns empty because
    the response is not yet in pendingCompletion and not in the
    FetchBuffer either.
  10. After that, the heartbeat thread wakes up and calls
    response.onComplete(), but by then the assertion has already
    failed.

Although I couldn't reproduce it in my local, this scenario make sense
to me. Given this, this PR applies the change, and I plan to monitor
the CI trend afterward to confirm whether the flakiness is resolved.

Reviewers: Lianet Magrans lmagrans@confluent.io

@github-actions github-actions Bot added triage PRs from the community consumer tests Test fixes (including flaky tests) clients small Small PRs labels May 24, 2026
@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@lianetm Hi!
When you get a chance, could you take a look?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@lianetm lianetm removed triage PRs from the community needs-attention labels Jun 1, 2026
@lianetm
Copy link
Copy Markdown
Member

lianetm commented Jun 1, 2026

Thanks @chickenchickenlove ! Makes sense to me. Could you rebase pls? (build failure, seems it's just that we are missing this fa097fa ?)

@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@lianetm
Thanks for your comments!
I rebased it and CI worked well!
When you get a chance, please take a look 🙇‍♂️

Copy link
Copy Markdown
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@lianetm lianetm merged commit 16e976a into apache:trunk Jun 2, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-approved clients consumer small Small PRs tests Test fixes (including flaky tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants