Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-8800: Increase poll timeout in poll[Records]UntilTrue #7211

Merged
merged 1 commit into from Aug 17, 2019

Conversation

@dongjinleekr
Copy link
Contributor

commented Aug 14, 2019

I dug out this problem and found the following.

The failing method, TestUtils#pollRecordsUntilTrue, was added with commit 1e92b70. However, it calls Consumer#poll(Duration) which throws timeout if the given period is expired while fetching both of the metadata and records. (i.e., dislike to deprecated Consumer#poll(long) method.) For this reason, it is easy to fail and making the related tests flaky.

It seems like there are two approaches to fix this problem:

  1. Increase the timeout duration (current PR)
  2. Use a dedicated method to fetch the metadata.

How do you think? cc/ @mjsax

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@dongjinleekr dongjinleekr changed the title Increase timeout duration in TestUtils#poll[Records]UntilTrue KAFKA-8800: Flaky Test SaslScramSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl Aug 14, 2019

@mjsax

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

#ould be great to get some test stabilized, but I don't know this part of the code very well. It's unclear if bumping timeouts is the right approach to me.

However, it seems that poll() is wrapped within waitUntilTrue that also take a timeout -- if poll() throws it should catch it and retry until it's own timeout waitTimeMs passed. Hence, I am not sure this PR does help.

\cc @hachikuji @ijuma

@ijuma

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

However, it calls Consumer#poll(Duration) which throws timeout if the given period is expired while fetching both of the metadata and records. (i.e., dislike to deprecated Consumer#poll(long) method.)

What exception is thrown?

@mjsax

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

@ijuma TimeoutException -- cf the PR description

@ijuma

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

@mjsax I read the PR, but it didn't specify exactly what the exception was. I am a bit confused because poll should not throw that exception, right?

@dongjinleekr

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

@mjsax @ijuma FYI: It seems like the timeout issue is potentially related to this issue.

@ijuma

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2019

@dongjinleekr I think what you're saying is that if the poll timeout is not long enough to retrieve the metadata, we will fail on each call to poll. And the time needed to fetch metadata increased somewhat and 50 ms is no longer enough. It seems fine to have this at 100 ms to reduce flakiness even after KAFKA-8806 is fixed.

@ijuma
ijuma approved these changes Aug 17, 2019
Copy link
Contributor

left a comment

LGTM

@ijuma ijuma changed the title KAFKA-8800: Flaky Test SaslScramSslEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl KAFKA-8800: Increase poll timeout in `pollUntilTrue` and `pollRecordsUntilTrue` Aug 17, 2019

@ijuma ijuma changed the title KAFKA-8800: Increase poll timeout in `pollUntilTrue` and `pollRecordsUntilTrue` KAFKA-8800: Increase poll timeout in `poll[Records]UntilTrue` Aug 17, 2019

@ijuma ijuma changed the title KAFKA-8800: Increase poll timeout in `poll[Records]UntilTrue` KAFKA-8800: Increase poll timeout in poll[Records]UntilTrue Aug 17, 2019

@ijuma ijuma merged commit 784f7e8 into apache:trunk Aug 17, 2019

2 of 3 checks passed

JDK 11 and Scala 2.12 FAILURE 11811 tests run, 77 skipped, 1 failed.
Details
JDK 11 and Scala 2.13 SUCCESS 11811 tests run, 77 skipped, 0 failed.
Details
JDK 8 and Scala 2.11 SUCCESS 11811 tests run, 77 skipped, 0 failed.
Details
@ijuma

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2019

Merged to master and 2.3 branches.

@ijuma

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2019

@dongjinleekr I summarized my understanding of the issue in the commit message 784f7e8. Feel free to update the PR description if you agree.

ijuma added a commit that referenced this pull request Aug 17, 2019
KAFKA-8800: Increase poll timeout in poll[Records]UntilTrue (#7211)
If retrieving metadata during `poll(Duration)` repeatedly takes longer than the
poll timeout, we don't make progress and `waitUntilTrue` eventually times out
causing the test to fail. This behaviour differs from the older `poll(long)` that
would block until metadata retrieval had completed. The flakiness was likely
introduced when we switched from the latter to the former.

Reviewers: Ismael Juma <ismael@juma.me.uk>
xiowu0 added a commit to linkedin/kafka that referenced this pull request Aug 22, 2019
[LI-CHERRY-PICK] [71e2019] KAFKA-8800: Increase poll timeout in poll[…
…Records]UntilTrue (apache#7211)

TICKET = KAFKA-8800
LI_DESCRIPTION =
EXIT_CRITERIA = HASH [71e2019]
ORIGINAL_DESCRIPTION =

If retrieving metadata during `poll(Duration)` repeatedly takes longer than the
poll timeout, we don't make progress and `waitUntilTrue` eventually times out
causing the test to fail. This behaviour differs from the older `poll(long)` that
would block until metadata retrieval had completed. The flakiness was likely
introduced when we switched from the latter to the former.

Reviewers: Ismael Juma <ismael@juma.me.uk>
(cherry picked from commit 71e2019)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.