Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-13128: extract retry checker #11275

Merged
merged 1 commit into from Aug 28, 2021
Merged

KAFKA-13128: extract retry checker #11275

merged 1 commit into from Aug 28, 2021

Conversation

wcarlson5
Copy link
Contributor

add a new case for the flaky tests to take care of threads starting up

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@wcarlson5
Copy link
Contributor Author

@lct45 @ableegoldman Can I get a quick look?

Copy link
Contributor

@lct45 lct45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wcarlson5 LGTM. The failure we saw to prompt this was:
java.lang.AssertionError: Unexpected exception thrown while getting the value from store. Expected: is (a string containing "Cannot get state store source-table because the stream thread is PARTITIONS_ASSIGNED, not RUNNING" or a string containing "The state store, source-table, may have migrated to another instance") but: was "Cannot get state store source-table because the stream thread is STARTING, not RUNNING"

Copy link
Contributor

@ableegoldman ableegoldman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for patching up this test, it's been failing pretty frequently lately. I had a suggestion for a cleaner approach to this specific case of a thread still in STARTING, but I'm also happy to just merge this fix as-is to stop the flakiness and loop back on it in a followup PR.

I'll leave it up to you. If you'd prefer to merge as-is, just lmk, but can you also file a quick ticket to revisit this and (a) make sure Streams transitions to REBALANCING when a new thread is added, and (b) use that to wait after adding a thread until Streams is back to RUNNING before trying to query it in this test

anyOf(
containsString("Cannot get state store source-table because the stream thread is PARTITIONS_ASSIGNED, not RUNNING"),
containsString("The state store, source-table, may have migrated to another instance"),
containsString("Cannot get state store source-table because the stream thread is STARTING, not RUNNING")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular exception has a different vibe than the above two: those are just unavoidable possibilities with IQ since Streams can start rebalancing at any time and there's no way to ensure that won't happen before/during a query. But failing due to a thread that's STARTING is in a different ballpark -- it's not like this can happen for no reason, and it should be possible to wait until all threads have finished startup before issuing a query. For example, we can just wait for Streams to reach RUNNING after adding a new thread, like we do when the client itself is first starting up.

(I'm not sure if the Streams state will transition to REBALANCING when a new thread is added, but it should.)

@wcarlson5
Copy link
Contributor Author

@ableegoldman lets get this merged. I will file a follow up ticket to make sure that this test has proper gating on the streams state

@ableegoldman ableegoldman merged commit 49aed78 into apache:trunk Aug 28, 2021
@wcarlson5 wcarlson5 deleted the new_case_for_starting_threads branch August 30, 2021 17:32
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
…n causing flaky StoreQueryIntegrationTest (apache#11275)

Add a new case to the list of possible retriable exceptions for the flaky tests to take care of threads starting up

Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants