New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-13128: extract retry checker #11275
KAFKA-13128: extract retry checker #11275
Conversation
@lct45 @ableegoldman Can I get a quick look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wcarlson5 LGTM. The failure we saw to prompt this was:
java.lang.AssertionError: Unexpected exception thrown while getting the value from store. Expected: is (a string containing "Cannot get state store source-table because the stream thread is PARTITIONS_ASSIGNED, not RUNNING" or a string containing "The state store, source-table, may have migrated to another instance") but: was "Cannot get state store source-table because the stream thread is STARTING, not RUNNING"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for patching up this test, it's been failing pretty frequently lately. I had a suggestion for a cleaner approach to this specific case of a thread still in STARTING
, but I'm also happy to just merge this fix as-is to stop the flakiness and loop back on it in a followup PR.
I'll leave it up to you. If you'd prefer to merge as-is, just lmk, but can you also file a quick ticket to revisit this and (a) make sure Streams transitions to REBALANCING when a new thread is added, and (b) use that to wait after adding a thread until Streams is back to RUNNING before trying to query it in this test
anyOf( | ||
containsString("Cannot get state store source-table because the stream thread is PARTITIONS_ASSIGNED, not RUNNING"), | ||
containsString("The state store, source-table, may have migrated to another instance"), | ||
containsString("Cannot get state store source-table because the stream thread is STARTING, not RUNNING") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular exception has a different vibe than the above two: those are just unavoidable possibilities with IQ since Streams can start rebalancing at any time and there's no way to ensure that won't happen before/during a query. But failing due to a thread that's STARTING
is in a different ballpark -- it's not like this can happen for no reason, and it should be possible to wait until all threads have finished startup before issuing a query. For example, we can just wait for Streams to reach RUNNING after adding a new thread, like we do when the client itself is first starting up.
(I'm not sure if the Streams state will transition to REBALANCING when a new thread is added, but it should.)
@ableegoldman lets get this merged. I will file a follow up ticket to make sure that this test has proper gating on the streams state |
…n causing flaky StoreQueryIntegrationTest (apache#11275) Add a new case to the list of possible retriable exceptions for the flaky tests to take care of threads starting up Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman
add a new case for the flaky tests to take care of threads starting up
Committer Checklist (excluded from commit message)