New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-10017: fix flaky EOS-beta upgrade test #9690
Conversation
LGTM. Left some comment on the PR for trunk. |
The test failed... pushed a commit for better debugging. Will try to reproduce locally. Seems there is still something going on. |
@mjsax , I investigated your failed tests for some days, and finally found out why sometimes the test failed here:
It's because sometimes, the keys in stream store is empty, and that's why the following computation based on the variable is wrong.
And the variable So, in summary, there's no logic error in the code, just has a bad assumption: Thanks. |
@mjsax , I further investigated the issue I found last week:
I finally found out the root cause, it's because the stream is not completed the stable assignment rebalancing during Coordinator finished stable assignment of tasks -> notify tasks -> task handles the new assignment -> stream thread change state from RUNNING to PARTITIONS_ASSIGNED -> stream client change state from RUNNING to REBALANCING -> stream thread change state from PARTITIONS_ASSIGNED to RUNNING -> stream client change state from REBALANCING to RUNNING And what we can make sure via
So, that's why we got the empty key list form the stream store. As I mentioned in #9733 , after Anyway, that's my finding, share with you. I'll update in my PR #9733 (maybe next week since a little busy these days). Thanks. |
As we have a |
PR for
2.6
branch. "Main" PR fortrunk
and2.7
is #9688The difference is, that in
2.6
and eos-alpha, we commit tasks individually, while in2.7
and eos-alpha, if one tasks needs a commit we commit all tasks.Call for review @abbccdda @ableegoldman @guozhangwang