-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-19864][tests] Fix unpredictable Thread.getState in StreamTaskTestHarness due to concurrent class loading #14140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…estHarness due to concurrent class loading
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit b3576e2 (Thu Nov 19 14:35:22 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
| try { | ||
| final CountDownLatch latch = new CountDownLatch(1); | ||
| mailboxExecutor.execute(() -> { | ||
| allInputProcessed.set(mailboxProcessor.isDefaultActionUnavailable()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think isDefaultActionUnavailable() is not the best choice here because suspension is a temporary state; some input may come after this check.
What about using mailboxProcessor.isMailboxLoopRunning() instead?
It is updated on InputStatus.END_OF_INPUT which seems exactly what is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may made wrong name for allInputProcessed, it should express all current available input has been processed, not end of input. What StreamTaskTestHarness.waitForInputProcessing does is waiting current available input processed, so that following up testing code could do post-process assertion.
| latch.countDown(); | ||
| }, "query-whether-processInput-has-suspend-itself"); | ||
| // Mail could be dropped due to task exception, so we do timed-await here. | ||
| latch.await(1, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this await, is the sleep below still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This await has two purposes here:
- Wait until post mail has been processed, so we can query
allInputProcessedsafely. - If post mail has been dropped due to task exception, break out indefinite wait.
It does not serve as sleeping to yield control to mailbox thread. Without sleep, testing thread and mailbox thread may do ping-pong game between process-one-element and execute-one-mail.
I tend to keep it, it does not affect correctness at least.
|
Hi @AHeise, I try to list existing alternatives below:
First, comparing to Second, migration may require big hard work since there are almost 53 dependent tests as you counted. Personally, I think it would be nice if we can solve unstable Third, I think 3(a) or similar may be what you suggest in jira. Togather with all-queues-empty while-looping, 3(a) and 3(b) should have same effect. I notice that there are some optimizations in |
Thank you very much for your deep investigation. I asked Roman to assess the solution and the alternatives as he is much more adept on threading issues than me. In theory, I'd go with the first approach, but I understand that this is hardly feasible. So I like your current fix in most regards (details may or may not be improved). One more idea, couldn't we also inject |
|
@AHeise I am getting confused. We probably have essential divergences on what @rkhachatryan gave similar suggestion in previous review cycle, I think we probably should align on what |
You are completely right. It's just very difficult to realize the original implementation without some kind of hacks and assumptions - any hotfix is prone to fail with a slight change again. I'd probably go your way but also start migrating the tests - the harness has been deprecated for a reason. |
rkhachatryan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side, all the concerns were resolved.
Thanks for the fix!
|
Thanks you very much for the contribution. Merging. |
|
I think In my opinion, the key difference between That is all I have got now. @AHeise Thank you for pointing out the inappropriateness of @AHeise @rkhachatryan Thank you for reviewing. |
What is the purpose of the change
Fix unpredictable
Thread.getStateinStreamTaskTestHarness.waitForInputProcessingdue to concurrent class loadingBrief change log
Query whether all input has been processed using
MailboxProcessor.isDefaultActionUnavailablethroughMailboxExecutor.execute.Verifying this change
This change is already covered by existing tests:
TwoInputStreamTaskTest.testWatermarkMetricsand other tests depending onStreamTaskTestHarness.waitForInputProcessing.Does this pull request potentially affect one of the following parts:
@Public(Evolving): (no)Documentation