New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-12015] Fix TaskManagerRunnerTest instability #8053
[FLINK-12015] Fix TaskManagerRunnerTest instability #8053
Conversation
Before, the was a race condition between the termination future in TaskManagerRunner completing and the asynchronous shutdown part here: https://github.com/apache/flink/blob/70107c4647ecac3df9b2b8c7920e7cb99ad550f1/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L258 The test would go out of the block that was waiting on the future but the shutdown code that is executed after the future completes is executed asynchronously, so is not guaranteed to have run at that point. This also refactors the code a bit to make it more obvious what is happening and removes the SecurityManagerContext because it was obscuring the problem. This was analyzed by Igal and me, and mostly fixed by Igal.
04783d9
to
19d56f2
Compare
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for hardening the test @aljoscha. This is a good fix. I would try to avoid busy looping by letting the SystemExitTrackingSecurityManager
return a future which is completed once System.exit
is called.
|
||
eventually(() -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of busy looping, I would prefer to change the SystemExitTrackingSecurityManager
to return a CompletableFuture<Integer>
which is completed with the exit code with which System.exit(exitCode)
is called.
@tillrohrmann I updated the PR. Btw, this test only ensures that we have at least one call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. Thanks for the hardening of this test @aljoscha. +1 for merging.
Thanks! I'll merge |
Before, the was a race condition between the termination future in
TaskManagerRunner completing and the asynchronous shutdown part here:
https://github.com/apache/flink/blob/70107c4647ecac3df9b2b8c7920e7cb99ad550f1/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L258
The test would go out of the block that was waiting on the future but
the shutdown code that is executed after the future completes is
executed asynchronously, so is not guaranteed to have run at that point.
This also refactors the code a bit to make it more obvious what is
happening and removes the SecurityManagerContext because it was
obscuring the problem.
Verifying this change
This change is already covered by existing tests, such as (please describe tests).