-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-19069] execute finalizeOnMaster with io executor to avoid hea… #13412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…vy io operations on main executor
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 85d8066 (Thu Sep 17 13:41:29 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
@flinkbot run azure |
|
@rmetzger Could you help me confirm the failed test? I found the |
|
It is a common phenomenon that tests are failing more frequently, since it is a slower environment (only 2 cores, 8gb memory). On the first run, the test is passing, but if I configure IntelliJ to run until failure, the test gets stuck on my MacBook Pro locally as well. I'm pretty sure this failure is caused by the proposed changes. |
| if (transitionState(JobStatus.RUNNING, JobStatus.FINISHED)) { | ||
| onTerminalState(JobStatus.FINISHED); | ||
| } | ||
| FutureUtils.combineAll(futures).whenCompleteAsync((ignored, t) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| FutureUtils.combineAll(futures).whenCompleteAsync((ignored, t) -> { | |
| FutureUtils.assertNoException(FutureUtils.combineAll(futures).whenCompleteAsync((ignored, t) -> { |
Otherwise, the ExceptionUtils.rethrowIfFatalError won't do much.
| ClusterEntryPointExceptionUtils.tryEnrichClusterEntryPointError(t); | ||
| failGlobal(new Exception("Failed to finalize execution on master", t)); | ||
| } else { | ||
| if (transitionState(JobStatus.RUNNING, JobStatus.FINISHED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we do if the job has transitioned into a terminal state in the meantime (say CANCELLED?)
If I'm reading the code correctly, will throw a IllegalStateException if the current state is terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I believe we should not confirm any terminal state until the finalizeOnMaster method has finished.
|
I looked a bit into your change. I can not guarantee that my comments are pointing you in the right direction. I have no prior experience with the |
|
@rmetzger The test fails because we don't use a real |
|
That is a very good remark @StephanEwen! I would actually propose to not address this issue anymore. |
|
I'm closing this PR for now. Please re-open or comment if you have a different opinion. |
#12956 What is the purpose of the change
Users may execute heavy I/O operations in
finalizeOnMasterfunction, which may block the operations in main executor such as client's request and heartbeats. This change is to move the job vertex'sfinalizeOnMastercall into dispatcher's IO executor, and transition the job status in main executor after all thefinalizeOnMasteroperations finish.Brief change log
Use
CompletableFuture.runAsyncto execute thefinalizeOnMasterinioExecutor.Verifying this change
The current unit testing in
FinalizeOnMasterTestandMiniClusterITCaseshould cover this change.Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation