-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-18685] [runtime] Make MiniClusterJobClient#getAccumulators non-blocking in Streaming mode #14558
[FLINK-18685] [runtime] Make MiniClusterJobClient#getAccumulators non-blocking in Streaming mode #14558
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit aa00e2a (Fri May 28 08:14:28 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
R:@rmetzger. There are some test failures that I missed during my local tests. I'll ping you when this PR is ready for review. |
4480110
to
5543d0d
Compare
@rmetzger I fixed one of the failing tests. But for UnalignedCheckpointCompatibilityITCase savepoint tests, I know we need to get the accumulators from JobExecutionResult for the test to work. But I struggle to find the proper condition when to get the accumulators from JobExecutionResult rather than getting them from the ExecutionGraph as usual. I don't know Flink enough yet, can you give me a clue on this condition? |
//TODO: this is not the only case when we need to get the accumulators from JobExecutionResult. | ||
// It is needed also for UnalignedCheckpointCompatibilityITCase savepoints tests to pass. | ||
// What is the complete condition ? | ||
if (!miniCluster.isRunning()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the miniCluster.isRunning()
might not be the proper condition ? Since a mini-cluster could be viewed as a cluster that could run multiple jobs, the miniCluster would still be able to run after the job is done logically. Perhaps we could use jobResultFuture.isDone()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree miniCluster.isRunning()
is more a side effect of PerJobMiniClusterFactoryTest.testJobExecution
stopping the minicluster when the job is done. jobResultFuture.isDone()
is definitely a better condition.
Hi @echauchot, from my point of view, there would be two possible semantics when users want to acquire accumulators:
Previously |
5543d0d
to
792763a
Compare
@gaoyunhaii thanks for your comments. I totally agree with them. I also concluded that the only way was to change UnalignedCheckpointCompatibilityITCase to wait for the job end. Thanks for the confirmation ! |
792763a
to
c875ca9
Compare
@flinkbot run azure |
@flinkbot run travis |
@rmetzger this PR is ready for review PTAL. |
…ocking and returns correct accumulators.
…t requests the accumulators from the JobExecutionResult as before
c875ca9
to
aa00e2a
Compare
Friendly ping: @rmetzger do you have time to review/merge this PR? if not, can you ping someone to look ? Thanks. |
@tillrohrmann can you please take a look at this PR as I saw you in the git history of this part of code ? Thanks |
@echauchot I'm really sorry for the delay. I'll take a look at this PR today! |
@rmetzger don't worry, I know what it is :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this fix!
I also manually verified the fix according to the Jira description.
I'll merge this now!
What is the purpose of the change
Make MiniClusterJobClient#getAccumulators() non-blocking in Streaming mode
Brief change log
Get the serialized accumulators from the execution graph rather than getting the accumulators from the JobExecutionResult.
Verifying this change
Changed AccumulatorLiveITCase so that the verification of the accumulators is done by calling getAccumulators() with both ClusterClient (as it was in previous test version) and MiniClusterJobClient
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation