-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-25427] Fix SavepointITCase.testTriggerSavepointAndResumeWithNo… #18332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 899e78f (Tue Jan 11 17:59:14 UTC 2022) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. DetailsThe Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
flink-runtime/src/test/java/org/apache/flink/runtime/minicluster/MiniClusterITCase.java
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java
Outdated
Show resolved
Hide resolved
| final String firstCheckpoint = | ||
| cluster.getMiniCluster().triggerCheckpoint(firstJobId).get(); | ||
| cluster.getClusterClient().cancel(firstJobId).get(); | ||
| CommonTestUtils.waitForJobStatus( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand why we need it here. if we clone JobGraph then everything works perfectly fine even without this waiting. It seems for me that the awaiting on cancel's future is enough to be sure that the next submit with the new JobGraph will be successful. Did I miss some rare race condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would fix the test as well, even without cloning the JobGraph, because after this point the mutations wouldn't affect the dispatcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you cloned the object inside of submit. Do I understand correctly, that we can avoid waitForJobStatus? (I understand that it also makes sense, but I just don't want to overcomplicate the code if it is not necessary)
|
@dawidwys Thanks for the review. I've addressed your comments, PTAL |
| cluster.getClusterClient().submitJob(jobGraph).get(); | ||
| CommonTestUtils.waitForAllTaskRunning(cluster.getMiniCluster(), jobID1, false); | ||
| final JobID firstJobId = new JobID(); | ||
| final JobGraph firstJobGraph = InstantiationUtil.clone(jobGraph); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, you don't need clone here anymore
| final String firstCheckpoint = | ||
| cluster.getMiniCluster().triggerCheckpoint(firstJobId).get(); | ||
| cluster.getClusterClient().cancel(firstJobId).get(); | ||
| CommonTestUtils.waitForJobStatus( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you cloned the object inside of submit. Do I understand correctly, that we can avoid waitForJobStatus? (I understand that it also makes sense, but I just don't want to overcomplicate the code if it is not necessary)
dawidwys
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 % I agree with @akalash we should remove the clone from SavepointITCase
…eWithNoClaim by implicitly cloning the JobGraphs submitted into the MiniCluster.
|
Needs another rebase due to conflict in MiniCluster |
akalash
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Shall I merge it @dmvk ? |
|
@dawidwys yes please, I was waiting for CI to get green to ping you :) |
https://issues.apache.org/jira/projects/FLINK/issues/FLINK-25427