-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-16443][checkpointing] Make sure that CheckpointException are also serialized in DeclineCheckpoint. #14177
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 8118763 (Mon Nov 23 14:09:53 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
Converted to draft, so that we merge only after 1.12 branch is forked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @AHeise. I had a couple of comments. Please take a look.
@@ -34,21 +35,18 @@ | |||
private static final long serialVersionUID = 2094094662279578953L; | |||
|
|||
/** The reason why the checkpoint was declined. */ | |||
@Nullable | |||
private final Throwable reason; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we change the type if we know that it is always serialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same actually applies to the return type of getReason()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that we can now also adapt CheckpointCoordinator.getCheckpointException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the first two things, but didn't know how to proceed with the third: the method is also called from onTriggerFailure
where we just get a Throwable
from the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked the other usages of this method. I guess it should be fine then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this PR @AHeise. LGTM. +1 for merging.
…lso serialized in DeclineCheckpoint.
…on JobManager" This reverts commit 2bcf509
What is the purpose of the change
The problem of having exceptions that are only in the user code classloader was fixed by proactively serializing them inside the CheckpointException. That means all consumers of CheckpointException now need to be aware of that and unwrap the serializable exception.
I believe the right way to fix this would have been to use a SerializedException in the DeclineCheckpoint message instead, which would have localized the change to the actual problem: RPC transport.
Brief change log
Verifying this change
It's a rather minor change, where proper testing would require a rather complicated setup.
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation