-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-13124] Don't forward exceptions when finishing SourceStreamTask #9090
[FLINK-13124] Don't forward exceptions when finishing SourceStreamTask #9090
Conversation
cc @knaufk |
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work @aljoscha ! I had some comments that I left on the PR.
* want to ignore exceptions thrown after finishing, to ensure shutdown works smoothly. | ||
*/ | ||
private volatile boolean isFinished = false; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to move this flag to the StreamTask
. The reason is that although for now it is used only in the SourceStreamTask
, know if the job was normally terminated or cancelled is a property relevant to all the StreamTask
s.
@@ -147,6 +155,7 @@ protected void cancelTask() { | |||
|
|||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above comments in mind, I would suggest to have a method in the StreamTask
like the following:
@VisibleForTesting
void finish() throws Exception {
finished = true;
finishTask(); // this is implemented by subclasses
}
(needed in the finishingIgnoresExceptions()
test).
And with this, the end of the performCheckpoint
should become:
if (isRunning && syncSavepointLatch.isSet()) {
final boolean checkpointWasAcked =
syncSavepointLatch.blockUntilCheckpointIsAcknowledged();
if (checkpointWasAcked) {
finishTask();
}
}
@@ -118,7 +124,9 @@ protected void performDefaultAction(ActionContext context) throws Exception { | |||
} | |||
|
|||
sourceThread.join(); | |||
sourceThread.checkThrowSourceExecutionException(); | |||
if (!isFinished) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the above comment, this can be a protected
getter method for the finished
flag in the StreamTask
, available to all StreamTask
s.
Master has moved since I opened this PR, the fix still works, with a change, but I want to investigate a bit further because that change on master removes other new behaviour that my PR was testing in one of the test cases. |
6c22974
to
b1330cb
Compare
PTAL (please take another look). I now changed this to work with latest master. (there was a change in how the mailbox worked.) |
Before, exceptions that occurred after cancelling a source (as the KafkaConsumer did, for example) would make a job fail when attempting a "stop-with-savepoint". Now we ignore those exceptions.
b1330cb
to
6b32e06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aljoscha Thanks for the work!
Changes look good for the master
, but the changes in the mailbox that you refer to, are not present in the release-1.9
branch. I can merge this PR to the master, but for the release-1.9
you should open a new PR, potentially with the solution included in this PR before your last rebasing.
WDYT?
ok, thanks! could you please merge? (just arrived in china 😅 ) |
Sure! I will merge now. |
Merged on master |
// We tell the mailbox to finish, to prevent any exceptions that might occur during | ||
// finishing from leading to a FAILED state. This could happen, for example, when cancelling | ||
// sources as part of a "stop-with-savepoint". | ||
mailboxProcessor.allActionsCompleted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being late in this PR.
Is there a reason why in master
branch version there is mailboxProcessor.allActionsCompleted();
in finishTask()
but in release-1.9
branch doesn't have it (#9115)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the release-1.9
branch StreamTask.cancel()
also doesn't use this method, in fact the MailboxProcessor
doesn't yet exist and isn't used like this. However, this solution leads to a race condition, see also #9125.
What is the purpose of the change
Before, exceptions that occurred after cancelling a source (as the
KafkaConsumer did, for example) would make a job fail when attempting a
"stop-with-savepoint". Now we ignore those exceptions.
Brief change log
isFinished
flag inSourceStreamTask
Verifying this change
This change added tests and can be verified as follows:
SourceStreamTaskTest
Does this pull request potentially affect one of the following parts:
Documentation