[FLINK-6833] [task] Fail StreamTask only due to async exception if it is running#4058
[FLINK-6833] [task] Fail StreamTask only due to async exception if it is running#4058tillrohrmann wants to merge 2 commits intoapache:masterfrom
Conversation
… is running In order to resolve a race condition between a properly terminated StreamTask which cleans up its resources (stopping asynchronous operations, etc.) and a cancelled asynchronous operation (e.g. asynchronous checkpointing operation), we check whether the StreamTask is still running before failing it externally.
|
The test will not pass checkstyle. You can run |
| @Override | ||
| public void handleAsyncException(String message, Throwable exception) { | ||
| getEnvironment().failExternally(exception); | ||
| if (isRunning) { |
There was a problem hiding this comment.
I doubt that this is a complete fix, it will only make the problem less likely to occur. This method can be called asynchronously to Threads that manipulate isRunning, which means that the stream task can leave running status after the condition was checked as true, but before failExternally(...) went through.
There was a problem hiding this comment.
If isRunning == true when entering the if branch, then depending on what happens before failExternally, we can assume that the handleAsyncException either happened atomically before isRunning was set to false or not. But what we don't want to happen is that if isRunning == false, that we can still fail the task. Thus, I think it solves a valid problem.
|
I would like to merge this PR if I could clarify all concerns and there are no further objections. |
|
LGTM +1 |
|
Thanks for the review @zentol and @StefanRRichter. Merging this PR. |
… is running In order to resolve a race condition between a properly terminated StreamTask which cleans up its resources (stopping asynchronous operations, etc.) and a cancelled asynchronous operation (e.g. asynchronous checkpointing operation), we check whether the StreamTask is still running before failing it externally. This closes #4058.
In order to resolve a race condition between a properly terminated StreamTask which
cleans up its resources (stopping asynchronous operations, etc.) and a cancelled
asynchronous operation (e.g. asynchronous checkpointing operation), we check whether
the StreamTask is still running before failing it externally.