-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-21189][runtime] Added support for concurrently caught exceptions to exception history #15311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 14a4a8a (Sat Aug 28 11:11:01 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
...c/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryService.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryService.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/DefaultExecutionGraph.java
Outdated
Show resolved
Hide resolved
...k-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionVertexProvider.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryFactory.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryService.java
Outdated
Show resolved
Hide resolved
0923b25 to
1d0123a
Compare
.../org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractorTest.java
Outdated
Show resolved
Hide resolved
.../org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractorTest.java
Outdated
Show resolved
Hide resolved
.../org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractorTest.java
Outdated
Show resolved
Hide resolved
.../org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractorTest.java
Outdated
Show resolved
Hide resolved
.../org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractorTest.java
Outdated
Show resolved
Hide resolved
flink-runtime/src/test/java/org/apache/flink/runtime/scheduler/ExceptionHistoryEntryTest.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractor.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractor.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractor.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractor.java
Outdated
Show resolved
Hide resolved
|
I addressed all comments and marked the PR as reviewable. The AzureCI failure seems to be unrelated FLINK-21929. |
|
I added a separate commit to cover adding the timestamp to the I verified manually that the UI is not affected by this change. @zentol please give it another round... |
zentol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some CI failures that should be looked at.
...src/main/java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntry.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/runtime/scheduler/exceptionhistory/ExceptionHistoryEntryExtractor.java
Show resolved
Hide resolved
...main/java/org/apache/flink/runtime/scheduler/exceptionhistory/RootExceptionHistoryEntry.java
Show resolved
Hide resolved
...main/java/org/apache/flink/runtime/scheduler/exceptionhistory/RootExceptionHistoryEntry.java
Outdated
Show resolved
Hide resolved
97997bc to
bac508e
Compare
|
The |
|
Writing down the previous comment got me thinking again: Ideally, we would want to have this null check since a failure should always have a cause. We didn't introduce a null check so far because of FLINK-21376. There is ErrorInfo. createErrorInfoWithNullableCause for handling this. But it feels to be handled in the wrong place. Instead, we should substitute the |
|
But that's all part of FLINK-21376, correct? FYI, I don't think the TaskExecutionState should set an exception on it's own, instead it should enforce that if the exception variants are used the exception is not null. |
zentol
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider that blocking the PR, or you can we go ahead with the merge?
It would be some preparation/cleanup task for FLINK-21376. I created FLINK-22060 to cover this. Hence, it's not blocking this PR. |
What is the purpose of the change
The goal is to not only provide the root cause of a task or global failure but exceptions that were caught concurrently in other
Executionswhich would be affected by the partial or full restart of the job.Brief change log
RooExceptionHistoryEntryExceptionHistoryEntryExtractorthat deals with creating the history entriesVerifying this change
This change added tests and can be verified as follows:
ExceptionHistoryEntryExtractorTestwas added dealing with the actual entry extractionDoes this pull request potentially affect one of the following parts:
@Public(Evolving): yesDocumentation