Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8567] Do not swallow execution errors during checkpointing #10008

Merged
merged 1 commit into from Nov 8, 2019

Conversation

mxm
Copy link
Contributor

@mxm mxm commented Nov 6, 2019

If a bundle fails to finalize before creating a checkpoint, it may be swallowed
and just considered a checkpointing error. This breaks the execution flow and
exactly-once guarantees.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@mxm mxm requested review from aljoscha, tweise and dmvk November 6, 2019 13:31
@mxm
Copy link
Contributor Author

mxm commented Nov 6, 2019

Unrelated Dataflow failures:

    org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testBasic[0: [streamingEngine=false]]
    org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testBasicHarness[0: [streamingEngine=false]]
    org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testBasic[0: [streamingEngine=false]]
    org.apache.beam.runners.dataflow.worker.StreamingDataflowWorkerTest.testBasicHarness[0: [streamingEngine=false]]

invokeFinishBundle();
}
outputManager.closeBuffer();
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also create a JIRA in Flink to have a cleaner way to signal an application error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would make sense. Normally, there should be no application logic in here, though there are probably other people who have similar logic in snapshotState.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxm
Copy link
Contributor Author

mxm commented Nov 7, 2019

Please also see #10007.

}
outputManager.closeBuffer();
} catch (Exception e) {
// Any regular exception during checkpointing will be tolerated by Flink because those
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Any regular exception during checkpointing will be tolerated by Flink because those
// https://jira.apache.org/jira/browse/FLINK-14653
// Any regular exception during checkpointing will be tolerated by Flink because those

@mxm mxm changed the title [BEAM-8566] Do not swallow execution errors during checkpointing [BEAM-8567] Do not swallow execution errors during checkpointing Nov 8, 2019
If a bundle fails to finalize before creating a checkpoint, it may be swallowed
and just considered a checkpointing error. This breaks the execution flow and
exactly-once guarantees.
@mxm
Copy link
Contributor Author

mxm commented Nov 8, 2019

Run Portable_Python PreCommit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants