[FLINK-3390] [runtime, tests] Restore savepoint on ExecutionGraph restart by uce · Pull Request #1720 · apache/flink

uce · 2016-02-26T11:50:38Z

Temporary work around to restore initial state on failure during recovery as
required by a user. Will be superseded by FLINK-3397 with better handling of
checkpoint and savepoint restoring.

A failure during recovery resulted in restarting a job without its savepoint
state. This temporary work around makes sure that if the savepoint coordinator
ever restored a savepoint and there was no checkpoint after the savepoint,
the savepoint state will be restored again.

…h restart Temporary work around to restore initial state on failure during recovery as required by a user. Will be superseded by FLINK-3397 with better handling of checkpoint and savepoint restoring. A failure during recovery resulted in restarting a job without its savepoint state. This temporary work around makes sure that if the savepoint coordinator ever restored a savepoint and there was no checkpoint after the savepoint, the savepoint state will be restored again.

StephanEwen · 2016-02-26T12:37:28Z

Looks good to me, pretty good test.

Is this crucial for the next 1.0 RC?

uce · 2016-02-26T14:22:42Z

I would say yes, because a user ran into this issue and asked for a fix.

tillrohrmann · 2016-02-26T17:36:14Z

Changes look good to me. Good work @uce :-) Will merge it to the master and the release branch.

…h restart Temporary work around to restore initial state on failure during recovery as required by a user. Will be superseded by FLINK-3397 with better handling of checkpoint and savepoint restoring. A failure during recovery resulted in restarting a job without its savepoint state. This temporary work around makes sure that if the savepoint coordinator ever restored a savepoint and there was no checkpoint after the savepoint, the savepoint state will be restored again. This closes #1720.

…h restart Temporary work around to restore initial state on failure during recovery as required by a user. Will be superseded by FLINK-3397 with better handling of checkpoint and savepoint restoring. A failure during recovery resulted in restarting a job without its savepoint state. This temporary work around makes sure that if the savepoint coordinator ever restored a savepoint and there was no checkpoint after the savepoint, the savepoint state will be restored again. This closes apache#1720.

asfgit closed this in c2a43c9 Feb 26, 2016

uce deleted the 3390-savepoint_retry branch April 20, 2016 13:16

rmetzger added the component=<none> label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-3390] [runtime, tests] Restore savepoint on ExecutionGraph restart#1720

[FLINK-3390] [runtime, tests] Restore savepoint on ExecutionGraph restart#1720
uce wants to merge 1 commit intoapache:masterfrom
uce:3390-savepoint_retry

uce commented Feb 26, 2016

Uh oh!

StephanEwen commented Feb 26, 2016

Uh oh!

uce commented Feb 26, 2016

Uh oh!

tillrohrmann commented Feb 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

uce commented Feb 26, 2016

Uh oh!

StephanEwen commented Feb 26, 2016

Uh oh!

uce commented Feb 26, 2016

Uh oh!

tillrohrmann commented Feb 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants