Skip to content

[FLINK-3390] [runtime, tests] Restore savepoint on ExecutionGraph restart#1720

Closed
uce wants to merge 1 commit intoapache:masterfrom
uce:3390-savepoint_retry
Closed

[FLINK-3390] [runtime, tests] Restore savepoint on ExecutionGraph restart#1720
uce wants to merge 1 commit intoapache:masterfrom
uce:3390-savepoint_retry

Conversation

@uce
Copy link
Copy Markdown
Contributor

@uce uce commented Feb 26, 2016

Temporary work around to restore initial state on failure during recovery as
required by a user. Will be superseded by FLINK-3397 with better handling of
checkpoint and savepoint restoring.

A failure during recovery resulted in restarting a job without its savepoint
state. This temporary work around makes sure that if the savepoint coordinator
ever restored a savepoint and there was no checkpoint after the savepoint,
the savepoint state will be restored again.

…h restart

Temporary work around to restore initial state on failure during recovery as
required by a user. Will be superseded by FLINK-3397 with better handling of
checkpoint and savepoint restoring.

A failure during recovery resulted in restarting a job without its savepoint
state. This temporary work around makes sure that if the savepoint coordinator
ever restored a savepoint and there was no checkpoint after the savepoint,
the savepoint state will be restored again.
@StephanEwen
Copy link
Copy Markdown
Contributor

Looks good to me, pretty good test.

Is this crucial for the next 1.0 RC?

@uce
Copy link
Copy Markdown
Contributor Author

uce commented Feb 26, 2016

I would say yes, because a user ran into this issue and asked for a fix.

@tillrohrmann
Copy link
Copy Markdown
Contributor

Changes look good to me. Good work @uce :-) Will merge it to the master and the release branch.

@asfgit asfgit closed this in c2a43c9 Feb 26, 2016
asfgit pushed a commit that referenced this pull request Feb 26, 2016
…h restart

Temporary work around to restore initial state on failure during recovery as
required by a user. Will be superseded by FLINK-3397 with better handling of
checkpoint and savepoint restoring.

A failure during recovery resulted in restarting a job without its savepoint
state. This temporary work around makes sure that if the savepoint coordinator
ever restored a savepoint and there was no checkpoint after the savepoint,
the savepoint state will be restored again.

This closes #1720.
subhankarb pushed a commit to subhankarb/flink that referenced this pull request Mar 17, 2016
…h restart

Temporary work around to restore initial state on failure during recovery as
required by a user. Will be superseded by FLINK-3397 with better handling of
checkpoint and savepoint restoring.

A failure during recovery resulted in restarting a job without its savepoint
state. This temporary work around makes sure that if the savepoint coordinator
ever restored a savepoint and there was no checkpoint after the savepoint,
the savepoint state will be restored again.

This closes apache#1720.
@uce uce deleted the 3390-savepoint_retry branch April 20, 2016 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants