[FLINK-4445] Add option to ignore unmapped checkpoint state #2712

uce · 2016-10-28T11:26:55Z

When restoring from a checkpoint/savepoint, state for each operator has to be restored. For savepoints, this means that the user cannot remove an operator from her topology and still use the savepoint.

With this change, we will allow to ignore state that cannot be mapped back to the job being restored. The default behaviour does not change.

Changes

I've removed the allOrNothingState flag as it was only effecting non-partitioned operator state and never set to true anyways (except tests). The flag controlled whether each non-partitioned operator state was restored.
Moved the savepoint path from the JobSnapshottingSettings to the JobGraph
Added the --ignoreUnmappedState (short -i) flag to the run command: bin/flink run -s <savepointPath> -i ...

I've tested this manually by triggering a savepoint for a job, adjusting the job (removing an operator), and then trying to resume from the savepoint. By default, restoring fails, but with the flag everything works.

StefanRRichter

+1, the proposed changes look good to me.

uce · 2016-10-31T14:51:11Z

Thanks for the review. Going to merge this.

StephanEwen · 2016-10-31T16:31:36Z

Looks good to me, except the name ignoreUnmappedState ;-)
As usual, the three most difficult things in computer science are (1) finding good names, and (2) off-by-one errors.

What do you think about calling something like allowUnresumedState or allowNonRestoredState? The "allow" to me implies that this is a valid scenario.

uce · 2016-10-31T16:39:50Z

Very much agree Stephan! I don't know what sounds better to native speakers and more intuitive to users though... unresumed state or non restored state? ;-) @greghogan @jgrier do you have any input on this?

The internal behaviour is the following: The checkpoint/savepoint stores state for each operator of the original job graph (from which the checkpoint/savepoint was triggered) keyed by the operator ID. When a user resumes from this checkpoint/savepoint, the checkpoint coordinator tries to map each state (keyed by operator ID) to the operators of the new job. This PR allows ;-) that some of this state is not restored. Any ideas on how to call this?

Allow to specify whether a checkpoint restore should ignore checkpoint state that it cannot map to the program. This is exposed via the CLI in the run command: bin/flink run -s <savepointPath> -i ... Furthermore, the savepoint restore settings are moved out of the snapshotting settings.

… state Allows to ignore checkpoint state that cannot be mapped to a job vertex when restoring.

…int state Allows to skip checkpoint state that cannot be mapped to a job vertex when restoring. This closes apache#2712.

This was referenced Oct 28, 2016

[backport] [FLINK-4445] Add option to ignore unmapped checkpoint state #2713

Closed

[FLINK-4935] [webfrontend] Submit job with savepoint #2714

Closed

StefanRRichter approved these changes Oct 31, 2016

View reviewed changes

uce added 2 commits November 1, 2016 17:51

[FLINK-4445] [checkpointing] Add option to ignore unmapped checkpoint…

3b2b8c3

… state Allows to ignore checkpoint state that cannot be mapped to a job vertex when restoring.

uce force-pushed the 4445-unmatched_state branch from 57621d3 to 5c44d6a Compare November 1, 2016 17:20

Rename to allowNonRestoredState

98a1362

uce force-pushed the 4445-unmatched_state branch from 5c44d6a to 98a1362 Compare November 1, 2016 17:45

Rename short opt to -n

1da6da6

asfgit closed this in c0e620f Nov 2, 2016

uce deleted the 4445-unmatched_state branch November 2, 2016 09:15

rmetzger added the component=Runtime/StateBackends label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-4445] Add option to ignore unmapped checkpoint state #2712

[FLINK-4445] Add option to ignore unmapped checkpoint state #2712

Uh oh!

uce commented Oct 28, 2016

Uh oh!

StefanRRichter left a comment

Uh oh!

uce commented Oct 31, 2016

Uh oh!

StephanEwen commented Oct 31, 2016

Uh oh!

uce commented Oct 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[FLINK-4445] Add option to ignore unmapped checkpoint state #2712

[FLINK-4445] Add option to ignore unmapped checkpoint state #2712

Uh oh!

Conversation

uce commented Oct 28, 2016

Changes

Uh oh!

StefanRRichter left a comment

Choose a reason for hiding this comment

Uh oh!

uce commented Oct 31, 2016

Uh oh!

StephanEwen commented Oct 31, 2016

Uh oh!

uce commented Oct 31, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants