Skip to content

Conversation

@uce
Copy link
Contributor

@uce uce commented Nov 2, 2018

What is the purpose of the change

Retain checkpoints in case of terminal job status SUSPENDED. Note that this does not actually effect the retention behavior currently, because we special case this terminal state in ZooKeeperCompletedCheckpointStore and don't suspend jobs when running with StandaloneCompletedCheckpointStore.

The proposed change is more of a proactive guard to avoid confusion in the future (e.g. if we stop special casing or accidentally use StandaloneCompletedCheckpointStore in HA mode).

I'm also OK with closing this PR without merging since it is not clear how the SUSPENDED state will evolve in the future. Currently SUSPENDED is an "internal" terminal state to which we transition on lost leadership. If we plan to change this in the future (e.g. let users trigger this transition), it might be worthwhile to keep the current behavior.

Brief change log

  • Update CheckpointProperties to retain on suspension

Verifying this change

  • This change is already covered by existing HA tests
  • The modified StandaloneCompletedCheckpointStoreTest was essentially testing behavior of an illegal state

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no (see comments above)
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@uce
Copy link
Contributor Author

uce commented Dec 11, 2018

Closing this as it should be properly addressed at some other point in time.

@uce uce closed this Dec 11, 2018
@uce uce deleted the FLINK-10751-retain_checkpoints_on_suspension branch May 19, 2019 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants