[FLINK-5820] [state backends] Split shared/exclusive state and properly handle disposal #5396

StephanEwen · 2018-01-31T16:32:57Z

What is the purpose of the change

This PR contains the final changes needed for [FLINK-5820]. Disposal of checkpoint directories happens properly across all file system types (previously did not work properly for some S3 connectors) with reduced calls to the file systems. Shared and exclusive state are split into different directories, to help implement cleanup safety nets.

Brief change log

TaskManagers use the CheckpointStorage to create CheckpointStreamFactories. Previously, these stream factories were created by the StateBackend. This completes the separating out the "storage" aspect of the StateBackend into the CheckpointStorage.
The location where to store state is communicated between the CheckpointCoordinator (instantiating the original CheckpointStorageLocation for a checkpoint/savepoint) and the Tasks in a unified manner. Tasks transparently obtain their CheckpointStreamFactories always in the same way, regardless of whether writing state for checkpoints or savepoints.
Checkpoint state now has the scope EXCLUSIVE or SHARED, which may be stored differently. The current file system based backends put shared state into a /shared directory, while exclusive state goes into the /chk-1234 directory.
Tasks can directly write task-owned state to a checkpoint storage. That state neither belongs specifically to one checkpoint, nor is it shared and eventually released by the Checkpoint Coordinator. Only the tasks themselves may release the state. An example for that type of state are the write ahead logs created by some sinks.
When a checkpoint is finalized, its storage is described by a CompletedCheckpointStorageLocation. That object gives access to addressing, metadata, and handles location disposal. This allows us to drop the "delete parent if empty" logic in File State Handles and fixes the issue that checkpoint directories are currently left over on S3.

Future Work

In the future, the CompletedCheckpointStorageLocation should also be used as a way to handle relative addressing of checkpoints, to allow users to move them to different directories without breaking the internal paths.
We can now implement disposal fast paths, like drop directory as a whole, rather than dropping each state object separately. However, one would still need to release drop shared state objects individually. Finishing these fast paths is currently blocked on some rework of the shared state handles, to make their selective release easier and more robust.

Verifying this change

This change can be verified by running a Flink cluster with a checkpointed program and

This PR also adds and adjusts various unit tests to guard the new behavior.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? Somewhat (it changes the state backend directory layouts)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…oint stats.

Because all checkpoints are now externalized (write their metadata) this is an obsolete test.

… methods together

…CompletedCheckpoint.

…class.

…roperties

…kpointWithDefaultLocation()

…onReference instead of String to communicate the location

…tory from CheckpointStorage and Checkpoint Location Reference to persist checkpoint data.

… Storage and Checkpoint Stream to separate tests suites

…rage and MemoryBackendCheckpointStorage.

…ope for states

…orageLocation to properly handle disposal of checkpoints. That concept allows us to properly handle deletion of a checkpoint storage, for example deleting checkpoint directories, or the dropping of a checkpoint specific table. This replaces the current workaround for file systems, where every file disposal checks if the parent directory is now empty, and deletes it if that is the case. That is not only inefficient, but prohibitively expensive on some systems, like Amazon S3.

…edCheckpointStorageLocation.

…am to FsCheckpointMetadataOutputStream The new name captures the proper use and meaning of the class in a better way.

…ean up their parent directory. Performing directory contents checks and cleaning up the parent directory in the state handle disposal has previously led to excessive file system metadata requests, which especially on systems like Amazon S3 is prohibitively expensive.

aljoscha

These are excellent changes! 👍

I had some comments and questions inline.

aljoscha · 2018-02-01T09:18:54Z

flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java

 /**
- * The configuration of a checkpoint, such as whether
+ * The configuration of a checkpoint. This described whether


aljoscha · 2018-02-01T09:19:22Z

flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java


-	private final boolean forced;
+	/** Type - checkpoit / savepoint. */


aljoscha · 2018-02-01T10:48:41Z

flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CompletedCheckpointTest.java

-		File file = tmpFolder.newFile();
-		assertEquals(true, file.exists());
-
+	public void registerStatesAtRegistry() {


What's the reason for this change?

The test whether state handles are correctly registered at the SharedStateRegistry was originally just sneakily added to a pre-existing metadata file cleanup test. That did not seem right ;-)

This factors the test out into a separate method. The test method should be called testRegisterStatesAtRegistry instead of registerStatesAtRegistry. Will change that...

aljoscha · 2018-02-01T10:51:22Z

...c/test/java/org/apache/flink/runtime/state/filesystem/FsCheckpointStateOutputStreamTest.java

+
+		stream2.close();
+
+		verify(fs, times(0)).delete(any(Path.class), anyBoolean());


nit: This seems somewhat brittle because there could be another "delete" method that the handle uses to delete the parent dir. For "future proof-ness"...

Will add an additional check that the directory still exists

StephanEwen · 2018-02-01T16:01:02Z

Merged in 31e97e5

StephanEwen added 19 commits January 30, 2018 19:15

[hotfix] [checkpointing] Cleanup: Fix Nullability and names in checkp…

ec8e552

…oint stats.

[hotfix] [tests] Drop obsolete CheckpointExternalResumeTest.

cf18831

Because all checkpoints are now externalized (write their metadata) this is an obsolete test.

[hotfix] [checkpoints] Clean up CompletedCheckpoint, grouping related…

a46acdd

… methods together

[hotfix] [checkpoints] Drop ill-defined hashCode() and equals() from …

f3eb951

…CompletedCheckpoint.

[hotfix] [tests] Clean up HeapKeyedStateBackendAsyncByDefaultTest

8d198c7

[FLINK-8531] [checkpoints] (part 1) Pull CheckpointType into its own …

59c917e

…class.

[FLINK-8531] [checkpoints] (part 2) Add CheckpointType to CheckpointP…

e9ed622

…roperties

[FLINK-8531] [checkpoints] (part 3) Rework ExternalizedCheckpointITCase

f38eb54

[FLINK-8531] [checkpoints] (part 4) rename forCheckpoint() to forChec…

2551f0d

…kpointWithDefaultLocation()

[FLINK-8531] [checkpoints] (part 5) Introduce CheckpointStorageLocati…

09dab44

…onReference instead of String to communicate the location

[FLINK-8531] [checkpoints] (part 6) Tasks resolve CheckpointStreamFac…

8288738

…tory from CheckpointStorage and Checkpoint Location Reference to persist checkpoint data.

[FLINK-8531] [checkpoints] (part 7) Move tests specific to Checkpoint…

950e7fd

… Storage and Checkpoint Stream to separate tests suites

[FLINK-8531] [checkpoints] (part 8) Add tests for the FsCheckpointSto…

19d6d16

…rage and MemoryBackendCheckpointStorage.

[FLINK-8531] [checkpoints] (part 9) Introduce EXCLUSIVE and SHARED sc…

62f635d

…ope for states

[hotfix] [runtime] Fix checkstyle for 'runtime/io/network/api'.

3c5440d

[FLINK-8539] [checkpointing] (part 2) Modify all tests to use Complet…

9ec65ee

…edCheckpointStorageLocation.

[FLINK-8539] [checkpointing] (part 3) Rename FixFileFsStateOutputStre…

c30b08c

…am to FsCheckpointMetadataOutputStream The new name captures the proper use and meaning of the class in a better way.

StephanEwen force-pushed the locations branch from 51691e0 to 1ca6ec4 Compare January 31, 2018 19:03

aljoscha approved these changes Feb 1, 2018

View reviewed changes

StephanEwen closed this Feb 1, 2018

rmetzger added the component=Runtime/StateBackends label Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-5820] [state backends] Split shared/exclusive state and properly handle disposal #5396

[FLINK-5820] [state backends] Split shared/exclusive state and properly handle disposal #5396

StephanEwen commented Jan 31, 2018 •

edited

Loading

aljoscha left a comment

aljoscha Feb 1, 2018

aljoscha Feb 1, 2018

aljoscha Feb 1, 2018

StephanEwen Feb 1, 2018

aljoscha Feb 1, 2018

StephanEwen Feb 1, 2018

StephanEwen commented Feb 1, 2018


		private final boolean forced;
		/** Type - checkpoit / savepoint. */


		stream2.close();

		verify(fs, times(0)).delete(any(Path.class), anyBoolean());

[FLINK-5820] [state backends] Split shared/exclusive state and properly handle disposal #5396

[FLINK-5820] [state backends] Split shared/exclusive state and properly handle disposal #5396

Conversation

StephanEwen commented Jan 31, 2018 • edited Loading

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

aljoscha left a comment

Choose a reason for hiding this comment

aljoscha Feb 1, 2018

Choose a reason for hiding this comment

aljoscha Feb 1, 2018

Choose a reason for hiding this comment

aljoscha Feb 1, 2018

Choose a reason for hiding this comment

StephanEwen Feb 1, 2018

Choose a reason for hiding this comment

aljoscha Feb 1, 2018

Choose a reason for hiding this comment

StephanEwen Feb 1, 2018

Choose a reason for hiding this comment

StephanEwen commented Feb 1, 2018

StephanEwen commented Jan 31, 2018 •

edited

Loading