[FLINK-7268] [checkpoints] Scope SharedStateRegistry objects per (re)start by StefanRRichter · Pull Request #4410 · apache/flink

StefanRRichter · 2017-07-27T15:40:52Z

What is the purpose of the change

This PR fixes FLINK-7268. The problem was that ZookeeperCompletedCheckpointStore deletes checkpoints asynchronously. When this happens parallel to a restart, it could happen that the async delete performed shared state de-registration.

Before this PR, the old SharedStateRegistry was kept after restart and the counts where updated from the completed checkpoint store. In the described race, a checkpoint that has a pending delete will not contribute to the new count, but it can still decrement the count once the shared state is unregistered in the async deletion thread. This can accidentally drop counts below 1 and lead to data loss.

The core idea behind the PR is to scope the SharedStateRegistry per (re-)start, so that old pending deletes cannot influence the current count.

SharedStateRegistry is now created via a factory that is passed into the CheckpointCoordinator to simplify testing.

The PR also introduces additional tests and improves the debug/trace logging of incremental checkpointing.

Verifying this change

This change added tests and can be verified as follows:

Run a job with keyed state, using incremental checkpoints and HA mode. Kill TMs to trigger recovery. After a couple of attempts, the problematic condition should be triggered, leading to an infinite recovery loop due to FileNotFoundException. This should be fixed in the PR.

Additional tests:

HAIncrementalRocksDbBackendEventTimeWindowCheckpointingITCase
CheckpointCoordinatortest::testSharedStateRegistrationOnRestore
IncrementalKeyedStateHandleTest::testSharedStateReRegistration

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (YES)

Documentation

Does this pull request introduce a new feature? (no)

StefanRRichter · 2017-07-27T15:41:34Z

CC @aljoscha

aljoscha · 2017-07-28T13:39:04Z

I also reviewed CheckpointCoordinatorTest.testSharedStateRegistrationOnRestore(). Looks good 👍 The rest is the same as in #4410 so let's wait until #4353 is merged and then we can merge this one as well.

…start

…Case This helps tease out races, for example the recently discovered one in cleanup of incremental state handles at the SharedStateRegistry. (cherry picked from commit d7683cc)

tedyu · 2017-08-23T02:55:04Z

flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java

+			sharedStateRegistry.close();
+			sharedStateRegistry = sharedStateRegistryFactory.create(executor);
+
+			// Recover the checkpoints, TODO this could be done only when there is a new leader, not on each recovery


If we use highAvailabilityServices.getJobManagerLeaderRetriever(), Job Id is required.
Can Job Id be obtained from JobVertexID ?

This was referenced Jul 27, 2017

[BACKPORT-1.3] [FLINK-7268] [checkpoints] Scope SharedStateRegistry objects per (re)start #4411

Closed

[BACKPORT-1.3][FLINK-7268] [checkpoints] Scope SharedStateRegistry objects #4413

Closed

StefanRRichter force-pushed the ImprovedIncreemntalCPDebug branch 7 times, most recently from fdc84c0 to 10adb9d Compare August 15, 2017 12:54

StefanRRichter and others added 3 commits August 15, 2017 14:56

[FLINK-7213] Introduce state management by OperatorID in TaskManager

b71154a

[FLINK-7268] [checkpoints] Scope SharedStateRegistry objects per (re)…

91a4b27

…start

[FLINK-7268] Add delaying executor in *EventTimeWindowCheckpointingIT…

d29bed3

…Case This helps tease out races, for example the recently discovered one in cleanup of incremental state handles at the SharedStateRegistry. (cherry picked from commit d7683cc)

StefanRRichter force-pushed the ImprovedIncreemntalCPDebug branch from 10adb9d to d29bed3 Compare August 15, 2017 12:57

asfgit merged commit d29bed3 into apache:master Aug 15, 2017

tedyu reviewed Aug 23, 2017

View reviewed changes

rmetzger added the component=Runtime/StateBackends label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-7268] [checkpoints] Scope SharedStateRegistry objects per (re)start#4410

[FLINK-7268] [checkpoints] Scope SharedStateRegistry objects per (re)start#4410
asfgit merged 3 commits intoapache:masterfrom
StefanRRichter:ImprovedIncreemntalCPDebug

StefanRRichter commented Jul 27, 2017 •

edited

Loading

Uh oh!

StefanRRichter commented Jul 27, 2017

Uh oh!

aljoscha commented Jul 28, 2017

Uh oh!

tedyu Aug 23, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

StefanRRichter commented Jul 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

StefanRRichter commented Jul 27, 2017

Uh oh!

aljoscha commented Jul 28, 2017

Uh oh!

tedyu Aug 23, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

StefanRRichter commented Jul 27, 2017 •

edited

Loading