[FLINK-8777][state]Improve resource release for local recovery #5578

sihuazhou · 2018-02-26T10:33:49Z

What is the purpose of the change

This PR fixes FLINK-8777. When recovery from failed, TaskLocalStateStoreImpl.retrieveLocalState() will be invoked, we can release all entry from storedTaskStateByCheckpointID that does not satisfy entry.checkpointID == checkpointID, this can prevent the resource leak when job loop in local checkpoint completed => failed => local checkpoint completed => failed ....

Brief change log

release resource in retrieveLocalState
change the type of toDiscard from Map to a single entry.

Verifying this change

This changes can be verified by the exists tests.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

Documentation

Does this pull request introduce a new feature? (no)

sihuazhou · 2018-02-26T10:35:28Z

@StefanRRichter Could you please have a look at this?

StefanRRichter

Thanks for re-introducing the cleanup, I had some comments in-line.

StefanRRichter · 2018-02-27T14:40:47Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

@@ -90,6 +92,10 @@
 	@GuardedBy("lock")
 	private boolean disposed;

+	/** Whether to discard the useless state when retrieve local checkpoint state. */
+	@Nonnull


This annotation does not fit here.

StefanRRichter · 2018-02-27T14:49:53Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

@@ -90,6 +92,10 @@
 	@GuardedBy("lock")
 	private boolean disposed;

+	/** Whether to discard the useless state when retrieve local checkpoint state. */
+	@Nonnull
+	private boolean retrieveWithDiscard;


Why not make this a general default?

StefanRRichter · 2018-02-27T14:52:37Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

 	}

 	@Override
 	@Nullable
 	public TaskStateSnapshot retrieveLocalState(long checkpointID) {
 		synchronized (lock) {
 			TaskStateSnapshot snapshot = storedTaskStateByCheckpointID.get(checkpointID);
+
+			Iterator<Map.Entry<Long, TaskStateSnapshot>> entryIterator =


I would move all the cleanup logic in a separate method that is just invoked here to separate the concerns.

StefanRRichter · 2018-02-27T14:56:03Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

@@ -285,4 +316,9 @@ public String toString() {
 			", localRecoveryConfig=" + localRecoveryConfig +
 			'}';
 	}
+
+	@VisibleForTesting
+	void setRetrieveWithDiscard(@Nonnull boolean retrieveWithDiscard) {


remove @Nonnull

StefanRRichter · 2018-02-27T15:10:06Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

+
+			if (retrieveWithDiscard) {
+				// Only the TaskStateSnapshot.checkpointID == checkpointID is useful, we remove the others
+				final List<Map.Entry<Long, TaskStateSnapshot>> toRemove = new ArrayList<>();


I think we can de-duplicate code by extracting a method with the code from confirmCheckpoint(...), maybe called pruneCheckpoints(...). We can do the comparison for both use-cases as entryCheckpointId != checkpointID and have a boolean parameter which determines if we break the iteration in the else case or not.

👍 addressed.

StefanRRichter

Overall looks good to me, had a few suggestions.

StefanRRichter · 2018-02-27T16:10:20Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

+	 * Pruning the useless checkpoints.
+	 */
+	private void pruneCheckpoints(long checkpointID, boolean breakTheIteration) {
+


I suggest to add an assert that the thread holds lock and document that this method should be called only when holding the lock.

StefanRRichter · 2018-02-27T16:10:36Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

@@ -159,6 +166,11 @@ public TaskStateSnapshot retrieveLocalState(long checkpointID) {
 		TaskStateSnapshot snapshot;
 		synchronized (lock) {
 			snapshot = storedTaskStateByCheckpointID.get(checkpointID);
+
+			if (retrieveWithDiscard) {
+				// Only the TaskStateSnapshot.checkpointID == checkpointID is useful, we remove the others


Comment is no longer required.

StefanRRichter · 2018-02-27T16:11:35Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

@@ -90,6 +92,9 @@
 	@GuardedBy("lock")
 	private boolean disposed;

+	/** Whether to discard the useless state when retrieve local checkpoint state. */
+	private boolean retrieveWithDiscard = true;


Why do we need this? Is there any case for not doing the cleanup?

Aha, this is just for passing the existing test case in TaskLocalStateStoreImplTest ...

private void checkStoredAsExpected(List<TaskStateSnapshot> history, int off, int len) throws Exception { for (int i = off; i < len; ++i) { TaskStateSnapshot expected = history.get(i); Assert.assertTrue(expected == taskLocalStateStore.retrieveLocalState(i)); Mockito.verify(expected, Mockito.never()).discardState(); } }

Then there are two better options in my opinion, because the flag is pure boilerplate:

Change the test to check what we are doing now, because that is what happens in the real use-case.

Maybe even better: split the method retrieveLocalState further: one method for pruning, one package-private method that does all the pure retrieval, logging, and null transformation. In the old retrieveLocalState, do the cleanup first, then the pure retrieval/logging. Call the package private method in the test to avoid the cleanup.

Maybe the test should then also just do both?

Agreed and I prefer the second option.

StefanRRichter · 2018-02-27T16:30:19Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

+			Map.Entry<Long, TaskStateSnapshot> snapshotEntry = entryIterator.next();
+			long entryCheckpointId = snapshotEntry.getKey();
+
+			if (entryCheckpointId != checkpointID) {


After a second though, while I think this code is currently correct, the case with breaking looks a bit dangerous. Potentially, if the checkpoint id is not there, this would not stop and prune ongoing checkpoints. I wonder if we should make the if a bit more complex, but safer (checking that the breaking case never exceeds the checkpoint id). What do you think?

I agree with you that the breaking case looks a bit dangerous ... I think maybe we could pass a Predicate for the if and let the caller side pass the Predicate into this function. This could make it cleaner from the caller side and don't need to mass the logic into the if to make it complex.

That is fine, from my point of view that is just one way of making the if more complex.

sihuazhou · 2018-02-28T05:28:56Z

@StefanRRichter I have addressed your suggestions, except the one that to make the if a bit complex, instead I introduced a Predicate for the pruneCheckpoints(). I not sure whether it is ok to you, if you still against doing so, I'd like to change the code as to make the if a bit complex.

StefanRRichter · 2018-02-28T09:00:15Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java

+	/**
+	 * Pruning the useless checkpoints, it should be called only when holding the {@link #lock}.
+	 */
+	private void pruneCheckpoints(Predicate<Long> pruningChecker, boolean breakOnceCheckerFalse) {


We could either use LongPredicate instead of Predicate<Long> or not convert snapshotEntry.getKey() to a primitive long.

👍 I prefer to use LongPredicate, addressing ...

StefanRRichter · 2018-02-28T09:12:39Z

Thanks, I will merge this once the final points are addressed :-)

StefanRRichter · 2018-02-28T11:04:26Z

Ok, I took another look at the complete picture and from the test got the feeling that retrieval and pruning should be two separated concerns and that not only should we have two internal methods, but maybe also expose them as different methods. For the sake to keep this short, I made a proposal in this branch, it is the last commit:

https://github.com/StefanRRichter/flink/tree/improve_resource_release_for_local_recovery

If you like the change, I would squash it and commit under your name, because you did all of the important parts. What do you think?

sihuazhou · 2018-02-28T11:46:07Z

@StefanRRichter Thanks, I like it! It looks very good.

sihuazhou changed the title ~~Improve resource release for local recovery~~ [FLINK-8777][state]Improve resource release for local recovery Feb 26, 2018

StefanRRichter reviewed Feb 27, 2018

View reviewed changes

sihuazhou added 4 commits February 27, 2018 23:48

change the type of toDiscard from Map to a single entry.

a2514be

Fix build.

717bcd1

Stefan comments.

e79b994

Rebase on master.

c4c6987

sihuazhou force-pushed the improve_resource_release_for_local_recovery branch from 1207165 to c4c6987 Compare February 27, 2018 15:55

StefanRRichter approved these changes Feb 27, 2018

View reviewed changes

Addressing stefan suggestions.

77e8646

StefanRRichter reviewed Feb 28, 2018

View reviewed changes

Stefan comments.

ad0d72c

Stefan comments.

8b30ceb

asfgit closed this in 296f9ff Feb 28, 2018

rmetzger added the component=Runtime/StateBackends label Mar 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-8777][state]Improve resource release for local recovery #5578

[FLINK-8777][state]Improve resource release for local recovery #5578

sihuazhou commented Feb 26, 2018

sihuazhou commented Feb 26, 2018

StefanRRichter left a comment

StefanRRichter Feb 27, 2018

sihuazhou Feb 27, 2018

StefanRRichter Feb 27, 2018

sihuazhou Feb 27, 2018

StefanRRichter Feb 27, 2018

sihuazhou Feb 27, 2018

StefanRRichter Feb 27, 2018

sihuazhou Feb 27, 2018

StefanRRichter Feb 27, 2018

sihuazhou Feb 27, 2018

StefanRRichter left a comment

StefanRRichter Feb 27, 2018

sihuazhou Feb 28, 2018

StefanRRichter Feb 27, 2018

sihuazhou Feb 28, 2018

StefanRRichter Feb 27, 2018

sihuazhou Feb 28, 2018

StefanRRichter Feb 28, 2018 •

edited

sihuazhou Feb 28, 2018

StefanRRichter Feb 27, 2018

sihuazhou Feb 28, 2018

StefanRRichter Feb 28, 2018

sihuazhou commented Feb 28, 2018

StefanRRichter Feb 28, 2018 •

edited

sihuazhou Feb 28, 2018

StefanRRichter commented Feb 28, 2018

StefanRRichter commented Feb 28, 2018 •

edited

sihuazhou commented Feb 28, 2018

[FLINK-8777][state]Improve resource release for local recovery #5578

[FLINK-8777][state]Improve resource release for local recovery #5578

Conversation

sihuazhou commented Feb 26, 2018

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

sihuazhou commented Feb 26, 2018

StefanRRichter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanRRichter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanRRichter Feb 28, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sihuazhou commented Feb 28, 2018

StefanRRichter Feb 28, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanRRichter commented Feb 28, 2018

StefanRRichter commented Feb 28, 2018 • edited

sihuazhou commented Feb 28, 2018

StefanRRichter Feb 28, 2018 •

edited

StefanRRichter Feb 28, 2018 •

edited

StefanRRichter commented Feb 28, 2018 •

edited