Use a fresh recovery id when retrying recoveries #22325

ywelsch · 2016-12-22T11:59:24Z

Recoveries are tracked on the target node using RecoveryTarget objects that are kept in a RecoveriesCollection. Each recovery has a unique id that is communicated from the recovery target to the source so that it can call back to the target and execute actions using the right recovery context. In case of a network disconnect, recoveries are retried. At the moment, the same recovery id is reused for the restarted recovery. This can lead to confusion though if the disconnect is unilateral and the recovery source continues with the recovery process. If the target reuses the same recovery id while doing a second attempt, there might be two concurrent recoveries running on the source for the same target.

This PR changes the recovery retry process to use a fresh recovery id. It also waits for the first recovery attempt to be fully finished (all resources locally freed) to further prevent concurrent access to the shard. Finally, in case of primary relocation, it also fails a second recovery attempt if the first attempt moved past the finalization step, as the relocation source can then be moved to RELOCATED state and start indexing as primary into the target shard (see TransportReplicationAction). Resetting the target shard in this state could mean that indexing is halted until the recovery retry attempt is completed and could also destroy existing documents indexed and acknowledged before the reset.

Relates to #22043

bleskes

Thanks yannick. I left some initial feedback

bleskes · 2016-12-22T13:41:57Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

-        this(copyFrom.indexShard, copyFrom.sourceNode, copyFrom.listener, copyFrom.cancellableThreads, copyFrom.recoveryId,
-            copyFrom.ensureClusterStateVersionCallback);
-    }
+    private final Map<String, String> tempFileNames = ConcurrentCollections.newConcurrentMap();

    public RecoveryTarget(IndexShard indexShard, DiscoveryNode sourceNode, PeerRecoveryTargetService.RecoveryListener listener,


I think we can remove the private constructor?

bleskes · 2016-12-22T13:42:57Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

+                    throw new TimeoutException("timed out while waiting on previous recovery attempt to release resources");
+                }
+            } catch (InterruptedException e) {
+                Thread.currentThread().interrupt();


this one always gets me - we deal with the interrupt - I.e., throw an exception, why restore the thread's flag?

bleskes · 2016-12-22T13:44:35Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

@@ -310,6 +337,7 @@ protected void closeInternal() {
            store.decRef();
            indexShard.recoveryStats().decCurrentAsTarget();
        }
+        closedLatch.countDown();


hmm.. I think we want this in the finally clause?

I was undecided about that one. I wondered if we should only consider successful closing as proper closing for the latch. Looking at it again: As we catch all the exceptions anyhow (IndexOutput.close() is wrapped by catch (Exception) and the store is deleted with deleteQuiet), there might not be a difference between the two. I'll move it to the finally

bleskes · 2016-12-22T13:46:23Z

core/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

        logger.trace(
            (Supplier<?>) () -> new ParameterizedMessage(
-                "will retry recovery with id [{}] in [{}]", recoveryTarget.recoveryId(), retryAfter), reason);
-        retryRecovery(recoveryTarget, retryAfter, currentRequest);
+                "will retry recovery with id [{}] in [{}]", recoveryRef.status().recoveryId(), retryAfter), reason);


shoulds we change the status() field to target()?

bleskes · 2016-12-22T14:17:08Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveriesCollection.java

            }
+
+            recoveryRef.close(); // close the reference held by RecoveryRunner


we discussed an alternative to not have to close this reference here, but make doRecovery use a recoveryId

bleskes · 2016-12-22T14:17:56Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveriesCollection.java

+                    "shard id mismatch, expected " + shardId + " but was " + oldRecoveryTarget.shardId();
+
+                newRecoveryTarget = oldRecoveryTarget.retryCopy();
+                RecoveryTarget existingTarget = onGoingRecoveries.putIfAbsent(newRecoveryTarget.recoveryId(), newRecoveryTarget);


we need to have a new monitor here..

bleskes · 2016-12-22T14:18:57Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveriesCollection.java

            }
+
+            recoveryRef.close(); // close the reference held by RecoveryRunner
+            oldRecoveryTarget.resetRecovery(retryCleanupTimeout); // Closes the current recovery target


we talked about an option to run this under the cancellable threads of the new target, so we rely on it's monitor to cancel this if it takes too long. This will save us retryCleanupTimeout setting.

bleskes

Thx @ywelsch . I left some very minor comments. I think we're almost there. Btw - I think we mentioned something about having a test, but I don't see it? Should we also link to the issue that started this all?

bleskes · 2016-12-23T10:13:39Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

-            indexShard.performRecoveryRestart();
+            try {
+                indexShard.performRecoveryRestart();
+            } catch (IOException ioe) {


I believe you can run this via CancellableThreads.executeIO, which means we won't this exception juggling

bleskes · 2016-12-23T10:15:58Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

@@ -188,20 +185,13 @@ public void renameAllTempFiles() throws IOException {
    /**
     * Closes the current recovery target and waits up to a certain timeout for resources to be freed
     */
-    void resetRecovery(TimeValue timeout) throws IOException, TimeoutException {
+    void resetRecovery() throws InterruptedException {
        ensureRefCount();
        if (finished.compareAndSet(false, true)) {


I think we should return a boolean to indicate whether this actually worked. I'm afraid that someone called the fail method concurrently, in which case the listener is notified and we can't reset the recovery any more. This means that if the caller of this methods get's a false back, it should cancel the new recovery.

bleskes · 2016-12-23T10:23:06Z

core/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

-            onGoingRecoveries.failRecovery(recoveryTarget.recoveryId(),
-                    new RecoveryFailedException(recoveryTarget.state(), "failed to list local files", e), true);
+    private void doRecovery(final long recoveryId) {
+        RecoveryRef recoveryRef = onGoingRecoveries.getRecovery(recoveryId);


we should release the reference. Can we do all prep work under a try with resources until and including the creation of the start recovery requests? I believe we only need the recovery id and the cancelable threads after that, so we can capture it within the block. It will be easier to follow.

The RecoveryTarget is needed for one more thing: recoveryTarget.indexShard().prepareForIndexRecovery(); which moves the recovery stage from INIT to INDEX. I've moved this call further up and made it fail the recovery/shard if this step fails. Previously it would retry in this situation, and reset the recovery stage back to INIT, but now with double recoveries being forbidden, I think we should fail early on this.

bleskes · 2016-12-23T10:25:27Z

core/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java

            // release the initial reference. recovery files will be cleaned as soon as ref count goes to zero, potentially now
            decRef();
+            closedLatch.await();
+            RecoveryState.Stage stage = indexShard.recoveryState().getStage();
+            if (indexShard.recoveryState().getPrimary() && (stage == RecoveryState.Stage.FINALIZE || stage == RecoveryState.Stage.DONE)) {


can we something to the PR description about this? it's subtle but important imo.

bleskes

LGTM. Thx @ywelsch

ywelsch · 2016-12-29T09:58:21Z

Thanks @bleskes

Recoveries are tracked on the target node using RecoveryTarget objects that are kept in a RecoveriesCollection. Each recovery has a unique id that is communicated from the recovery target to the source so that it can call back to the target and execute actions using the right recovery context. In case of a network disconnect, recoveries are retried. At the moment, the same recovery id is reused for the restarted recovery. This can lead to confusion though if the disconnect is unilateral and the recovery source continues with the recovery process. If the target reuses the same recovery id while doing a second attempt, there might be two concurrent recoveries running on the source for the same target. This commit changes the recovery retry process to use a fresh recovery id. It also waits for the first recovery attempt to be fully finished (all resources locally freed) to further prevent concurrent access to the shard. Finally, in case of primary relocation, it also fails a second recovery attempt if the first attempt moved past the finalization step, as the relocation source can then be moved to RELOCATED state and start indexing as primary into the target shard (see TransportReplicationAction). Resetting the target shard in this state could mean that indexing is halted until the recovery retry attempt is completed and could also destroy existing documents indexed and acknowledged before the reset. Relates to #22043

Resetting a recovery consists of resetting the old recovery target and replacing it by a new recovery target object. This is done on the Cancellable threads of the new recovery target. If the new recovery target is already cancelled before or while this happens, for example due to shard closing or recovery source changing, we have to make sure that the old recovery target object frees all shard resources. Relates to #22325

ywelsch · 2016-12-29T16:07:30Z

I've pushed 816e1c6 which fixes an issue uncovered by a CI run:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=debian/459/consoleFull

Resetting a recovery consists of resetting the old recovery target and replacing it by a new recovery target object. This is done on the Cancellable threads of the new recovery target. If the new recovery target is already cancelled before or while this happens, for example due to shard closing or recovery source changing, we have to make sure that the old recovery target object frees all shard resources. Relates to #22325

#22325 changed the recovery retry logic to use unique recovery ids. The change also introduced an issue, however, which made it possible for the shard store to be closed under CancellableThreads, triggering assertions in the node locking logic. This commit limits the use of CancellableThreads only to the part where we wait on the old recovery target to be closed.

Use a fresh recovery id when retrying recovery

beb0688

ywelsch added :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >bug labels Dec 22, 2016

ywelsch requested a review from bleskes December 22, 2016 11:59

bleskes suggested changes Dec 22, 2016

View reviewed changes

review comments

0a13521

bleskes suggested changes Dec 23, 2016

View reviewed changes

ywelsch added 2 commits December 27, 2016 10:56

review round 2

ed71158

some cleanups

6c9fc8e

bleskes approved these changes Dec 28, 2016

View reviewed changes

fix test

b82fe29

ywelsch added v5.1.2 v5.2.0 v6.0.0-alpha1 labels Dec 29, 2016

ywelsch merged commit 6e6d9eb into elastic:master Dec 29, 2016

ywelsch mentioned this pull request Dec 29, 2016

master: "CurrentState[RELOCATED] operation only allowed when started/recovering" (and three stuck shards) #22043

Closed

ywelsch mentioned this pull request Jan 4, 2017

Don't close store under CancellableThreads #22434

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a fresh recovery id when retrying recoveries #22325

Use a fresh recovery id when retrying recoveries #22325

ywelsch commented Dec 22, 2016 •

edited

bleskes left a comment

bleskes Dec 22, 2016

bleskes Dec 22, 2016

bleskes Dec 22, 2016

ywelsch Dec 22, 2016

bleskes Dec 22, 2016

bleskes Dec 22, 2016

bleskes Dec 22, 2016

bleskes Dec 22, 2016

bleskes left a comment

bleskes Dec 23, 2016

bleskes Dec 23, 2016

bleskes Dec 23, 2016

ywelsch Dec 27, 2016

bleskes Dec 23, 2016

ywelsch Dec 27, 2016

bleskes left a comment

ywelsch commented Dec 29, 2016

ywelsch commented Dec 29, 2016

		}

		recoveryRef.close(); // close the reference held by RecoveryRunner

Use a fresh recovery id when retrying recoveries #22325

Use a fresh recovery id when retrying recoveries #22325

Conversation

ywelsch commented Dec 22, 2016 • edited

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes left a comment

Choose a reason for hiding this comment

ywelsch commented Dec 29, 2016

ywelsch commented Dec 29, 2016

ywelsch commented Dec 22, 2016 •

edited