Ensures cleanup of temporary index-* generational blobs during snapshotting #21469

abeyad · 2016-11-10T15:36:37Z

This PR ensures pending index-* blobs are deleted when snapshotting. The
index-* blobs are generational files that maintain the snapshots
in the repository. To write these atomically, we first write a
pending-index-* blob, then move it to index-*, which also deletes
pending-index-* in case its not a file-system level move (e.g.
S3 repositories) . For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
pending-index-5 and then move pending-index-5 to index-5. It is
possible that we fail after writing pending-index-5, but before
moving it to index-5 or deleting pending-index-5. In this case,
we will have a dangling pending-index-5 blob laying around. Since
snapshot number 5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to index-5, which first tries to
write to pending-index-5 before moving the blob to index-5. Since
pending-index-5 is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
pending-index-* blobs, and secondly, strengthen the logic around
failure to write the index-* generational blob to ensure pending
files are deleted on cleanup.

Closes #21462

index-* blobs are generational files that maintain the snapshots in the repository. To write these atomically, we first write a `pending-index-*` blob, then move it to `index-*`, which also deletes `pending-index-*` in case its not a file-system level move (e.g. S3 repositories) . For example, to write the 5th generation of the index blob for the repository, we would first write the bytes to `pending-index-5` and then move `pending-index-5` to `index-5`. It is possible that we fail after writing `pending-index-5`, but before moving it to `index-5` or deleting `pending-index-5`. In this case, we will have a dangling `pending-index-5` blob laying around. Since snapshot elastic#5 would have failed, the next snapshot assumes a generation number of 5, so it tries to write to `index-5`, which first tries to write to `pending-index-5` before moving the blob to `index-5`. Since `pending-index-5` is leftover from the previous failure, the snapshot fails as it cannot overwrite this blob. This commit solves the problem by first, adding a UUID to the `pending-index-*` blobs, and secondly, strengthen the logic around failure to write the `index-*` generational blob to ensure pending files are deleted on cleanup. Closes elastic#21462

ywelsch

Is it possible to add tests that show that the issue is fixed?

ywelsch · 2016-11-10T15:48:54Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -867,16 +867,23 @@ private long listBlobsToGetLatestIndexId() throws IOException {
    }

    private void writeAtomic(final String blobName, final BytesReference bytesRef) throws IOException {
-        final String tempBlobName = "pending-" + blobName;
+        final String tempBlobName = "pending-" + UUIDs.randomBase64UUID() + "-" + blobName;


I wonder if it's nicer to append the random uuid.

ywelsch · 2016-11-10T15:50:50Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                // exceptionToThrow will capture this and be thrown at the end
+            } catch (IOException e) {
+                if (exceptionToThrow == null) {
+                    exceptionToThrow = e;


exceptionToThrow can never be null?
Maybe it's easier to just do the ex.addSuppressed(...)?

ywelsch · 2016-11-10T15:57:37Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+            IOException exceptionToThrow = ex;
+            try {
+                snapshotsBlobContainer.deleteBlob(tempBlobName);
+            } catch (NoSuchFileException e) {


I don't think we should special-case this. Depending on underlying blobcontainer, a different exception might be thrown.

Its part of the contract of the BlobContainer to throw a NoSuchFileException if the blob doesn't exist on calling deleteBlob, and we changed all implementations to adhere to this contract. That said, you are right, I went through two iterations of this, and now the only way we get to this block is if an exception was thrown, in which case we don't care about the nature of the exception here. Same with below

abeyad · 2016-11-10T16:00:14Z

retest this please

abeyad · 2016-11-10T22:18:25Z

@ywelsch I pushed some commits that add a test and addresses your feedback

ywelsch

left minor comments (easy to fix). LGTM

ywelsch · 2016-11-10T22:48:10Z

core/src/main/java/org/elasticsearch/common/blobstore/BlobContainer.java

-     * target blob already exists, an exception is thrown.
+     * Renames the source blob into the target blob.  If the source blob does not exist or the
+     * target blob already exists, an exception is thrown.  Atomicity of the move operation is
+     * can only be guaranteed on an implementation-by-implementation basis.  The only current


ywelsch · 2016-11-10T22:49:47Z

core/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

            snapshotsBlobContainer.move(tempBlobName, blobName);
        } catch (IOException ex) {
-            // Move failed - try cleaning up
-            snapshotsBlobContainer.deleteBlob(tempBlobName);
+            try {


maybe readd comment here (temp file creation or move failed - clean up)

ywelsch · 2016-11-10T22:52:00Z

core/src/test/java/org/elasticsearch/snapshots/mockstore/MockRepository.java

                maybeIOExceptionOrBlock(targetBlob);
-                super.move(sourceBlob, targetBlob);
+                super.writeBlob(targetBlob, readBlob(sourceBlob), 0L);


super.readBlock instead of readBlock to prevent double maybeIOExceptionOrBlock.

ywelsch · 2016-11-10T22:53:13Z

core/src/test/java/org/elasticsearch/snapshots/SharedClusterSnapshotRestoreIT.java

+        logger.info("--> creating repository");
+        final Path repoPath = randomRepoPath();
+        assertAcked(client().admin().cluster().preparePutRepository("test-repo").setType("mock").setVerify(false).setSettings(
+                Settings.builder().put("location", repoPath).put("random_control_io_exception_rate", 0.2)));


maybe add some randomization to the random_control_io_exception_rate?

ywelsch · 2016-11-10T22:55:27Z

core/src/test/java/org/elasticsearch/snapshots/SharedClusterSnapshotRestoreIT.java

+                    assertThat(shardFailure.reason(), containsString("Random IOException"));
+                }
+            }
+        } catch (Exception ex) {


can we catch a more specific exception type? IOException with the Random IOException string?

The issue here is that either SnapshotCreationException or RepositoryException can be thrown, but what we can do here is ensure the stack trace has the Random IOException in the nested stack trace.

can be specified as catch (SnapshotCreationException | RepositoryException ex) in Java ;-)

abeyad · 2016-11-11T02:44:13Z

@ywelsch I pushed a commit to address your latest feedback. Thanks for reviewing!

…otting (#21469) Ensures pending index-* blobs are deleted when snapshotting. The index-* blobs are generational files that maintain the snapshots in the repository. To write these atomically, we first write a `pending-index-*` blob, then move it to `index-*`, which also deletes `pending-index-*` in case its not a file-system level move (e.g. S3 repositories) . For example, to write the 5th generation of the index blob for the repository, we would first write the bytes to `pending-index-5` and then move `pending-index-5` to `index-5`. It is possible that we fail after writing `pending-index-5`, but before moving it to `index-5` or deleting `pending-index-5`. In this case, we will have a dangling `pending-index-5` blob laying around. Since snapshot #5 would have failed, the next snapshot assumes a generation number of 5, so it tries to write to `index-5`, which first tries to write to `pending-index-5` before moving the blob to `index-5`. Since `pending-index-5` is leftover from the previous failure, the snapshot fails as it cannot overwrite this blob. This commit solves the problem by first, adding a UUID to the `pending-index-*` blobs, and secondly, strengthen the logic around failure to write the `index-*` generational blob to ensure pending files are deleted on cleanup. Closes #21462

abeyad · 2016-11-11T04:14:59Z

5.x commit: 6966fa5
5.0 commit: a00b588

ywelsch · 2016-11-11T08:48:29Z

core/src/test/java/org/elasticsearch/snapshots/mockstore/MockRepository.java

                maybeIOExceptionOrBlock(targetBlob);
-                super.move(sourceBlob, targetBlob);
+                super.writeBlob(targetBlob, super.readBlob(sourceBlob), 0L);
+                super.deleteBlob(sourceBlob);


sorry, forgot to add here that we could randomize between atomic and non-atomic move.

I pushed adb7aad to randomize between atomic and non-atomic

* master: ShardActiveResponseHandler shouldn't hold to an entire cluster state Ensures cleanup of temporary index-* generational blobs during snapshotting (#21469) Remove (again) test uses of onModule (#21414) [TEST] Add assertBusy when checking for pending operation counter after tests Revert "Add trace logging when aquiring and releasing operation locks for replication requests" Allows multiple patterns to be specified for index templates (#21009) [TEST] fixes rebalance single shard check as it isn't guaranteed that a rebalance makes sense and the method only tests if rebalance is allowed Document _reindex with random_score

abeyad added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs review v5.0.1 v5.1.1 v6.0.0-alpha1 labels Nov 10, 2016

ywelsch suggested changes Nov 10, 2016

View reviewed changes

Ali Beyad added 3 commits November 10, 2016 11:07

improved exception handling

b110e77

update javadocs to reflect move is not always atomic

7e1b01e

add test

a0c06a2

ywelsch approved these changes Nov 10, 2016

View reviewed changes

address comments

448e08c

abeyad merged commit 3001b63 into elastic:master Nov 11, 2016

abeyad deleted the fix/snapshot_blob_overwrites branch November 11, 2016 02:45

ywelsch reviewed Nov 11, 2016

View reviewed changes

abeyad mentioned this pull request Nov 11, 2016

SharedClusterSnapshotRestoreIT fails on CI #21496

Closed

clintongormley added the >bug label Nov 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensures cleanup of temporary index-* generational blobs during snapshotting #21469

Ensures cleanup of temporary index-* generational blobs during snapshotting #21469

abeyad commented Nov 10, 2016 •

edited

ywelsch left a comment

ywelsch Nov 10, 2016

abeyad Nov 10, 2016

ywelsch Nov 10, 2016

ywelsch Nov 10, 2016

abeyad Nov 10, 2016

abeyad commented Nov 10, 2016

abeyad commented Nov 10, 2016

ywelsch left a comment

ywelsch Nov 10, 2016

ywelsch Nov 10, 2016

ywelsch Nov 10, 2016

abeyad Nov 11, 2016

ywelsch Nov 10, 2016

ywelsch Nov 10, 2016

abeyad Nov 11, 2016

ywelsch Nov 11, 2016

abeyad commented Nov 11, 2016

abeyad commented Nov 11, 2016

ywelsch Nov 11, 2016

abeyad Nov 11, 2016

Ensures cleanup of temporary index-* generational blobs during snapshotting #21469

Ensures cleanup of temporary index-* generational blobs during snapshotting #21469

Conversation

abeyad commented Nov 10, 2016 • edited

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abeyad commented Nov 10, 2016

abeyad commented Nov 10, 2016

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abeyad commented Nov 11, 2016

abeyad commented Nov 11, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abeyad commented Nov 10, 2016 •

edited