Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensures cleanup of temporary index-* generational blobs during snapshotting #21469

Merged
merged 5 commits into from Nov 11, 2016

Conversation

abeyad
Copy link

@abeyad abeyad commented Nov 10, 2016

This PR ensures pending index-* blobs are deleted when snapshotting. The
index-* blobs are generational files that maintain the snapshots
in the repository. To write these atomically, we first write a
pending-index-* blob, then move it to index-*, which also deletes
pending-index-* in case its not a file-system level move (e.g.
S3 repositories) . For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
pending-index-5 and then move pending-index-5 to index-5. It is
possible that we fail after writing pending-index-5, but before
moving it to index-5 or deleting pending-index-5. In this case,
we will have a dangling pending-index-5 blob laying around. Since
snapshot number 5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to index-5, which first tries to
write to pending-index-5 before moving the blob to index-5. Since
pending-index-5 is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
pending-index-* blobs, and secondly, strengthen the logic around
failure to write the index-* generational blob to ensure pending
files are deleted on cleanup.

Closes #21462

index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot elastic#5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes elastic#21462
@abeyad abeyad added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs review v5.0.1 v5.1.1 v6.0.0-alpha1 labels Nov 10, 2016
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add tests that show that the issue is fixed?

@@ -867,16 +867,23 @@ private long listBlobsToGetLatestIndexId() throws IOException {
}

private void writeAtomic(final String blobName, final BytesReference bytesRef) throws IOException {
final String tempBlobName = "pending-" + blobName;
final String tempBlobName = "pending-" + UUIDs.randomBase64UUID() + "-" + blobName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's nicer to append the random uuid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

// exceptionToThrow will capture this and be thrown at the end
} catch (IOException e) {
if (exceptionToThrow == null) {
exceptionToThrow = e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exceptionToThrow can never be null?
Maybe it's easier to just do the ex.addSuppressed(...)?

IOException exceptionToThrow = ex;
try {
snapshotsBlobContainer.deleteBlob(tempBlobName);
} catch (NoSuchFileException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should special-case this. Depending on underlying blobcontainer, a different exception might be thrown.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its part of the contract of the BlobContainer to throw a NoSuchFileException if the blob doesn't exist on calling deleteBlob, and we changed all implementations to adhere to this contract. That said, you are right, I went through two iterations of this, and now the only way we get to this block is if an exception was thrown, in which case we don't care about the nature of the exception here. Same with below

@abeyad
Copy link
Author

abeyad commented Nov 10, 2016

retest this please

@abeyad
Copy link
Author

abeyad commented Nov 10, 2016

@ywelsch I pushed some commits that add a test and addresses your feedback

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left minor comments (easy to fix). LGTM

* target blob already exists, an exception is thrown.
* Renames the source blob into the target blob. If the source blob does not exist or the
* target blob already exists, an exception is thrown. Atomicity of the move operation is
* can only be guaranteed on an implementation-by-implementation basis. The only current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is can

snapshotsBlobContainer.move(tempBlobName, blobName);
} catch (IOException ex) {
// Move failed - try cleaning up
snapshotsBlobContainer.deleteBlob(tempBlobName);
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe readd comment here (temp file creation or move failed - clean up)

maybeIOExceptionOrBlock(targetBlob);
super.move(sourceBlob, targetBlob);
super.writeBlob(targetBlob, readBlob(sourceBlob), 0L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super.readBlock instead of readBlock to prevent double maybeIOExceptionOrBlock.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

logger.info("--> creating repository");
final Path repoPath = randomRepoPath();
assertAcked(client().admin().cluster().preparePutRepository("test-repo").setType("mock").setVerify(false).setSettings(
Settings.builder().put("location", repoPath).put("random_control_io_exception_rate", 0.2)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add some randomization to the random_control_io_exception_rate?

assertThat(shardFailure.reason(), containsString("Random IOException"));
}
}
} catch (Exception ex) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we catch a more specific exception type? IOException with the Random IOException string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that either SnapshotCreationException or RepositoryException can be thrown, but what we can do here is ensure the stack trace has the Random IOException in the nested stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be specified as catch (SnapshotCreationException | RepositoryException ex) in Java ;-)

@abeyad
Copy link
Author

abeyad commented Nov 11, 2016

@ywelsch I pushed a commit to address your latest feedback. Thanks for reviewing!

@abeyad abeyad merged commit 3001b63 into elastic:master Nov 11, 2016
@abeyad abeyad deleted the fix/snapshot_blob_overwrites branch November 11, 2016 02:45
abeyad pushed a commit that referenced this pull request Nov 11, 2016
…otting (#21469)

Ensures pending index-* blobs are deleted when snapshotting.  The
index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot #5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes #21462
abeyad pushed a commit that referenced this pull request Nov 11, 2016
…otting (#21469)

Ensures pending index-* blobs are deleted when snapshotting.  The
index-* blobs are generational files that maintain the snapshots
in the repository.  To write these atomically, we first write a
`pending-index-*` blob, then move it to `index-*`, which also deletes
`pending-index-*` in case its not a file-system level move (e.g.
S3 repositories) .  For example, to write the 5th generation of the
index blob for the repository, we would first write the bytes to
`pending-index-5` and then move `pending-index-5` to `index-5`.  It is
possible that we fail after writing `pending-index-5`, but before
moving it to `index-5` or deleting `pending-index-5`.  In this case,
we will have a dangling `pending-index-5` blob laying around.  Since
snapshot #5 would have failed, the next snapshot assumes a generation
number of 5, so it tries to write to `index-5`, which first tries to
write to `pending-index-5` before moving the blob to `index-5`.  Since
`pending-index-5` is leftover from the previous failure, the snapshot
fails as it cannot overwrite this blob.

This commit solves the problem by first, adding a UUID to the
`pending-index-*` blobs, and secondly, strengthen the logic around
failure to write the `index-*` generational blob to ensure pending
files are deleted on cleanup.

Closes #21462
@abeyad
Copy link
Author

abeyad commented Nov 11, 2016

5.x commit: 6966fa5
5.0 commit: a00b588

maybeIOExceptionOrBlock(targetBlob);
super.move(sourceBlob, targetBlob);
super.writeBlob(targetBlob, super.readBlob(sourceBlob), 0L);
super.deleteBlob(sourceBlob);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, forgot to add here that we could randomize between atomic and non-atomic move.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed adb7aad to randomize between atomic and non-atomic

jasontedor added a commit that referenced this pull request Nov 11, 2016
* master:
  ShardActiveResponseHandler shouldn't hold to an entire cluster state
  Ensures cleanup of temporary index-* generational blobs during snapshotting (#21469)
  Remove (again) test uses of onModule (#21414)
  [TEST] Add assertBusy when checking for pending operation counter after tests
  Revert "Add trace logging when aquiring and releasing operation locks for replication requests"
  Allows multiple patterns to be specified for index templates (#21009)
  [TEST] fixes rebalance single shard check as it isn't guaranteed that a rebalance makes sense and the method only tests if rebalance is allowed
  Document _reindex with random_score
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v5.0.1 v5.1.1 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants