Improve recovery / snapshot restoring file identity handling #7351

s1monw · 2014-08-20T14:38:48Z

This commit changes the way how files are selected for retransmission
on recovery / restore. Today this happens on a per-file basis where the
rather weak checksum and the file length in bytes is compared to check if
a file is identical. This is prone to fail in the case of a checksum collision
which can happen under certain circumstances.
The changes in this commit move the identity comparison to a per-commit / per-segment
level where files are only treated as identical iff all the other files in the
commit / segment are the same. This all or nothing strategy is reducing the chance for
a collision dramatically since we also use a strong hash to identify commits / segments
based on the content of the .si / segments.N file.

s1monw · 2014-08-20T14:46:28Z

@imotov @rmuir can you guys do a review here? I am not sure about the XContent changes in Backup/Restore would be good to get some ideas here...

rmuir · 2014-08-20T15:15:30Z

The diffing logic here etc looks great to me.

imotov · 2014-08-20T17:24:06Z

src/main/java/org/elasticsearch/index/snapshots/blobstore/BlobStoreIndexShardSnapshot.java

+            if (file.metadata.hash() != null && file.metadata().hash().length > 0) {
+                BytesRef hash = file.metadata.hash();
+                // TODO - not sure if this is the right way to do this - can't we pass the bytes ref directly?
+                builder.field(Fields.META_HASH, Base64.encodeBytes(hash.bytes, hash.offset, hash.length));


I think we should use field(String name, byte[] value, int offset, int length) or field(String name, BytesReference value) here and then use XContentParser#binaryValue to read this value back?

imotov · 2014-08-20T17:34:09Z

I left a couple of minor comments. Otherwise, looks good to me.

s1monw · 2014-08-21T07:47:54Z

@imotov I pushed a new commit including a test for the FileInfo serialization

imotov · 2014-08-21T14:58:17Z

LGTM

s1monw · 2014-08-21T15:25:19Z

I think we have a small regression here for snapshot and restore since we don't have the hash for the segments in the already existing snapshot. I think we can read the hashes for those where we calculated them from the snapshot on the fly if necessary. I will open a followup for this as I already discussed this with @imotov

This commit changes the way how files are selected for retransmission on recovery / restore. Today this happens on a per-file basis where the rather weak checksum and the file length in bytes is compared to check if a file is identical. This is prone to fail in the case of a checksum collision which can happen under certain circumstances. The changes in this commit move the identity comparsion to a per-commit / per-segment level where files are only treated as identical iff all the other files in the commit / segment are the same. This "all or nothing" strategy is reducing the chance for a collision dramatically since we also use a strong hash to identify commits / segments based on the content of the ".si" / "segments.N" file. Closes elastic#7351

This commit changes the way how files are selected for retransmission on recovery / restore. Today this happens on a per-file basis where the rather weak checksum and the file length in bytes is compared to check if a file is identical. This is prone to fail in the case of a checksum collision which can happen under certain circumstances. The changes in this commit move the identity comparsion to a per-commit / per-segment level where files are only treated as identical iff all the other files in the commit / segment are the same. This "all or nothing" strategy is reducing the chance for a collision dramatically since we also use a strong hash to identify commits / segments based on the content of the ".si" / "segments.N" file. Closes #7351

Due to additional safety added in elastic#7351 we compute now a strong hash for .si and segments_N files which are compared during snapshot / restore. Old snapshots don't have this hash which can cause unnecessary copying of large amount of data. This commit adds the ability to fetch this hash from the blob store if needed. Closes elastic#7434

Due to additional safety added in #7351 we compute now a strong hash for .si and segments_N files which are compared during snapshot / restore. Old snapshots don't have this hash which can cause unnecessary copying of large amount of data. This commit adds the ability to fetch this hash from the blob store if needed. Closes #7434

This commit changes the way how files are selected for retransmission on recovery / restore. Today this happens on a per-file basis where the rather weak checksum and the file length in bytes is compared to check if a file is identical. This is prone to fail in the case of a checksum collision which can happen under certain circumstances. The changes in this commit move the identity comparsion to a per-commit / per-segment level where files are only treated as identical iff all the other files in the commit / segment are the same. This "all or nothing" strategy is reducing the chance for a collision dramatically since we also use a strong hash to identify commits / segments based on the content of the ".si" / "segments.N" file. Closes #7351

Due to additional safety added in #7351 we compute now a strong hash for .si and segments_N files which are compared during snapshot / restore. Old snapshots don't have this hash which can cause unnecessary copying of large amount of data. This commit adds the ability to fetch this hash from the blob store if needed. Closes #7434

This commit changes the way how files are selected for retransmission on recovery / restore. Today this happens on a per-file basis where the rather weak checksum and the file length in bytes is compared to check if a file is identical. This is prone to fail in the case of a checksum collision which can happen under certain circumstances. The changes in this commit move the identity comparsion to a per-commit / per-segment level where files are only treated as identical iff all the other files in the commit / segment are the same. This "all or nothing" strategy is reducing the chance for a collision dramatically since we also use a strong hash to identify commits / segments based on the content of the ".si" / "segments.N" file. Closes elastic#7351

Due to additional safety added in elastic#7351 we compute now a strong hash for .si and segments_N files which are compared during snapshot / restore. Old snapshots don't have this hash which can cause unnecessary copying of large amount of data. This commit adds the ability to fetch this hash from the blob store if needed. Closes elastic#7434

This commit changes the way how files are selected for retransmission on recovery / restore. Today this happens on a per-file basis where the rather weak checksum and the file length in bytes is compared to check if a file is identical. This is prone to fail in the case of a checksum collision which can happen under certain circumstances. The changes in this commit move the identity comparsion to a per-commit / per-segment level where files are only treated as identical iff all the other files in the commit / segment are the same. This "all or nothing" strategy is reducing the chance for a collision dramatically since we also use a strong hash to identify commits / segments based on the content of the ".si" / "segments.N" file. Closes elastic#7351

Due to additional safety added in elastic#7351 we compute now a strong hash for .si and segments_N files which are compared during snapshot / restore. Old snapshots don't have this hash which can cause unnecessary copying of large amount of data. This commit adds the ability to fetch this hash from the blob store if needed. Closes elastic#7434

kimchy added the resiliency label Aug 20, 2014

s1monw added review labels Aug 20, 2014

imotov reviewed Aug 20, 2014
View reviewed changes

s1monw force-pushed the fix_checksums branch 2 times, most recently from 5d48020 to 60b280c Compare August 21, 2014 14:32

s1monw force-pushed the fix_checksums branch from 60b280c to 058a02b Compare August 21, 2014 16:01

s1monw merged commit 058a02b into elastic:master Aug 21, 2014

s1monw deleted the fix_checksums branch August 25, 2014 14:25

s1monw mentioned this pull request Aug 25, 2014

Snapshot/Restore: Add BWC layer to .si / segments_N hashing to identify segments accurately #7434

Closed

s1monw mentioned this pull request Aug 25, 2014

Add BWC layer to .si / segments_N hashing to identify segments accurately #7436

Merged

jpountz removed the review label Aug 26, 2014

clintongormley changed the title ~~Improve recovery / snapshot restoring file identity handling~~ Resiliency: Improve recovery / snapshot restoring file identity handling Sep 8, 2014

clintongormley added the >enhancement label Sep 11, 2014

s1monw mentioned this pull request Sep 24, 2014

Resiliency: Backport Recovery / Snapshot file identity improvements to 1.3 #7857

Merged

s1monw added v1.3.3 >bug labels Sep 24, 2014

clintongormley removed the >enhancement label Sep 26, 2014

clintongormley added the :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Jun 7, 2015

clintongormley changed the title ~~Resiliency: Improve recovery / snapshot restoring file identity handling~~ Improve recovery / snapshot restoring file identity handling Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve recovery / snapshot restoring file identity handling #7351

Improve recovery / snapshot restoring file identity handling #7351

s1monw commented Aug 20, 2014

s1monw commented Aug 20, 2014

rmuir commented Aug 20, 2014

imotov Aug 20, 2014

imotov commented Aug 20, 2014

s1monw commented Aug 21, 2014

imotov commented Aug 21, 2014

s1monw commented Aug 21, 2014

Improve recovery / snapshot restoring file identity handling #7351

Improve recovery / snapshot restoring file identity handling #7351

Conversation

s1monw commented Aug 20, 2014

s1monw commented Aug 20, 2014

rmuir commented Aug 20, 2014

imotov Aug 20, 2014

Choose a reason for hiding this comment

imotov commented Aug 20, 2014

s1monw commented Aug 21, 2014

imotov commented Aug 21, 2014

s1monw commented Aug 21, 2014