Add checksum validation for SST files#2641
Merged
merlimat merged 1 commit intoapache:masterfrom Mar 18, 2021
Merged
Conversation
dlg99
requested changes
Mar 10, 2021
Contributor
dlg99
left a comment
There was a problem hiding this comment.
My major concern is the backwards compatibility, see the comments
...lib/src/main/java/org/apache/bookkeeper/statelib/impl/rocksdb/checkpoint/CheckpointFile.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/bookkeeper/statelib/impl/rocksdb/checkpoint/RocksdbCheckpointTask.java
Outdated
Show resolved
Hide resolved
.../test/java/org/apache/bookkeeper/statelib/impl/rocksdb/checkpoint/RocksCheckpointerTest.java
Show resolved
Hide resolved
...lib/src/main/java/org/apache/bookkeeper/statelib/impl/rocksdb/checkpoint/CheckpointFile.java
Outdated
Show resolved
Hide resolved
...lib/src/main/java/org/apache/bookkeeper/statelib/impl/rocksdb/checkpoint/CheckpointFile.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/bookkeeper/statelib/impl/rocksdb/checkpoint/RocksdbRestoreTask.java
Show resolved
Hide resolved
Contributor
Author
|
@dlg99 : Added support to allow downgrade to older version. There is a config knob that is enabled by default. This will create files with regular names to allow old versions to restore from the checkpoint. |
dlg99
approved these changes
Mar 12, 2021
Contributor
dlg99
left a comment
There was a problem hiding this comment.
Looks awesome!
Thank you for going the extra mile with the downgrade compatibility and tests.
Contributor
Author
|
rerun failed tests |
Normally the SST file are immutable. A SST file from previous checkpoint can be reused in subsequent checkpoints. This fact is used to avoid unnecessary upload of SST files. However there are scenarios in which just the name comparison doesn't work. It is possible that the checkpoint process doesn't complete (due to crash/restart). In such cases the stale SST files are left behind. When the storage container is restarted, it will be correctly restored from previous checkpoint. When we do a checkpoint on this new state, a new SST files are created. Since we only compare the SST file names, we assume that files is already available in the checkpoint store. At best the size of the new files will mismatch, and restore will fail. But if the size of the files match, restore will succeed and we will have invalid data in the state store. With this change we are adding the checksum for the SST files. The checksum will be appended to the name when the SST file is uploaded. This will ensure that the correct files are always uploaded
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Normally the SST file are immutable. A SST file from previous checkpoint can
be reused in subsequent checkpoints. This fact is used to avoid unnecessary
upload of SST files.
However there are scenarios in which just the name comparison doesn't work.
It is possible that the checkpoint process doesn't complete (due to
crash/restart). In such cases the stale SST files are left behind. When the
storage container is restarted, it will be correctly restored from previous
checkpoint. When we do a checkpoint on this new state, a new SST files
are created. Since we only compare the SST file names, we assume that files is
already available in the checkpoint store.
At best the size of the new files will mismatch, and restore will fail. But if
the size of the files match, restore will succeed and we will have invalid
data in the state store.
Changes
With this change we are adding the checksum for the SST files. The checksum
will be appended to the name when the SST file is uploaded. This will ensure
that the correct files are always uploaded