Skip to content

Add checksum validation for SST files#2641

Merged
merlimat merged 1 commit intoapache:masterfrom
sursingh:sst-checksum
Mar 18, 2021
Merged

Add checksum validation for SST files#2641
merlimat merged 1 commit intoapache:masterfrom
sursingh:sst-checksum

Conversation

@sursingh
Copy link
Contributor

Motivation

Normally the SST file are immutable. A SST file from previous checkpoint can
be reused in subsequent checkpoints. This fact is used to avoid unnecessary
upload of SST files.

However there are scenarios in which just the name comparison doesn't work.
It is possible that the checkpoint process doesn't complete (due to
crash/restart). In such cases the stale SST files are left behind. When the
storage container is restarted, it will be correctly restored from previous
checkpoint. When we do a checkpoint on this new state, a new SST files
are created. Since we only compare the SST file names, we assume that files is
already available in the checkpoint store.

At best the size of the new files will mismatch, and restore will fail. But if
the size of the files match, restore will succeed and we will have invalid
data in the state store.

Changes

With this change we are adding the checksum for the SST files. The checksum
will be appended to the name when the SST file is uploaded. This will ensure
that the correct files are always uploaded

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My major concern is the backwards compatibility, see the comments

@sursingh
Copy link
Contributor Author

@dlg99 : Added support to allow downgrade to older version. There is a config knob that is enabled by default. This will create files with regular names to allow old versions to restore from the checkpoint.

@sursingh sursingh requested review from dlg99 and eolivelli March 12, 2021 00:21
Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!
Thank you for going the extra mile with the downgrade compatibility and tests.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

great work

@sursingh
Copy link
Contributor Author

rerun failed tests

Normally the SST file are immutable. A SST file from previous checkpoint can
be reused in subsequent checkpoints. This fact is used to avoid unnecessary
upload of SST files.

However there are scenarios in which just the name comparison doesn't work.
It is possible that the checkpoint process doesn't complete (due to
crash/restart). In such cases the stale SST files are left behind. When the
storage container is restarted, it will be correctly restored from previous
checkpoint. When we do a checkpoint on this new state, a new SST files
are created. Since we only compare the SST file names, we assume that files is
already available in the checkpoint store.

At best the size of the new files will mismatch, and restore will fail. But if
the size of the files match, restore will succeed and we will have invalid
data in the state store.

With this change we are adding the checksum for the SST files. The checksum
will be appended to the name when the SST file is uploaded. This will ensure
that the correct files are always uploaded
@merlimat merlimat added this to the 4.14.0 milestone Mar 18, 2021
@merlimat merlimat merged commit de09829 into apache:master Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants