Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocksdb DLCheckpoint SST file corruption in statestore #2563

Closed
sursingh opened this issue Jan 29, 2021 · 0 comments
Closed

Rocksdb DLCheckpoint SST file corruption in statestore #2563

sursingh opened this issue Jan 29, 2021 · 0 comments
Assignees
Labels
area/tableservice changes related to table service release/4.13.0 type/bug
Milestone

Comments

@sursingh
Copy link
Contributor

BUG REPORT

Describe the bug

During checkpointing large SST files can get corrupted. Since the SST files are shared among checkpoints, this will not be resolved by future checkpoints

To Reproduce

Add values to statestore such that SST files are larger than 128K. Wait for the checkpoints to happen. This would require multiple attempts

Expected behavior

Checkpoints should not get corrupted.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

@dlg99 dlg99 added this to the 4.13.0 milestone Jan 29, 2021
@dlg99 dlg99 added area/tableservice changes related to table service release/4.13.0 labels Jan 29, 2021
@dlg99 dlg99 closed this as completed in efaa993 Jan 29, 2021
dlg99 pushed a commit that referenced this issue Feb 2, 2021
Fix SST File corruption during checkpointing

### Motivation

Since the SST files are shared among checkpoints, this will not be resolved by future checkpoints. We will fail to restore all future checkpoints that depend on this file.

### Changes

The record is sent asynchronously. We need to use a copy of the passed buffer
in the record. The ownership is retained by the caller and will be potentially
changed by the caller. In case of corruption the later blocks were
overwriting the previous blocks resulting in corruption


Master Issue: #2563 



Reviewers: Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com>, Matteo Merli <mmerli@apache.org>

This closes #2564 from sursingh/fix-sst-corruption, closes #2563
dlg99 pushed a commit that referenced this issue Feb 2, 2021
Fix SST File corruption during checkpointing

### Motivation

Since the SST files are shared among checkpoints, this will not be resolved by future checkpoints. We will fail to restore all future checkpoints that depend on this file.

### Changes

The record is sent asynchronously. We need to use a copy of the passed buffer
in the record. The ownership is retained by the caller and will be potentially
changed by the caller. In case of corruption the later blocks were
overwriting the previous blocks resulting in corruption


Master Issue: #2563 



Reviewers: Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com>, Matteo Merli <mmerli@apache.org>

This closes #2564 from sursingh/fix-sst-corruption, closes #2563
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tableservice changes related to table service release/4.13.0 type/bug
Projects
None yet
Development

No branches or pull requests

2 participants