Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: empty full followed by incremental backup can result in use of the wrong codec #115773

Closed
adityamaru opened this issue Dec 7, 2023 · 1 comment · Fixed by #116316
Closed
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Dec 7, 2023

An empty full backup captures no Spans in the backup manifest. If this full backup is then followed by an incremental that does capture some data, the codec we use in the key rewriter is incorrect. This is because we resolve the codec based on the Spans in the full backup manifest, of which there are none, causing us to default to a SystemCodec. Using a system codec when running inside an application tenant generates an error when decoding the key and causes the restore to fail. We should be inferring the codec from the first non-empty backup manifest.

Jira issue: CRDB-34204

@adityamaru adityamaru added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery labels Dec 7, 2023
Copy link

blathers-crl bot commented Dec 7, 2023

cc @cockroachdb/disaster-recovery

@blathers-crl blathers-crl bot added this to Triage in Disaster Recovery Backlog Dec 7, 2023
@adityamaru adityamaru self-assigned this Dec 7, 2023
craig bot pushed a commit that referenced this issue Dec 15, 2023
116316: backupccl: fix codec selection during restore r=msbutler a=adityamaru

Previously, the codec selection logic only looked at a single manifest to determine whether the backup data requires a system or tenant codec to be read. In older release branches if the full backup was empty (but a subsequent incremental was not) then we could incorrectly use a system codec for a backup that actually required a tenant codec.

To fix this we now process all the manifests in the backup chain and use the first non-empty manifest.

Fixes: #115773
Release note(bug fix): an empty full backup, followed by non-empty incrementals taken inside an application tenant may be unrestoreable because of the use of an incorrect SQL codec

Co-authored-by: adityamaru <adityamaru@gmail.com>
@craig craig bot closed this as completed in a75b261 Dec 15, 2023
Disaster Recovery Backlog automation moved this from Triage to Done Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery
Development

Successfully merging a pull request may close this issue.

1 participant