-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-20.2: backupccl: fix restore aost bug with dropped desc revisions #69639
release-20.2: backupccl: fix restore aost bug with dropped desc revisions #69639
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
@pbardea how do you feel about this? |
Hmm now that I think about it, the assertion might be too aggressive. Thinking it through. Edit: The assertion was incorrect, removed. |
f6e4bbe
to
107c2a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like consolidating the dropped desc filtering here to keep it in one place. Perhaps a comment on loadSQLDescsFromBackupsAtTime
would help readability.
This is a fix for a discrepancy in the descriptor resolution logic during restore planning and execution, for a full cluster restore. While the resolution logic in restore planning filtered out descriptor revisions in the dropped state, the logic in execution did not do this. As a a result of this, the restore job would process additional descriptors (the dropped revisions). In the case of full cluster restores, the planning phase picks an id higher than all restored desc ids, for the tempSystemDB. The additional dropped descriptor revisions during execution could have the same id as the tempSystemDB. This id clash would cause issues when processing descriptor rewrites which are keyed on the descriptor id. Table and database restores are not affected by this bug since we filter the descriptors during execution based on the descriptor rewrites we allocated in planning. Since no additional entries for system tables are added to the rewrites, we expect to filter out all dropped revisions since there will be no rewrites allocated for them in the first place. Release note (bug fix): Fixes a bug in full cluster restores where dropped descriptor revisions would cause the restore to fail. Release justification: Fixes a bug in full cluster restore where dropped descriptor revisions were causing restore jobs to fail.
107c2a2
to
753f21e
Compare
Merging this without bake time on master because it is a critical bug that is preventing a customer from successfully running restore. We would like the next dot release to have this fix. The change is small and we have a regression test to exhibit the targeted bug fix. |
This is a fix for a discrepancy in the descriptor resolution
logic during restore planning and execution, for a full cluster restore.
While the resolution logic in restore planning filtered out descriptor
revisions in the dropped state, the logic in execution did not do this. As a
a result of this, the restore job would process additional descriptors (the
dropped revisions). In the case of full cluster restores, the planning phase
picks an id higher than all restored desc ids, for the tempSystemDB. The
additional dropped descriptor revisions during execution could have the same
id as the tempSystemDB. This id clash would cause issues when processing
descriptor rewrites which are keyed on the descriptor id.
Table and database restores are not affected by this bug since we filter the
descriptors during execution based on the descriptor rewrites we allocated in
planning. Since no additional entries for system tables are added to the
rewrites, we expect to filter out all dropped revisions since there will be
no rewrites allocated for them in the first place.
Release note (bug fix): Fixes a bug in full cluster restores where
dropped descriptor revisions would cause the restore to fail.
Release justification: Fixes a bug in full cluster restore where dropped
descriptor revisions were causing restore jobs to fail.