New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: check mismatched cluster regions on backup and restore #64758
backupccl: check mismatched cluster regions on backup and restore #64758
Conversation
5d62620
to
ed53f1f
Compare
ca5bb27
to
f585f92
Compare
f585f92
to
1e5e3c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Elliebababa it's great that we were able to get to this so quickly in 21.2 🎉
I had a look at the multi-region related files only, and left the restore-heavy files for @pbardea. Generally it looks good, there are a few minor tweaks I suggested below.
Reviewed 2 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa and @pbardea)
pkg/ccl/backupccl/restore_planning.go, line 66 at r1 (raw file):
type DescRewriteMap map[descpb.ID]*jobspb.RestoreDetails_DescriptorRewrite const multiTenantZoneConfigNo = 49854
Nit: Does it make sense to change this to multiTenantZoneConfigIssueNumber
? At first glance I was confused that this is referring to an issue.
pkg/ccl/backupccl/restore_planning.go, line 1673 at r1 (raw file):
} if skipLocalitiesCheck || len(regionDesc) == 0 {
Nit: Does it make sense to split this check into two so that we check for skipLocalitiesCheck
before we pull all the descriptors? Might make things perform slightly better in that case.
pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):
} for _, desc := range regionDesc {
Before we do this, does it make sense to scan all of the descriptors and build a map of the region names? That way in cases where we have a large number of databases to restore (with mostly overlapping regions), the calls to findNodeOfRegion
will be proportional to the number of regions, and not the number of databases*regions.
pkg/ccl/backupccl/restore_planning.go, line 1700 at r1 (raw file):
for _, region := range regionNames { if nodeFound := findNodeOfRegion(nodesResponse.Nodes, region); !nodeFound { mismatchErr := errors.Newf("detected mismatch regions between restore cluster and backup cluster, "+
Nit: Can we make this a bit more verbose "detected a mismatch in regions between the restore cluster and the backup cluster"?
pkg/ccl/backupccl/restore_planning.go, line 1701 at r1 (raw file):
if nodeFound := findNodeOfRegion(nodesResponse.Nodes, region); !nodeFound { mismatchErr := errors.Newf("detected mismatch regions between restore cluster and backup cluster, "+ "missing region %s.", region)
Nit: Is this the first missing region detected?
pkg/ccl/backupccl/restore_planning.go, line 1702 at r1 (raw file):
mismatchErr := errors.Newf("detected mismatch regions between restore cluster and backup cluster, "+ "missing region %s.", region) hintsMsg := fmt.Sprintf("please update cluster localities or restore with option %q.", restoreOptSkipLocalitiesCheck)
Nit: Can we be a bit more verbose here too? Something like "there are two ways you can resolve this issue: 1) update the cluster to which you're restoring to ensure that the regions present on the node's --locality flags match those present in the backup image, or 2) restore with the %q option"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @pbardea)
pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Before we do this, does it make sense to scan all of the descriptors and build a map of the region names? That way in cases where we have a large number of databases to restore (with mostly overlapping regions), the calls to
findNodeOfRegion
will be proportional to the number of regions, and not the number of databases*regions.
ah I see! Thanks for catching! Done.
1e5e3c0
to
afb56fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)
pkg/ccl/backupccl/restore_planning.go, line 66 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Nit: Does it make sense to change this to
multiTenantZoneConfigIssueNumber
? At first glance I was confused that this is referring to an issue.
Or perhaps even export multitenancyZoneCfgIssueNo
in pkg/sql/set_zone_config.go and use that here so that references to that issue number are easily greppable.
pkg/sql/parser/sql.y, line 2782 at r2 (raw file):
$$.val = &tree.RestoreOptions{Detached: true} } | SKIP_LOCALITIES_CHECK
Let's add this flag to the restore's parser tests in pkg/sql/parser/parse_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)
pkg/ccl/backupccl/restore_planning.go, line 1707 at r3 (raw file):
for region := range regionSet { if nodeFound := findNodeOfRegion(nodesResponse.Nodes, region); !nodeFound { mismatchErr := errors.Newf("detected a mismatch in regions between the restore cluster and the backup cluster, "+
Can we actually update this to list out all of the regions that are missing? I think as a user I would want to know all of the regions that I need to add to this cluster rather than needing to run the restore again and again to find all of the regions I need to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @pbardea)
pkg/ccl/backupccl/restore_planning.go, line 1707 at r3 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Can we actually update this to list out all of the regions that are missing? I think as a user I would want to know all of the regions that I need to add to this cluster rather than needing to run the restore again and again to find all of the regions I need to add.
Aha, I see. Done.
pkg/sql/parser/sql.y, line 2782 at r2 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Let's add this flag to the restore's parser tests in pkg/sql/parser/parse_test.go
Done.
b60b4f5
to
41a08b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore changes LGTM with nits -- nice!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)
pkg/ccl/backupccl/restore_planning.go, line 1914 at r4 (raw file):
} } if err := checkClusterRegions(ctx, typesByID, p.ExecCfg().NodesStatusServer, restoreStmt.Options.SkipLocalitiesCheck, p.ExecCfg().Codec); err != nil {
nit: I have a preference of checking for the skip localities check flag here and then only running this check if that flag isn't present to clean up the arguments to the method and so this method actually just checks the cluster regions.
pkg/ccl/backupccl/testdata/backup-restore/multiregion, line 43 at r4 (raw file):
# A new cluster with different localities settings. new-server name=s3 share-io-dir=s1 allow-implicit-access localities=us-east-1,eu-central-1,eu-north-1
can we remove either us-east or eu-central here so that we can test that the error message that reports more than 1 missing region works as expected
pkg/sql/set_zone_config.go, line 943 at r4 (raw file):
} // MultitenancyZoneCfgIssueNo points to the multitenancy zone config issue number
super-nit: .
at the end of this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good on my side too. Thanks again for making this change!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)
pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):
Previously, Elliebababa wrote…
ah I see! Thanks for catching! Done.
Looks great!
Previously, if a restore cluster mismatched the regions in backup cluster, the data would be restored as if the zone configuration does not exist because we did not do the region check for users. This patch checks the regions used by backup at restore planning phase, and hence users are aware of mismatched regions between backup and restore cluster. If there's a mismatched region, users can either update cluster localities or restore with option `--skip-localities-check` to continue. For serverless users, they are allowed to run restore with `--skip-localities-check` specified. Release note (enterprise change): Previously, if a restore cluster mismatched the regions in backup cluster, the data would be restored as if the zone configuration does not exist. This patch checks the regions before restore and hence users are aware of mismatched regions between backup and restore cluster. If there's a mismatched region, users can either update cluster localities or restore with option `--skip-localities-check` to continue.
41a08b1
to
fb76aa8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajstorm and @pbardea)
pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):
Previously, ajstorm (Adam Storm) wrote…
Looks great!
:)
pkg/ccl/backupccl/testdata/backup-restore/multiregion, line 43 at r4 (raw file):
Previously, pbardea (Paul Bardea) wrote…
can we remove either us-east or eu-central here so that we can test that the error message that reports more than 1 missing region works as expected
Done.
pkg/sql/set_zone_config.go, line 943 at r4 (raw file):
Previously, pbardea (Paul Bardea) wrote…
super-nit:
.
at the end of this line
haha done.
TFTR! bors r+ |
Build succeeded: |
Previously, if a restore cluster mismatched the regions in backup
cluster, the data would be restored as if the zone configuration
does not exist because we did not do the region check for users.
This patch checks the regions used by backup at restore planning
phase, and hence users are aware of mismatched regions between
backup and restore cluster. If there's a mismatched region, users
can either update cluster localities or restore with option
--skip-localities-check
to continue.For serverless users, they are allowed to run restore with
--skip-localities-check
specified.Release note (enterprise change): Previously, if a restore
cluster mismatched the regions in backup cluster, the data would
be restored as if the zone configuration does not exist.
This patch checks the regions before restore and hence users
are aware of mismatched regions between backup and restore cluster.
If there's a mismatched region, users can either update cluster
localities or restore with option
--skip-localities-check
tocontinue.
Epic CRDB-2599