Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: check mismatched cluster regions on backup and restore #64758

Conversation

Elliebababa
Copy link
Contributor

@Elliebababa Elliebababa commented May 6, 2021

Previously, if a restore cluster mismatched the regions in backup
cluster, the data would be restored as if the zone configuration
does not exist because we did not do the region check for users.

This patch checks the regions used by backup at restore planning
phase, and hence users are aware of mismatched regions between
backup and restore cluster. If there's a mismatched region, users
can either update cluster localities or restore with option
--skip-localities-check to continue.
For serverless users, they are allowed to run restore with
--skip-localities-check specified.

Release note (enterprise change): Previously, if a restore
cluster mismatched the regions in backup cluster, the data would
be restored as if the zone configuration does not exist.
This patch checks the regions before restore and hence users
are aware of mismatched regions between backup and restore cluster.
If there's a mismatched region, users can either update cluster
localities or restore with option --skip-localities-check to
continue.

Epic CRDB-2599

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Elliebababa Elliebababa force-pushed the mismatch_cluster_regions_backup_and_restore branch 2 times, most recently from 5d62620 to ed53f1f Compare May 6, 2021 03:38
@Elliebababa Elliebababa requested review from a team and pbardea and removed request for a team May 6, 2021 13:57
@Elliebababa Elliebababa force-pushed the mismatch_cluster_regions_backup_and_restore branch 2 times, most recently from ca5bb27 to f585f92 Compare May 6, 2021 21:43
@Elliebababa Elliebababa marked this pull request as ready for review May 6, 2021 21:47
@Elliebababa Elliebababa requested a review from a team May 7, 2021 04:52
@Elliebababa Elliebababa force-pushed the mismatch_cluster_regions_backup_and_restore branch from f585f92 to 1e5e3c0 Compare May 7, 2021 14:42
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Elliebababa it's great that we were able to get to this so quickly in 21.2 🎉

I had a look at the multi-region related files only, and left the restore-heavy files for @pbardea. Generally it looks good, there are a few minor tweaks I suggested below.

Reviewed 2 of 9 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Elliebababa and @pbardea)


pkg/ccl/backupccl/restore_planning.go, line 66 at r1 (raw file):

type DescRewriteMap map[descpb.ID]*jobspb.RestoreDetails_DescriptorRewrite

const multiTenantZoneConfigNo = 49854

Nit: Does it make sense to change this to multiTenantZoneConfigIssueNumber? At first glance I was confused that this is referring to an issue.


pkg/ccl/backupccl/restore_planning.go, line 1673 at r1 (raw file):

	}

	if skipLocalitiesCheck || len(regionDesc) == 0 {

Nit: Does it make sense to split this check into two so that we check for skipLocalitiesCheck before we pull all the descriptors? Might make things perform slightly better in that case.


pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):

	}

	for _, desc := range regionDesc {

Before we do this, does it make sense to scan all of the descriptors and build a map of the region names? That way in cases where we have a large number of databases to restore (with mostly overlapping regions), the calls to findNodeOfRegion will be proportional to the number of regions, and not the number of databases*regions.


pkg/ccl/backupccl/restore_planning.go, line 1700 at r1 (raw file):

		for _, region := range regionNames {
			if nodeFound := findNodeOfRegion(nodesResponse.Nodes, region); !nodeFound {
				mismatchErr := errors.Newf("detected mismatch regions between restore cluster and backup cluster, "+

Nit: Can we make this a bit more verbose "detected a mismatch in regions between the restore cluster and the backup cluster"?


pkg/ccl/backupccl/restore_planning.go, line 1701 at r1 (raw file):

			if nodeFound := findNodeOfRegion(nodesResponse.Nodes, region); !nodeFound {
				mismatchErr := errors.Newf("detected mismatch regions between restore cluster and backup cluster, "+
					"missing region %s.", region)

Nit: Is this the first missing region detected?


pkg/ccl/backupccl/restore_planning.go, line 1702 at r1 (raw file):

				mismatchErr := errors.Newf("detected mismatch regions between restore cluster and backup cluster, "+
					"missing region %s.", region)
				hintsMsg := fmt.Sprintf("please update cluster localities or restore with option %q.", restoreOptSkipLocalitiesCheck)

Nit: Can we be a bit more verbose here too? Something like "there are two ways you can resolve this issue: 1) update the cluster to which you're restoring to ensure that the regions present on the node's --locality flags match those present in the backup image, or 2) restore with the %q option"?

Copy link
Contributor Author

@Elliebababa Elliebababa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @pbardea)


pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Before we do this, does it make sense to scan all of the descriptors and build a map of the region names? That way in cases where we have a large number of databases to restore (with mostly overlapping regions), the calls to findNodeOfRegion will be proportional to the number of regions, and not the number of databases*regions.

ah I see! Thanks for catching! Done.

@Elliebababa Elliebababa force-pushed the mismatch_cluster_regions_backup_and_restore branch from 1e5e3c0 to afb56fd Compare May 7, 2021 16:21
Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 9 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)


pkg/ccl/backupccl/restore_planning.go, line 66 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Does it make sense to change this to multiTenantZoneConfigIssueNumber? At first glance I was confused that this is referring to an issue.

Or perhaps even export multitenancyZoneCfgIssueNo in pkg/sql/set_zone_config.go and use that here so that references to that issue number are easily greppable.


pkg/sql/parser/sql.y, line 2782 at r2 (raw file):

    $$.val = &tree.RestoreOptions{Detached: true}
  }
| SKIP_LOCALITIES_CHECK

Let's add this flag to the restore's parser tests in pkg/sql/parser/parse_test.go

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)


pkg/ccl/backupccl/restore_planning.go, line 1707 at r3 (raw file):

	for region := range regionSet {
		if nodeFound := findNodeOfRegion(nodesResponse.Nodes, region); !nodeFound {
			mismatchErr := errors.Newf("detected a mismatch in regions between the restore cluster and the backup cluster, "+

Can we actually update this to list out all of the regions that are missing? I think as a user I would want to know all of the regions that I need to add to this cluster rather than needing to run the restore again and again to find all of the regions I need to add.

Copy link
Contributor Author

@Elliebababa Elliebababa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @pbardea)


pkg/ccl/backupccl/restore_planning.go, line 1707 at r3 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Can we actually update this to list out all of the regions that are missing? I think as a user I would want to know all of the regions that I need to add to this cluster rather than needing to run the restore again and again to find all of the regions I need to add.

Aha, I see. Done.


pkg/sql/parser/sql.y, line 2782 at r2 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Let's add this flag to the restore's parser tests in pkg/sql/parser/parse_test.go

Done.

@Elliebababa Elliebababa force-pushed the mismatch_cluster_regions_backup_and_restore branch 2 times, most recently from b60b4f5 to 41a08b1 Compare May 7, 2021 17:41
@pbardea pbardea self-requested a review May 7, 2021 18:15
Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore changes LGTM with nits -- nice!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)


pkg/ccl/backupccl/restore_planning.go, line 1914 at r4 (raw file):

		}
	}
	if err := checkClusterRegions(ctx, typesByID, p.ExecCfg().NodesStatusServer, restoreStmt.Options.SkipLocalitiesCheck, p.ExecCfg().Codec); err != nil {

nit: I have a preference of checking for the skip localities check flag here and then only running this check if that flag isn't present to clean up the arguments to the method and so this method actually just checks the cluster regions.


pkg/ccl/backupccl/testdata/backup-restore/multiregion, line 43 at r4 (raw file):


# A new cluster with different localities settings.
new-server name=s3 share-io-dir=s1 allow-implicit-access localities=us-east-1,eu-central-1,eu-north-1

can we remove either us-east or eu-central here so that we can test that the error message that reports more than 1 missing region works as expected


pkg/sql/set_zone_config.go, line 943 at r4 (raw file):

}

// MultitenancyZoneCfgIssueNo points to the multitenancy zone config issue number

super-nit: . at the end of this line

Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good on my side too. Thanks again for making this change!

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajstorm and @Elliebababa)


pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):

Previously, Elliebababa wrote…

ah I see! Thanks for catching! Done.

Looks great!

@ajstorm ajstorm self-requested a review May 7, 2021 18:41
Previously, if a restore cluster mismatched the regions in backup
cluster, the data would be restored as if the zone configuration
does not exist because we did not do the region check for users.

This patch checks the regions used by backup at restore planning
phase, and hence users are aware of mismatched regions between
backup and restore cluster. If there's a mismatched region, users
can either update cluster localities or restore with option
`--skip-localities-check` to continue.
For serverless users, they are allowed to run restore with
`--skip-localities-check` specified.

Release note (enterprise change): Previously, if a restore
cluster mismatched the regions in backup cluster, the data would
be restored as if the zone configuration does not exist.
This patch checks the regions before restore and hence users
are aware of mismatched regions between backup and restore cluster.
If there's a mismatched region, users can either update cluster
localities or restore with option `--skip-localities-check` to
continue.
@Elliebababa Elliebababa force-pushed the mismatch_cluster_regions_backup_and_restore branch from 41a08b1 to fb76aa8 Compare May 7, 2021 19:30
Copy link
Contributor Author

@Elliebababa Elliebababa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajstorm and @pbardea)


pkg/ccl/backupccl/restore_planning.go, line 1693 at r1 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Looks great!

:)


pkg/ccl/backupccl/testdata/backup-restore/multiregion, line 43 at r4 (raw file):

Previously, pbardea (Paul Bardea) wrote…

can we remove either us-east or eu-central here so that we can test that the error message that reports more than 1 missing region works as expected

Done.


pkg/sql/set_zone_config.go, line 943 at r4 (raw file):

Previously, pbardea (Paul Bardea) wrote…

super-nit: . at the end of this line

haha done.

@Elliebababa
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented May 7, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants