Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: pause scheduled backup if resumed on different cluster #111578

Merged
merged 1 commit into from Oct 5, 2023

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Oct 2, 2023

Previously, after cluster restore or c2c cutover, a backed up/ replicated
backup schedule would begin executing on the new cluster. If the backup /
source cluster was still available and executing the schedule, then the two
schedules would compete to run, as outlined in #108028.

This patch prevents this poor UX by pausing the backup schedule if the schedule
realizes it is running on a new cluster. It is then up to the user to resume
backups on the new cluster and prevent the backup collision problem.

Fixes #108028

Release note (sql change): if a scheduled backup resumes on a new cluster (e.g.
after C2C cutover or a tenant restore), the backup schedule will pause. The
user may resume the schedule without changing it, but should take special care
to ensure no other schedule is backing up to the same collection. The user may
also want to cancel the paused schedule and start a new one.

@msbutler msbutler self-assigned this Oct 2, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler changed the title Butler c2c cutover schedule c2c/restore: pause backup schedule post c2c cutover and cluster restore Oct 2, 2023
@msbutler msbutler force-pushed the butler-c2c-cutover-schedule branch 3 times, most recently from c273cba to e215128 Compare October 2, 2023 19:26
@msbutler msbutler marked this pull request as ready for review October 2, 2023 19:27
@msbutler msbutler requested review from a team as code owners October 2, 2023 19:27
@msbutler msbutler requested review from dt, jayshrivastava, srosenberg, smg260 and adityamaru and removed request for a team, jayshrivastava, srosenberg and smg260 October 2, 2023 19:27
@msbutler
Copy link
Collaborator Author

msbutler commented Oct 2, 2023

@dt @adityamaru This is ready for a look. The first two commits are getting reviewed in other prs: Test server change is here, and the clusterID/Version scheduleDetails change here.

I have one high level design q: should I check for the clusterID in the scheduledBackupExecutor implementation of ExecuteJob (current approach), or should I add a new public method to the ScheduledJobExecutor interface along the lines of CheckForNewCluster()? I could go either way.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2¢: firing up a whole c2c end to end test with two tenants and cutover and schedules all involved feels like a kinda expensive way to cover this one behavior of schedules to me, and far away from the code in question. I might vote to just test this by manually creating a schedule with a different (and empty!) creator in create_scheduled_backup_test.go, removing all streamingccl/ diffs?

pkg/ccl/backupccl/schedule_exec.go Outdated Show resolved Hide resolved
pkg/ccl/backupccl/schedule_exec.go Outdated Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-c2c-cutover-schedule branch 2 times, most recently from b2c50ae to 5840e76 Compare October 3, 2023 14:09
@msbutler
Copy link
Collaborator Author

msbutler commented Oct 3, 2023

@dt ready for another look! Added a better test in create_scheduled_backup_test.go

@msbutler msbutler changed the title c2c/restore: pause backup schedule post c2c cutover and cluster restore backupccl: pause scheduled backup if resumed on different cluster Oct 3, 2023
@msbutler msbutler force-pushed the butler-c2c-cutover-schedule branch 2 times, most recently from 050bc49 to 800de37 Compare October 4, 2023 20:10
Previously, after cluster restore or c2c cutover, a backed up/ replicated
backup schedule would begin executing on the new cluster. If the backup /
source cluster was still available and executing the schedule, then the two
schedules would compete to run, as outlined in cockroachdb#108028.

This patch prevents this poor UX by pausing the backup schedule if the schedule
realizes it is running on a new cluster. It is then up to the user to resume
backups on the new cluster and prevent the backup collision problem.

Fixes cockroachdb#108028

Release note (sql change): if a scheduled backup resumes on a new cluster (e.g.
after C2C cutover or a tenant restore), the backup schedule will pause. The
user may resume the schedule without changing it, but should take special care
to ensure no other schedule is backing up to the same collection. The user may
also want to cancel the paused schedule and start a new one.
msbutler added a commit to msbutler/cockroach that referenced this pull request Oct 5, 2023
This patch adds an e2e c2c test to verify that replicated backup schedules
pause on the destination tenant after cutover. This functionality was added
in cockroachdb#111578.

Informs cockroachdb#108028

Release note: none
@msbutler
Copy link
Collaborator Author

msbutler commented Oct 5, 2023

bors r=dt

@craig
Copy link
Contributor

craig bot commented Oct 5, 2023

Build succeeded:

@craig craig bot merged commit 9a1bb22 into cockroachdb:master Oct 5, 2023
8 checks passed
msbutler added a commit to msbutler/cockroach that referenced this pull request Oct 5, 2023
This patch adds an e2e c2c test to verify that replicated backup schedules
pause on the destination tenant after cutover. This functionality was added
in cockroachdb#111578.

Informs cockroachdb#108028

Release note: none
craig bot pushed a commit that referenced this pull request Oct 5, 2023
111804: streamingccl: verify backup schedule pauses after cutover r=stevendanna a=msbutler

This patch adds an e2e c2c test to verify that replicated backup schedules
pause on the destination tenant after cutover. This functionality was added
in #111578.

Informs #108028

Release note: none

Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: ensure that only one cluster can write to a backup collection at a given time.
3 participants