Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: add deprecation notice for backup cmd with explicit subdir #79447

Merged
merged 1 commit into from Apr 8, 2022

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Apr 5, 2022

backupccl: add deprecation notice for backup cmd with explicit subdir

Release note (sql change): Previously, Backup commands allowed the user to
specify a custom subdirectory name for their backups via BACKUP .. INTO <subdir> IN <collectionURI>. After this change, this will no longer be
supported. Users can only create a full backup via Backup ... INTO <collectionURI> or an incremental backup on the latest full backup in their
collection via BACKUP ... INTO LATEST IN <collectionURI>. This deprecation
also removes the need to address a bug in SHOW BACKUPS IN which cannot display
user defined subdirectories.

@msbutler msbutler added A-disaster-recovery T-disaster-recovery branch-release-22.1 Used to mark release blockers and technical advisories for 22.1 labels Apr 5, 2022
@msbutler msbutler requested review from dt and benbardin April 5, 2022 19:00
@msbutler msbutler self-assigned this Apr 5, 2022
@msbutler msbutler requested a review from a team April 5, 2022 19:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler
Copy link
Collaborator Author

msbutler commented Apr 6, 2022

One nuance to note: if the latest backup in a collection is one with a user defined subdir, an incremental backup with LATEST will not issue a warning. I think this is fine.

# Enter \? for a brief introduction.
#
demo@127.0.0.1:26257/movr> BACKUP INTO 'sub' IN 'userfile:///foo';
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+---------
  750985278723489793 | succeeded |                  1 | 2602 |          1052 | 474395
(1 row)

NOTICE: BACKUP commands with an explicitly specified subdirectory will be removed in a future release. Users can create a full backup via `BACKUP ... INTO <collectionURI>`, or an incremental backup on the latest full backup in their collection via `BACKUP ... INTO LATEST IN <collectionURI>`

Time: 252ms total (execution 251ms / network 0ms)

demo@127.0.0.1:26257/movr> BACKUP INTO LATEST IN 'userfile:///foo';
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+---------
  750985434582843393 | succeeded |                  1 |   41 |            47 | 251706
(1 row)


Time: 290ms total (execution 290ms / network 0ms)

@@ -792,6 +792,14 @@ func backupPlanHook(
initialDetails.Destination.Subdir = latestFileName
} else if subdir != "" {
initialDetails.Destination.Subdir = "/" + strings.TrimPrefix(subdir, "/")
// Deprecation notice for `BACKUP INTO` syntax with an explicit subdir.
// Remove this once the syntax is deleted in 22.2.
p.BufferClientNotice(ctx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand correctly - we want to deprecate this in BACKUP, but not RESTORE, right? So they can restore from arbitrary points in a series of incremental layers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct! Users of SHOW BACKUP and RESTORE should still be able to pick the specific the full backup in a collection.

Copy link
Collaborator

@benbardin benbardin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

@msbutler msbutler added backport-22.1.x 22.1 is EOL and removed branch-release-22.1 Used to mark release blockers and technical advisories for 22.1 labels Apr 6, 2022
Release note (sql change): Previously, Backup commmands allowed the user to
specify a custom subdirectory name for their backups via `BACKUP .. INTO
<subdir> IN <collectionURI>`. After this change, this will no longer be
supported. Users can only create a full backup via `Backup ... INTO
<collectionURI>` or an incremental backup on the latest full backup in their
collection via `BACKUP ... INTO LATEST IN <collectionURI>`. This deprecation
also removes the need to address a bug in `SHOW BACKUPS IN` which cannot display
user defined subdirectories.
@msbutler msbutler force-pushed the butler-backup-deprecate-subdir branch from b228a8c to 7d2d1d2 Compare April 7, 2022 21:37
@msbutler
Copy link
Collaborator Author

msbutler commented Apr 8, 2022

bors r=benbardin

@craig
Copy link
Contributor

craig bot commented Apr 8, 2022

Build succeeded:

@dt
Copy link
Member

dt commented Apr 8, 2022

Huh, I thought we only wanted to deprecate this only if there is not an existing backup there?

I believe it is perfectly fine to say BACKUP TO x/y/z IN abc if x/y/z is an existing full backup in abc -- that is how you indicate you want to add an incremental backup to x/y/z.

The deprecation here was motivated by our desire to limit how new possible values of x/y/z are added to abc, specifically to have them only be the date-based values we choose during backup into abc. But once there is a value of x/y/z in abc, I don't think we care how you say you want to backup into it, i.e. whether you say so explicitly or via reading latest .

@msbutler
Copy link
Collaborator Author

msbutler commented Apr 8, 2022

hm, i guess you're right. At first I didn't see a use case for creating an incremental backup for the non latest full backup in a collection; but if something goes wrong with the latest full backup chain in a collection, you may want to increment on an earlier backup. I'll fix this.

@dt
Copy link
Member

dt commented Apr 8, 2022

use case for creating an incremental backup for the non latest full backup

You could also be backing up AS OF a particular time that is before the end time of the chain in latest, e.g. if latest has already captured some fat-finger mistake so you're going back and backing up AS OF right before you e.g. ran the UPDATE without the WHERE, into the previous chain.

@msbutler
Copy link
Collaborator Author

msbutler commented Apr 8, 2022

hmmmmm, now that is an intriguing use case I hadn't thought about, which implies we should clarify some things in the code:

  1. Suppose the user runs: BACKUP INTO LATEST IN dest AS OF SYSTEM TIME earlyTime. This will always increments to the latest full backup in the collection. The last branch in this conditional never gets reached if LATEST is specified. Is that our desired behavior? LATEST trumps everything?
			if backupStmt.AppendToLatest {
				initialDetails.Destination.Subdir = latestFileName
			} else if subdir != "" {
				initialDetails.Destination.Subdir = "/" + strings.TrimPrefix(subdir, "/")
				// Deprecation notice for `BACKUP INTO` syntax with an explicit subdir.
				// Remove this once the syntax is deleted in 22.2.
				p.BufferClientNotice(ctx,
					pgnotice.Newf("BACKUP commands with an explicitly specified"+
						" subdirectory will be removed in a future release. Users can create a full backup via `BACKUP ... "+
						"INTO <collectionURI>`, or an incremental backup on the latest full backup in their "+
						"collection via `BACKUP ... INTO LATEST IN <collectionURI>`"))

			} else {
				initialDetails.Destination.Subdir = endTime.GoTime().Format(DateBasedIntoFolderName)
			}
  1. Or, say the user runs BACKUP INTO someTimeSubdir IN dest AS OF SYSTEM TIME earlyTime. again, because of the code above, the explicit subdir overrides the subdir AS OF SYSTEM TIME wants to create.

  2. In fact, with the collection based syntax, the only way to create a full backup chain with an AS OF SYTEM TIME subdir is to specify the time twice BACKUP INTO myTime IN dest AS OF SYSTEM TIME myTime.

I think we should discuss next week and create another issue with a ga blocker? I think we never caught this bc all of the as of system time tests use the old backup syntax :(

Another complication: this ambiguity is in 21.2.

@benbardin
Copy link
Collaborator

use case for creating an incremental backup for the non latest full backup

You could also be backing up AS OF a particular time that is before the end time of the chain in latest, e.g. if latest has already captured some fat-finger mistake so you're going back and backing up AS OF right before you e.g. ran the UPDATE without the WHERE, into the previous chain.

Wouldn't we still want that to be done on top of LATEST?

@dt
Copy link
Member

dt commented Apr 8, 2022

Wouldn't we still want that to be done on top of LATEST?

Eh, not necessarily. Say it is now 22-04-08 09:15, and you have backup /22/4/7 with inc backups at 06, 12, 18 and then /22/4/8 with one inc backup at 06. You look at the logs and go "oh no, we ran this bad UPDATE last night at 21:30". You might then do BACKUP INTO /22/4/7 IN abc AS OF SYSTEM TIME 22-04-07 21:29:29 to extend the 22/4/7 chain from 18 to that last known-good time before the bad thing happened (21:29). You don't want to extend the latest chain, /22/4/8, since it is starts after the time in question (22/4/7-21:29).

@benbardin
Copy link
Collaborator

Sorry, I was ambiguous with "we" :) Users for sure might want to do what you describe, but don't we as CRDB developers think that's too complicated to be a good idea? I.e. the simplicity of "all incremental backups go on LATEST" seems really powerful to me - kinda feels like asking for trouble to help users make trees of backups that we can't really help them manage.

@dt
Copy link
Member

dt commented Apr 8, 2022

Eh, I dunno. I think these should be separate concerns. "LATEST" as just a shorthand/helper for picking a value of x/y/z to which you want to (try to) append. If you say "LATEST", we read go read the value that that points to for you and substitute it into your command. But everything from then on should not care how you arrived at "x/y/z", just what is or isn't in the collection at x/y/z. I think the real distinction is whether you said a subdirectory at all, not whether it was implicit latest or explicit.

@msbutler
Copy link
Collaborator Author

msbutler commented Apr 8, 2022

What if we add the following guardrail: if the user passes AS OF SYSTEM TIME, fail fast if the LATEST subdir (or the explicitly passed time based subdir) is a timestamp after the provided AOST?

msbutler added a commit to msbutler/cockroach that referenced this pull request Apr 8, 2022
Release note (sql change): cockroachdb#79447 mistakingly warned users that
explicit time based backup subdirectories were deprecated. CRDB still welcomes
these! Non time based subdirectories will be deprecated.
@dt
Copy link
Member

dt commented Apr 8, 2022

I'd step back:

  1. If the user passes a subdirectory at all -- either explicitly or as LATEST (so subdirectory!="" || appendToLatest) -- then if that subdirectory does not have a backup in it when we go to resolve it i.e. we elect to make a full backup there, that is deprecated in 22.1; we warn that falling back to a creating a full backup when attempting to append an incremental backup to a missing backup is deprecated. 22.2 makes it a hard failure.

  2. if a user attempts to use an AOST when appending -- again, via latest or explicit -- and that time is less than start time of the prior backup they are appending to -- again, no matter how we found it -- hard fail. end cannot be before start (this might already be this way, but we should check/test using collections).

@dt
Copy link
Member

dt commented Apr 8, 2022

More concretely, that'd probably mean:
a) remove the third case in the code you quoted that is run during planning, where we populate initialDetails.Destination.Subdir with the timestamp-based name. Instead, planning will leave Destination.Subdir empty to indicate "did not indicate into existing backup (aka subdir) in the collection".
b) Then, during the real resolution in the execution phase, where we're actually ready to fully resolve the actual incremental against real cloud storage, we take one of two paths:

  1. if subdirectory is empty then, we are making a new full backup; we do the endTime.GoTime().Format(DateBasedIntoFolderName) and verify that that path does indeed not already contain a backup (prevent overwrite), or
  2. else if subdirectory is not empty, we have been told to backup into an existing backup (append). if it is "latest" we do the lookup and replacement. then we go try to do an inc backup to the backup chain rooted there. If there isn't a backup there, this should fail.

@msbutler
Copy link
Collaborator Author

msbutler commented Apr 8, 2022

sounds good. I can file a ga blocked issue.

msbutler added a commit to msbutler/cockroach that referenced this pull request Apr 12, 2022
Informs cockroachdb#79674, cockroachdb#79672

The changes in this PR sprang out of discussions in cockroachdb#79447

Release note (sql change): This patch introduces a few UX guardrails:

command. After further discussion, we realized explicit subdirectories were
useful for running incremental backups, but not for full backups. To that end,
this pr throws a deprecation warning if: a) a user passes a subdirectory; b)
there does not already exist a full backup in that subdirectory.

Discussion in cockroachdb#79447 also lead to a discovery of two
bugs for backups with AS OF SYSTEM TIME:

Previously, a user could run an AS OF SYSTEM TIME
incremental backup with an end time earlier than the previous backup's end time
, which could lead to an out of order incremental backup chain. This PR causes
the incremental backup to fail if the AS OF SYSTEM TIME is less than the
previous backup's end time.

Previously, if a user ran `BACKUP INTO dest AS OF SYSTEM TIME t` and a full
backup subdirectory already existed at t, the backup would mistakingly increment
on that full backup instead of failing. Without the IN keyword, the user expects
a full backup, not an incremental backup. In this patch, the full backup fails
when it detects a full backup already exists at the resolved subdirectory.
msbutler added a commit to msbutler/cockroach that referenced this pull request Apr 18, 2022
Informs cockroachdb#79674, cockroachdb#79672

The changes in this PR sprang out of discussions in cockroachdb#79447 which deprecated the
ability to pass an explicit subdirectory in any backup command. This PR tweaks
that deprecation warning and adds further UX guardrails.

Release note (sql change): This patch introduces a few UX guardrails, including
one breaking change:

[breaking]: After further discussion, we realized explicit subdirectories were
useful for running incremental backups, but not for full backups. To that end,
this pr throws an error if: a) a user passes a subdirectory; b)
there does not already exist a full backup in that subdirectory. The user can
enable this deprecated syntax by switching the new
bulkio.backup.deprecated_full_backup_with_subdir cluster setting to true.

Discussion in cockroachdb#79447 also lead to a discovery of two
bugs for backups with AS OF SYSTEM TIME:

Previously, a user could run an AS OF SYSTEM TIME
incremental backup with an end time earlier than the previous backup's end time
, which could lead to an out of order incremental backup chain. This PR causes
the incremental backup to fail if the AS OF SYSTEM TIME is less than the
previous backup's end time.

Previously, if a user ran `BACKUP INTO dest AS OF SYSTEM TIME t` and a full
backup subdirectory already existed at t, the backup would mistakingly increment
on that full backup instead of failing. Without the IN keyword, the user expects
a full backup, not an incremental backup. In this patch, the full backup fails
when it detects a full backup already exists at the resolved subdirectory.
msbutler added a commit to msbutler/cockroach that referenced this pull request Apr 18, 2022
Informs cockroachdb#79674, cockroachdb#79672

The changes in this PR sprang out of discussions in cockroachdb#79447 which deprecated the
ability to pass an explicit subdirectory in any backup command. This PR tweaks
that deprecation warning and adds further UX guardrails.

Release note (backward-incompatible change): This patch introduces a few UX
guardrails, including one breaking change:

[breaking]: After further discussion, we realized explicit subdirectories were
useful for running incremental backups, but not for full backups. To that end,
this pr throws an error if: a) a user passes a subdirectory; b)
there does not already exist a full backup in that subdirectory. The user can
enable this deprecated syntax by switching the new
bulkio.backup.deprecated_full_backup_with_subdir cluster setting to true.

Discussion in cockroachdb#79447 also lead to a discovery of two
bugs for backups with AS OF SYSTEM TIME:

Previously, a user could run an AS OF SYSTEM TIME
incremental backup with an end time earlier than the previous backup's end time
, which could lead to an out of order incremental backup chain. This PR causes
the incremental backup to fail if the AS OF SYSTEM TIME is less than the
previous backup's end time.

Previously, if a user ran `BACKUP INTO dest AS OF SYSTEM TIME t` and a full
backup subdirectory already existed at t, the backup would mistakingly increment
on that full backup instead of failing. Without the IN keyword, the user expects
a full backup, not an incremental backup. In this patch, the full backup fails
when it detects a full backup already exists at the resolved subdirectory.
craig bot pushed a commit that referenced this pull request Apr 19, 2022
79799: backupccl: add UX guardrails during backup subdirectory resolution r=dt a=msbutler

Informs #79674, #79672

The changes in this PR sprang out of discussions in #79447 which deprecated the
ability to pass an explicit subdirectory in any backup command. This PR tweaks
that deprecation warning and adds further UX guardrails.

Release note (backward-incompatible change): This patch introduces a few UX guardrails, including
one breaking change:

[breaking]: After further discussion, we realized explicit subdirectories were
useful for running incremental backups, but not for full backups. To that end,
this pr throws an error if: a) a user passes a subdirectory; b)
there does not already exist a full backup in that subdirectory. The user can
enable this deprecated syntax by switching the new
bulkio.backup.deprecated_full_backup_with_subdir cluster setting to true.

Discussion in #79447 also lead to a discovery of two
bugs for backups with AS OF SYSTEM TIME:

Previously, a user could run an AS OF SYSTEM TIME
incremental backup with an end time earlier than the previous backup's end time
, which could lead to an out of order incremental backup chain. This PR causes
the incremental backup to fail if the AS OF SYSTEM TIME is less than the
previous backup's end time.

Previously, if a user ran `BACKUP INTO dest AS OF SYSTEM TIME t` and a full
backup subdirectory already existed at t, the backup would mistakingly increment
on that full backup instead of failing. Without the IN keyword, the user expects
a full backup, not an incremental backup. In this patch, the full backup fails
when it detects a full backup already exists at the resolved subdirectory.

Co-authored-by: Michael Butler <butler@cockroachlabs.com>
msbutler added a commit to msbutler/cockroach that referenced this pull request Apr 19, 2022
Informs cockroachdb#79674, cockroachdb#79672

The changes in this PR sprang out of discussions in cockroachdb#79447 which deprecated the
ability to pass an explicit subdirectory in any backup command. This PR tweaks
that deprecation warning and adds further UX guardrails.

Release note (backward-incompatible change): This patch introduces a few UX
guardrails, including one breaking change:

[breaking]: After further discussion, we realized explicit subdirectories were
useful for running incremental backups, but not for full backups. To that end,
this pr throws an error if: a) a user passes a subdirectory; b)
there does not already exist a full backup in that subdirectory. The user can
enable this deprecated syntax by switching the new
bulkio.backup.deprecated_full_backup_with_subdir cluster setting to true.

Discussion in cockroachdb#79447 also lead to a discovery of two
bugs for backups with AS OF SYSTEM TIME:

Previously, a user could run an AS OF SYSTEM TIME
incremental backup with an end time earlier than the previous backup's end time
, which could lead to an out of order incremental backup chain. This PR causes
the incremental backup to fail if the AS OF SYSTEM TIME is less than the
previous backup's end time.

Previously, if a user ran `BACKUP INTO dest AS OF SYSTEM TIME t` and a full
backup subdirectory already existed at t, the backup would mistakingly increment
on that full backup instead of failing. Without the IN keyword, the user expects
a full backup, not an incremental backup. In this patch, the full backup fails
when it detects a full backup already exists at the resolved subdirectory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants