backup/restore: all tables in an incremental backup must be present in the full backup for restore to work #18633

dianasaur323 · 2017-09-20T20:02:44Z

When a restore is run, it validates that all time ranges are accounted for to prevent the user from footgun-ing. Consider the following:

A full backup is run for (0, t1]
An incremental backup is run for (t1, t2]
An incremental backup is run for (t2, t3]

If the user tries to restore but only specifies the (0,t1] and (t2,t3] backups, we error instead of restoring incorrect data. Unfortunately, this check breaks if a new table was added in either of the incremental restores and it falsely thinks there is missing history for that table.

One potential fix for this is when generating the export requests for an incremental backup, to use 0 as the start time for any table that was not present in the full backup (so in essence it's a full backup for that table).

danhhz · 2017-09-21T14:16:41Z

You need to specify both the full and incremental backups to restore from an incremental backup: https://www.cockroachlabs.com/docs/stable/restore.html#restore-from-incremental-backups

dianasaur323 · 2017-09-21T14:28:49Z

You might have to re-explain this to me. Also, I'm sorry - I pasted in the wrong query, since I tried the two commands in both orders to see if the documentation was wrong.

Don't I have both the full (crdbcsvtest/database) and the incremental (crdbcsvtest/database_inc) specified above?

danhhz · 2017-09-21T14:36:46Z

Yeah, talked to diana offline and it looks like there's a real issue here (which was hidden by the initially copy-paste error). On a high level, if a table is added after a full backup and then an incremental backup is run, we can't restore that table from the incremental backup. In the 1.1 timeframe, this will have to be a known limitation. I'm going to replace the issue text with a technical description of what's going on and think some more about how we'd fix this

danhhz · 2017-09-21T15:30:48Z

@cuongdo unless there's an easier fix I'm missing, this is going to require that we have more than one start time associated with a backup, which means changes to the BackupDescriptor proto that we serialize next to a backup as well as the BackupDetails in the jobs table. Which likely qualifies this as a 1.1 known limitation and a 1.2 fix. Thoughts?

benesch · 2017-09-21T17:06:00Z

Could this be a potential 1.1 workaround? Haven't actually tried it yet.

BACKUP DATABASE foo TO 'nodelocal://a';
CREATE TABLE foo.new;
BACKUP DATABASE foo TO 'nodelocal://b' INCREMENTAL FROM 'nodelocal://a';
RESTORE foo.* FROM 'nodelocal://a', 'nodelocal://a';
-- Oh te noes!
BACKUP TABLE foo.new TO 'nodelocal://c';
RESTORE foo.* FROM 'nodelocal://a', 'nodelocal://c', 'nodelocal://b';

danhhz · 2017-09-21T17:11:48Z

Nice idea, though I think it will reject that since foo.new is present in overlapping times in b and c

danhhz · 2017-09-21T17:16:16Z

The really unfortunate thing here is that trying to restoreCREATE TABLE a; BACKUP; CREATE TABLE b; BACKUP INCREMENTAL FROM will not work for either a or b. The more I think about this, the more I'm convinced that even if we can't get b working, maybe we can get a working in something that's cherry-pickable into 1.1. Otherwise adding tables and using incremental backup are more or less incompatible which is a pretty big problem

jseldess · 2017-10-11T18:47:29Z

Documented this 1.1 limitation in cockroachdb/docs#1990.

dianasaur323 · 2017-10-19T03:37:12Z

@dt @mjibson I just noticed that we didn't talk about this issue during our bug squashing meeting. I can't remember whether or not you two are already working on this, but if not, I'm assuming we want to get to this in 1.2?

benesch · 2017-10-19T03:58:18Z

The worst of this is already fixed, even in 1.1: #19286
Basically, if your incremental backup would not be restorable due to this bug, we no longer let you create it.

dt · 2017-10-19T12:38:39Z

ditto what @benesch said, with added point that we might want to "fix" the issue (not just error) in 1.2. doing so "just" requires adding more granular time bounds information to the backup metadata, then using that to determine if the previous backups indeed cover the right tables over all of time.

However there's a ux question of if/when we actually want to do that -- automatically include essentially a full backup of one or more of the tables when doing an "incremental" backup.

One use case (A):
nightly BACKUP DATABASE widgetco TO <today> INCREMENTAL FROM <yesterday>.
you add a new or drop a table to widgetco.

Another use case B:
BACKUP users TO <today> followed by BACKUP users, products INCREMENTAL <yesterday> the next day.

And finally, an almost silly case C:
BACKUP users TO foo followed by BACKUP orders TO bar INCREMENTAL FROM foo

In all three cases, the set of tables in the first backup does not match the set of tables being backed up.

(A) seems like it should probably Just Work.

(C) seems like it is more likely the operator has just mistakenly pointed at the wrong previous backup. A full backup might be much bigger or more expensive, so they my be unpleasantly surprised when their "incremental" backup contains the entire orders table.

(B) isn't quite as clear cut -- under the hood it is the same as (A) except the new table might be old/huge, so it has some of the same potential for unpleasant surprises as (C).

On possible rule that catches (C) would be that there must be some overlap in the previous and current backup.

Another possible option might be to set the start time range that must be covered to the creation time of the table. Then a previous backup that doesn't include a new table is OK.

dianasaur323 · 2017-10-19T14:04:40Z

@dt I think you are right in that case B and C seem odd. Based on the customer who ran this, he was backing up the entire database, not adding new tables to an existing backup -> that definitely seems like a weird edge case that I don't particularly think we need to support. If they want to add a table to an existing full backup that existed prior to the full backup's timestamp, I think it's reasonable to force them to run a full backup.

dt · 2017-10-19T14:10:43Z

backing up a database with nightly incrementals, I expect, the default use case, so I think it would be ideal if that Just Works, even if you add/drop/truncate tables in that DB. Under the hood, supporting that is about the same as supporting B and C, but IMO, B and C look like usage errors that I'd expect to fail rather than silently de-incrementalize themselves.

dianasaur323 · 2017-10-19T14:19:32Z

makes sense to me! what do you think the work would be to support this in terms of time? I'm worried about adding more things to our full plate.

dt · 2017-10-19T14:30:12Z

@danhhz @benesch off the top of my head, the obvious but complicated approach we already discussed seems to be to start keeping per-span time bounds and then de-incrementalize new keyspace. To reject B and C, we'd want to check something like table creation time.

Alternatively, I think we could also relax the the coverage requirement to start at table creation time rather than time 0?

danhhz · 2017-10-19T14:56:20Z

What are you thinking of using as "table creation time"? The mvcc timestamp doesn't work (Think about if you're backing up a table which was created via RESTORE; all the restored data will have mvcc timestamps that are less than the descriptor's) and I don't see a creation time on TableDescriptor.

My personal opinion is that the easiest thing to do correctly is handle all of A, B, and C by making start time per-file instead of per-backup. Then you have to figure out the UX issues of a hybrid full/incremental backup but that seems tractable.

dt · 2017-10-19T14:59:50Z

I was thinking we'd put an HLC timestamp in the table descriptor -- old tables would be 0, which is fine since that would match current behavior but new tables would have it -- which is fine, since is is only new tables where it matters.

danhhz · 2017-10-19T15:16:36Z

That could work. I started to think through some of the edge cases (txn writing the desc gets pushed, etc), but as long as it's not a tight lower bound, I don't think they're so bad.

I still think making start time per-file is the way to go, but your call.

dt · 2017-10-19T15:26:25Z

I don't think (C) should work -- it you said "incremental" but just pointed at the wrong backup, silently switching to a full backup and ignoring the unrelated base backup seems likely to do more harm than good -- it reduces the operational predictably, suddenly running a longer/bigger/more expensive operation than the administrator expected.

That's why I was thinking it might be nice if, instead of expanding the window of changes we capture (and thus potentially capturing more than was intended), we narrow the required range.

dt · 2017-10-19T15:30:08Z

That said, I can go either way on (B) working, which, if we do want to support, I think implies we want per-file time bounds / per-range start-times, in which case maybe we just reject (c) with an explicit intersection check if we want that. Hmm.

danhhz · 2017-10-19T15:41:41Z

Yeah. Disallowing (c) and perhaps warning or something on (b) is what I meant by "Then you have to figure out the UX issues of a hybrid full/incremental backup"

dt · 2017-10-19T17:52:35Z

in lunch discussion with mjibson, it seems like with revision_history potentially gets a little confusing with "automatically upgrade to full backup" semantics, since we can't actually capture revision history for the newly included tables from time 0

dianasaur323 · 2017-10-19T20:04:09Z

@dt this is only a problem in case B right?

dianasaur323 added this to the 1.1 milestone Sep 20, 2017

cuongdo assigned danhhz Sep 20, 2017

danhhz closed this as completed Sep 21, 2017

danhhz reopened this Sep 21, 2017

danhhz changed the title ~~backup/restore: incremental restore fails~~ backup/restore: all tables in an incremental backup must be present in the full backup for restore to work Sep 21, 2017

danhhz added the docs-known-limitation label Sep 21, 2017

danhhz assigned cuongdo and unassigned danhhz Sep 21, 2017

cuongdo assigned dt and unassigned cuongdo Sep 28, 2017

jseldess added the docs-done label Oct 11, 2017

dianasaur323 modified the milestones: 1.1, 1.2 Oct 19, 2017

dianasaur323 added the A-disaster-recovery label Oct 19, 2017

dianasaur323 added this to Features/UX in Bulk IO Oct 31, 2017

dt closed this as completed Jan 18, 2018

dt removed this from Features/UX in Bulk IO Jan 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backup/restore: all tables in an incremental backup must be present in the full backup for restore to work #18633

backup/restore: all tables in an incremental backup must be present in the full backup for restore to work #18633

dianasaur323 commented Sep 20, 2017 •

edited by danhhz

danhhz commented Sep 21, 2017

dianasaur323 commented Sep 21, 2017

danhhz commented Sep 21, 2017

danhhz commented Sep 21, 2017

benesch commented Sep 21, 2017 •

edited

danhhz commented Sep 21, 2017

danhhz commented Sep 21, 2017

jseldess commented Oct 11, 2017

dianasaur323 commented Oct 19, 2017

benesch commented Oct 19, 2017

dt commented Oct 19, 2017 •

edited

dianasaur323 commented Oct 19, 2017

dt commented Oct 19, 2017

dianasaur323 commented Oct 19, 2017

dt commented Oct 19, 2017 •

edited

danhhz commented Oct 19, 2017

dt commented Oct 19, 2017

danhhz commented Oct 19, 2017

dt commented Oct 19, 2017

dt commented Oct 19, 2017

danhhz commented Oct 19, 2017

dt commented Oct 19, 2017

dianasaur323 commented Oct 19, 2017

backup/restore: all tables in an incremental backup must be present in the full backup for restore to work #18633

backup/restore: all tables in an incremental backup must be present in the full backup for restore to work #18633

Comments

dianasaur323 commented Sep 20, 2017 • edited by danhhz

danhhz commented Sep 21, 2017

dianasaur323 commented Sep 21, 2017

danhhz commented Sep 21, 2017

danhhz commented Sep 21, 2017

benesch commented Sep 21, 2017 • edited

danhhz commented Sep 21, 2017

danhhz commented Sep 21, 2017

jseldess commented Oct 11, 2017

dianasaur323 commented Oct 19, 2017

benesch commented Oct 19, 2017

dt commented Oct 19, 2017 • edited

dianasaur323 commented Oct 19, 2017

dt commented Oct 19, 2017

dianasaur323 commented Oct 19, 2017

dt commented Oct 19, 2017 • edited

danhhz commented Oct 19, 2017

dt commented Oct 19, 2017

danhhz commented Oct 19, 2017

dt commented Oct 19, 2017

dt commented Oct 19, 2017

danhhz commented Oct 19, 2017

dt commented Oct 19, 2017

dianasaur323 commented Oct 19, 2017

dianasaur323 commented Sep 20, 2017 •

edited by danhhz

benesch commented Sep 21, 2017 •

edited

dt commented Oct 19, 2017 •

edited

dt commented Oct 19, 2017 •

edited