Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup/restore: all tables in an incremental backup must be present in the full backup for restore to work #18633

Closed
dianasaur323 opened this issue Sep 20, 2017 · 23 comments

Comments

@dianasaur323
Copy link
Contributor

dianasaur323 commented Sep 20, 2017

When a restore is run, it validates that all time ranges are accounted for to prevent the user from footgun-ing. Consider the following:

A full backup is run for (0, t1]
An incremental backup is run for (t1, t2]
An incremental backup is run for (t2, t3]

If the user tries to restore but only specifies the (0,t1] and (t2,t3] backups, we error instead of restoring incorrect data. Unfortunately, this check breaks if a new table was added in either of the incremental restores and it falsely thinks there is missing history for that table.

One potential fix for this is when generating the export requests for an incremental backup, to use 0 as the start time for any table that was not present in the full backup (so in essence it's a full backup for that table).

@dianasaur323 dianasaur323 added this to the 1.1 milestone Sep 20, 2017
@danhhz
Copy link
Contributor

danhhz commented Sep 21, 2017

You need to specify both the full and incremental backups to restore from an incremental backup: https://www.cockroachlabs.com/docs/stable/restore.html#restore-from-incremental-backups

@danhhz danhhz closed this as completed Sep 21, 2017
@dianasaur323
Copy link
Contributor Author

You might have to re-explain this to me. Also, I'm sorry - I pasted in the wrong query, since I tried the two commands in both orders to see if the documentation was wrong.

Don't I have both the full (crdbcsvtest/database) and the incremental (crdbcsvtest/database_inc) specified above?

@danhhz
Copy link
Contributor

danhhz commented Sep 21, 2017

Yeah, talked to diana offline and it looks like there's a real issue here (which was hidden by the initially copy-paste error). On a high level, if a table is added after a full backup and then an incremental backup is run, we can't restore that table from the incremental backup. In the 1.1 timeframe, this will have to be a known limitation. I'm going to replace the issue text with a technical description of what's going on and think some more about how we'd fix this

@danhhz danhhz reopened this Sep 21, 2017
@danhhz danhhz changed the title backup/restore: incremental restore fails backup/restore: all tables in an incremental backup must be present in the full backup for restore to work Sep 21, 2017
@danhhz
Copy link
Contributor

danhhz commented Sep 21, 2017

@cuongdo unless there's an easier fix I'm missing, this is going to require that we have more than one start time associated with a backup, which means changes to the BackupDescriptor proto that we serialize next to a backup as well as the BackupDetails in the jobs table. Which likely qualifies this as a 1.1 known limitation and a 1.2 fix. Thoughts?

@danhhz danhhz assigned cuongdo and unassigned danhhz Sep 21, 2017
@benesch
Copy link
Contributor

benesch commented Sep 21, 2017

Could this be a potential 1.1 workaround? Haven't actually tried it yet.

BACKUP DATABASE foo TO 'nodelocal://a';
CREATE TABLE foo.new;
BACKUP DATABASE foo TO 'nodelocal://b' INCREMENTAL FROM 'nodelocal://a';
RESTORE foo.* FROM 'nodelocal://a', 'nodelocal://a';
-- Oh te noes!
BACKUP TABLE foo.new TO 'nodelocal://c';
RESTORE foo.* FROM 'nodelocal://a', 'nodelocal://c', 'nodelocal://b';

@danhhz
Copy link
Contributor

danhhz commented Sep 21, 2017

Nice idea, though I think it will reject that since foo.new is present in overlapping times in b and c

@danhhz
Copy link
Contributor

danhhz commented Sep 21, 2017

The really unfortunate thing here is that trying to restoreCREATE TABLE a; BACKUP; CREATE TABLE b; BACKUP INCREMENTAL FROM will not work for either a or b. The more I think about this, the more I'm convinced that even if we can't get b working, maybe we can get a working in something that's cherry-pickable into 1.1. Otherwise adding tables and using incremental backup are more or less incompatible which is a pretty big problem

@cuongdo cuongdo assigned dt and unassigned cuongdo Sep 28, 2017
@jseldess
Copy link
Contributor

Documented this 1.1 limitation in cockroachdb/docs#1990.

@dianasaur323
Copy link
Contributor Author

@dt @mjibson I just noticed that we didn't talk about this issue during our bug squashing meeting. I can't remember whether or not you two are already working on this, but if not, I'm assuming we want to get to this in 1.2?

@benesch
Copy link
Contributor

benesch commented Oct 19, 2017

The worst of this is already fixed, even in 1.1: #19286
Basically, if your incremental backup would not be restorable due to this bug, we no longer let you create it.

@dt
Copy link
Member

dt commented Oct 19, 2017

ditto what @benesch said, with added point that we might want to "fix" the issue (not just error) in 1.2. doing so "just" requires adding more granular time bounds information to the backup metadata, then using that to determine if the previous backups indeed cover the right tables over all of time.

However there's a ux question of if/when we actually want to do that -- automatically include essentially a full backup of one or more of the tables when doing an "incremental" backup.

One use case (A):
nightly BACKUP DATABASE widgetco TO <today> INCREMENTAL FROM <yesterday>.
you add a new or drop a table to widgetco.

Another use case B:
BACKUP users TO <today> followed by BACKUP users, products INCREMENTAL <yesterday> the next day.

And finally, an almost silly case C:
BACKUP users TO foo followed by BACKUP orders TO bar INCREMENTAL FROM foo

In all three cases, the set of tables in the first backup does not match the set of tables being backed up.

(A) seems like it should probably Just Work.

(C) seems like it is more likely the operator has just mistakenly pointed at the wrong previous backup. A full backup might be much bigger or more expensive, so they my be unpleasantly surprised when their "incremental" backup contains the entire orders table.

(B) isn't quite as clear cut -- under the hood it is the same as (A) except the new table might be old/huge, so it has some of the same potential for unpleasant surprises as (C).

On possible rule that catches (C) would be that there must be some overlap in the previous and current backup.

Another possible option might be to set the start time range that must be covered to the creation time of the table. Then a previous backup that doesn't include a new table is OK.

@dianasaur323
Copy link
Contributor Author

@dt I think you are right in that case B and C seem odd. Based on the customer who ran this, he was backing up the entire database, not adding new tables to an existing backup -> that definitely seems like a weird edge case that I don't particularly think we need to support. If they want to add a table to an existing full backup that existed prior to the full backup's timestamp, I think it's reasonable to force them to run a full backup.

@dt
Copy link
Member

dt commented Oct 19, 2017

backing up a database with nightly incrementals, I expect, the default use case, so I think it would be ideal if that Just Works, even if you add/drop/truncate tables in that DB. Under the hood, supporting that is about the same as supporting B and C, but IMO, B and C look like usage errors that I'd expect to fail rather than silently de-incrementalize themselves.

@dianasaur323
Copy link
Contributor Author

makes sense to me! what do you think the work would be to support this in terms of time? I'm worried about adding more things to our full plate.

@dt
Copy link
Member

dt commented Oct 19, 2017

@danhhz @benesch off the top of my head, the obvious but complicated approach we already discussed seems to be to start keeping per-span time bounds and then de-incrementalize new keyspace. To reject B and C, we'd want to check something like table creation time.

Alternatively, I think we could also relax the the coverage requirement to start at table creation time rather than time 0?

@danhhz
Copy link
Contributor

danhhz commented Oct 19, 2017

What are you thinking of using as "table creation time"? The mvcc timestamp doesn't work (Think about if you're backing up a table which was created via RESTORE; all the restored data will have mvcc timestamps that are less than the descriptor's) and I don't see a creation time on TableDescriptor.

My personal opinion is that the easiest thing to do correctly is handle all of A, B, and C by making start time per-file instead of per-backup. Then you have to figure out the UX issues of a hybrid full/incremental backup but that seems tractable.

@dt
Copy link
Member

dt commented Oct 19, 2017

I was thinking we'd put an HLC timestamp in the table descriptor -- old tables would be 0, which is fine since that would match current behavior but new tables would have it -- which is fine, since is is only new tables where it matters.

@danhhz
Copy link
Contributor

danhhz commented Oct 19, 2017

That could work. I started to think through some of the edge cases (txn writing the desc gets pushed, etc), but as long as it's not a tight lower bound, I don't think they're so bad.

I still think making start time per-file is the way to go, but your call.

@dt
Copy link
Member

dt commented Oct 19, 2017

I don't think (C) should work -- it you said "incremental" but just pointed at the wrong backup, silently switching to a full backup and ignoring the unrelated base backup seems likely to do more harm than good -- it reduces the operational predictably, suddenly running a longer/bigger/more expensive operation than the administrator expected.

That's why I was thinking it might be nice if, instead of expanding the window of changes we capture (and thus potentially capturing more than was intended), we narrow the required range.

@dt
Copy link
Member

dt commented Oct 19, 2017

That said, I can go either way on (B) working, which, if we do want to support, I think implies we want per-file time bounds / per-range start-times, in which case maybe we just reject (c) with an explicit intersection check if we want that. Hmm.

@danhhz
Copy link
Contributor

danhhz commented Oct 19, 2017

Yeah. Disallowing (c) and perhaps warning or something on (b) is what I meant by "Then you have to figure out the UX issues of a hybrid full/incremental backup"

@dt
Copy link
Member

dt commented Oct 19, 2017

in lunch discussion with mjibson, it seems like with revision_history potentially gets a little confusing with "automatically upgrade to full backup" semantics, since we can't actually capture revision history for the newly included tables from time 0

@dianasaur323
Copy link
Contributor Author

@dt this is only a problem in case B right?

@dianasaur323 dianasaur323 added this to Features/UX in Bulk IO Oct 31, 2017
@dt dt closed this as completed Jan 18, 2018
@dt dt removed this from Features/UX in Bulk IO Jan 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants