rfc: backup/restore/ingest#8966
Conversation
|
I was hitting the limit of my knowledge of the system, so there are a few fuzzy details left. Please poke holes in what's here. |
|
Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 60 [r1] (raw file):
WriteBatches are faster to apply than higher-level formats, or at least they should be. This difference may be irrelevant since we have to rekey the backup anyway to apply it. docs/RFCS/backup_restore.md, line 76 [r1] (raw file):
Must be no greater than s/skew/offset/g docs/RFCS/backup_restore.md, line 77 [r1] (raw file):
If the DistSender supported parallel RPCs, could we use that instead of building a new custom scheduler? docs/RFCS/backup_restore.md, line 85 [r1] (raw file):
What kind of RPC? docs/RFCS/backup_restore.md, line 86 [r1] (raw file):
"In the past" needs to be defined precisely. Does this operation require the range lease? What exactly does it mean to flush rocksdb? docs/RFCS/backup_restore.md, line 89 [r1] (raw file):
Alternately, the backup RPC could continue, and do a (potentially) remote Scan to get the data from the right side of the split. docs/RFCS/backup_restore.md, line 113 [r1] (raw file):
Is it a requirement to be able to do an incremental backup on top of other incremental backups that don't cover the whole key range? Seems simpler to require that there be one "last backup" timestamp for the whole backup job. docs/RFCS/backup_restore.md, line 117 [r1] (raw file):
Could we add this to rocksdb? I haven't looked at how their filter interface works closely enough to see if this is feasible but bigtable had this kind of metadata associated with each sstable. docs/RFCS/backup_restore.md, line 182 [r1] (raw file):
Why would in-place restore be desirable (as opposed to restoring to a new ID and swapping with a rename? Comments from Reviewable |
|
Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 60 [r1] (raw file):
|
|
Thanks for the comments! I responded to a couple and am going to need to think through some of the others Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 60 [r1] (raw file):
|
|
Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 60 [r1] (raw file):
|
|
Review status: 0 of 1 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 60 [r1] (raw file):
|
| be a deciding factor for potential customers. | ||
|
|
||
|
|
||
| # Detailed design |
|
Review status: 0 of 1 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 91 [r1] (raw file):
Why do we need all MVCC entries instead of just the most recent one per key? Assuming the default GC of 24h, if a backup is restored more than 24h after backup, everything but the most recent MVCC entry will immediately be collected during the next GC cycle. Seems like an easy way to reduce backup/restore time without losing any data that a user would have access to anyway. (Near the end of this RFC you say it would only use the most recent, so I think something is off.) docs/RFCS/backup_restore.md, line 110 [r1] (raw file):
What does consistency mean? That the two table descriptors are identical? What if a column/index has been added? docs/RFCS/backup_restore.md, line 146 [r1] (raw file):
We discussed punting on this issue and making the client the coordinator for now since this depends on yet another feature that lacks an RFC. Does it make sense to do the simpler approach for now? Comments from Reviewable |
910ba6e to
35d3526
Compare
|
Sorry for the delay, RFAL Review status: 0 of 1 files reviewed at latest revision, 15 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 30 [r1] (raw file):
|
|
Review status: 0 of 1 files reviewed at latest revision, 18 unresolved discussions. docs/RFCS/backup_restore.md, line 86 [r1] (raw file):
|
6f5838b to
c9d75d5
Compare
|
Review status: 0 of 1 files reviewed at latest revision, 18 unresolved discussions. docs/RFCS/backup_restore.md, line 86 [r1] (raw file):
|
|
Review status: 0 of 1 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 91 at r1 (raw file):
|
|
Review status: 0 of 1 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 91 at r1 (raw file):
|
|
@bdarnell any outstanding concerns? Review status: 0 of 1 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed. Comments from Reviewable |
|
LGTM Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed. docs/RFCS/backup_restore.md, line 117 at r1 (raw file):
|
Any durable datastore is expected to have the ability to save a snaphot of data and later restore from that snapshot. Even in a system that can gracefully handle a configurable number of node failues, there are other motivations: a general sense of security, "Oops I dropped a table", legally required data archiving, and others. Additionally, in an era where it's easy and useful to produce datasets in the hundreds of gigabytes or terabyte range, CockroachDB has an opportunity for a competitive advantage. First class support for using consistent snapshots as an input to a bulk data pipeline (without any possibility of affecting production traffic) as well as the ability to very quickly serve the output of them could be a deciding factor for potential customers. Closes cockroachdb#8191.
c9d75d5 to
fc9f2b8
Compare
|
Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful. docs/RFCS/backup_restore.md, line 117 at r1 (raw file):
|
Any durable datastore is expected to have the ability to save a snaphot of data
and later restore from that snapshot. Even in a system that can gracefully
handle a configurable number of node failues, there are other motivations: a
general sense of security, "Oops I dropped a table", legally required data
archiving, and others.
Additionally, in an era where it's easy and useful to produce datasets in the
hundreds of gigabytes or terabyte range, CockroachDB has an opportunity for a
competitive advantage. First class support for using consistent snapshots as an
input to a bulk data pipeline (without any possibility of affecting production
traffic) as well as the ability to very quickly serve the output of them could
be a deciding factor for potential customers.
Closes #8191.
This change is