backup: support encrypted backups in online restore#165454
backup: support encrypted backups in online restore#165454andrew-r-thomas wants to merge 3 commits intocockroachdb:masterfrom
Conversation
|
Merging to
|
bd58b13 to
c34f1f1
Compare
c34f1f1 to
d7ca0c0
Compare
| package cockroach.storage.enginepb; | ||
| option go_package = "github.com/cockroachdb/cockroach/pkg/storage/enginepb"; | ||
|
|
||
| // RemoteDecodingInfo carries metadata needed to read data from remote storage. |
There was a problem hiding this comment.
could we doc and enforce the invariant that that RemoteDecodingInfo should have the same fields as kvpb.FileEncryptionOptions? The only reason we need to define a new struct is to avoid a dependency cycle. FileEncryptionOptions appears to be how we encrypt/decrypt data for backup and conventional restore, so we should probably have the same fields in RemoteDecodingInfo
There was a problem hiding this comment.
Eh, do we want to enforce that? I'm not even sure kvpb.FileEncryptionOptions should continue to exist, at least not in kvpb, since AFAIK, we don't even use it in any kvrequests these days.
My impression is these are similar but distinct structs: This is "what does the storage.objectReader need to know to read the bytes of this file?". Currently that is just any possible encryption key, but who knows, that could change.
kvpb.FileEncryptionOptions is specifically "is this file encrypted and if so, with what options". That's a narrower set, e.g. maybe we have a storage option like buffering in the future, or options for caching on some sort of CRDB-specific blob storage service in the future. Or maybe later there are some backup encryption options that don't apply to linkable ssts so they don't belong in storage.objectReader.
I'd lean towards keeping this stuct standalone -- it is what a storage.objectReader needs to read bytes, no more, no less.
pkg/storage/encryption.go
Outdated
| pos int64 | ||
| } | ||
|
|
||
| func (a *objectReaderAdapter) ReadAt(p []byte, off int64) (int, error) { |
There was a problem hiding this comment.
i'm a bit confused: could you explain why we need all this new reading infra, if we already have a decrypting reader that knows how to deal with backup file encryption keys?
There was a problem hiding this comment.
oh wait. you are using the encryption reader. let me read a little more closely.
There was a problem hiding this comment.
Yeah I tried a few things here, none of which I loved. The main tension is that we use this logic in two places now, one of which needs an io.ReaderAt interface, and one of which needs a (pebble) remote.ObjectReader interface, both of which have a ReadAt with different signatures and different underlying semantics (namely the ObjectReader one takes a context, and is expected to be called concurrently). This approach is purely additive and is essentially just a wrapper around the existing logic which implements the ObjectReader interface we need, and at least has the mild benefit of leaving the existing logic completely untouched.
There was a problem hiding this comment.
ooof, your current approach makes sense. figured it out after after staring at the code for 30 minutes. I suppose we could hang the mutex off the decryptReader. though i'm not sure we can do this, since we're also locking the adapter.ctx. what a mess.
I'm not sure how to clean this up, though I suspect storage folks or @dt have thought about this problem before.
There was a problem hiding this comment.
i wonder if we'd get out of this context plumbing hell if we refactored the ciphertext field off of decryptReader to be a ioReaderAdapter instead.
There was a problem hiding this comment.
but maybe each object storage read does need a fresh context, idk.
There was a problem hiding this comment.
how hard would it be to just change decryptReader itself to fully implement the remote.ObjectReader natively instead of needing an adaptor? We'd need to change a few CRDB callers, but I'd expect those changes to be not to bad?
There was a problem hiding this comment.
I don't think the backup side call sites would be all that bad, but the main hiccup here is ExternalSSTReader wants objstorage.ReadableFiles (which is itself an io.ReaderAt). This makes it all the way down to a pebble function, so we'd still either need an adapter somewhere, or a change to the pebble function signature.
There was a problem hiding this comment.
Currently have a third option here instead of a pebble change or the wrapper that was here previously. Pebble has a pretty ubiquitous pattern of using io.ReaderAt etc for things that might be interacting with local disk. A pebble side refactor would work for fixing our interface woes here, but it seems like a fairly big bite for implementing one feature.
On another note, this decryption logic is purely logic, and really doesn't want or need to care about how exactly the io is being done. The current approach extracts the core decryption logic into a type that accepts a closure for io purposes, with two wrappers (one for each interface we need) on top of it. Which let's us do what we need to wrt concurrency/context within the wrappers, and leaves the core logic interface-agnostic.
b25e221 to
854d692
Compare
pkg/storage/encryption.go
Outdated
| // newDecryptor reads and validates an encryption header, then returns a | ||
| // decryptor ready for use. readHeader fills the provided buffer with the | ||
| // header bytes from the underlying I/O source. | ||
| func newDecryptor(key []byte, readHeader func([]byte) error) (*decryptor, error) { |
There was a problem hiding this comment.
nit: define readerHeader as a type so you can document its semantics.
854d692 to
6d1f6f3
Compare
…redactable api Bumps pebble to a267c0a2e8991a4f509e0dfcd756bcefeae2b1e4. * [`a267c0a2`](cockroachdb/pebble@a267c0a2) objstorage: adjust remote.Locator to support redactability * [`35632a01`](cockroachdb/pebble@35632a01) compaction: skip pre-valsep files in policy comparison, add annotations This patch adapts the current `remote.Locator` usage to use the redactable api. Informs: cockroachdb#161169 Release note: None
This patch adds redaction markers to URIs passed to pebble in online restore. Specifically, all query params in the URIs are marked as unsafe and as such will be redacted in logs and error messages. Informs: cockroachdb#161169 Release note: None
6d1f6f3 to
458d85b
Compare
Previously, online restore did not support encrypted backups, as linking files to pebble means the decryption can't happen within the restore job. This patch teaches online restore to restore encrypted backups by plumbing the requisite information to pebble via the file URI, and implementing a remote.ObjectReader which performs decryption. Fixes: cockroachdb#161169 Release note: None
458d85b to
b4321da
Compare
Previously, online restore did not support encrypted backups, as linking files to pebble means the decryption can't happen within the restore job. This patch teaches online restore to restore encrypted backups by plumbing the requisite information to pebble via the file URI, and implementing a remote.ObjectReader which performs decryption.
Fixes: #161169
Release note: None