New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librbd: add encryption format support for clones (part 2/2) #48618
Conversation
8474d36
to
8401e26
Compare
|
@orozery I haven't been able to run this through the rbd suite yet due to the lab suffering a major outage. Did you get a chance to take a look and playtest? |
|
|
||
| uint64_t size; | ||
| ASSERT_EQ(0, clone.size(&size)); | ||
| EXPECT_EQ(data_size + luks1_meta_size - luks2_meta_size, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be the expected behaviour for users, that after changing the cloned image format the data area may be not the same as the parent data area? Shouldn't we resize the clone image accordingly when formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be the expected behaviour for users, that after changing the cloned image format the data area may be not the same as the parent data area?
As much as I dislike that, that is the expected behavior. If the crypto header in the parent is larger that the crypto header in the clone and the user didn't take steps to reserve the space in the clone prior to creating it, there is nothing we can do -- at least not with librbd API as is.
Shouldn't we resize the clone image accordingly when formatting?
Two things:
- we can't resize the clone on the user's behalf because
rbd_cloneAPI is blissfully unaware of encryption - even if we could resize the clone on the user's behalf, that wouldn't expose the missing part of the parent DATA area because parent overlap would remain at the original clone size
I see three ways out of this:
(A) Introduce some sort of "reset parent overlap" API. Cloning for this case would look like:
- snap create/protect
- clone
- format the clone
- grow the clone to compensate for the difference in crypto header sizes
- reset overlap to the new size
(B) Require the user to reserve the space in the clone prior to creating it. Cloning for this case would look like:
- grow the parent to reserve space for the difference in crypto header sizes
- snap create/protect
- clone
- format the clone
- shrink the parent back to the original size
(C) Rethink our approach to how encryption is exposed by librbd: introduce new higher-level encryption-aware APIs or at least make rbd_encryption_format optionally handle resizing and resetting parent overlap. Formatting the clone for this case should become a single-step operation -- all details and image metadata manipulation should be encapsulated. Another example is formatting a regular image: it could be optionally resized to compensate for the added crypto header -- this doesn't happen today.
I went with (B) in this PR because (A) requires adding a new class method at the very least and (C) is an even longer shot. My sense is that people would generally use either LUKS1 or LUKS2 and wouldn't mix them too often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't think (A) and (B) makes sense, because when cloning we don't change the format (AFAIR it). It is only when we run rbd_encryption_format on a clone. So I was thinking about making rbd_encryption_format to do this automatically. I don't see a need to make as an option though.
Another example is formatting a regular image: it could be optionally resized to compensate for the added crypto header -- this doesn't happen today.
Actually I think I like this idea, but again, not optionally but always. Making it optional does not differ much than just asking the user to do resize as additional (pre-format) step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I don't think (A) and (B) makes sense, because when cloning we don't change the format (AFAIR it). It is only when we run
rbd_encryption_formaton a clone. So I was thinking about makingrbd_encryption_formatto do this automatically. I don't see a need to make as an option though.
It would have been clearer if I had split (C) into two options:
(C-1) Fully encryption-aware API, where rbd_create-like and rbd_clone-like methods handle everything from creating an image/clone to formatting it and adjusting as necessary. This could enable a rich set of "on-the-fly re-encryption" use cases, such as importing a plaintext image and turning it into an encrypted image at the same time or deep copying an image encrypted with one format into an image encrypted with another format. rbd_encryption_format method would be removed.
(C-2) Make rbd_encryption_format method smarter to reduce the number steps it takes to create an encrypted image/clone.
It sounds like you prefer (C-2). But note that (C-2) is would still need (A) on the backend -- since creating an image/clone and formatting it remain to be separate steps we would still need a way to reset the parent overlap as part of formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually (C-1) looks appealing to me but (I my be wrong) I think we had a discussion with @orozery long time ago why we didn't go that way and there was some reason. So just for this reason I was thinking (C-2) would be more realistic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and since (B) doesn't bring any API modifications and just requires the user to temporarily grow the parent image, (C-2) can always be implemented on top to lift that requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with (B) in this PR because (A) requires adding a new class method at the very least and (C) is an even longer shot. My sense is that people would generally use either LUKS1 or LUKS2 and wouldn't mix them too often.
I agree. Users should choose one format and stick to it. I don't think we should complicate ourselves too much to ease on users that will mix.
BTW, from a cloud provider perspective, the end-user is not aware of this whole crypto header sizes thing. It is the cloud provider's responsibility to handle all of the resizing. So even if the current approach is cumbersome, it is not user-facing (in the case of a cloud provider).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how I understand it. With the current code, if a user have a luks1 base image, and expect that one may want to clone it and format to luks2, to avoid data loss, before creating a snapashot she have to expand the base image by luks2 - luks1 difference, and after creating the snapshot the image may be shrinked back.
I expect a lot of confusion here, i.e. (how a clone caller may be sure the parent image has already been expanded). On the other hand I agree that it is most likely very rare case and we may not need to spend a lot of effort on it. Personally, I would then just forbid formatting a clone if the new crypto header is bigger. But if you still think it should be allowed then at least clearly documenting this case would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how I understand it. With the current code, if a user have a luks1 base image, and expect that one may want to clone it and format to luks2, to avoid data loss, before creating a snapashot she have to expand the base image by luks2 - luks1 difference, and after creating the snapshot the image may be shrinked back.
Correct, and this also goes for creating an encrypted clone of a plaintext parent. However I wouldn't call it data loss since the data stays there in the parent. It's not lost, it's just that the last 4M to 16M of the parent just aren't available to the clone.
I expect a lot of confusion here, i.e. (how a clone caller may be sure the parent image has already been expanded).
Expanding the parent multiple times (or by more than the required difference, e.g. using some fixed amount like 1G) is fine. But you are right in that cloning an existing snapshot can be a hit or miss -- to be completely sure they would need to create their own snapshot.
On the other hand I agree that it is most likely very rare case and we may not need to spend a lot of effort on it. Personally, I would then just forbid formatting a clone if the new crypto header is bigger. But if you still think it should be allowed then at least clearly documenting this case would be nice.
I think it should be allowed because the whole thing would be too constrained otherwise. Like Or said, in cases where there is some management layer sitting between the user and the actual images, these details are trivial to automate.
I tried to document this in the Examples section (https://ceph--48618.org.readthedocs.build/en/48618/rbd/rbd-encryption/#examples). Feel free to suggest stroger wording!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc (examples) looks clear to me. I just missed it somehow when reviewing. We might want to add a warning note in the man's encryption format command description about the effect when formatting a clone with a new header lager than parent. But I don't insist.
Went over the code - no comments I could think of :) |
Build it and play with it -- try various test cases you can think of on the CLI or with python bindings, etc. Things like encrypted resize, encrypted flatten, some convoluted parent/clone chains -- anything that comes to mind or what you would have done if you were submitting this PR yourself. In a nutshell, try to break it ;) |
5781737
to
3c85a8e
Compare
3c85a8e
to
da3b94a
Compare
da3b94a
to
1d076d5
Compare
|
jenkins test make check |
|
Pushed an update to switch to the "always preserve" behavior for format identifiers and some rbd CLI UX improvements. @trociny Some context for you: The original discussion where I raised the issue of decaying everything into LUKSEncryptionFormat is here: #40363 (comment). Back then, we settled on the "always discard" (always decay) approach because Or wanted to make the old QEMU (or, more generally, something using the old librbd without layered encryption changes) work for a case where images with different formats are encrypted with the same passphrase. It continued to seem fishy to me though and, when this behavior was indirectly brought up again on the qemu-block mailing list a couple of weeks ago, I decided to rip it out and instead do exactly what cryptsetup does in this area. @orozery The other thing I want to get your ack on is the "rbd, rbd-nbd: don't strip trailing newline in passphrase files" change. What was the rationale for stripping the trailing newline in the first place? |
|
Repeated make check fails aren't related, @cbodley PTAL: |
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
+1 for "always preserve" behavior. |
Yeah, I think Or's goal was to enable something resembling that functionality with the old librbd. With "always discard", one could open a clone with a mix of LUKS1 and LUKS2 parents using the old |
image_extents is already taken by rvalue reference. CopyupRequest::create() callers are prepared for the move. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When encryption is loaded, rbd_get_size() and Image::size() return "effective" size, but rbd_resize() and Image::resize() continue to take raw size. The user has to constantly keep these domains in mind. Saying that resize must be done without encryption loaded is not an answer because shrinking a clone that has snapshots involves copying up objects in the affected range (identical to flattening). In addition, even if a clone doesn't have snapshots, shrinking it to a size that isn't an object boundary is going to involve a copyup for the victim object as well. To avoid subtle data corruption on shrink, treat resize operation the same as flatten operation (including on the CLI). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Make it clear that get_parent_overlap() returns the raw parent overlap value (i.e. physical offset into the parent image). Also drop redundant ceph_mutex_is_locked assert -- get_parent_info() already has one. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When encryption is loaded, rbd_get_overlap() and Image::overlap() now return "effective" overlap, similar to rbd_get_size() and Image::size(). Previously, returned overlap could have been bigger than "effective" size. Note that get_effective_image_size() successor doesn't take snap_id because passing anything but ictx->snap_id was broken. Since the size of the crypto header is stored in the crypto header itself, image areas are defined only for the "opened at" snap_id. Getting "effective" size for an arbitrary snapshot requires actually opening it and loading encryption on it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This fixes cases where CRYPTO HEADER area is larger than DATA area. In particular, it was effectively impossible to flatten unformatted clones of such images. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
DATA area in the parent may be smaller than the part of DATA area in
the clone that is still within the overlap. This would occur e.g. in
LUKS2-formatted parent + LUKS1-formatted clone case, due to LUKS2
header usually being bigger than LUKS1 header:
parent: raw size = 64M
LUKS2 header area = 16M
data area = 48M
clone: raw size = 64M (raw overlap 64M)
LUKS1 header area = 4M
data area = 60M
Currently, because parent extents are pruned only according to raw
overlap (64M), the clone ends up attempting to reach the parent for all
of its data area (60M < 64M) even though the parent only has 48M worth
of data. All kinds of bugs ensue for 48M..60M offsets and this range
basically becomes inaccessible to the user.
A related issue is that prune_parent_extents() ignores area.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Apply similar "reduce overlap and respect area" logic to places that don't use prune_parent_extents(). Changes to FlattenRequest and TrimRequest here should complete the long tail of encrypted I/O path and flatten fixes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
One of the stated goals is compatibility with standard LUKS tools, in particular being able to load encryption on images formatted with cryptsetup. cryptsetup doesn't do this and this really interferes with randomly generated (binary) passphrases. While at it, open passphrase files as binary -- it communicates the intent if nothing else on POSIX. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
"rbd encryption format" handler sets up a scope guard to zero out the passphrase string on return but also makes a copy of same which isn't zeroed out. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Commit 9892ead ("librbd/crypto: allow loading luks format without specifying version") introduced RBD_ENCRYPTION_FORMAT_LUKS format identifier, matching cryptsetup's CRYPT_LUKS ("load any LUKS version happens to be there"). However, in an effort to enable an obscure "layered encryption with the same passphrase + old QEMU" use case, it also introduced decaying of RBD_ENCRYPTION_FORMAT_LUKS1 and RBD_ENCRYPTION_FORMAT_LUKS2 format identifiers, making it impossible to assert on the format that is being loaded. This new behavior was then extended to standalone images. Treating RBD_ENCRYPTION_FORMAT_LUKS1, RBD_ENCRYPTION_FORMAT_LUKS2 and RBD_ENCRYPTION_FORMAT_LUKS the same when loading encryption can be construed as an opening for a format downgrade attack. Let's resurrect the previous standalone images behavior and extend it to layered encryption instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Since RBD_ENCRYPTION_FORMAT_LUKS1, RBD_ENCRYPTION_FORMAT_LUKS2 and RBD_ENCRYPTION_FORMAT_LUKS aren't treated the same when loading encryption anymore, "luks1" and "luks2" formats need to be accepted in addition to "luks" format. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If no --encryption-format specified at all, default to "luks" for each specified --encryption-passphrase-file. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Note that we are hitting https://tracker.ceph.com/issues/58160 here because by the time we get to "rbd resize" RAW_DEV mapping owns the lock (due to a write to /dev/mapper/cryptsetupdev being last). While at it, resurrect the ability to easily run this script on vstart clusters -- see commit f737c28 ("qa/workunits/rbd: make luks-encryption test work on vstart cluster"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
1d076d5
to
8d5d478
Compare
|
Rebased to resolve PendingReleaseNotes conflict and added encryption-aware resize CLI test. |
|
jenkins test submodules |
|
jenkins test make check |
https://tracker.ceph.com/issues/58160
This would entail sending the passphrase over the wire to ceph-mgr daemon, relying on msgr2 secure mode (encryption in transit). Not sure how desirable that is or how useful would this feature be, so punting it for now.
|
https://pulpito.ceph.com/dis-2022-12-04_20:59:49-rbd-wip-dis-testing-distro-default-smithi/ |
hmmm I guess I thought that I thought this will be more useful to users who use "user-passphrases" (i.e. printable ASCII), and in that case a trailing newline is just something that is probably added by the creation process (e.g. "echo") which the (non-advance) user will not notice. I'm ok with changing it though. |
|
also, I'm good with the change to "always preserve". |
Hopefully they would just use the same passpharse file when invoking |
An immediate follow-up for #40363 containing encrypted I/O path + encrypted flatten + encrypted resize fixes.
A few TODOs that I have on my list but may just convert to tracker tickets:
rbd task add flatten --encryption-format ... --encryption-passphrase-file ...)?