Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: add encryption format support for clones (part 2/2) #48618

Merged
merged 28 commits into from Dec 5, 2022

Conversation

idryomov
Copy link
Contributor

@idryomov idryomov commented Oct 25, 2022

An immediate follow-up for #40363 containing encrypted I/O path + encrypted flatten + encrypted resize fixes.

A few TODOs that I have on my list but may just convert to tracker tickets:

@idryomov
Copy link
Contributor Author

cc @orozery @dannyharnik

@idryomov
Copy link
Contributor Author

idryomov commented Nov 1, 2022

@orozery I haven't been able to run this through the rbd suite yet due to the lab suffering a major outage. Did you get a chance to take a look and playtest?

src/librbd/io/Utils.h Outdated Show resolved Hide resolved
src/librbd/crypto/CryptoObjectDispatch.cc Show resolved Hide resolved

uint64_t size;
ASSERT_EQ(0, clone.size(&size));
EXPECT_EQ(data_size + luks1_meta_size - luks2_meta_size, size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be the expected behaviour for users, that after changing the cloned image format the data area may be not the same as the parent data area? Shouldn't we resize the clone image accordingly when formatting?

Copy link
Contributor Author

@idryomov idryomov Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be the expected behaviour for users, that after changing the cloned image format the data area may be not the same as the parent data area?

As much as I dislike that, that is the expected behavior. If the crypto header in the parent is larger that the crypto header in the clone and the user didn't take steps to reserve the space in the clone prior to creating it, there is nothing we can do -- at least not with librbd API as is.

Shouldn't we resize the clone image accordingly when formatting?

Two things:

  • we can't resize the clone on the user's behalf because rbd_clone API is blissfully unaware of encryption
  • even if we could resize the clone on the user's behalf, that wouldn't expose the missing part of the parent DATA area because parent overlap would remain at the original clone size

I see three ways out of this:

(A) Introduce some sort of "reset parent overlap" API. Cloning for this case would look like:

  • snap create/protect
  • clone
  • format the clone
  • grow the clone to compensate for the difference in crypto header sizes
  • reset overlap to the new size

(B) Require the user to reserve the space in the clone prior to creating it. Cloning for this case would look like:

  • grow the parent to reserve space for the difference in crypto header sizes
  • snap create/protect
  • clone
  • format the clone
  • shrink the parent back to the original size

(C) Rethink our approach to how encryption is exposed by librbd: introduce new higher-level encryption-aware APIs or at least make rbd_encryption_format optionally handle resizing and resetting parent overlap. Formatting the clone for this case should become a single-step operation -- all details and image metadata manipulation should be encapsulated. Another example is formatting a regular image: it could be optionally resized to compensate for the added crypto header -- this doesn't happen today.

I went with (B) in this PR because (A) requires adding a new class method at the very least and (C) is an even longer shot. My sense is that people would generally use either LUKS1 or LUKS2 and wouldn't mix them too often.

Copy link
Contributor

@trociny trociny Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think (A) and (B) makes sense, because when cloning we don't change the format (AFAIR it). It is only when we run rbd_encryption_format on a clone. So I was thinking about making rbd_encryption_format to do this automatically. I don't see a need to make as an option though.

Another example is formatting a regular image: it could be optionally resized to compensate for the added crypto header -- this doesn't happen today.

Actually I think I like this idea, but again, not optionally but always. Making it optional does not differ much than just asking the user to do resize as additional (pre-format) step.

Copy link
Contributor Author

@idryomov idryomov Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think (A) and (B) makes sense, because when cloning we don't change the format (AFAIR it). It is only when we run rbd_encryption_format on a clone. So I was thinking about making rbd_encryption_format to do this automatically. I don't see a need to make as an option though.

It would have been clearer if I had split (C) into two options:

(C-1) Fully encryption-aware API, where rbd_create-like and rbd_clone-like methods handle everything from creating an image/clone to formatting it and adjusting as necessary. This could enable a rich set of "on-the-fly re-encryption" use cases, such as importing a plaintext image and turning it into an encrypted image at the same time or deep copying an image encrypted with one format into an image encrypted with another format. rbd_encryption_format method would be removed.

(C-2) Make rbd_encryption_format method smarter to reduce the number steps it takes to create an encrypted image/clone.

It sounds like you prefer (C-2). But note that (C-2) is would still need (A) on the backend -- since creating an image/clone and formatting it remain to be separate steps we would still need a way to reset the parent overlap as part of formatting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually (C-1) looks appealing to me but (I my be wrong) I think we had a discussion with @orozery long time ago why we didn't go that way and there was some reason. So just for this reason I was thinking (C-2) would be more realistic.

Copy link
Contributor Author

@idryomov idryomov Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and since (B) doesn't bring any API modifications and just requires the user to temporarily grow the parent image, (C-2) can always be implemented on top to lift that requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with (B) in this PR because (A) requires adding a new class method at the very least and (C) is an even longer shot. My sense is that people would generally use either LUKS1 or LUKS2 and wouldn't mix them too often.

I agree. Users should choose one format and stick to it. I don't think we should complicate ourselves too much to ease on users that will mix.
BTW, from a cloud provider perspective, the end-user is not aware of this whole crypto header sizes thing. It is the cloud provider's responsibility to handle all of the resizing. So even if the current approach is cumbersome, it is not user-facing (in the case of a cloud provider).

Copy link
Contributor

@trociny trociny Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, how I understand it. With the current code, if a user have a luks1 base image, and expect that one may want to clone it and format to luks2, to avoid data loss, before creating a snapashot she have to expand the base image by luks2 - luks1 difference, and after creating the snapshot the image may be shrinked back.

I expect a lot of confusion here, i.e. (how a clone caller may be sure the parent image has already been expanded). On the other hand I agree that it is most likely very rare case and we may not need to spend a lot of effort on it. Personally, I would then just forbid formatting a clone if the new crypto header is bigger. But if you still think it should be allowed then at least clearly documenting this case would be nice.

Copy link
Contributor Author

@idryomov idryomov Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, how I understand it. With the current code, if a user have a luks1 base image, and expect that one may want to clone it and format to luks2, to avoid data loss, before creating a snapashot she have to expand the base image by luks2 - luks1 difference, and after creating the snapshot the image may be shrinked back.

Correct, and this also goes for creating an encrypted clone of a plaintext parent. However I wouldn't call it data loss since the data stays there in the parent. It's not lost, it's just that the last 4M to 16M of the parent just aren't available to the clone.

I expect a lot of confusion here, i.e. (how a clone caller may be sure the parent image has already been expanded).

Expanding the parent multiple times (or by more than the required difference, e.g. using some fixed amount like 1G) is fine. But you are right in that cloning an existing snapshot can be a hit or miss -- to be completely sure they would need to create their own snapshot.

On the other hand I agree that it is most likely very rare case and we may not need to spend a lot of effort on it. Personally, I would then just forbid formatting a clone if the new crypto header is bigger. But if you still think it should be allowed then at least clearly documenting this case would be nice.

I think it should be allowed because the whole thing would be too constrained otherwise. Like Or said, in cases where there is some management layer sitting between the user and the actual images, these details are trivial to automate.

I tried to document this in the Examples section (https://ceph--48618.org.readthedocs.build/en/48618/rbd/rbd-encryption/#examples). Feel free to suggest stroger wording!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc (examples) looks clear to me. I just missed it somehow when reviewing. We might want to add a warning note in the man's encryption format command description about the effect when formatting a clone with a new header lager than parent. But I don't insist.

@orozery
Copy link
Contributor

orozery commented Nov 3, 2022

@orozery I haven't been able to run this through the rbd suite yet due to the lab suffering a major outage. Did you get a chance to take a look and playtest?

Went over the code - no comments I could think of :)
What do you mean by play-test?

@idryomov
Copy link
Contributor Author

idryomov commented Nov 3, 2022

Went over the code - no comments I could think of :) What do you mean by play-test?

Build it and play with it -- try various test cases you can think of on the CLI or with python bindings, etc. Things like encrypted resize, encrypted flatten, some convoluted parent/clone chains -- anything that comes to mind or what you would have done if you were submitting this PR yourself. In a nutshell, try to break it ;)

@idryomov
Copy link
Contributor Author

jenkins test make check

@idryomov
Copy link
Contributor Author

Pushed an update to switch to the "always preserve" behavior for format identifiers and some rbd CLI UX improvements.

@trociny Some context for you: The original discussion where I raised the issue of decaying everything into LUKSEncryptionFormat is here: #40363 (comment). Back then, we settled on the "always discard" (always decay) approach because Or wanted to make the old QEMU (or, more generally, something using the old librbd without layered encryption changes) work for a case where images with different formats are encrypted with the same passphrase. It continued to seem fishy to me though and, when this behavior was indirectly brought up again on the qemu-block mailing list a couple of weeks ago, I decided to rip it out and instead do exactly what cryptsetup does in this area.

@orozery The other thing I want to get your ack on is the "rbd, rbd-nbd: don't strip trailing newline in passphrase files" change. What was the rationale for stripping the trailing newline in the first place?

@idryomov
Copy link
Contributor Author

Repeated make check fails aren't related, @cbodley PTAL:

2 rgw_data_sync_marker
1 rgw_data_sync_status
2 rgw_log_entry
/tmp/typ-bqFTsS5zl /tmp/typ-39E6pxNnf differ: byte 447, line 21
**** rgw_log_entry test 2 dump_json check failed ****
   ceph-dencoder type rgw_log_entry select_test 2 dump_json > /tmp/typ-bqFTsS5zl
   ceph-dencoder type rgw_log_entry select_test 2 encode decode dump_json > /tmp/typ-39E6pxNnf
21c21
<     "identity_type": 4028302560
---
>     "identity_type": 1950033120


2 rgw_data_sync_marker
1 rgw_data_sync_status
2 rgw_log_entry
/tmp/typ-AW7AEODB3 /tmp/typ-0NX4cmLHR differ: byte 447, line 21
**** rgw_log_entry test 2 dump_json check failed ****
   ceph-dencoder type rgw_log_entry select_test 2 dump_json > /tmp/typ-AW7AEODB3
   ceph-dencoder type rgw_log_entry select_test 2 encode decode dump_json > /tmp/typ-0NX4cmLHR
21c21
<     "identity_type": 3599295712
---
>     "identity_type": 2072560864

@cbodley
Copy link
Contributor

cbodley commented Nov 29, 2022

Repeated make check fails aren't related, @cbodley PTAL:

thanks @idryomov, i opened #49131 to initialize that variable. i'm surprised this hasn't been failing all along

@cbodley
Copy link
Contributor

cbodley commented Nov 29, 2022

jenkins test make check

1 similar comment
@idryomov
Copy link
Contributor Author

idryomov commented Dec 1, 2022

jenkins test make check

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@trociny
Copy link
Contributor

trociny commented Dec 2, 2022

+1 for "always preserve" behavior.
@orozery As I understand from #40363 (comment) discusion, you wanted "always discard" for migration from LUKS1 to LUKS2 with smallest downtime by using clone+flatten. But this looks like what rbd migration functionality was designed for. If it currently does not support this case then it would be nice to make it.

@idryomov
Copy link
Contributor Author

idryomov commented Dec 2, 2022

@orozery As I understand from #40363 (comment) discusion, you wanted "always discard" for migration from LUKS1 to LUKS2 with smallest downtime by using clone+flatten. But this looks like what rbd migration functionality was designed for. If it currently does not support this case then it would be nice to make it.

Yeah, I think Or's goal was to enable something resembling that functionality with the old librbd. With "always discard", one could open a clone with a mix of LUKS1 and LUKS2 parents using the old rbd_encryption_load API as long as the clone and all parents had the same passphrase. With "always preserve", using the old rbd_encryption_load API is possible only if the clone and all parents have the same format (in addition to the same passphrase) -- otherwise the new rbd_encryption_load2 API must be used.

image_extents is already taken by rvalue reference.
CopyupRequest::create() callers are prepared for the move.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When encryption is loaded, rbd_get_size() and Image::size() return
"effective" size, but rbd_resize() and Image::resize() continue to take
raw size.  The user has to constantly keep these domains in mind.

Saying that resize must be done without encryption loaded is not an
answer because shrinking a clone that has snapshots involves copying up
objects in the affected range (identical to flattening).  In addition,
even if a clone doesn't have snapshots, shrinking it to a size that
isn't an object boundary is going to involve a copyup for the victim
object as well.

To avoid subtle data corruption on shrink, treat resize operation the
same as flatten operation (including on the CLI).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Make it clear that get_parent_overlap() returns the raw parent overlap
value (i.e. physical offset into the parent image).  Also drop redundant
ceph_mutex_is_locked assert -- get_parent_info() already has one.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When encryption is loaded, rbd_get_overlap() and Image::overlap() now
return "effective" overlap, similar to rbd_get_size() and Image::size().
Previously, returned overlap could have been bigger than "effective"
size.

Note that get_effective_image_size() successor doesn't take snap_id
because passing anything but ictx->snap_id was broken.  Since the size
of the crypto header is stored in the crypto header itself, image areas
are defined only for the "opened at" snap_id.  Getting "effective" size
for an arbitrary snapshot requires actually opening it and loading
encryption on it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This fixes cases where CRYPTO HEADER area is larger than DATA area.
In particular, it was effectively impossible to flatten unformatted
clones of such images.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
DATA area in the parent may be smaller than the part of DATA area in
the clone that is still within the overlap.  This would occur e.g. in
LUKS2-formatted parent + LUKS1-formatted clone case, due to LUKS2
header usually being bigger than LUKS1 header:

parent: raw size = 64M
        LUKS2 header area = 16M
        data area = 48M

clone:  raw size = 64M (raw overlap 64M)
        LUKS1 header area = 4M
        data area = 60M

Currently, because parent extents are pruned only according to raw
overlap (64M), the clone ends up attempting to reach the parent for all
of its data area (60M < 64M) even though the parent only has 48M worth
of data.  All kinds of bugs ensue for 48M..60M offsets and this range
basically becomes inaccessible to the user.

A related issue is that prune_parent_extents() ignores area.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Apply similar "reduce overlap and respect area" logic to places
that don't use prune_parent_extents().  Changes to FlattenRequest
and TrimRequest here should complete the long tail of encrypted
I/O path and flatten fixes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
One of the stated goals is compatibility with standard LUKS tools,
in particular being able to load encryption on images formatted with
cryptsetup.  cryptsetup doesn't do this and this really interferes
with randomly generated (binary) passphrases.

While at it, open passphrase files as binary -- it communicates the
intent if nothing else on POSIX.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
"rbd encryption format" handler sets up a scope guard to zero out
the passphrase string on return but also makes a copy of same which
isn't zeroed out.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Commit 9892ead ("librbd/crypto: allow loading luks format
without specifying version") introduced RBD_ENCRYPTION_FORMAT_LUKS
format identifier, matching cryptsetup's CRYPT_LUKS ("load any LUKS
version happens to be there").  However, in an effort to enable an
obscure "layered encryption with the same passphrase + old QEMU" use
case, it also introduced decaying of RBD_ENCRYPTION_FORMAT_LUKS1 and
RBD_ENCRYPTION_FORMAT_LUKS2 format identifiers, making it impossible
to assert on the format that is being loaded.  This new behavior was
then extended to standalone images.

Treating RBD_ENCRYPTION_FORMAT_LUKS1, RBD_ENCRYPTION_FORMAT_LUKS2
and RBD_ENCRYPTION_FORMAT_LUKS the same when loading encryption can
be construed as an opening for a format downgrade attack.  Let's
resurrect the previous standalone images behavior and extend it to
layered encryption instead.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Since RBD_ENCRYPTION_FORMAT_LUKS1, RBD_ENCRYPTION_FORMAT_LUKS2
and RBD_ENCRYPTION_FORMAT_LUKS aren't treated the same when loading
encryption anymore, "luks1" and "luks2" formats need to be accepted
in addition to "luks" format.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If no --encryption-format specified at all, default to "luks" for each
specified --encryption-passphrase-file.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Note that we are hitting https://tracker.ceph.com/issues/58160 here
because by the time we get to "rbd resize" RAW_DEV mapping owns the
lock (due to a write to /dev/mapper/cryptsetupdev being last).

While at it, resurrect the ability to easily run this script on
vstart clusters -- see commit f737c28 ("qa/workunits/rbd: make
luks-encryption test work on vstart cluster").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

idryomov commented Dec 4, 2022

Rebased to resolve PendingReleaseNotes conflict and added encryption-aware resize CLI test.

@idryomov
Copy link
Contributor Author

idryomov commented Dec 4, 2022

jenkins test submodules

@idryomov
Copy link
Contributor Author

idryomov commented Dec 4, 2022

jenkins test make check

@idryomov
Copy link
Contributor Author

idryomov commented Dec 5, 2022

A few TODOs that I have on my list but may just convert to tracker tickets:

* proxied encrypted flatten/resize discussion ([librbd: add encryption format support for clones (part 1/2) #40363 (comment)](https://github.com/ceph/ceph/pull/40363#discussion_r976800950))

https://tracker.ceph.com/issues/58160

* task-based encrypted flatten (`rbd task add flatten --encryption-format ... --encryption-passphrase-file ...`)?

This would entail sending the passphrase over the wire to ceph-mgr daemon, relying on msgr2 secure mode (encryption in transit). Not sure how desirable that is or how useful would this feature be, so punting it for now.

* exercise layered encryption via fsx

https://tracker.ceph.com/issues/58161

@idryomov
Copy link
Contributor Author

idryomov commented Dec 5, 2022

... and added encryption-aware resize CLI test.

https://pulpito.ceph.com/dis-2022-12-04_20:59:49-rbd-wip-dis-testing-distro-default-smithi/

@orozery
Copy link
Contributor

orozery commented Dec 5, 2022

@orozery The other thing I want to get your ack on is the "rbd, rbd-nbd: don't strip trailing newline in passphrase files" change. What was the rationale for stripping the trailing newline in the first place?

hmmm I guess I thought that I thought this will be more useful to users who use "user-passphrases" (i.e. printable ASCII), and in that case a trailing newline is just something that is probably added by the creation process (e.g. "echo") which the (non-advance) user will not notice.

I'm ok with changing it though.

@orozery
Copy link
Contributor

orozery commented Dec 5, 2022

also, I'm good with the change to "always preserve".

@idryomov
Copy link
Contributor Author

idryomov commented Dec 5, 2022

hmmm I guess I thought that I thought this will be more useful to users who use "user-passphrases" (i.e. printable ASCII), and in that case a trailing newline is just something that is probably added by the creation process (e.g. "echo") which the (non-advance) user will not notice.

Hopefully they would just use the same passpharse file when invoking cryptsetup luksOpen (if it ever comes to that). Since we don't support interactive passphrase creation/input on the RBD side, the issue of "how do I enter this newline interactively" shouldn't come up.

@idryomov idryomov merged commit 8780f60 into ceph:main Dec 5, 2022
10 of 12 checks passed
@idryomov idryomov deleted the rbd-clone-encryption-part2 branch December 5, 2022 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants