Skip to content

librbd/crypto: fix issue when live-migrating from encrypted export#44366

Merged
idryomov merged 4 commits intoceph:mainfrom
orozery:rbd-crypto-migration
Aug 7, 2024
Merged

librbd/crypto: fix issue when live-migrating from encrypted export#44366
idryomov merged 4 commits intoceph:mainfrom
orozery:rbd-crypto-migration

Conversation

@orozery
Copy link
Copy Markdown
Contributor

@orozery orozery commented Dec 20, 2021

See bug report here:
https://tracker.ceph.com/issues/53674

This PR depends on #43804

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@orozery
Copy link
Copy Markdown
Contributor Author

orozery commented Dec 20, 2021

@trociny

@ljflores
Copy link
Copy Markdown
Member

jenkins retest this please

trociny
trociny previously approved these changes Dec 21, 2021
Copy link
Copy Markdown
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@trociny trociny requested a review from idryomov December 21, 2021 16:24
@djgalloway djgalloway changed the base branch from master to main June 1, 2022 16:52
@djgalloway djgalloway requested a review from a team as a code owner June 1, 2022 16:52
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jul 31, 2022
@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 1, 2022

Unstale -- this is waiting for #40363 as it conflicts with CryptoInterface removal.

@github-actions
Copy link
Copy Markdown

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 29, 2022
@idryomov idryomov removed the stale label Dec 29, 2022
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Feb 27, 2023
@idryomov idryomov removed the stale label Feb 27, 2023
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Apr 28, 2023
@idryomov idryomov removed the stale label Apr 28, 2023
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 1, 2024

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@idryomov idryomov force-pushed the rbd-crypto-migration branch from 0870add to 621f9ef Compare August 2, 2024 18:12
@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 2, 2024

I rewrote this on top of #58882 which solved

[...] all this headache is caused by FormatInterface that tries to abstract two very different things: a native "format" that has a ton of logic behind it and actual formats that simply return raw data from an external source. It's a bogus abstraction [...]

and allowed for vast simplification here. The size of the diff went down from

 qa/workunits/rbd/luks-encryption.sh                |  64 +++++-
 src/librbd/CMakeLists.txt                          |   1 +
 src/librbd/crypto/CryptoImageDispatch.cc           | 205 +++++++++++++++++++
 src/librbd/crypto/CryptoImageDispatch.h            | 107 ++++++++++
 src/librbd/crypto/CryptoInterface.h                |   8 +
 src/librbd/crypto/ShutDownCryptoRequest.cc         |  29 ++-
 src/librbd/crypto/ShutDownCryptoRequest.h          |   2 +
 src/librbd/crypto/Utils.cc                         |  11 +-
 src/librbd/io/Types.h                              |   1 +
 src/librbd/migration/OpenSourceImageRequest.cc     |  10 +-
 src/librbd/migration/QCOWFormat.cc                 |   7 +
 src/librbd/migration/RawFormat.cc                  |   7 +
 src/test/librbd/CMakeLists.txt                     |   1 +
 .../librbd/crypto/test_mock_CryptoImageDispatch.cc | 218 +++++++++++++++++++++
 .../crypto/test_mock_ShutDownCryptoRequest.cc      |  66 ++++++-
 src/test/librbd/migration/test_mock_QCOWFormat.cc  |  68 +++++++
 src/test/librbd/migration/test_mock_RawFormat.cc   |  66 +++++++
 17 files changed, 861 insertions(+), 10 deletions(-)

to

 qa/workunits/rbd/luks-encryption.sh            |  64 +++++++++++++-
 src/librbd/io/ReadResult.cc                    |   2 +-
 src/librbd/io/ReadResult.h                     |   2 +-
 src/librbd/migration/ImageDispatch.cc          | 112 ++++++++++++++++++++++++-
 src/librbd/migration/OpenSourceImageRequest.cc |  10 +--
 5 files changed, 181 insertions(+), 9 deletions(-)

Apart from taking care of the outstanding

[...] this PR introduces a regression for the native format. The issue is that when a half-populated (or completely unpopulated) clone is being migrated, the parent data is decrypted in the appropriate parent (using that parent's key). Once the parent data reaches the "source" clone -- the one that has CryptoImageDispatch loaded -- it's run through decryption again using the clone's key despite already being plaintext.

regression in read(), I also fixed list_snaps() which was completely missed in the previous version (although coming up with a CLI test case to demonstrate the fix is challenging due to RawFormat not supporting sparseness and QCOWFormat having an unrelated bug in this area). list_snaps() now maps to raw extents as well, satisfying

[...] leave FormatInterface mostly as is, working with raw extents.

I haven't touched qa/workunits/rbd/luks-encryption.sh (just made sure that it still passes), but it could definitely use test cases that exercise the native format better and test cases that trigger some copyups before calling rbd migration execute for all formats.

@idryomov idryomov dismissed trociny’s stale review August 2, 2024 18:26

The previous version had major issues and got rewritten

@ajarr
Copy link
Copy Markdown
Contributor

ajarr commented Aug 2, 2024

I was able to follow the tests. I understand that you didn't want to modify the tests further, but does fixing this #44366 (comment) also cause issues with testing?

@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 2, 2024

I was able to follow the tests. I understand that you didn't want to modify the tests further, but does fixing this #44366 (comment) also cause issues with testing?

This seems to point to the comment on >/dev/null 2>&1 nit for rbd migration abort commands -- probably not what you meant to ask about?

@ajarr
Copy link
Copy Markdown
Contributor

ajarr commented Aug 2, 2024

I was able to follow the tests. I understand that you didn't want to modify the tests further, but does fixing this #44366 (comment) also cause issues with testing?

This seems to point to the comment on >/dev/null 2>&1 nit for rbd migration abort commands -- probably not what you meant to ask about?

That was my question. I was curious to know why this nit wasn't fixed even though it'd be helpful as you said in the comment.

idryomov and others added 4 commits August 7, 2024 12:35
Most workunits expect the user to be a member of "disk" group, so we
can pretty much rely on that being the case at this point.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
... so that RAW_DEV can be unmapped and future tests can reuse testimg
and other image names without bumping into watchers and older snapshots.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
…st ctor

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
… formats

With NativeFormat now being handled via dispatch, handling encryption
for non-native formats (i.e. mapping to raw image extents and performing
decryption/mapping back on completion) in the migration layer is really
straightforward.

Note that alignment doesn't need to be performed in the migration layer
because it happens on the destination image -- the "align and resubmit"
logic in C_UnalignedObjectReadRequest should kick in before the call to
read_parent().

Fixes: https://tracker.ceph.com/issues/53674
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov idryomov force-pushed the rbd-crypto-migration branch from 621f9ef to 0000c34 Compare August 7, 2024 10:50
@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 7, 2024

I haven't touched qa/workunits/rbd/luks-encryption.sh (just made sure that it still passes), but it could definitely use test cases that exercise the native format better and test cases that trigger some copyups before calling rbd migration execute for all formats.

With the run demonstrating that unmodified qa/workunits/rbd/luks-encryption.sh indeed passes, I improved the test coverage by adding test_migration_read_and_copyup() which is called for all formats and test_migration_clone(). This addresses the outstanding

I would leave test_clone_encryption() alone and group migration tests separately as we clearly need more than one clone + migration + encryption test.

comment and >/dev/null 2>&1 nit for rbd migration abort commands, that is #44366 (comment) and #44366 (comment).

@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 7, 2024

jenkins test api

@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 7, 2024

I improved the test coverage by adding test_migration_read_and_copyup() which is called for all formats and test_migration_clone().

https://pulpito.ceph.com/dis-2024-08-07_11:24:02-rbd-wip-dis-testing-distro-default-smithi/
https://pulpito.ceph.com/dis-2024-08-07_12:23:07-rbd-wip-dis-testing-distro-default-smithi/

@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 7, 2024

I also fixed list_snaps() which was completely missed in the previous version (although coming up with a CLI test case to demonstrate the fix is challenging due to [...] QCOWFormat having an unrelated bug in this area).

Filed https://tracker.ceph.com/issues/67416

@idryomov idryomov merged commit d5a061e into ceph:main Aug 7, 2024
@ajarr
Copy link
Copy Markdown
Contributor

ajarr commented Aug 7, 2024

I see that the PR has been merged. I wanted to note that I went through the updated tests as mentioned in #44366 (comment) and the tests looked good.

@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 8, 2024

I forgot to add a FIXME to qa/workunits/rbd/luks-encryption.sh for an issue that necessitated test_migration() to use a fully written image (16M worth of data from /tmp/testdata1), unlike test_migration_clone().

The problem is that read() employs a flat bufferlist on the encryption path, which forgoes any sparseness and leads to the same user-visible issue as https://tracker.ceph.com/issues/67402 (except that here rbd_sparse_read_threshold_bytes knob doesn't apply):

    auto ctx = new C_DecryptData(aio_comp, image_extents, crypto);
    aio_comp = io::AioCompletion::create_and_start<Context>(
        ctx, util::get_image_ctx(m_image_ctx), io::AIO_TYPE_READ);
    read_result = io::ReadResult(&ctx->bl);
...
  m_format->read(aio_comp, io_context->get_read_snap(),
                 std::move(image_extents), std::move(read_result),
                 op_flags, read_flags, parent_trace);

This was the behavior in the original submission (prior to a rewrite on top of #58882) as well:

struct C_ReadRequest : public Context {
...
    ceph::bufferlist bl;
...
      auto ctx = create_context_callback<C_ReadRequest<I>,
                                         &C_ReadRequest<I>::handle_read>(this);
      auto backing_aio_comp = io::AioCompletion::create_and_start(
              ctx, util::get_image_ctx(image_ctx), io::AIO_TYPE_READ);
...
      req = io::ImageDispatchSpec::create_read(
              *image_ctx, io::IMAGE_DISPATCH_LAYER_CRYPTO, backing_aio_comp,
              std::move(aligned_extents_copy), io::ImageArea::DATA, {&bl},
              io_context, op_flags, read_flags, parent_trace);
    }

({&bl} constructs io::ReadResult for a flat bufferlist in-place)

It's fine to do this for RawFormat where sparse extents aren't supported, but we could do much better for QCOWFormat and planned NBD format/stream (#44640).

@idryomov
Copy link
Copy Markdown
Contributor

idryomov commented Aug 8, 2024

It's fine to do this for RawFormat where sparse extents aren't supported, but we could do much better for QCOWFormat and planned NBD format/stream (#44640).

Filed https://tracker.ceph.com/issues/67439

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants