Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librbd: data corruption on mirroring cloned images #53672

Closed
wants to merge 3 commits into from

Conversation

nbalacha
Copy link
Contributor

@nbalacha nbalacha commented Sep 26, 2023

Parent data is not synced to the non-primary after a copyup operation on the primary image with snapshot based rbd-mirroring. Writing to a non-existent object in an rbd clone image triggers a copyup operation.The new write data and the data from the parent are combined and written to the new clone object and any existing snapshots on the clone are updated with the parent data.
The rbd-mirror daemon uses deep_copy with subsets of snapshots. If a copyup has modified the already synced older snapshot with the parent data, a snapshot diff will only return the new data that was written to the clone. As the deep_copy writes data to the dst object using rados calls instead of rbd apis, a copyup is not performed and the parent data is lost. The fix involves triggering a copyup on the object if the image is a clone. This does not affect other deep_copy users as they copy the image and all the existing snapshots as a whole.

Fixes: https://tracker.ceph.com/issues/61891

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@nbalacha nbalacha requested a review from a team as a code owner September 26, 2023 11:34
@nbalacha nbalacha marked this pull request as draft September 26, 2023 11:34
@github-actions github-actions bot added the rbd label Sep 26, 2023
@nbalacha
Copy link
Contributor Author

nbalacha commented Sep 26, 2023

I'm working on the following:

  • This requires that the parent image has been synced before the cloned image sync begins. If both are enabled at the same time and the parent image object has not been created/synced before the trigger_copyup is called on the clone, the read on the parent fails with ENOENT leading to the original data corruption.
  • Additional unit tests

@trociny
Copy link
Contributor

trociny commented Sep 26, 2023

@nbalacha Do we have a tracker ticket for this? We will definitely want to create a ticket in tracker.ceph.com for this issue and link it here.

@satoru-takeuchi
Copy link
Contributor

@nbalacha I guess this PR is for the following issue. Is it correct?
https://tracker.ceph.com/issues/61891

In addition, I'm glad if the backport tag in this will be filled.

@nbalacha
Copy link
Contributor Author

@nbalacha Do we have a tracker ticket for this? We will definitely want to create a ticket in tracker.ceph.com for this issue and link it here.

Done.

@nbalacha
Copy link
Contributor Author

@nbalacha I guess this PR is for the following issue. Is it correct? https://tracker.ceph.com/issues/61891

In addition, I'm glad if the backport tag in this will be filled.

Thank you @satoru-takeuchi . I will check about the backport tag.

@github-actions github-actions bot added the tests label Oct 9, 2023
@nbalacha
Copy link
Contributor Author

nbalacha commented Oct 9, 2023

Still WIP. The new tests do not succeed atm.

if (io::util::trigger_copyup(
m_dst_image_ctx, m_dst_object_number, io_context, ctx)) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not a review, just something that came up in a conversation about mocking)

This would hang (and leak some memory) if io::util::trigger_copyup() returns false because you just bail from ObjectCopyRequest::trigger_copyup() in that case. If it's "not possible" for io::util::trigger_copyup() to return false here, an assert would be nice.

@nbalacha
Copy link
Contributor Author

nbalacha commented Nov 2, 2023

@trociny , can you please take a look and let me know if this is the right approach?

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the approach seems right to me. There are comments though.

bool r = io::util::trigger_copyup(
m_dst_image_ctx, m_dst_object_number, io_context, ctx);
if (r == false) {
// No parent found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to delete ctx manually here otherwise you will leak a memory when trigger_copyup is failed (as Ilya mentioned).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

m_dst_image_ctx, m_dst_object_number, io_context, ctx);
if (r == false) {
// No parent found
send_update_object_map();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would add return here for clarity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -135,6 +135,9 @@ class ObjectCopyRequest {
void send_update_object_map();
void handle_update_object_map(int r);

void trigger_copyup();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to update the state diagram above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ldout(m_cct, 20) << "NITHYA: m_src_snap_id_start: " << m_src_snap_id_start << ", parent:" << m_src_image_ctx->parent << dendl;
if (m_src_snap_id_start != 0 && m_src_image_ctx->parent != nullptr) {
ldout(m_cct, 20) << dendl;
auto io_context = m_dst_image_ctx->duplicate_data_io_context();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to duplicate data io context instead of just using it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// For rbd-mirror scenarios, we trigger a copyup on the object and write the parent
// data to the object before writing the new clone image data.
ldout(m_cct, 20) << "NITHYA: m_src_snap_id_start: " << m_src_snap_id_start << ", parent:" << m_src_image_ctx->parent << dendl;
if (m_src_snap_id_start != 0 && m_src_image_ctx->parent != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to check m_dst_image_ctx->parent instead? Also, I think you should be stricter, i.e. you should try to trigger copyup only when the dst_snap_id_start is "attached" to the parent. Otherwise you will trigger copyup on every new mirrored snapshot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the first point. The trigger_copyup needs to be called only on cloned images hence the check for a non-null src parent. The dst image would be a clone as well as the mirror daemon requires the parent to be mirrored in this scenario.

trigger copyup only when the dst_snap_id_start is "attached" to the parent.
I'll look into this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not to check the destination image for the parent (where we will need to trigger copyup)? Note, I am thinking here about a more generic scenario, not only the current rbd-mirror behavior. E.g. someone did rbd deep-copy with --flatten option (i.e. the destination image will not have a parent), and then later deep-copy only newer snapshots (like mirroring does). Currently we don't provide this feature but someone might want it for rbd deep-copy or some new application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, rbd-mirror might want this flatten option too, e.g. when the parent image is not mirrored. My point is that you cannot assume that the destination image will always have a parent if the source image has it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC , the flatten option with mirroring and the incremental snapshot deep copy are possible future features. We are currently not targeting mirroring without the parent images being mirrored.
I would like to work on the current issue where the parent is also required to be mirrored as we have a requirement to get this fixed and address the other scenarios later.

// This causes the rbd-mirror to ignore the parent data and only sync the new write
// to the clone, causing data corruption.
// For rbd-mirror scenarios, we trigger a copyup on the object and write the parent
// data to the object before writing the new clone image data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would not use rbd-mirror/mirroring words here, it may be any other application or tool that is doing deep copy for a clone. BTW, I have not checked but I suppose the rbd deep-copy could be also extended to provide similar functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can clarify that further in the comment. The issue is seen only with rbd-mirroring as of now. The regular deep copy/migration etc work correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about clarifying. It is about mentioning a specific application (rbd-mirror) in librbd code. Just describe the scenario but don't specify it is "rbd-mirror" scenario. Actually I am not sure we need this comment at all, but I would leave this to Ilya to decide.

Copy link
Contributor Author

@nbalacha nbalacha Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this comment is required to help understand what is going on. The lack of comments in the code makes it difficult to figure out why something is being done.
I have rewritten it to remove mention of rbd-mirror, however I personally think we need to have something written up as to which components use these functions and how.

Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jan 22, 2024
@idryomov idryomov removed the stale label Jan 22, 2024
@nbalacha nbalacha marked this pull request as ready for review January 24, 2024 10:24
@nbalacha
Copy link
Contributor Author

nbalacha commented Jan 24, 2024

@trociny , I have run into an issue where user snaps on the clone are not being mirrored correctly. I cannot find anything in the logs to indicate why. Do you have any idea as to what I should be looking at?

The following test fails:

bin/rbd --cluster site-a create -s 8M data/src1
bin/rbd --cluster site-a map -t nbd data/src1

xfs_io -d -c 'pwrite -S 0xadad -b 4M 0 5M' /dev/nbd0

bin/rbd --cluster site-a snap create data/src1@snap1
bin/rbd --cluster site-a snap protect data/src1@snap1
bin/rbd --cluster site-a unmap -t nbd data/src1

bin/rbd --cluster site-a clone data/src1@snap1 data/clone1
bin/rbd --cluster site-a map -t nbd data/clone1


bin/ceph --cluster site-b config set global debug_rbd 20
bin/ceph --cluster site-b config set global debug_rbd_mirror 20

bin/rbd --cluster site-a mirror image enable data/src1 snapshot
sleep 15
bin/rbd --cluster site-a snap create data/clone1@snap1
bin/rbd --cluster site-a mirror image enable data/clone1 snapshot
sleep 5
xfs_io -d -c 'pwrite -S 0x11 -b 4M 7M 1M' /dev/nbd0
xfs_io -d -c 'pread -v 4M 512' /dev/nbd0
xfs_io -d -c 'pread -v 7M 512' /dev/nbd0
bin/rbd --cluster site-a snap create data/clone1@snap2
bin/rbd --cluster site-a mirror image snapshot data/clone1
sleep 10
xfs_io -d -c 'pwrite -S 0x22 -b 4M 5M 2M' /dev/nbd0
bin/rbd --cluster site-a snap create data/clone1@snap3
bin/rbd --cluster site-a mirror image snapshot data/clone1

for i in {1..3}; do bin/rbd --cluster site-a map -t nbd data/clone1@snap$i; bin/rbd --cluster site-b map -t nbd data/clone1@snap$i;diff /dev/nbd0  /dev/nbd1; bin/rbd --cluster site-a unmap -t nbd data/clone1@snap$i; bin/rbd --cluster site-b unmap -t nbd data/clone1@snap$i;done

The actual final images match but snap2 and snap3 do not match on the primary and secondary.
I am working on this in parallel but would like it if this PR could be merged without waiting a fix.

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the approach looks right.

Please split the commit into two: librbd and rbd_mirror parts.

@@ -120,12 +121,18 @@ class CreateImageRequest {
void open_remote_parent_image();
void handle_open_remote_parent_image(int r);

void open_local_parent_image();
void handle_open_local_parent_image(int r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you also need to update the diagram above to include the new states.

@@ -275,6 +307,46 @@ void CreateImageRequest<I>::clone_image() {
close_remote_parent_image();
return;
}
if (m_mirror_image_mode == cls::rbd::MIRROR_IMAGE_MODE_SNAPSHOT) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if you did this check in handle_open_local_parent_image. This way it would be more clear why we needed to open the local parent image.

This would also mean that the checks above should be moved to handle_open_remote_parent_image and member variables m_snap_name and m_snap_namespace introduced to store the result.

Ideally, I think I would like we introduce image_replayer::ValidateSnapshot request (or something like this) that would call image_replayer::snapshot::ValidateSnapshot for the snapshot based mirroring, where this code could be moved. And for the journal based mirroring, it could be noop, or it could also have ``image_replayer::journal::ValidateSnapshot` that would just check the image snapshot exists. But I will not insist on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov and I did discuss having a separate snapshot/journal class implementations but thought this approach would be ok.

I'll look into moving the checks into the open functions.

// This causes the rbd-mirror to ignore the parent data and only sync the new write
// to the clone, causing data corruption.
// For rbd-mirror scenarios, we trigger a copyup on the object and write the parent
// data to the object before writing the new clone image data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not about clarifying. It is about mentioning a specific application (rbd-mirror) in librbd code. Just describe the scenario but don't specify it is "rbd-mirror" scenario. Actually I am not sure we need this comment at all, but I would leave this to Ilya to decide.

// object, the older snaps will be updated with the parent data.
// When called by snapshot based rbd-mirror, the diff in the older snap is ignored
// as it was processed earlier, causing the snapshot_delta to not include
// the parent data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need this comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it helps explain what happens - it took me a long time to figure this out.

@trociny
Copy link
Contributor

trociny commented Jan 30, 2024

Please split the commit into two: librbd and rbd_mirror parts.

@nbalacha what about this?

@nbalacha
Copy link
Contributor Author

nbalacha commented Feb 9, 2024

Please split the commit into two: librbd and rbd_mirror parts.

@nbalacha what about this?

Sorry. Missed that earlier. Updated now.

@trociny
Copy link
Contributor

trociny commented Feb 13, 2024

@nbalacha jenkins build failed:

In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/deep_copy/test_mock_ObjectCopyRequest.cc:19:
/home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/io/Utils.h:54:6: error: function 'librbd::io::util::trigger_copyup<librbd::(anonymous namespace)::MockTestImageCtx>' is used but not defined in this translation unit, and cannot be defined in any other translation unit because its type does not have linkage
bool trigger_copyup(ImageCtxT *image_ctx, uint64_t object_no,
     ^
/home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/deep_copy/ObjectCopyRequest.cc:854:24: note: used here
    bool r = io::util::trigger_copyup(m_dst_image_ctx, m_dst_object_number,
                       ^
1 error generated.

Parent data is not synced to the non-primary with snapshot based rbd-mirroring
when the primary clone image is written to. The rbd-mirror daemon uses
deep_copy with subsets of snapshots. If a copyup operation on the clone image
modifies the already synced older snapshot with the parent data, a snapshot
diff will only return the new data that was written to the clone.
The fix checks that the parent image snap from which the clone was created
has been synced before creating the non-primary clone image.

Fixes: https://tracker.ceph.com/issues/61891

Signed-off-by: N Balachandran <nibalach@redhat.com>
Parent data is not synced to the non-primary after a copyup operation
on the primary image with snapshot based rbd-mirroring.
Writing to a non-existent object in an rbd clone image triggers a copyup
operation.The new  write data and the data from the parent are combined and
written to the new clone object and any existing snapshots on the clone are
updated with the parent data.
The rbd-mirror daemon uses deep_copy with subsets of snapshots. If a copyup
has modified the already synced older snapshot with the parent data, a
snapshot diff will only return the new data that was written to the clone. As
the deep_copy writes data to the dst object using rados calls instead of
rbd apis, a copyup is not performed and the parent data is lost.
The fix involves triggering a copyup on the object if the image is a clone.
This does not affect other deep_copy users as they copy the image and
all the existing snapshots as a whole.

Fixes: https://tracker.ceph.com/issues/61891

Signed-off-by: N Balachandran <nibalach@redhat.com>
These tests do not pass at the moment but are required in
order to get the build to succeed.

Signed-off-by: N Balachandran <nibalach@redhat.com>
@nbalacha
Copy link
Contributor Author

@nbalacha jenkins build failed:

In file included from /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/librbd/deep_copy/test_mock_ObjectCopyRequest.cc:19:
/home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/io/Utils.h:54:6: error: function 'librbd::io::util::trigger_copyup<librbd::(anonymous namespace)::MockTestImageCtx>' is used but not defined in this translation unit, and cannot be defined in any other translation unit because its type does not have linkage
bool trigger_copyup(ImageCtxT *image_ctx, uint64_t object_no,
     ^
/home/jenkins-build/build/workspace/ceph-pull-requests/src/librbd/deep_copy/ObjectCopyRequest.cc:854:24: note: used here
    bool r = io::util::trigger_copyup(m_dst_image_ctx, m_dst_object_number,
                       ^
1 error generated.

Done. The tests still fail though. @trociny , any idea as to what might be causing #53672 (comment) ?

@nbalacha nbalacha closed this Feb 15, 2024
@nbalacha nbalacha deleted the tracker-61891-2 branch February 15, 2024 11:36
@trociny
Copy link
Contributor

trociny commented Feb 16, 2024

@nbalacha Why did you close the PR? Was it accidental?

@trociny
Copy link
Contributor

trociny commented Feb 16, 2024

The tests still fail though. @trociny , any idea as to what might be causing #53672 (comment) ?

Have you investigated what is exactly different in the snapshots that do not match?

@nbalacha
Copy link
Contributor Author

I renamed the branch yesterday - looks like it has assumed it was deleted.
I shall create a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants