New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
librbd: data corruption on mirroring cloned images #53672
Conversation
I'm working on the following:
|
@nbalacha Do we have a tracker ticket for this? We will definitely want to create a ticket in tracker.ceph.com for this issue and link it here. |
@nbalacha I guess this PR is for the following issue. Is it correct? In addition, I'm glad if the backport tag in this will be filled. |
072df44
to
806284f
Compare
Done. |
Thank you @satoru-takeuchi . I will check about the backport tag. |
Still WIP. The new tests do not succeed atm. |
if (io::util::trigger_copyup( | ||
m_dst_image_ctx, m_dst_object_number, io_context, ctx)) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not a review, just something that came up in a conversation about mocking)
This would hang (and leak some memory) if io::util::trigger_copyup()
returns false because you just bail from ObjectCopyRequest::trigger_copyup()
in that case. If it's "not possible" for io::util::trigger_copyup()
to return false here, an assert would be nice.
e85f453
to
3dd9f10
Compare
c9eaf25
to
f17f0a6
Compare
@trociny , can you please take a look and let me know if this is the right approach? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the approach seems right to me. There are comments though.
bool r = io::util::trigger_copyup( | ||
m_dst_image_ctx, m_dst_object_number, io_context, ctx); | ||
if (r == false) { | ||
// No parent found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to delete ctx
manually here otherwise you will leak a memory when trigger_copyup
is failed (as Ilya mentioned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
m_dst_image_ctx, m_dst_object_number, io_context, ctx); | ||
if (r == false) { | ||
// No parent found | ||
send_update_object_map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would add return
here for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -135,6 +135,9 @@ class ObjectCopyRequest { | |||
void send_update_object_map(); | |||
void handle_update_object_map(int r); | |||
|
|||
void trigger_copyup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to update the state diagram above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ldout(m_cct, 20) << "NITHYA: m_src_snap_id_start: " << m_src_snap_id_start << ", parent:" << m_src_image_ctx->parent << dendl; | ||
if (m_src_snap_id_start != 0 && m_src_image_ctx->parent != nullptr) { | ||
ldout(m_cct, 20) << dendl; | ||
auto io_context = m_dst_image_ctx->duplicate_data_io_context(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to duplicate data io context instead of just using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// For rbd-mirror scenarios, we trigger a copyup on the object and write the parent | ||
// data to the object before writing the new clone image data. | ||
ldout(m_cct, 20) << "NITHYA: m_src_snap_id_start: " << m_src_snap_id_start << ", parent:" << m_src_image_ctx->parent << dendl; | ||
if (m_src_snap_id_start != 0 && m_src_image_ctx->parent != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to check m_dst_image_ctx->parent
instead? Also, I think you should be stricter, i.e. you should try to trigger copyup only when the dst_snap_id_start is "attached" to the parent. Otherwise you will trigger copyup on every new mirrored snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the first point. The trigger_copyup needs to be called only on cloned images hence the check for a non-null src parent. The dst image would be a clone as well as the mirror daemon requires the parent to be mirrored in this scenario.
trigger copyup only when the dst_snap_id_start is "attached" to the parent.
I'll look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why not to check the destination image for the parent (where we will need to trigger copyup)? Note, I am thinking here about a more generic scenario, not only the current rbd-mirror behavior. E.g. someone did rbd deep-copy
with --flatten
option (i.e. the destination image will not have a parent), and then later deep-copy only newer snapshots (like mirroring does). Currently we don't provide this feature but someone might want it for rbd deep-copy
or some new application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, rbd-mirror might want this flatten
option too, e.g. when the parent image is not mirrored. My point is that you cannot assume that the destination image will always have a parent if the source image has it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC , the flatten option with mirroring and the incremental snapshot deep copy are possible future features. We are currently not targeting mirroring without the parent images being mirrored.
I would like to work on the current issue where the parent is also required to be mirrored as we have a requirement to get this fixed and address the other scenarios later.
// This causes the rbd-mirror to ignore the parent data and only sync the new write | ||
// to the clone, causing data corruption. | ||
// For rbd-mirror scenarios, we trigger a copyup on the object and write the parent | ||
// data to the object before writing the new clone image data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would not use rbd-mirror/mirroring words here, it may be any other application or tool that is doing deep copy for a clone. BTW, I have not checked but I suppose the rbd deep-copy
could be also extended to provide similar functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can clarify that further in the comment. The issue is seen only with rbd-mirroring as of now. The regular deep copy/migration etc work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not about clarifying. It is about mentioning a specific application (rbd-mirror) in librbd code. Just describe the scenario but don't specify it is "rbd-mirror" scenario. Actually I am not sure we need this comment at all, but I would leave this to Ilya to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this comment is required to help understand what is going on. The lack of comments in the code makes it difficult to figure out why something is being done.
I have rewritten it to remove mention of rbd-mirror, however I personally think we need to have something written up as to which components use these functions and how.
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
f17f0a6
to
ccd3e7d
Compare
@trociny , I have run into an issue where user snaps on the clone are not being mirrored correctly. I cannot find anything in the logs to indicate why. Do you have any idea as to what I should be looking at? The following test fails:
The actual final images match but snap2 and snap3 do not match on the primary and secondary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the approach looks right.
Please split the commit into two: librbd and rbd_mirror parts.
@@ -120,12 +121,18 @@ class CreateImageRequest { | |||
void open_remote_parent_image(); | |||
void handle_open_remote_parent_image(int r); | |||
|
|||
void open_local_parent_image(); | |||
void handle_open_local_parent_image(int r); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you also need to update the diagram above to include the new states.
@@ -275,6 +307,46 @@ void CreateImageRequest<I>::clone_image() { | |||
close_remote_parent_image(); | |||
return; | |||
} | |||
if (m_mirror_image_mode == cls::rbd::MIRROR_IMAGE_MODE_SNAPSHOT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if you did this check in handle_open_local_parent_image
. This way it would be more clear why we needed to open the local parent image.
This would also mean that the checks above should be moved to handle_open_remote_parent_image
and member variables m_snap_name
and m_snap_namespace
introduced to store the result.
Ideally, I think I would like we introduce image_replayer::ValidateSnapshot
request (or something like this) that would call image_replayer::snapshot::ValidateSnapshot
for the snapshot based mirroring, where this code could be moved. And for the journal based mirroring, it could be noop, or it could also have ``image_replayer::journal::ValidateSnapshot` that would just check the image snapshot exists. But I will not insist on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idryomov and I did discuss having a separate snapshot/journal class implementations but thought this approach would be ok.
I'll look into moving the checks into the open functions.
// This causes the rbd-mirror to ignore the parent data and only sync the new write | ||
// to the clone, causing data corruption. | ||
// For rbd-mirror scenarios, we trigger a copyup on the object and write the parent | ||
// data to the object before writing the new clone image data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not about clarifying. It is about mentioning a specific application (rbd-mirror) in librbd code. Just describe the scenario but don't specify it is "rbd-mirror" scenario. Actually I am not sure we need this comment at all, but I would leave this to Ilya to decide.
// object, the older snaps will be updated with the parent data. | ||
// When called by snapshot based rbd-mirror, the diff in the older snap is ignored | ||
// as it was processed earlier, causing the snapshot_delta to not include | ||
// the parent data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we need this comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it helps explain what happens - it took me a long time to figure this out.
ccd3e7d
to
6f72642
Compare
@nbalacha what about this? |
6f72642
to
dc5389c
Compare
Sorry. Missed that earlier. Updated now. |
@nbalacha jenkins build failed:
|
Parent data is not synced to the non-primary with snapshot based rbd-mirroring when the primary clone image is written to. The rbd-mirror daemon uses deep_copy with subsets of snapshots. If a copyup operation on the clone image modifies the already synced older snapshot with the parent data, a snapshot diff will only return the new data that was written to the clone. The fix checks that the parent image snap from which the clone was created has been synced before creating the non-primary clone image. Fixes: https://tracker.ceph.com/issues/61891 Signed-off-by: N Balachandran <nibalach@redhat.com>
Parent data is not synced to the non-primary after a copyup operation on the primary image with snapshot based rbd-mirroring. Writing to a non-existent object in an rbd clone image triggers a copyup operation.The new write data and the data from the parent are combined and written to the new clone object and any existing snapshots on the clone are updated with the parent data. The rbd-mirror daemon uses deep_copy with subsets of snapshots. If a copyup has modified the already synced older snapshot with the parent data, a snapshot diff will only return the new data that was written to the clone. As the deep_copy writes data to the dst object using rados calls instead of rbd apis, a copyup is not performed and the parent data is lost. The fix involves triggering a copyup on the object if the image is a clone. This does not affect other deep_copy users as they copy the image and all the existing snapshots as a whole. Fixes: https://tracker.ceph.com/issues/61891 Signed-off-by: N Balachandran <nibalach@redhat.com>
These tests do not pass at the moment but are required in order to get the build to succeed. Signed-off-by: N Balachandran <nibalach@redhat.com>
42c0f9f
to
e4ca4bd
Compare
Done. The tests still fail though. @trociny , any idea as to what might be causing #53672 (comment) ? |
@nbalacha Why did you close the PR? Was it accidental? |
Have you investigated what is exactly different in the snapshots that do not match? |
I renamed the branch yesterday - looks like it has assumed it was deleted. |
Parent data is not synced to the non-primary after a copyup operation on the primary image with snapshot based rbd-mirroring. Writing to a non-existent object in an rbd clone image triggers a copyup operation.The new write data and the data from the parent are combined and written to the new clone object and any existing snapshots on the clone are updated with the parent data.
The rbd-mirror daemon uses deep_copy with subsets of snapshots. If a copyup has modified the already synced older snapshot with the parent data, a snapshot diff will only return the new data that was written to the clone. As the deep_copy writes data to the dst object using rados calls instead of rbd apis, a copyup is not performed and the parent data is lost. The fix involves triggering a copyup on the object if the image is a clone. This does not affect other deep_copy users as they copy the image and all the existing snapshots as a whole.
Fixes: https://tracker.ceph.com/issues/61891
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows