Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: Fix multisite Synchronization failed when read and write delete … #20814

Merged
merged 1 commit into from Apr 12, 2018
Merged

rgw: Fix multisite Synchronization failed when read and write delete … #20814

merged 1 commit into from Apr 12, 2018

Conversation

niupengju
Copy link

…at the same time

This case is firsrt write objA,then write and delete objA at the same
time,write early than delete.
When del objA, use information which stat of first write objA, so the
op should del the first write data.However when try to del objA, objA
header is second write, so osd "do_xattr_cmp_str" has found idtag change
and return -125(canceled),after rgw client receive the ret -125 , it
will still do "complete_del", then do cls_obj_complete_del to write
bilog。"complete_op" in cls_rgw module will write bilog with second
write mtime and second ".ver.epoch". Finally, del op append behind the
second write in bilog. And the slave rgw will merge write op and del op
as del op, and del data,but master rgw complete second write and cancel
del.
This logic is problematic, so bilog recording the del op should use
first write mtime can ensure the correctness of the operation sequence.

Fixes: http://tracker.ceph.com/issues/22804
Signed-off-by: Niu Pengju pengju.niu@xtaotech.com

@tchaikov tchaikov added the rgw label Mar 9, 2018
@yehudasa
Copy link
Member

@niupengju there was a reason why we use the object's mtime in the delete, I think it was because it was needed for the tombstone cache.
I think the way forward here is in the delete operation to not complete successfully if we identify a racing write, but to cancel the index change.

@niupengju
Copy link
Author

@yehudasa yeah, when I encountered this problem,I have two solutions.

  1. It's this, because I think It's delete first write,so should use first write mtime in del complete bilog, not user latest mtime.
  2. I think delete operation to not complete successfully, so should write cancel into bilog, but there is a commention which "/* raced with another operation, we can regard it as removed */" in code.

Can use second solution?

@cbodley
Copy link
Contributor

cbodley commented Apr 5, 2018

hi @niupengju, i've opened #21262 which follows that second approach - does that look reasonable to you?

…at the same time

This case is first write objA,then write and delete objA at the same
time,write early than delete.
When del objA, use information which stat  of first write objA, so the
op should del the first write data.However when try to del objA, objA
header is second write, so osd "do_xattr_cmp_str" has found idtag change
and return -125(canceled),after rgw client receive the ret -125 , it
will still do "complete_del", then do cls_obj_complete_del to write
bilog。"complete_op" in cls_rgw module  will  write bilog with second
write mtime and second ".ver.epoch". Finally, del op append behind the
second write in bilog. And the slave rgw will merge write op and del op
as del op, and del data,but master rgw complete second write and cancel
del.
This logic is problematic,  so bilog recording the del op should use
cancel op. And squash_map should skip the cancel op.

Fixes: http://tracker.ceph.com/issues/22804
Signed-off-by: Niu Pengju <pengju.niu@xtaotech.com>
@niupengju
Copy link
Author

hi@cbodley, yeah, it's the second approach, but imperfect. should premeditate the ret of delete_obj and the op merge in squash_map.

@niupengju
Copy link
Author

@yehudasa please help review,thank you!

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍 let's get this through testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants