New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: fix abort multipart in lc when enable index shard #26480
rgw: fix abort multipart in lc when enable index shard #26480
Conversation
Signed-off-by: yuliyang <yuliyang@cmss.chinamobile.com>
hi @cbodley , would you mind take a review? thanks |
@ivancich could you have a look at this, as well? i.e., as a potential reproducer downstream |
hi @cbodley , would you mind take a review? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting issue and leads to an important insight.
Looking at the code, the issue seems to be using the start
parameter in cls_bucket_list_unordered
in two different ways -- as a marker to allow sequential calls to move through the entire listing vs. as a prefix to jump past uninteresting listings and get to the interesting ones.
The code works fine in the former. But because the listing is unordered, using start
as a prefix fails. It's worth noting that in the ordered case cls_bucket_list_ordered
it will work fine in both conditions.
The proposed solution seems to be a hack, treating a start
of _multipart_
in a special manner.
The simpler solution would be to call cls_bucket_list_ordered
in the case where we want to do a prefix search and expect all entries that begin with _multipart_
to emerge sequentially, as that would allow the loop on the caller to exit when it encounters the first entry that does not have the prefix _multipart_
.
If we really do want to use cls_bucket_list_unordered
, then the calling code needs to be aware of the semantics. For the first call it needs to send the empty string as start
, and use it as a marker as it moves through the listings. The calling code also needs to be aware that there will be non-_multipart_
entries interspersed among the _multipart_
entries and that it will need to exhaust the listing.
I'd like to discuss this with Matt and Casey and at this moment I do not know what would be the better solution. I do not believe that the proposed solution is ideal, though.
@mattbenjamin , @cbodley Would you mind looking at my review comment and weighing in? As you'll see, I think we need to either change the call to the Either way I think I need to better document |
I've put together what I believe is a more general fix to the issue you idenified (thank you!). I tried to add you as a reviewer but was unable to and I'm not sure why. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the reasons I outlined previously, I do not think this is appropriate for merging.
In my previous comments I did mis-describe the issue. I thought the prefix was being sent in explicitly as the marker. What's actually happening, though, is more complex in that the namespace is being moved into the marker.
@joke-lee After you look at the alternate solution, if you agree, perhaps we can close this PR. Again, thank you for your help here; it was instrumental. |
rgw_override_bucket_index_max_shards = 128
rgw_enable_lc_threads = false
rgw_lifecycle_work_time = "00:00-24:00"
rgw lc debug interval = 1
we set shard num 128
and get which shard the obj store in
the obj in shard 39
set the lc rule
and the abort will never execute
Signed-off-by: yuliyang yuliyang@cmss.chinamobile.com