New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: move delimiter-based bucket listing/filtering logic to cls #30272
rgw: move delimiter-based bucket listing/filtering logic to cls #30272
Conversation
jenkins test signed |
@ivancich Out of curiosity, can we still return bufferlist encoded versions of the entries over the wire while filtering in this way ala #29980 ? That also appeared to provide a pretty significant general performance boost to avoid the re-encode step (though we still have to decode for existing filtering). I think we're going to have trade-offs doing intermediate decode/filter/encode at the OSD, but we can at least make it better if it's just decode/filter but not re-encode. |
@markhpc The filtering is only done on the identifier (and only when a delimiter is specified). The filtering itself does not need to decode (although I think other filtering in those functions might in some cases). So I think these changes are essentially orthogonal to yours. |
e2d068e
to
0d51e92
Compare
34b7e4e
to
b5ba9ed
Compare
@cbodley, I accepted all your suggestions, and rebased on top of today's master. Please re-review at your earliest convenience. Thanks! |
BTW, it produces the same result as master when running s3tests_boto3.functional. |
} | ||
|
||
if (!params.delim.empty()) { | ||
int marker_delim_pos = cur_marker.name.find(params.delim, cur_prefix.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i correct that we're removing this filtering from radosgw itself, on the assumption that the osd is doing it for us? i worry about cases where radosgw is upgraded to include this change, but the osds are not - which would leave these common entries unfiltered
to avoid that, i feel like we'd need to preserve this filtering logic in radosgw for another couple releases. :(
we'd still get the correct result if filtering happens in both locations, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbodley I assume we do not need to execute the logic in radosgw when the OSD -does- the filtering...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then do we need a way to determine if the filtering did occur.... I can bump up the struct version # of rgw_cls_list_ret. I'd also add a boolean, such as cls_decoded. That boolean would never be transmitted. However the version of the structure would determine if it was true or false. If it was false, then the filtering would happen on the rgw side, but not if it's true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, rgw wouldn't know unless we extend the cls protocol. but even if the osd does this filtering, duplicating it in rgw should essentially be a noop right? if that's the case, i'd prefer to leave it as is, rather than complicating it further by making it conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I already did the work in cleaning up the filtering, I'd rather have a conditional. There are comments in the code that once we reach version 16 of Ceph, we can remove the compatibility code.
I ran the s3-tests in 3 combinations and got the same result each time:
- new rgw, new osd : FAILED (SKIP=5, errors=22, failures=25)
- new rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)
- old rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)
So it all seems to work. The above includes 27 boto 3 tests with the prefixes "s3tests_boto3.functional.test_s3.test_bucket_list_delimiter*" and "s3tests_boto3.functional.test_s3.test_bucket_listv2_delimiter*".
I added a commit to make sure the new rgws still work with the older osds. There are comments in the code that once we reach version 16 of Ceph, we can remove the compatibility code. I ran the s3-tests in 3 combinations and got the same result each time:
So it all seems to work. The above includes 27 boto 3 tests with the prefixes "s3tests_boto3.functional.test_s3.test_bucket_list_delimiter*" and "s3tests_boto3.functional.test_s3.test_bucket_listv2_delimiter*". |
eb6a84c
to
04fa4e5
Compare
what are the failures from? is this the ceph-master branch of s3tests with tags filtered out? |
So I'm running s3-tests master (fab38b2). When I run it with your command-line option (and with uncommenting that lifecycle line in vstart.sh), I get: ERROR: s3tests_boto3.functional.test_s3.test_object_acl_full_control_verify_owner FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_default FAILED (SKIP=5, errors=12, failures=24) |
looking for trace of discussion re: making rgw behavior conditional |
I "unresolved" that portion of the conversation, so it shows up. |
04fa4e5
to
0ddabd8
Compare
This required a complicated rebase. I re-ran s3-tests in 3 combinations -- old osd and old rgw, old osd and new rgw, new osd and new rgw -- and the results were the same across all of them. |
c42a845
to
1340a92
Compare
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
unstale |
this passed qa in http://pulpito.ceph.com/cbodley-2020-01-22_18:24:28-rgw-wip-cbodley-testing2-distro-basic-smithi/ but needs a rebase |
@cbodley Thanks. I'll do the rebase and merge. |
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Rather than hard-coding literals (of 1000) throughout the code, use a constexpr given it's the standard chunk we're reading from a bucket index. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Since we need to be able to calculate a key a path subdirectory both in rgw and cls, move the code that calculates that to cls_rgw_types.h, which is shared by both rgw and cls/osd. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Some of the logic surrounding bucket indexes and the listing thereof can be subtle. Add some comments to help future developers working on this code. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Redefine global constants that define flags in bucket index entries to struct-scoped. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
cls_cxx_map_get_vals takes a parameter of where in the sequence of keys to start returning values. In code it's often referred to as the "start_key", but in fact it only returns values who have keys *after* it. To clarify the semantics, rename the variable to "start_after_key" in the bucket listing functions in both rgw and cls. Also, declare a type RGWRados::check_filter_t, so the type does not have to be fully literalized multiple times. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Currently the cls code that returns bucket index entries does no filtering when a delimiter is specified. This means that the cls code will return many entries that the rgw code does not need. This means unnecessary bucket index entries will be encoded, decoded, and consume network bandwidth. A change is made to send a delimiter, if it's specified, to the cls code, so the filtering can be done there. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Previous commits moved bucket list filtering when a delimiter was specified to the osd/cls layer. However, since rgw's are often upgraded before osd's are, until we reach verison ceph version 16, an rgw cannot assume that the osd/cls did the filtering. This is addressed in the following ways.... First rgw_cls_list_ret now indicates whether filtering was done on the osd/cls side. And second, the old filtering code in the rgw is maintained in RGWRados::Bucket::List::list_objects_ordered, so it can still be triggered when all osd's are not doing the filtering. Once we reach ceph version 16, and there is no chance that the rgw is working with a osd running "young" version 14 code, we can remove the backward compatibility code in RGWRados::Bucket::List::list_objects_ordered. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
1340a92
to
5a67af5
Compare
@cbodley I've rebased, but it was non-trivial. I'd like to do some testing before merging. A lot of PRs are hitting the same segments of code. The 0-length listing one will need to be folded in as well, so I've temporarily added the DNM label. |
After the rebase I re-did my triplet of s3-tests. OLD OSD + OLD RGW, OLD OSD + NEW RGW, NEW OSD + NEW RGW, and all got identical results with the s3-tests. So I'll now merge. Thanks, @cbodley! |
Presently, when listing the contents of a bucket using delimiters (i.e., for simulated subdirectories), the cls/osds return many more entries than will ultimately be listed. This uses CPU to encode and decode them along with network traffic to send them. This process would be much more efficient if the logic for this was pushed down to the cls/osd level.
This PR does other clean-ups as well.
Tracker: https://tracker.ceph.com/issues/41051