rgw: move delimiter-based bucket listing/filtering logic to cls #30272

ivancich · 2019-09-09T21:54:30Z

Presently, when listing the contents of a bucket using delimiters (i.e., for simulated subdirectories), the cls/osds return many more entries than will ultimately be listed. This uses CPU to encode and decode them along with network traffic to send them. This process would be much more efficient if the logic for this was pushed down to the cls/osd level.

This PR does other clean-ups as well.

Tracker: https://tracker.ceph.com/issues/41051

ivancich · 2019-09-10T19:59:12Z

jenkins test signed

markhpc · 2019-09-12T13:59:29Z

@ivancich Out of curiosity, can we still return bufferlist encoded versions of the entries over the wire while filtering in this way ala #29980 ? That also appeared to provide a pretty significant general performance boost to avoid the re-encode step (though we still have to decode for existing filtering).

I think we're going to have trade-offs doing intermediate decode/filter/encode at the OSD, but we can at least make it better if it's just decode/filter but not re-encode.

ivancich · 2019-09-12T15:03:38Z

@markhpc The filtering is only done on the identifier (and only when a delimiter is specified). The filtering itself does not need to decode (although I think other filtering in those functions might in some cases). So I think these changes are essentially orthogonal to yours.

src/cls/rgw/cls_rgw_types.h

src/cls/rgw/cls_rgw.cc

src/cls/rgw/cls_rgw_types.h

src/cls/rgw/cls_rgw.cc

ivancich · 2019-09-24T19:54:02Z

@cbodley, I accepted all your suggestions, and rebased on top of today's master. Please re-review at your earliest convenience. Thanks!

ivancich · 2019-09-24T19:56:32Z

BTW, it produces the same result as master when running s3tests_boto3.functional.

cbodley · 2019-09-26T19:50:55Z

src/rgw/rgw_rados.cc

-    }
-
-    if (!params.delim.empty()) {
-      int marker_delim_pos = cur_marker.name.find(params.delim, cur_prefix.size());


am i correct that we're removing this filtering from radosgw itself, on the assumption that the osd is doing it for us? i worry about cases where radosgw is upgraded to include this change, but the osds are not - which would leave these common entries unfiltered

to avoid that, i feel like we'd need to preserve this filtering logic in radosgw for another couple releases. :(

we'd still get the correct result if filtering happens in both locations, right?

@cbodley I assume we do not need to execute the logic in radosgw when the OSD -does- the filtering...

Then do we need a way to determine if the filtering did occur.... I can bump up the struct version # of rgw_cls_list_ret. I'd also add a boolean, such as cls_decoded. That boolean would never be transmitted. However the version of the structure would determine if it was true or false. If it was false, then the filtering would happen on the rgw side, but not if it's true.

right, rgw wouldn't know unless we extend the cls protocol. but even if the osd does this filtering, duplicating it in rgw should essentially be a noop right? if that's the case, i'd prefer to leave it as is, rather than complicating it further by making it conditional

Because I already did the work in cleaning up the filtering, I'd rather have a conditional. There are comments in the code that once we reach version 16 of Ceph, we can remove the compatibility code.

I ran the s3-tests in 3 combinations and got the same result each time:

new rgw, new osd : FAILED (SKIP=5, errors=22, failures=25)

new rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)

old rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)

So it all seems to work. The above includes 27 boto 3 tests with the prefixes "s3tests_boto3.functional.test_s3.test_bucket_list_delimiter*" and "s3tests_boto3.functional.test_s3.test_bucket_listv2_delimiter*".

ivancich · 2019-09-30T20:57:34Z

I added a commit to make sure the new rgws still work with the older osds. There are comments in the code that once we reach version 16 of Ceph, we can remove the compatibility code.

I ran the s3-tests in 3 combinations and got the same result each time:

new rgw, new osd : FAILED (SKIP=5, errors=22, failures=25)
new rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)
old rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)

So it all seems to work. The above includes 27 boto 3 tests with the prefixes "s3tests_boto3.functional.test_s3.test_bucket_list_delimiter*" and "s3tests_boto3.functional.test_s3.test_bucket_listv2_delimiter*".

cbodley · 2019-10-01T19:56:40Z

FAILED (SKIP=5, errors=22, failures=25)

what are the failures from? is this the ceph-master branch of s3tests with tags filtered out?
-a '!fails_on_rgw,!lifecycle_expiration,!fails_strict_rfc2616'

ivancich · 2019-10-02T00:01:44Z

FAILED (SKIP=5, errors=22, failures=25)

what are the failures from? is this the ceph-master branch of s3tests with tags filtered out?
-a '!fails_on_rgw,!lifecycle_expiration,!fails_strict_rfc2616'

So I'm running s3-tests master (fab38b2). When I run it with your command-line option (and with uncommenting that lifecycle line in vstart.sh), I get:

ERROR: s3tests_boto3.functional.test_s3.test_object_acl_full_control_verify_owner
ERROR: s3tests_boto3.functional.test_s3.test_object_acl_full_control_verify_attributes
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_fullcontrol
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_read
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_readacp
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_write
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_writeacp
ERROR: s3tests_boto3.functional.test_s3.test_object_header_acl_grants
ERROR: s3tests_boto3.functional.test_s3.test_bucket_header_acl_grants
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_email
ERROR: s3tests_boto3.functional.test_s3.test_object_copy_not_owned_object_bucket
ERROR: s3tests_boto3.functional.test_s3.test_bucket_policy_put_obj_grant

FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_default
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned_during_create
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned_publicreadwrite
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned_authenticatedread
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_default
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_during_create
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_publicreadwrite
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_authenticatedread
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_bucketownerread
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_bucketownerfullcontrol
FAIL: s3tests_boto3.functional.test_s3.test_object_acl
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_write
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_writeacp
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_read
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_readacp
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_nonexist_user
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_email_notexist
FAIL: s3tests_boto3.functional.test_s3.test_multipart_copy_improper_range
FAIL: s3tests_boto3.functional.test_s3.test_versioned_object_acl
FAIL: s3tests_boto3.functional.test_s3.test_versioned_object_acl_no_version_specified
FAIL: s3tests_boto3.functional.test_s3.test_copy_object_ifmatch_failed
FAIL: s3tests_boto3.functional.test_s3.test_copy_object_ifnonematch_good

FAILED (SKIP=5, errors=12, failures=24)

mattbenjamin · 2019-10-02T01:59:04Z

looking for trace of discussion re: making rgw behavior conditional

ivancich · 2019-10-02T03:54:07Z

looking for trace of discussion re: making rgw behavior conditional

I "unresolved" that portion of the conversation, so it shows up.

ivancich · 2019-11-11T22:55:09Z

This required a complicated rebase. I re-ran s3-tests in 3 combinations -- old osd and old rgw, old osd and new rgw, new osd and new rgw -- and the results were the same across all of them.

stale · 2020-01-14T22:30:22Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

ivancich · 2020-01-15T00:03:52Z

unstale

cbodley · 2020-01-22T20:50:35Z

this passed qa in http://pulpito.ceph.com/cbodley-2020-01-22_18:24:28-rgw-wip-cbodley-testing2-distro-basic-smithi/ but needs a rebase

ivancich · 2020-01-23T18:50:04Z

@cbodley Thanks. I'll do the rebase and merge.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Rather than hard-coding literals (of 1000) throughout the code, use a constexpr given it's the standard chunk we're reading from a bucket index. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Since we need to be able to calculate a key a path subdirectory both in rgw and cls, move the code that calculates that to cls_rgw_types.h, which is shared by both rgw and cls/osd. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Some of the logic surrounding bucket indexes and the listing thereof can be subtle. Add some comments to help future developers working on this code. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Redefine global constants that define flags in bucket index entries to struct-scoped. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

cls_cxx_map_get_vals takes a parameter of where in the sequence of keys to start returning values. In code it's often referred to as the "start_key", but in fact it only returns values who have keys *after* it. To clarify the semantics, rename the variable to "start_after_key" in the bucket listing functions in both rgw and cls. Also, declare a type RGWRados::check_filter_t, so the type does not have to be fully literalized multiple times. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Currently the cls code that returns bucket index entries does no filtering when a delimiter is specified. This means that the cls code will return many entries that the rgw code does not need. This means unnecessary bucket index entries will be encoded, decoded, and consume network bandwidth. A change is made to send a delimiter, if it's specified, to the cls code, so the filtering can be done there. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

Previous commits moved bucket list filtering when a delimiter was specified to the osd/cls layer. However, since rgw's are often upgraded before osd's are, until we reach verison ceph version 16, an rgw cannot assume that the osd/cls did the filtering. This is addressed in the following ways.... First rgw_cls_list_ret now indicates whether filtering was done on the osd/cls side. And second, the old filtering code in the rgw is maintained in RGWRados::Bucket::List::list_objects_ordered, so it can still be triggered when all osd's are not doing the filtering. Once we reach ceph version 16, and there is no chance that the rgw is working with a osd running "young" version 14 code, we can remove the backward compatibility code in RGWRados::Bucket::List::list_objects_ordered. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

ivancich · 2020-01-24T20:03:36Z

@cbodley I've rebased, but it was non-trivial. I'd like to do some testing before merging. A lot of PRs are hitting the same segments of code. The 0-length listing one will need to be folded in as well, so I've temporarily added the DNM label.

ivancich · 2020-01-25T19:11:13Z

After the rebase I re-did my triplet of s3-tests. OLD OSD + OLD RGW, OLD OSD + NEW RGW, NEW OSD + NEW RGW, and all got identical results with the s3-tests. So I'll now merge. Thanks, @cbodley!

ivancich added cleanup needs-review performance rgw labels Sep 9, 2019

ivancich requested review from mattbenjamin, theanalyst and cbodley September 9, 2019 21:55

cbodley reviewed Sep 12, 2019

View reviewed changes

ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from e2d068e to 0d51e92 Compare September 13, 2019 04:00

ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch 2 times, most recently from 34b7e4e to b5ba9ed Compare September 24, 2019 19:52

ivancich requested a review from cbodley September 24, 2019 19:54

cbodley reviewed Sep 26, 2019

View reviewed changes

ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from eb6a84c to 04fa4e5 Compare September 30, 2019 21:06

ivancich requested a review from cbodley October 1, 2019 14:12

ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from 04fa4e5 to 0ddabd8 Compare October 2, 2019 17:30

cbodley approved these changes Oct 2, 2019

View reviewed changes

cbodley added needs-qa and removed needs-review labels Oct 2, 2019

ivancich mentioned this pull request Oct 15, 2019

rgw: reduce per-shard entry count during ordered bucket listing #30853

Merged

ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from c42a845 to 1340a92 Compare November 15, 2019 22:12

ivancich added the wip-eric-testing-1 for ivancich testing label Dec 5, 2019

ivancich removed the wip-eric-testing-1 for ivancich testing label Jan 14, 2020

stale bot added the stale label Jan 14, 2020

stale bot removed the stale label Jan 15, 2020

cbodley added the wip-cbodley-testing label Jan 16, 2020

cbodley removed needs-qa wip-cbodley-testing labels Jan 22, 2020

ivancich added 10 commits January 24, 2020 13:44

rgw: add missing editor directive to file

d240407

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

rgw: use a constexpr to specify standard # of bucket entries to list

6df5311

Rather than hard-coding literals (of 1000) throughout the code, use a constexpr given it's the standard chunk we're reading from a bucket index. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

rgw: add documentation to bucket index code

94dcdb4

Some of the logic surrounding bucket indexes and the listing thereof can be subtle. Add some comments to help future developers working on this code. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

rgw: change scope of bucket index entry flags

3084a7f

Redefine global constants that define flags in bucket index entries to struct-scoped. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

rgw: reformat some code to make it easier to read

ff09bb5

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

rgw: add unit testing for delimiter filtering in cls

dd6cdd0

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>

ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from 1340a92 to 5a67af5 Compare January 24, 2020 20:01

ivancich added the DNM label Jan 24, 2020

ivancich merged commit 55f131e into ceph:master Jan 25, 2020

ivancich deleted the wip-move-some-bucket-listing-logic-to-cls branch January 25, 2020 19:12

ivancich removed the DNM label Jan 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rgw: move delimiter-based bucket listing/filtering logic to cls #30272

rgw: move delimiter-based bucket listing/filtering logic to cls #30272

ivancich commented Sep 9, 2019 •

edited

ivancich commented Sep 10, 2019

markhpc commented Sep 12, 2019 •

edited

ivancich commented Sep 12, 2019

ivancich commented Sep 24, 2019

ivancich commented Sep 24, 2019

cbodley Sep 26, 2019

mattbenjamin Sep 26, 2019

ivancich Sep 27, 2019

cbodley Sep 27, 2019

ivancich Sep 30, 2019

ivancich commented Sep 30, 2019

cbodley commented Oct 1, 2019

ivancich commented Oct 2, 2019

mattbenjamin commented Oct 2, 2019

ivancich commented Oct 2, 2019

ivancich commented Nov 11, 2019

stale bot commented Jan 14, 2020

ivancich commented Jan 15, 2020

cbodley commented Jan 22, 2020

ivancich commented Jan 23, 2020

ivancich commented Jan 24, 2020 •

edited

ivancich commented Jan 25, 2020

rgw: move delimiter-based bucket listing/filtering logic to cls #30272

rgw: move delimiter-based bucket listing/filtering logic to cls #30272

Conversation

ivancich commented Sep 9, 2019 • edited

ivancich commented Sep 10, 2019

markhpc commented Sep 12, 2019 • edited

ivancich commented Sep 12, 2019

ivancich commented Sep 24, 2019

ivancich commented Sep 24, 2019

cbodley Sep 26, 2019

Choose a reason for hiding this comment

mattbenjamin Sep 26, 2019

Choose a reason for hiding this comment

ivancich Sep 27, 2019

Choose a reason for hiding this comment

cbodley Sep 27, 2019

Choose a reason for hiding this comment

ivancich Sep 30, 2019

Choose a reason for hiding this comment

ivancich commented Sep 30, 2019

cbodley commented Oct 1, 2019

ivancich commented Oct 2, 2019

mattbenjamin commented Oct 2, 2019

ivancich commented Oct 2, 2019

ivancich commented Nov 11, 2019

stale bot commented Jan 14, 2020

ivancich commented Jan 15, 2020

cbodley commented Jan 22, 2020

ivancich commented Jan 23, 2020

ivancich commented Jan 24, 2020 • edited

ivancich commented Jan 25, 2020

ivancich commented Sep 9, 2019 •

edited

markhpc commented Sep 12, 2019 •

edited

ivancich commented Jan 24, 2020 •

edited