Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: move delimiter-based bucket listing/filtering logic to cls #30272

Merged

Conversation

ivancich
Copy link
Member

@ivancich ivancich commented Sep 9, 2019

Presently, when listing the contents of a bucket using delimiters (i.e., for simulated subdirectories), the cls/osds return many more entries than will ultimately be listed. This uses CPU to encode and decode them along with network traffic to send them. This process would be much more efficient if the logic for this was pushed down to the cls/osd level.

This PR does other clean-ups as well.

Tracker: https://tracker.ceph.com/issues/41051

@ivancich
Copy link
Member Author

jenkins test signed

@markhpc
Copy link
Member

markhpc commented Sep 12, 2019

@ivancich Out of curiosity, can we still return bufferlist encoded versions of the entries over the wire while filtering in this way ala #29980 ? That also appeared to provide a pretty significant general performance boost to avoid the re-encode step (though we still have to decode for existing filtering).

I think we're going to have trade-offs doing intermediate decode/filter/encode at the OSD, but we can at least make it better if it's just decode/filter but not re-encode.

@ivancich
Copy link
Member Author

@markhpc The filtering is only done on the identifier (and only when a delimiter is specified). The filtering itself does not need to decode (although I think other filtering in those functions might in some cases). So I think these changes are essentially orthogonal to yours.

src/cls/rgw/cls_rgw_types.h Outdated Show resolved Hide resolved
src/cls/rgw/cls_rgw.cc Outdated Show resolved Hide resolved
src/cls/rgw/cls_rgw.cc Outdated Show resolved Hide resolved
src/cls/rgw/cls_rgw_types.h Outdated Show resolved Hide resolved
src/cls/rgw/cls_rgw.cc Show resolved Hide resolved
@ivancich ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from e2d068e to 0d51e92 Compare September 13, 2019 04:00
@ivancich ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch 2 times, most recently from 34b7e4e to b5ba9ed Compare September 24, 2019 19:52
@ivancich
Copy link
Member Author

@cbodley, I accepted all your suggestions, and rebased on top of today's master. Please re-review at your earliest convenience. Thanks!

@ivancich
Copy link
Member Author

BTW, it produces the same result as master when running s3tests_boto3.functional.

}

if (!params.delim.empty()) {
int marker_delim_pos = cur_marker.name.find(params.delim, cur_prefix.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i correct that we're removing this filtering from radosgw itself, on the assumption that the osd is doing it for us? i worry about cases where radosgw is upgraded to include this change, but the osds are not - which would leave these common entries unfiltered

to avoid that, i feel like we'd need to preserve this filtering logic in radosgw for another couple releases. :(

we'd still get the correct result if filtering happens in both locations, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley I assume we do not need to execute the logic in radosgw when the OSD -does- the filtering...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then do we need a way to determine if the filtering did occur.... I can bump up the struct version # of rgw_cls_list_ret. I'd also add a boolean, such as cls_decoded. That boolean would never be transmitted. However the version of the structure would determine if it was true or false. If it was false, then the filtering would happen on the rgw side, but not if it's true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, rgw wouldn't know unless we extend the cls protocol. but even if the osd does this filtering, duplicating it in rgw should essentially be a noop right? if that's the case, i'd prefer to leave it as is, rather than complicating it further by making it conditional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I already did the work in cleaning up the filtering, I'd rather have a conditional. There are comments in the code that once we reach version 16 of Ceph, we can remove the compatibility code.

I ran the s3-tests in 3 combinations and got the same result each time:

  • new rgw, new osd : FAILED (SKIP=5, errors=22, failures=25)
  • new rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)
  • old rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)

So it all seems to work. The above includes 27 boto 3 tests with the prefixes "s3tests_boto3.functional.test_s3.test_bucket_list_delimiter*" and "s3tests_boto3.functional.test_s3.test_bucket_listv2_delimiter*".

@ivancich
Copy link
Member Author

I added a commit to make sure the new rgws still work with the older osds. There are comments in the code that once we reach version 16 of Ceph, we can remove the compatibility code.

I ran the s3-tests in 3 combinations and got the same result each time:

  • new rgw, new osd : FAILED (SKIP=5, errors=22, failures=25)
  • new rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)
  • old rgw, old osd: FAILED (SKIP=5, errors=22, failures=25)

So it all seems to work. The above includes 27 boto 3 tests with the prefixes "s3tests_boto3.functional.test_s3.test_bucket_list_delimiter*" and "s3tests_boto3.functional.test_s3.test_bucket_listv2_delimiter*".

@ivancich ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from eb6a84c to 04fa4e5 Compare September 30, 2019 21:06
@ivancich ivancich requested a review from cbodley October 1, 2019 14:12
@cbodley
Copy link
Contributor

cbodley commented Oct 1, 2019

FAILED (SKIP=5, errors=22, failures=25)

what are the failures from? is this the ceph-master branch of s3tests with tags filtered out?
-a '!fails_on_rgw,!lifecycle_expiration,!fails_strict_rfc2616'

@ivancich
Copy link
Member Author

ivancich commented Oct 2, 2019

FAILED (SKIP=5, errors=22, failures=25)

what are the failures from? is this the ceph-master branch of s3tests with tags filtered out?
-a '!fails_on_rgw,!lifecycle_expiration,!fails_strict_rfc2616'

So I'm running s3-tests master (fab38b2). When I run it with your command-line option (and with uncommenting that lifecycle line in vstart.sh), I get:

ERROR: s3tests_boto3.functional.test_s3.test_object_acl_full_control_verify_owner
ERROR: s3tests_boto3.functional.test_s3.test_object_acl_full_control_verify_attributes
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_fullcontrol
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_read
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_readacp
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_write
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_userid_writeacp
ERROR: s3tests_boto3.functional.test_s3.test_object_header_acl_grants
ERROR: s3tests_boto3.functional.test_s3.test_bucket_header_acl_grants
ERROR: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_email
ERROR: s3tests_boto3.functional.test_s3.test_object_copy_not_owned_object_bucket
ERROR: s3tests_boto3.functional.test_s3.test_bucket_policy_put_obj_grant

FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_default
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned_during_create
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned_publicreadwrite
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_canned_authenticatedread
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_default
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_during_create
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_publicreadwrite
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_authenticatedread
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_bucketownerread
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_canned_bucketownerfullcontrol
FAIL: s3tests_boto3.functional.test_s3.test_object_acl
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_write
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_writeacp
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_read
FAIL: s3tests_boto3.functional.test_s3.test_object_acl_readacp
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_nonexist_user
FAIL: s3tests_boto3.functional.test_s3.test_bucket_acl_grant_email_notexist
FAIL: s3tests_boto3.functional.test_s3.test_multipart_copy_improper_range
FAIL: s3tests_boto3.functional.test_s3.test_versioned_object_acl
FAIL: s3tests_boto3.functional.test_s3.test_versioned_object_acl_no_version_specified
FAIL: s3tests_boto3.functional.test_s3.test_copy_object_ifmatch_failed
FAIL: s3tests_boto3.functional.test_s3.test_copy_object_ifnonematch_good

FAILED (SKIP=5, errors=12, failures=24)

@mattbenjamin
Copy link
Contributor

looking for trace of discussion re: making rgw behavior conditional

@ivancich
Copy link
Member Author

ivancich commented Oct 2, 2019

looking for trace of discussion re: making rgw behavior conditional

I "unresolved" that portion of the conversation, so it shows up.

@ivancich ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from 04fa4e5 to 0ddabd8 Compare October 2, 2019 17:30
@ivancich
Copy link
Member Author

This required a complicated rebase. I re-ran s3-tests in 3 combinations -- old osd and old rgw, old osd and new rgw, new osd and new rgw -- and the results were the same across all of them.

@ivancich ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from c42a845 to 1340a92 Compare November 15, 2019 22:12
@ivancich ivancich added the wip-eric-testing-1 for ivancich testing label Dec 5, 2019
@ivancich ivancich removed the wip-eric-testing-1 for ivancich testing label Jan 14, 2020
@stale
Copy link

stale bot commented Jan 14, 2020

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Jan 14, 2020
@ivancich
Copy link
Member Author

unstale

@cbodley
Copy link
Contributor

cbodley commented Jan 22, 2020

@ivancich
Copy link
Member Author

@cbodley Thanks. I'll do the rebase and merge.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Rather than hard-coding literals (of 1000) throughout the code, use a
constexpr given it's the standard chunk we're reading from a bucket
index.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Since we need to be able to calculate a key a path subdirectory both
in rgw and cls, move the code that calculates that to cls_rgw_types.h,
which is shared by both rgw and cls/osd.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Some of the logic surrounding bucket indexes and the listing thereof
can be subtle. Add some comments to help future developers working on
this code.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Redefine global constants that define flags in bucket index entries to
struct-scoped.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
cls_cxx_map_get_vals takes a parameter of where in the sequence of
keys to start returning values. In code it's often referred to as the
"start_key", but in fact it only returns values who have keys *after*
it. To clarify the semantics, rename the variable to "start_after_key"
in the bucket listing functions in both rgw and cls.

Also, declare a type RGWRados::check_filter_t, so the type does not
have to be fully literalized multiple times.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Currently the cls code that returns bucket index entries does no
filtering when a delimiter is specified. This means that the cls code
will return many entries that the rgw code does not need. This means
unnecessary bucket index entries will be encoded, decoded, and consume
network bandwidth.

A change is made to send a delimiter, if it's specified, to the cls
code, so the filtering can be done there.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Previous commits moved bucket list filtering when a delimiter was
specified to the osd/cls layer. However, since rgw's are often
upgraded before osd's are, until we reach verison ceph version 16, an
rgw cannot assume that the osd/cls did the filtering. This is
addressed in the following ways....

First rgw_cls_list_ret now indicates whether filtering was done on the
osd/cls side.

And second, the old filtering code in the rgw is maintained in
RGWRados::Bucket::List::list_objects_ordered, so it can still be
triggered when all osd's are not doing the filtering.

Once we reach ceph version 16, and there is no chance that the rgw is
working with a osd running "young" version 14 code, we can remove the
backward compatibility code in
RGWRados::Bucket::List::list_objects_ordered.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
@ivancich ivancich force-pushed the wip-move-some-bucket-listing-logic-to-cls branch from 1340a92 to 5a67af5 Compare January 24, 2020 20:01
@ivancich ivancich added the DNM label Jan 24, 2020
@ivancich
Copy link
Member Author

ivancich commented Jan 24, 2020

@cbodley I've rebased, but it was non-trivial. I'd like to do some testing before merging. A lot of PRs are hitting the same segments of code. The 0-length listing one will need to be folded in as well, so I've temporarily added the DNM label.

@ivancich
Copy link
Member Author

After the rebase I re-did my triplet of s3-tests. OLD OSD + OLD RGW, OLD OSD + NEW RGW, NEW OSD + NEW RGW, and all got identical results with the s3-tests. So I'll now merge. Thanks, @cbodley!

@ivancich ivancich merged commit 55f131e into ceph:master Jan 25, 2020
@ivancich ivancich deleted the wip-move-some-bucket-listing-logic-to-cls branch January 25, 2020 19:12
@ivancich ivancich removed the DNM label Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants