Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: Head/GetObject support partNumber #50148

Merged
merged 8 commits into from Nov 24, 2023
Merged

rgw: Head/GetObject support partNumber #50148

merged 8 commits into from Nov 24, 2023

Conversation

cbodley
Copy link
Contributor

@cbodley cbodley commented Feb 17, 2023

aws s3 supports HEAD/GET requests for individual parts of a completed multipart upload with the partNumber query parameter

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cbodley
Copy link
Contributor Author

cbodley commented Feb 20, 2023

i rebased this on top of #50169 and #50172 so it can call get_obj_state() on the part's head object to get its attrs like etag etc

unfortunately, it turns out that multipart part uploads don't store their RGW_ATTR_MANIFEST in this head object. instead, they encode it in the RGWUploadPartInfo that we write to omap on the multipart meta object: https://github.com/ceph/ceph/blob/2a7301d/src/rgw/driver/rados/rgw_putobj_processor.cc#L457-L491

storing them in the multipart meta object makes it easier for RadosMultipartUpload::complete() to stitch the part manifests together for the final object without having to read the head object of each part

but without a manifest in part's head object, RGWRados::iterate_obj() would only read the first 4MB from the part's head object instead of the entire part. so get_part_obj_state() actually initializes a new RGWObjStateManifest::manifest, and uses RGWObjManifest::generator to copy over all of the requested part's extents from the original multipart manifest

i had to add a new get_obj_state() overload to expose RGWObjStateManifest directly. the original get_obj_state() function now wraps this new overload, returning RGWObjState and RGWObjManifest separately for the existing callers

@cbodley
Copy link
Contributor Author

cbodley commented Feb 20, 2023

in the sal layer, this feature is only implemented by sal::Object::ReadOp and not sal::Object as a whole. sal shouldn't be able to accidentally overwrite or delete individual parts

the only part information that 'leaks' from ReadOp into sal::Object is the part object's size and its attributes; both of which are necessary for GetObj to return the correct Content-Length and Etag headers

@github-actions
Copy link

github-actions bot commented Mar 2, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley cbodley changed the title rgw: start on Head/GetObject support for partNumber rgw: Head/GetObject support partNumber Aug 3, 2023
@cbodley cbodley marked this pull request as ready for review August 3, 2023 20:49
@cbodley cbodley requested a review from a team as a code owner August 3, 2023 20:49
@github-actions
Copy link

github-actions bot commented Aug 4, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor Author

cbodley commented Aug 4, 2023

thanks @dang. this is passing the test cases in ceph/s3-tests#491. i'll tag for qa after rebasing

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor Author

cbodley commented Oct 11, 2023

last qa run had one consistent java test failure:

suite > Object tests > ObjectTest.testDownloadHLAPI FAILED
    com.amazonaws.services.s3.model.AmazonS3Exception at ObjectTest.java:1979

there isn't any additional context for debugging, and i'm having trouble running the tests locally

Comment on lines +6526 to +6544
// navigate to the requested part in the manifest
RGWObjManifest::obj_iterator end = manifest->obj_end(dpp);
if (end.get_cur_part_id() == 0) { // not multipart
ldpp_dout(dpp, 20) << "object does not have a multipart manifest" << dendl;
return -ERR_INVALID_PART;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, a java test case ObjectTest.testDownloadHLAPI is failing on this because it's requesting partNumber=1 for a non-multipart upload. i guess it expects to read the entire object in that case?

i suppose that could be used as an optimization to read partNumber=1 which, if multipart, will return the x-amz-mp-parts-count that you can then read as parts in parallel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the test_non_multipart_get_part test case for this and see it passing

@cbodley cbodley removed the needs-qa label Nov 16, 2023
Signed-off-by: Casey Bodley <cbodley@redhat.com>
and just add the follow_olh=true argument to callers

Signed-off-by: Casey Bodley <cbodley@redhat.com>
add an overload to expose the manifest storage to callers of
get_obj_state(). the existing RGWObjState+RGWObjManifest overload
just calls the RGWObjStateManifest one

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
when called on a versioned object, prepare() may follow olh and look up
a different object instance

but when called on a multipart part, we should not overwrite the
original object name with the part's object name (of the form
mymultipart.2~_XLFNqOW0NuiALg7q4-Hi_7hdtAkZUH.1)

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
and omit the x-amz-mp-parts-count response header

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Nov 24, 2023

included test cases from ceph/s3-tests#491:

s3tests_boto3/functional/test_s3.py::test_multipart_get_part PASSED [ 67%]
s3tests_boto3/functional/test_s3.py::test_non_multipart_get_part PASSED [ 67%]

@cbodley cbodley merged commit e342927 into ceph:main Nov 24, 2023
10 of 11 checks passed
@cbodley cbodley deleted the wip-58750 branch November 24, 2023 00:13
@cbodley cbodley mentioned this pull request Dec 13, 2023
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants