New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: add s3 checksum crc32 and sha1 #49986
Conversation
I would prefer that we merge my existing work #30606 |
@mattbenjamin your PR proposes a new |
That's fine, though the crc and sha-1 mechanisms maybe have limited utility. |
@mattbenjamin it looks like we could extend s3's API with additional algorithms like blake2b etc @imtzw is there a reason that the crc32c and sha-256 algorithms were omitted here? |
crc32c and sha-256 algorithms are not intended to be omitted . just not implemented for now. |
544deb0
to
41d1b53
Compare
41d1b53
to
4949261
Compare
packaged in RGWChecksum class Signed-off-by: tongzhiwei <tongzhiwei_yewu.cmss.chinamobile.com>
4949261
to
e268f76
Compare
@@ -4003,6 +4020,13 @@ void RGWPutObj::execute(optional_yield y) | |||
std::unique_ptr<rgw::sal::MultipartUpload> upload; | |||
upload = s->bucket->get_multipart_upload(s->object->get_name(), | |||
multipart_upload_id); | |||
rgw::sal::Attrs mpattrs; | |||
op_ret = upload->get_info(this, s->yield, nullptr, &mpattrs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm concerned about this additional rados read for the multipart upload info; this is a cost we'll have to pay for all part uploads, whether they're checksummed or not
assuming that each part upload is providing its own x-amz-checksum-
header, we should checksum based on that instead of the attr from CreateMultipartUpload
. then on CompleteMultipartUpload
when calculating the checksum-of-checksums (not implemented here?), we can verify that all of its parts use the same algorithm as CreateMultipartUpload
we use a similar strategy for compression types in https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_sal_rados.cc#L2612-L2617
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is implemented here , is , (as AWS does) each part is to be checked whether x-amz-checksum- header is carried if an algorithm is set when creating MPupload (otherwise for this part response 400 EINVAL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @imtzw. i think it's probably worth breaking from AWS' behavior in this case. there's a clear performance cost, and i don't see much benefit
if the client sends a different algorithm for PutObject vs CreateMultipartUpload, isn't that just a bug in the client? does it really make a difference whether we reject the PutObject instead of the final CompleteMultipartUpload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think i was mistaken on this part. i now see that we're already calling this just below:
op_ret = upload->get_info(this, s->yield, &pdest_placement);
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
after digging through the aws docs, i've identified a couple of missing features:
the existing logic for ETag/MD5 is here: https://github.com/ceph/ceph/blob/d8dcffa/src/rgw/driver/rados/rgw_sal_rados.cc#L2700-L2707
we don't currently support GET/HEAD requests for individual parts, so i opened https://tracker.ceph.com/issues/58750 to track that. i believe that requires the multipart upload to have completed, so we could implement that by reading the head object's manifest to find the part's head object that stores the part checksum |
@@ -292,6 +292,11 @@ int RGWGetObj_ObjStore_S3::get_params(optional_yield y) | |||
// all of the data from its parts. the parts will sync as separate objects | |||
skip_manifest = s->info.args.exists(RGW_SYS_PARAM_PREFIX "sync-manifest"); | |||
|
|||
string checksum_mode_arg = s->info.env->get("HTTP_CHECKSUM_MODE",""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is from the x-amz-checksum-mode header, so "HTTP_X_AMZ_CHECKSUM_MODE"
class RGWChecksum { | ||
bool need_calc_crc32; | ||
char supplied_crc32_bin[RGW_CHECKSUM_CRC32_DIGESTSIZE + 1]; | ||
char final_crc32_bin[RGW_CHECKSUM_CRC32_DIGESTSIZE + 1]; | ||
char final_crc32_str[RGW_CHECKSUM_CRC32_DIGESTSIZE * 2 + 1]; | ||
char resp_crc32_bin[RGW_CHECKSUM_CRC32_DIGESTSIZE * 2 + 16]; | ||
char resp_crc32_b64[RGW_CHECKSUM_CRC32_DIGESTSIZE * 2 + 16]; | ||
crc32_type hash_crc32; | ||
bool need_calc_sha1; | ||
char supplied_sha1_bin[RGW_CHECKSUM_SHA1_DIGESTSIZE + 1]; | ||
char final_sha1_bin[RGW_CHECKSUM_SHA1_DIGESTSIZE + 1]; | ||
char final_sha1_str[RGW_CHECKSUM_SHA1_DIGESTSIZE * 2 + 1]; | ||
char resp_sha1_bin[RGW_CHECKSUM_SHA1_DIGESTSIZE * 2 + 16]; | ||
char resp_sha1_b64[RGW_CHECKSUM_SHA1_DIGESTSIZE * 2 + 16]; | ||
ceph::crypto::SHA1 hash_sha1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a point of design feedback, we should consider some kind of variant-like abstraction here. we'll only ever have one algorithm enabled, so we shouldn't need to have all of their buffers in memory. this will become more of an issue as we add more algorithms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think c56fc3b did a good job of packaging things up compactly and minimizing overhead, using only modern apis
relevant resources
I think this will greatly improve
This will make the effective upload speed (5GB / 20.5s) ~ 250MB/s 😢 So working on more efficient hashing algorithms would greatly benefit single stream performance. |
@blackliner thanks for sharing this! it looks like minio did something similar in https://github.com/minio/md5-simd we're very interested in optimizations for these ETag calculations. there was an earlier contribution in #42435 that tried to offload these md5 calculations to separate threads, but we decided not to pursue that approach i opened https://tracker.ceph.com/issues/61646 to keep track of this vectorization feature. i've also added it as a discussion topic for this week's rgw meeting in https://pad.ceph.com/p/rgw-weekly. would you care to join us there? it's on wednesday at 11:30am EDT in https://meet.google.com/oky-gdaz-ror
these new checksum algorithms are in addition to the use of MD5 for ETags, correct? so i don't think this PR alone will help with that |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
@cbodley @blackliner could you maybe share an update? I suppose it's not stale at least? |
@frittentheke that PR is specific to the md5 calculations for ETag. we might want to pursue similar optimizations for these checksum algorithms, but that wouldn't block progress on this x-amz-checksum feature |
When trying out the AWSlabs mountpoint-s3 (Fuse mounting S3) tool https://github.com/awslabs/mountpoint-s3 against Ceph RGW I ran into the issue of Supporting these kind of content check-summing for Ceph RGW seems to not be optional or nice to have anymore when it comes to modern S3 libraries / clients. Is there any tracking ticket or roadmap feature anywhere? |
@frittentheke i don't think that
from this table, the values rgw supports are:
rgw will only return that it sounds like there's a real interop issue here, but checksum mismatches can be tricky to debug |
Thanks @cbodley for your quick response. I will enable this to see if there are any more clues. But just having the log show me some other hash that does not match the header does not really help, right?
In that case, you might want to take a peek yourself? This is the command I used to fuse mount a bucket:
|
thanks @frittentheke, opened https://tracker.ceph.com/issues/63153 to track this |
it turns out that https://tracker.ceph.com/issues/63153 does relate to this checksumming feature, though the mountpoint-s3 client (via the go sdk) depends on the trailing checksum variant that isn't covered here |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
unstale |
@frittentheke fixes for signature mismatch are testable here: #54856 |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
packaged in RGWChecksum class
Signed-off-by: tongzhiwei <tongzhiwei_yewu.cmss.chinamobile.com>
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows