Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: add s3 checksum crc32 and sha1 #49986

Closed
wants to merge 1 commit into from
Closed

Conversation

imtzw
Copy link
Contributor

@imtzw imtzw commented Feb 3, 2023

packaged in RGWChecksum class

Signed-off-by: tongzhiwei <tongzhiwei_yewu.cmss.chinamobile.com>

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@mattbenjamin
Copy link
Contributor

I would prefer that we merge my existing work #30606

@cbodley
Copy link
Contributor

cbodley commented Feb 3, 2023

I would prefer that we merge my existing work #30606

@mattbenjamin your PR proposes a new x-rgw-cksum header, but i think it's worth considering support for s3's x-amz-checksum- and x-amz-sdk-checksum-algorithm headers. i wasn't familiar with these, but found some documentation at https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

@mattbenjamin
Copy link
Contributor

I would prefer that we merge my existing work #30606

@mattbenjamin your PR proposes a new x-rgw-cksum header, but i think it's worth considering support for s3's x-amz-checksum- and x-amz-sdk-checksum-algorithm headers. i wasn't familiar with these, but found some documentation at https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html

That's fine, though the crc and sha-1 mechanisms maybe have limited utility.

@cbodley
Copy link
Contributor

cbodley commented Feb 3, 2023

@mattbenjamin it looks like we could extend s3's API with additional algorithms like blake2b etc

@imtzw is there a reason that the crc32c and sha-256 algorithms were omitted here?

@imtzw
Copy link
Contributor Author

imtzw commented Feb 9, 2023

crc32c and sha-256 algorithms are not intended to be omitted . just not implemented for now.

@imtzw imtzw changed the title add s3 checksum crc32 and sha1 rgw: add s3 checksum crc32 and sha1 Feb 13, 2023
@imtzw imtzw requested a review from a team as a code owner February 15, 2023 05:58
packaged in RGWChecksum class

Signed-off-by: tongzhiwei <tongzhiwei_yewu.cmss.chinamobile.com>
@@ -4003,6 +4020,13 @@ void RGWPutObj::execute(optional_yield y)
std::unique_ptr<rgw::sal::MultipartUpload> upload;
upload = s->bucket->get_multipart_upload(s->object->get_name(),
multipart_upload_id);
rgw::sal::Attrs mpattrs;
op_ret = upload->get_info(this, s->yield, nullptr, &mpattrs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm concerned about this additional rados read for the multipart upload info; this is a cost we'll have to pay for all part uploads, whether they're checksummed or not

assuming that each part upload is providing its own x-amz-checksum- header, we should checksum based on that instead of the attr from CreateMultipartUpload. then on CompleteMultipartUpload when calculating the checksum-of-checksums (not implemented here?), we can verify that all of its parts use the same algorithm as CreateMultipartUpload

we use a similar strategy for compression types in https://github.com/ceph/ceph/blob/main/src/rgw/driver/rados/rgw_sal_rados.cc#L2612-L2617

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is implemented here , is , (as AWS does) each part is to be checked whether x-amz-checksum- header is carried if an algorithm is set when creating MPupload (otherwise for this part response 400 EINVAL).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @imtzw. i think it's probably worth breaking from AWS' behavior in this case. there's a clear performance cost, and i don't see much benefit

if the client sends a different algorithm for PutObject vs CreateMultipartUpload, isn't that just a bug in the client? does it really make a difference whether we reject the PutObject instead of the final CompleteMultipartUpload?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i was mistaken on this part. i now see that we're already calling this just below:

op_ret = upload->get_info(this, s->yield, &pdest_placement);

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@cbodley
Copy link
Contributor

cbodley commented Feb 16, 2023

after digging through the aws docs, i've identified a couple of missing features:

  1. from https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html#large-object-checksums

If you've enabled additional checksum values for your multipart object, Amazon S3 calculates the checksum for each individual part by using the specified checksum algorithm. The checksum for the completed object is calculated in the same way that Amazon S3 calculates the MD5 digest for the multipart upload. You can use this checksum to verify the integrity of the object.

the existing logic for ETag/MD5 is here: https://github.com/ceph/ceph/blob/d8dcffa/src/rgw/driver/rados/rgw_sal_rados.cc#L2700-L2707

  1. from https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#mpuchecksums

You can use the API or SDK to retrieve the checksum value for individual parts by using GetObject or HeadObject.

we don't currently support GET/HEAD requests for individual parts, so i opened https://tracker.ceph.com/issues/58750 to track that. i believe that requires the multipart upload to have completed, so we could implement that by reading the head object's manifest to find the part's head object that stores the part checksum

@@ -292,6 +292,11 @@ int RGWGetObj_ObjStore_S3::get_params(optional_yield y)
// all of the data from its parts. the parts will sync as separate objects
skip_manifest = s->info.args.exists(RGW_SYS_PARAM_PREFIX "sync-manifest");

string checksum_mode_arg = s->info.env->get("HTTP_CHECKSUM_MODE","");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is from the x-amz-checksum-mode header, so "HTTP_X_AMZ_CHECKSUM_MODE"

Comment on lines +19 to +33
class RGWChecksum {
bool need_calc_crc32;
char supplied_crc32_bin[RGW_CHECKSUM_CRC32_DIGESTSIZE + 1];
char final_crc32_bin[RGW_CHECKSUM_CRC32_DIGESTSIZE + 1];
char final_crc32_str[RGW_CHECKSUM_CRC32_DIGESTSIZE * 2 + 1];
char resp_crc32_bin[RGW_CHECKSUM_CRC32_DIGESTSIZE * 2 + 16];
char resp_crc32_b64[RGW_CHECKSUM_CRC32_DIGESTSIZE * 2 + 16];
crc32_type hash_crc32;
bool need_calc_sha1;
char supplied_sha1_bin[RGW_CHECKSUM_SHA1_DIGESTSIZE + 1];
char final_sha1_bin[RGW_CHECKSUM_SHA1_DIGESTSIZE + 1];
char final_sha1_str[RGW_CHECKSUM_SHA1_DIGESTSIZE * 2 + 1];
char resp_sha1_bin[RGW_CHECKSUM_SHA1_DIGESTSIZE * 2 + 16];
char resp_sha1_b64[RGW_CHECKSUM_SHA1_DIGESTSIZE * 2 + 16];
ceph::crypto::SHA1 hash_sha1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a point of design feedback, we should consider some kind of variant-like abstraction here. we'll only ever have one algorithm enabled, so we shouldn't need to have all of their buffers in memory. this will become more of an issue as we add more algorithms

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think c56fc3b did a good job of packaging things up compactly and minimizing overhead, using only modern apis

@blackliner
Copy link

blackliner commented Jun 11, 2023

relevant resources

I think this will greatly improve s3 cp upload operations, since the current MD5 algorithm has a maximum throughput of ~500-1000MBps, depending on your CPU of course. So a single upload will always be much slower than the theoretical network and disk bandwith would allow. Lets assume a MD5 throughput of 500MB/s, a 100GB file with 20x5GB multiparts, a capable ceph setup (lets assume 100Gbit/s is the limit for both network and disk), then:

  1. MD5 hash is calculated on the client side, takes 5GB / 500MB/s = 10s
  2. file is uploaded, takes 5GB / 10GB/s = 0.5s
  3. MD5 hash is calculated on the server side, another 10s
  4. repeat 20 times / in parallel, depending on your max_concurrent_requests settings

This will make the effective upload speed (5GB / 20.5s) ~ 250MB/s 😢 So working on more efficient hashing algorithms would greatly benefit single stream performance.

@cbodley
Copy link
Contributor

cbodley commented Jun 12, 2023

Blog about performance bottlenecks with MD5 https://dzone.com/articles/parallelizing-md5-checksum-computation-to-speed-up

@blackliner thanks for sharing this! it looks like minio did something similar in https://github.com/minio/md5-simd

we're very interested in optimizations for these ETag calculations. there was an earlier contribution in #42435 that tried to offload these md5 calculations to separate threads, but we decided not to pursue that approach

i opened https://tracker.ceph.com/issues/61646 to keep track of this vectorization feature. i've also added it as a discussion topic for this week's rgw meeting in https://pad.ceph.com/p/rgw-weekly. would you care to join us there? it's on wednesday at 11:30am EDT in https://meet.google.com/oky-gdaz-ror

So working on more efficient hashing algorithms would greatly benefit single stream performance.

these new checksum algorithms are in addition to the use of MD5 for ETags, correct? so i don't think this PR alone will help with that

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Aug 20, 2023
@frittentheke
Copy link
Contributor

@cbodley @blackliner could you maybe share an update? I suppose it's not stale at least?
Is this change here depending on #52385?

@github-actions github-actions bot removed the stale label Aug 30, 2023
@cbodley
Copy link
Contributor

cbodley commented Sep 13, 2023

Is this change here depending on #52385?

@frittentheke that PR is specific to the md5 calculations for ETag. we might want to pursue similar optimizations for these checksum algorithms, but that wouldn't block progress on this x-amz-checksum feature

@frittentheke
Copy link
Contributor

When trying out the AWSlabs mountpoint-s3 (Fuse mounting S3) tool https://github.com/awslabs/mountpoint-s3 against Ceph RGW I ran into the issue of XAmzContentSHA256Mismatch when uploading any files.

Supporting these kind of content check-summing for Ceph RGW seems to not be optional or nice to have anymore when it comes to modern S3 libraries / clients. Is there any tracking ticket or roadmap feature anywhere?

@cbodley
Copy link
Contributor

cbodley commented Oct 9, 2023

@frittentheke i don't think that XAmzContentSHA256Mismatch error is related to this checksumming feature. the x-amz-content-sha256 header (see https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) is a required part of sigv4 for s3

When you send a request, you must tell Amazon S3 which of the preceding options you have chosen in your signature calculation, by adding the x-amz-content-sha256 header with one of the following values:

from this table, the values rgw supports are:

  1. Actual payload checksum value
  2. UNSIGNED-PAYLOAD
  3. STREAMING-AWS4-HMAC-SHA256-PAYLOAD

rgw will only return that XAmzContentSHA256Mismatch error in cases 1 (ERROR: x-amz-content-sha256 does not match) and 3 (ERROR: signature of last chunk does not match). those error messages will show up at debug_rgw 10. at level 20, the rgw log will include the request header values like HTTP_X_AMZ_CONTENT_SHA256=...

it sounds like there's a real interop issue here, but checksum mismatches can be tricky to debug

@frittentheke
Copy link
Contributor

frittentheke commented Oct 9, 2023

@frittentheke i don't think that XAmzContentSHA256Mismatch error is related to this checksumming feature. the x-amz-content-sha256 header (see https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-auth-using-authorization-header.html) is a required part of sigv4 for s3

When you send a request, you must tell Amazon S3 which of the preceding options you have chosen in your signature calculation, by adding the x-amz-content-sha256 header with one of the following values:

from this table, the values rgw supports are:

1. Actual payload checksum value

2. UNSIGNED-PAYLOAD

3. STREAMING-AWS4-HMAC-SHA256-PAYLOAD

rgw will only return that XAmzContentSHA256Mismatch error in cases 1 (ERROR: x-amz-content-sha256 does not match) and 3 (ERROR: signature of last chunk does not match). those error messages will show up at debug_rgw 10. at level 20, the rgw log will include the request header values like HTTP_X_AMZ_CONTENT_SHA256=...

Thanks @cbodley for your quick response. I will enable this to see if there are any more clues. But just having the log show me some other hash that does not match the header does not really help, right?

it sounds like there's a real interop issue here, but checksum mismatches can be tricky to debug

In that case, you might want to take a peek yourself? This is the command I used to fuse mount a bucket:

mount-s3 --foreground --debug --endpoint-url https://s3.example.com --force-path-style somebuckettotest S3MOUNT

@cbodley
Copy link
Contributor

cbodley commented Oct 9, 2023

thanks @frittentheke, opened https://tracker.ceph.com/issues/63153 to track this

@cbodley
Copy link
Contributor

cbodley commented Oct 31, 2023

it turns out that https://tracker.ceph.com/issues/63153 does relate to this checksumming feature, though the mountpoint-s3 client (via the go sdk) depends on the trailing checksum variant that isn't covered here

Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 30, 2023
@frittentheke
Copy link
Contributor

unstale

@github-actions github-actions bot removed the stale label Jan 1, 2024
@mattbenjamin
Copy link
Contributor

@frittentheke fixes for signature mismatch are testable here: #54856

Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Apr 14, 2024
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants