fix(proxy): forward flexible-checksum headers on multipart ops#93
Merged
alukach merged 5 commits intoJun 26, 2026
Merged
Conversation
Modern AWS SDKs/CLI (botocore / CLI v2 >= ~2.23) enable CRC32 data-integrity
checksums by default. For a multipart upload that means:
- CreateMultipartUpload sends `x-amz-checksum-algorithm` (declares the MPU's
checksum algorithm), and
- CompleteMultipartUpload echoes the per-part / full-object checksums via
`x-amz-checksum-type` / `x-amz-checksum-crc32` plus the per-part values in
the XML body.
`execute_multipart` only forwarded `content-type`/`content-length`/`content-md5`
and dropped every `x-amz-checksum-*` header. So the MPU was initialized with no
checksum context while parts were stored *with* CRC32 checksums (the streaming
re-sign path forwards the `x-amz-trailer`), and S3 rejected
CompleteMultipartUpload with `InvalidPart`:
An error occurred (InvalidPart) when calling the CompleteMultipartUpload
operation: One or more of the specified parts could not be found...
Forward the client's `x-amz-checksum-*` and `x-amz-sdk-checksum-algorithm`
headers on this raw-signed path. It signs every header present
(`sign_s3_request`), so they are covered by the signature — unlike the
presigned PutObject path, where unsigned `x-amz-*` headers are rejected. This
also fixes a plain (non-chunked) UploadPart carrying a header checksum, which
routes through the same function.
Workaround for clients until deployed: set
`AWS_REQUEST_CHECKSUM_CALCULATION=when_required`.
Adds a functional regression test driving `handle_with_body` with a
CompleteMultipartUpload and a header-capturing backend, asserting the checksum
headers both reach the backend and appear in the re-signed SignedHeaders.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🚀 Latest commit deployed to https://multistore-proxy-pr-93.development-seed.workers.dev
|
The dropped-checksum bug (InvalidPart on CompleteMultipartUpload) only reproduces against real AWS S3. Verified empirically that MinIO RELEASE.2025-09 does NOT reproduce it: an MPU created without a checksum algorithm but with CRC32 parts + completion succeeds on MinIO and is rejected by S3. So the existing MinIO integration multipart tests can't guard this — a real-S3 smoke test is required. Add `TestMultipartUpload` to the smoke suite: a checksum-forced (`ChecksumAlgorithm=CRC32`) two-part upload via boto3's transfer manager (mirroring `aws s3 cp`), asserting the round-trip. It is gated on `SMOKE_WRITE_BUCKET`, so it self-skips until a writable bucket is configured. Wire a `smoke-writable` bucket + a github-actions write scope into wrangler.deploy.toml (OIDC-federated, no long-lived keys). Activation needs the backend bucket created, the OIDC role granted S3 write perms, and SMOKE_WRITE_BUCKET=smoke-writable set in the smoke job. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Point the smoke-writable bucket at the existing federated-test backend bucket (multistore-test-bucket) under a smoke-writable/ prefix instead of a separate bucket. The role already has PutObject/GetObject/DeleteObject/ListBucket (covering Create/Upload/Complete); only s3:AbortMultipartUpload need be added so failed uploads clean up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wires the multipart write smoke test (TestMultipartUpload) to run on every deploy that exercises the smoke suite — including per-PR preview deploys, not just main. Gated behind the SMOKE_WRITE_BUCKET repo/environment variable: unset (default) -> the test self-skips; set to smoke-writable -> it runs against the freshly-deployed worker and real S3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the opt-in repo-variable gate; instead pass smoke_write_bucket as a deploy.yml input and have preview.yml set it to smoke-writable unconditionally. Every PR preview now exercises the real-S3 multipart upload (no main deploy needed); staging/production leave the input empty, so the test self-skips there unless they opt in the same way. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1373ae3 to
354e7f8
Compare
alukach
added a commit
to source-cooperative/data.source.coop
that referenced
this pull request
Jun 26, 2026
…ad (#167) Refreshes the git-pinned `multistore*` crates from `71b1513` → `35f5aad` (the current HEAD of `fix/decode-aws-chunked-writes`), which now includes the merged [developmentseed/multistore#93](developmentseed/multistore#93) — *forward flexible-checksum headers on multipart ops*. ## Why Modern AWS CLI/SDK enable CRC32 integrity checksums by default. The proxy's raw-signed multipart path dropped the client's `x-amz-checksum-*` headers on Create/Complete, so large multipart uploads via `aws s3 cp` failed with: ``` InvalidPart ... CompleteMultipartUpload ... One or more of the specified parts could not be found. ``` #93 forwards (and signs) those headers, fixing it. This bump pulls that fix into the proxy. ## Scope Lockfile-only — `Cargo.toml` already tracks the `fix/decode-aws-chunked-writes` branch; only `Cargo.lock` moves to the new commit. ## Verification - `cargo check --target wasm32-unknown-unknown` ✅ - `cargo test` ✅ (routing / pagination / authz / backend_auth — all green) - pre-push hook (fmt + clippy-wasm + check-wasm + test) ✅ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alukach
added a commit
that referenced
this pull request
Jun 26, 2026
* fix(proxy): stream-resign aws-chunked uploads instead of decoding Modern AWS clients (aws-cli >= 2.23, recent SDKs) frame PutObject and UploadPart bodies as `Content-Encoding: aws-chunked` with a streaming sentinel (`x-amz-content-sha256: STREAMING-...`). The proxy forwarded them untouched — PutObject via a presigned URL (UNSIGNED-PAYLOAD), UploadPart via a re-signed raw request — and S3 only de-chunks a request signed with a matching streaming sentinel, so it stored the raw chunk envelope (`7\r\nhello!\n\r\n0\r\n<trailer>`) as the object. Every aws-chunked write was corrupted. For the common unsigned-payload variant (`STREAMING-UNSIGNED-PAYLOAD-TRAILER`, the client default), re-sign only the request seed with the backend credentials — reusing the client's streaming sentinel and the de-chunk headers (content-encoding, x-amz-decoded-content-length, x-amz-trailer) — and stream the chunk framing straight through for S3 to de-chunk. Zero-copy: the worker never buffers or hashes the payload, which matters on the Cloudflare Workers memory ceiling. PutObject and UploadPart share this path. Signed-chunk uploads (`STREAMING-AWS4-HMAC-SHA256-PAYLOAD`) carry per-chunk signatures bound to the client's key that can't be re-signed to the backend credentials, so they are rejected with NotImplemented (501) rather than silently corrupted. Content-Length is forwarded but left unsigned: the runtime owns the streaming transfer framing, and S3 sizes the payload from x-amz-decoded-content-length and the chunk framing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(proxy): reject aws-chunked streaming uploads on non-S3 backends Review follow-up to the stream-resign change. Three fixes from a 4-agent review: - Bug: the PutObject streaming arm reached `build_streaming_forward` (which hardcodes S3 seed signing) without a backend-type guard, so a default aws-chunked PUT to a non-S3 backend (azure) mis-routed into S3 signing instead of rejecting — unlike UploadPart, which guards earlier. Guard at the point the S3 assumption is made; reject with InvalidRequest. - Drop the unreachable `Internal` re-read of x-amz-content-sha256: the classifier already validated it, so it now returns the sentinel for the caller to thread through as the payload hash. - Strengthen the UploadPart test to assert partNumber/uploadId survive into the forwarded URL, and add a non-S3 streaming-rejection regression test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * refactor(proxy): trim aws-chunked over-engineering leftovers Ponytail over-engineering pass on the aws-chunked streaming work: - Delete the unused `STREAMING_PAYLOAD` const — zero references repo-wide; the new `aws_chunked` module classifies the sentinel by substring, so the literal is dead. - Consolidate the non-S3 backend gate into `try_streaming_forward` so `build_streaming_forward` becomes a pure constructor. Behavior unchanged: PutObject streaming to a non-S3 backend still rejects 400, covered by `streaming_put_on_non_s3_backend_is_rejected`. - Drop two redundant `aws_chunked` unit assertions — the signed `-TRAILER` variant and the hex-hash case each re-hit a branch already covered. No behavior change. cargo fmt / clippy -D warnings / test (111 passing) / wasm32 check all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(proxy): forward flexible-checksum headers on multipart ops (#93) * fix(proxy): forward flexible-checksum headers on multipart ops Modern AWS SDKs/CLI (botocore / CLI v2 >= ~2.23) enable CRC32 data-integrity checksums by default. For a multipart upload that means: - CreateMultipartUpload sends `x-amz-checksum-algorithm` (declares the MPU's checksum algorithm), and - CompleteMultipartUpload echoes the per-part / full-object checksums via `x-amz-checksum-type` / `x-amz-checksum-crc32` plus the per-part values in the XML body. `execute_multipart` only forwarded `content-type`/`content-length`/`content-md5` and dropped every `x-amz-checksum-*` header. So the MPU was initialized with no checksum context while parts were stored *with* CRC32 checksums (the streaming re-sign path forwards the `x-amz-trailer`), and S3 rejected CompleteMultipartUpload with `InvalidPart`: An error occurred (InvalidPart) when calling the CompleteMultipartUpload operation: One or more of the specified parts could not be found... Forward the client's `x-amz-checksum-*` and `x-amz-sdk-checksum-algorithm` headers on this raw-signed path. It signs every header present (`sign_s3_request`), so they are covered by the signature — unlike the presigned PutObject path, where unsigned `x-amz-*` headers are rejected. This also fixes a plain (non-chunked) UploadPart carrying a header checksum, which routes through the same function. Workaround for clients until deployed: set `AWS_REQUEST_CHECKSUM_CALCULATION=when_required`. Adds a functional regression test driving `handle_with_body` with a CompleteMultipartUpload and a header-capturing backend, asserting the checksum headers both reach the backend and appear in the re-signed SignedHeaders. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(smoke): multipart upload regression test against real S3 The dropped-checksum bug (InvalidPart on CompleteMultipartUpload) only reproduces against real AWS S3. Verified empirically that MinIO RELEASE.2025-09 does NOT reproduce it: an MPU created without a checksum algorithm but with CRC32 parts + completion succeeds on MinIO and is rejected by S3. So the existing MinIO integration multipart tests can't guard this — a real-S3 smoke test is required. Add `TestMultipartUpload` to the smoke suite: a checksum-forced (`ChecksumAlgorithm=CRC32`) two-part upload via boto3's transfer manager (mirroring `aws s3 cp`), asserting the round-trip. It is gated on `SMOKE_WRITE_BUCKET`, so it self-skips until a writable bucket is configured. Wire a `smoke-writable` bucket + a github-actions write scope into wrangler.deploy.toml (OIDC-federated, no long-lived keys). Activation needs the backend bucket created, the OIDC role granted S3 write perms, and SMOKE_WRITE_BUCKET=smoke-writable set in the smoke job. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(smoke): reuse multistore-test-bucket for the write smoke test Point the smoke-writable bucket at the existing federated-test backend bucket (multistore-test-bucket) under a smoke-writable/ prefix instead of a separate bucket. The role already has PutObject/GetObject/DeleteObject/ListBucket (covering Create/Upload/Complete); only s3:AbortMultipartUpload need be added so failed uploads clean up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: pass SMOKE_WRITE_BUCKET to the smoke-test job Wires the multipart write smoke test (TestMultipartUpload) to run on every deploy that exercises the smoke suite — including per-PR preview deploys, not just main. Gated behind the SMOKE_WRITE_BUCKET repo/environment variable: unset (default) -> the test self-skips; set to smoke-writable -> it runs against the freshly-deployed worker and real S3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: run the multipart write smoke test on every preview Drop the opt-in repo-variable gate; instead pass smoke_write_bucket as a deploy.yml input and have preview.yml set it to smoke-writable unconditionally. Every PR preview now exercises the real-S3 multipart upload (no main deploy needed); staging/production leave the input empty, so the test self-skips there unless they opt in the same way. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
alukach
added a commit
to source-cooperative/data.source.coop
that referenced
this pull request
Jun 26, 2026
multistore 0.6.1 is released — it includes the aws-chunked write-decode fix and the flexible-checksum multipart fix (developmentseed/multistore#93), so modern `aws s3 cp` multipart uploads complete instead of failing with `InvalidPart`. Bump the multistore* deps 0.6.0 -> 0.6.1 and remove the temporary [patch.crates-io] block that tracked the fix/decode-aws-chunked-writes branch. All five crates now resolve from crates.io. Verified green: `cargo check --target wasm32-unknown-unknown` and `cargo test`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Large-file uploads via
aws s3 cp(AWS CLI v2.27.34) through the proxy fail atCompleteMultipartUpload:Parts all upload with
200; only the completion fails. SettingAWS_REQUEST_CHECKSUM_CALCULATION=when_required(disables the CLI's default CRC32 checksums) makes it succeed — pinpointing the trigger as AWS's default data-integrity checksums (CRC32), on by default since AWS CLI v2 ~2.23 / recent botocore.Root cause
execute_multipart(the raw-signed path for Create/Complete/Abort) forwarded only["content-type", "content-length", "content-md5"]and dropped everyx-amz-checksum-*header:CreateMultipartUpload'sx-amz-checksum-algorithmwas dropped → S3 initialized the MPU with no checksum context.build_streaming_forwardforwardsx-amz-trailer).CompleteMultipartUploadsends per-part<ChecksumCRC32>in the body plusx-amz-checksum-type— the headers were dropped → S3 saw an inconsistent MPU →InvalidPart.Fix
Forward the client's
x-amz-checksum-*andx-amz-sdk-checksum-algorithmheaders inexecute_multipart. This path signs every header present (sign_s3_request), so the forwarded headers are covered by the signature — unlike the presignedPutObjectpath, where unsignedx-amz-*headers are rejected. Because Create, Complete, and plain (non-chunked)UploadPartall flow through this one function, the single change fixes the whole lifecycle.Test
A functional regression test (
complete_multipart_forwards_and_signs_checksum_headers) drives the publichandle_with_bodywith aCompleteMultipartUploadand a header-capturing backend, asserting the checksum headers both reach the backend and appear in the re-signedSignedHeaders. Verified it fails on the pre-fix code (thex-amz-checksum-crc32header is absent) and passes after. Full core suite: 112 passing.Notes
fix/decode-aws-chunked-writes) — that branch adds the streaming part path this fix depends on;mainhas no streaming path, so the bug isn't reachable in this form there.aws s3api head-objectreports the expected checksum). TheAWS_REQUEST_CHECKSUM_CALCULATIONevidence is suggestive but also reroutes the part path, so it isn't fully isolating on its own.AbortMultipartUploadreturns403(backend role / token scope likely missings3:AbortMultipartUpload/abort_multipart_upload) — a separate authorization gap, not caused by the header drop.🤖 Generated with Claude Code