Skip to content

feat(server): add S3Storage + BufferedMultipartWriter#3227

Closed
jayakasadev wants to merge 2 commits into
apache:masterfrom
jayakasadev:pr3227
Closed

feat(server): add S3Storage + BufferedMultipartWriter#3227
jayakasadev wants to merge 2 commits into
apache:masterfrom
jayakasadev:pr3227

Conversation

@jayakasadev
Copy link
Copy Markdown

@jayakasadev jayakasadev commented May 8, 2026

Which issue does this PR close?

Part of #3228 — Phase 1b; stacks on #3226.

Rationale

#3226 added the `ObjectStorage` seam; this PR lands the actual S3 backend behind a default-off `object-storage` cargo feature. Built on `rusty-s3` (sans-IO SigV4) + `cyper` (compio HTTP, rustls TLS) — does not reintroduce tokio into the data path that #2020 removed.

What changed?

After #3226, `kind = "object"` fell back to `CompioFsStorage` with a warning.

With `--features object-storage`, `kind = "object"` resolves to a real `S3Storage` (PUT, conditional PUT, ranged GET, multipart upload, list, delete). A `BufferedMultipartWriter` coalesces sub-MiB iggy flushes into S3-legal parts (≥ 5 MiB except final); a small-segment path aborts the multipart and does a single PUT when total < 5 MiB to avoid `400 EntityTooSmall`. `main.rs` installs `rustls::crypto::ring::default_provider()` at boot when the feature is on.

Credentials follow a subset of the standard AWS provider chain (priority order):

  1. Inline `access_key_id` + `secret_access_key` in `[system.storage.object]`.
  2. Standard env vars (`AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`/`AWS_SESSION_TOKEN`).
  3. Container credentials — `AWS_CONTAINER_CREDENTIALS_FULL_URI` or `RELATIVE_URI`. Covers ECS task roles and EKS Pod Identity.
  4. IRSA — `AWS_ROLE_ARN` + `AWS_WEB_IDENTITY_TOKEN_FILE`, exchanged via STS `AssumeRoleWithWebIdentity`.

Credential refresh is not implemented in Phase 1; long-running pods using short-lived creds need to be restarted before expiry. Phase 9 hardening will add refresh. EC2 IMDSv2 is also out of scope for Phase 1 — bare-EC2 deploys can wrap iggy in a credentials helper that exports env vars.

Two correctness fixes baked in from a pre-merge AWS S3 spike:

  • AWS ETags arrive quoted; `rusty-s3`'s `complete_multipart_upload` re-wraps them, so callers must quote-strip before passing back. Otherwise: `400 InvalidPart`.
  • Multipart minimum part size is 5 MiB except final — `BufferedMultipartWriter` enforces this; the `multipart_part_size` config knob is bounded ≥ 5 MiB at load.

GitHub doesn't allow cross-repo PR bases, so the diff here visually includes both commits; the commit list has just the Phase 1b commit on top.

Local Execution

  • Passed (both feature variants): `cargo fmt --all -- --check`, `cargo clippy --workspace --features 'server/object-storage' --no-deps --tests -- -D warnings`, `cargo test --workspace --lib` (20 with feature on, 15 with off — diff is 4 credential-parsing unit tests + the `IGGY_TEST_MINIO`-gated wire test), `cargo build --workspace --features server/object-storage`, `cargo machete`, `cargo sort --workspace --check`.
  • Wire validation: spike against real AWS S3 (us-east-1, ephemeral bucket + 1-day lifecycle backstop). 5/5 scenarios passed: PUT 1 KiB (108 ms), Range-GET (33 ms), multipart 12 MiB (1555 ms, 3 parts), full GET + byte-compare (919 ms), conditional PUT race (62 ms — loser fenced with HTTP 412). Bucket torn down on exit.
  • Pre-commit hooks: `prek` not installed locally.

AI Usage

  1. Claude (Anthropic, Opus 4.7).
  2. Generated function — `S3Storage`, `S3Multipart`, `BufferedMultipartWriter`, the credential chain (env / container / IRSA), the `main.rs` provider-install hook, and 9 new unit/integration tests drafted by the assistant. Library choice (rusty-s3 + cyper over opendal) was decided in the design plan and validated in the spike before code.
  3. Verified via `cargo test --workspace --lib` and the spike's real-S3 round-trip; both correctness fixes (ETag stripping, 5 MiB minimum) are from spike failures.
  4. Yes.

…rage

Phase 1a of the S3-as-primary-storage milestone. Adds the abstraction seam
without bringing in rusty-s3/cyper deps. fs mode is unchanged; setting
[system.storage] kind = "object" warns and falls back to CompioFsStorage
until Phase 1b lands the real S3 backend.

What lands:
- core/server/src/streaming/storage/object_store.rs: ObjectStorage trait,
  MultipartHandle trait, CompioFsStorage, and a cfg(test) InMemoryStorage
  with 11 unit tests covering put/get/head/list/delete/multipart/CAS.
- [system.storage] config block (kind, object.{service,bucket,region,
  endpoint,prefix,multipart_part_size,ack_after_upload,credentials}).
- bootstrap::resolve_object_storage seam returning Rc<dyn ObjectStorage>.

The trait is `?Send` because compio's per-thread io_uring driver yields
non-Send futures; ObjectStorage instances are per-shard, not shared
across threads.
Phase 1b of the S3-as-primary-storage milestone. Adds the compio-native S3
backend behind cargo features = ["object-storage"] (default off). When the
feature is on and [system.storage] kind = "object", bootstrap returns an
S3Storage built on rusty-s3 (SigV4 + request shaping) and cyper (compio
HTTP client, rustls TLS) — no tokio, no opendal.

What lands:
- rusty-s3 + url added as optional deps under the `object-storage` feature.
  cyper and rustls were already in the server crate.
- S3Storage with all ObjectStorage methods (put/put_if_absent/put_multipart/
  get_range/head/list_prefix/delete) plus an S3Multipart handle.
- BufferedMultipartWriter that coalesces sub-5-MiB iggy flushes into S3-legal
  parts; the small-segment path aborts the multipart and falls back to a
  single PUT under threshold (a feasibility spike against real AWS S3
  confirmed S3 rejects parts < 5 MiB with EntityTooSmall otherwise).
- main.rs installs rustls::crypto::ring::default_provider() at boot when the
  feature is on (idempotent — `install_default` returns Err if already set).
- bootstrap::resolve_object_storage now returns S3Storage for kind = "object"
  + feature on; without the feature, logs a warning and returns fs.
- 5 new unit tests (4 on BufferedMultipartWriter, 1 IGGY_TEST_MINIO-gated
  S3 wire round-trip).
- core/server-ng/config.toml gets the matching [system.storage] block so its
  embedded-toml validation test still passes after the SystemConfig field add.

Two correctness findings from a pre-merge AWS S3 spike that this commit
bakes in:
- AWS ETags arrive wrapped in quotes (`"abc"`); rusty-s3's
  CompleteMultipartUpload re-wraps them, so callers must `.trim_matches('"')`
  before passing the etag back. S3Multipart::upload_part does this.
- Multipart minimum part size is 5 MiB except the final part. The
  BufferedMultipartWriter `part_size` argument is debug-asserted >= that.

Stacks on Phase 1a.
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented May 9, 2026

See the reason for closure in #3228.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants