Skip to content

Buffer OCI registry uploads to S3#752

Merged
epompeii merged 10 commits intodevelfrom
u/ep/registry-s3
Mar 30, 2026
Merged

Buffer OCI registry uploads to S3#752
epompeii merged 10 commits intodevelfrom
u/ep/registry-s3

Conversation

@epompeii
Copy link
Copy Markdown
Member

Buffer OCI upload network frames into configurable S3 chunks (registry.data_store.chunk_size, default 5 MB) to reduce S3 operations during Docker push.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 29, 2026

🤖 Claude Code Review

PR: #752
Base: devel
Head: u/ep/registry-s3
Commit: 87caf9f67664a8f42cd3ddf07215959ba47e6c25


PR Review

Summary

This PR adds chunked buffering for S3 OCI uploads, fixes #[cfg] feature gating for the otel feature, fixes a Dockerfile bench target filter, introduces a storage_error helper to reduce boilerplate, and adds i18n docs for the new chunk_size config option.


Issues

1. Potential memory issue in stream_to_s3 — buffer not drained when it exceeds chunk_size (plus/api_oci/src/uploads.rs:136-138)

When the buffer exceeds chunk_size, the entire buffer is flushed as one chunk via std::mem::replace. If chunk_size is 5 MB but data accumulates to, say, 5.5 MB before the check, a 5.5 MB chunk is sent. This is fine for the default 5 MB chunk size (well under S3's 5 GB part limit), but if a user configures chunk_size close to MAX_CHUNK_SIZE (5 GB), individual chunks could exceed 5 GB. In practice this is unlikely since network frames are small, but it's worth noting the invariant isn't strictly enforced.

2. chunk_size cast from u64 to usize without overflow check (plus/bencher_oci_storage/src/storage.rs:654)

let chunk_size = chunk_size
    .unwrap_or(DEFAULT_CHUNK_SIZE)
    .clamp(DEFAULT_CHUNK_SIZE, MAX_CHUNK_SIZE) as usize;

MAX_CHUNK_SIZE is 5 GB. On a 32-bit platform, usize is 4 bytes, so as usize would silently truncate. This is likely fine since the server targets 64-bit, but a try_into().expect(...) or a comment noting the 64-bit assumption would be safer.

3. Vec::with_capacity(chunk_size) pre-allocates up to 5 GB (plus/api_oci/src/uploads.rs:129, 137)

If a user configures chunk_size to 5 GB, Vec::with_capacity(chunk_size) will attempt to allocate 5 GB upfront — and does so again on each flush via std::mem::replace. This could OOM the server. Consider documenting a practical upper bound or adding a tighter maximum.

4. stream_to_s3 returns stale new_size when buffer is empty and no flush occurs (plus/api_oci/src/uploads.rs:148-156)

If the stream is completely empty (no data frames at all), new_size remains current_size, which is correct. But if all data fit in a single buffer and were flushed in the fold, the final new_size from the fold is returned correctly. This path looks fine on closer inspection.


Positives

  • storage_error helper — Good DRY cleanup, eliminates ~20 instances of |e| crate::error::into_http_error(OciError::from(e)).
  • Feature gating fix — The #[cfg(all(feature = "plus", feature = "otel"))] fix for priority is correct. Without it, plus without otel would fail to compile since priority was computed but Priority type might not be available.
  • Dockerfile fix — Filtering by target.kind == ["bench"] prevents picking up non-bench executables. Good catch.
  • Chunked buffering design — Splitting stream_to_storage into stream_to_s3 / stream_direct is clean and well-documented.
  • Constants are well-documented with rationale linking to S3 multipart constraints.

Minor Suggestions

  • The OciStorage enum variants (S3, Local) are matched in stream_to_storage in api_oci, which means api_oci now depends on the internal structure of OciStorage. Consider adding stream_upload as a method on OciStorage itself to preserve encapsulation — though this would require OciStorage to depend on StreamingBody/HttpError, so the current approach may be a pragmatic tradeoff.

  • The #[expect(clippy::too_many_arguments)] on OciS3Storage::new (storage.rs:607) — consider a builder or config struct if more parameters are added in the future.

  • The OpenAPI spec shows "minimum": 0 for chunk_size, but the actual minimum enforced is 5 MB (via clamp). The schema could use "minimum": 5242880 to document this at the API level.


Model: claude-opus-4-6

@epompeii epompeii merged commit 43a2050 into devel Mar 30, 2026
64 checks passed
@epompeii epompeii deleted the u/ep/registry-s3 branch March 30, 2026 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant