Skip to content

openshell sandbox create --from OOM-killed on large images due to in-memory image buffering #600

@minhdqdev

Description

@minhdqdev

Agent Diagnostic

Investigated using the create-spike skill backed by principal-engineer-reviewer. Skills loaded: debug-openshell-cluster, openshell-cli, create-spike.

Investigation summary:

  • Traced sandbox create --from through run.rs → build.rs → push.rs
  • Confirmed three discrete full-image heap allocations in push_local_images()
  • Verified bollard upload_to_container API supports streaming via body_try_stream — no API constraint blocks the fix
  • Confirmed tar crate size-in-header constraint requires a seekable (disk) intermediate
  • Identified tempfile is already a dev-dep; needs promotion to prod dep
  • No test coverage exists for push.rs

Description

Running openshell sandbox create --from <dockerfile-dir> on a large sandbox image (~3.7 GB) causes the openshell process to be killed by the Linux OOM killer. The image export pipeline in crates/openshell-bootstrap/src/push.rs buffers the entire Docker image tar three times in memory before importing it into the gateway, producing ~11 GB peak allocation.

Expected: The export → tar wrap → upload pipeline streams data in O(chunk) memory.
Actual: ~3× image size held in memory simultaneously → OOM kill on images ≥ ~4 GB on machines with ≤ 16 GB free RAM.


Root Cause

Data flow

sandbox create --from ./my-image/
  └─ run.rs:build_from_dockerfile()
       └─ build.rs:build_and_push_image()
            ├─ build_image()          ← streaming, no issue
            └─ push_local_images()
                 ├─ collect_export()  ← BUFFER 1: 3.7 GB Vec<u8>
                 ├─ wrap_in_tar()     ← BUFFER 2: 3.7 GB Vec<u8> (both live simultaneously)
                 └─ upload_archive()  ← BUFFER 3: 3.7 GB Bytes::copy_from_slice

Buffer 1 — collect_export (push.rs:97–107)

async fn collect_export(docker: &Docker, images: &[&str]) -> Result<Vec<u8>> {
    let mut stream = docker.export_images(images);  // already a Stream<Item=Result<Bytes>>
    let mut buf = Vec::new();
    while let Some(chunk) = stream.next().await {
        buf.extend_from_slice(&bytes);              // ← 3.7 GB accumulates here
    }
    Ok(buf)
}

Buffer 2 — wrap_in_tar (push.rs:114–131)

fn wrap_in_tar(file_path: &str, data: &[u8]) -> Result<Vec<u8>> {
    let mut builder = tar::Builder::new(Vec::new());
    header.set_size(data.len() as u64);
    builder.append(&header, data)?;  // ← copies full 3.7 GB into new Vec<u8>
    builder.into_inner()             // ← returns second 3.7 GB allocation
}

Both image_tar (Buffer 1) and outer_tar (Buffer 2) are live simultaneously at push.rs:54. Peak ~7.4 GB at this point.

Buffer 3 — upload_archive (push.rs:135–151)

docker.upload_to_container(
    container_name,
    Some(options),
    bollard::body_full(Bytes::copy_from_slice(archive)),  // ← third 3.7 GB copy
)

All three allocations overlap in scope. Peak ~11 GB.

Step Allocation Size
collect_export Vec<u8> (image tar) ~3.7 GB
wrap_in_tar Vec<u8> (outer tar) ~3.7 GB
upload_archive Bytes (HTTP body) ~3.7 GB
Total peak ~11 GB

Streaming Fix: Feasibility Confirmed

Bollard supports streaming

upload_to_container accepts a BodyType which has three constructors:

Function Input
bollard::body_full(Bytes) Materialized — current, causes Buffer 3
bollard::body_stream(impl Stream<Item=Bytes>) Infallible stream
bollard::body_try_stream(impl Stream<Item=Result<Bytes, io::Error>>) Best fit

There is no API constraint blocking streaming.

Tar size-in-header constraint requires temp file

The POSIX tar format encodes the entry byte size in the 512-byte header, written before the data. The tar crates append_writer` escape hatch requires a seekable destination. This means:

  • A fully in-memory zero-copy pipeline is not feasible without knowing the export size in advance
  • Writing the raw export to a temp file first, then constructing the outer tar header from the known file size, then streaming the header + file chunks is the correct approach

Proposed streaming pipeline

export_images() stream
      │ (chunked Bytes)
      ▼
write to NamedTempFile   ← sync via spawn_blocking or async write
      │ (flush + seek to start)
      ▼
stat file → build 512-byte tar header in memory (set_size = file.len())
      │
      ▼
Stream: [header_bytes] ++ [file content chunks] ++ [tar padding]
      │ (impl Stream<Item=Result<Bytes, io::Error>>)
      ▼
bollard::body_try_stream(...)
      ▼
upload_to_container()    ← single pass, O(chunk) RAM

Peak memory drops from ~11 GB to O(stream chunk size) + one temp file on disk (~3.7 GB disk, ~few MB RAM).

Dependencies

Primitive Status
tempfile::NamedTempFile Promote from dev-dep → prod dep in Cargo.toml
tokio::fs::File + AsyncReadExt Available (tokio already a dep)
futures::stream::try_unfold Available (futures already a dep)
tokio_util::io::ReaderStream Optional (already in lockfile as transitive dep; cleaner ergonomics)

Design Questions

  1. Single vs. two temp files: Writing raw export to file A, then outer tar to file B keeps logic simple. Writing export to file A, constructing the 512-byte header in memory, then streaming header_bytes ++ file_chunks ++ padding avoids file B entirely. The latter is preferable — no second disk write, avoids a second ~3.7 GB allocation.

  2. Should OPENSHELL_SCRATCH_DIR be supported? Some CI environments mount /tmp as tmpfs (in-RAM), defeating the purpose. An env-var override is a nice-to-have but not blocking for v1 — users can redirect $TMPDIR.

  3. Disk space pre-check: A proactive check (docker image inspectVirtualSize vs. available disk) before starting the export would give a clear error instead of a cryptic OS write failure. Nice-to-have.

  4. build.rs build context: create_build_context_tar materializes the Dockerfile build context into a Vec<u8>. Same pattern, much smaller scale (MBs not GBs). Out of scope for this fix but worth tracking as a follow-up.


Reproduction Steps

  1. Build a large sandbox image (≥ 3 GB uncompressed), e.g.:
    openshell sandbox create --from sandboxes/gemini
    
  2. Observe:
    [progress] Exported 3745 MiB
    Killed
    
  3. Exit code: 137 (SIGKILL from OOM killer)

Environment

  • Image size: ~3.7 GB (base sandbox + @google/gemini-cli@0.34.0)
  • Host RAM: 23 GB total, ~13 GB available at time of failure
  • Swap: 8 GB total, ~12 MB free at time of failure
  • OOM killer log: Out of memory: Killed process (openshell) total-vm:19913320kB, anon-rss:13626992kB

Scope

Dimension Assessment
Type perf / bug — correct behavior, resource-fatal on large inputs
Complexity Medium — clear fix, async streaming around sync tar with size-in-header constraint
Confidence High — no API blockers, temp-file path is proven
Primary files push.rs (logic), Cargo.toml (promote tempfile to prod dep)
Tests needed Unit tests for streaming helpers (tar header construction, stream reassembly); no Docker daemon required

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (debug-openshell-cluster, openshell-cli, create-spike)
  • My agent could not resolve this — the diagnostic above explains why (code change required)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions