Skip to content

tar: stream file reads instead of buffering into Vec<u8>#72

Merged
jlebon merged 3 commits intomainfrom
issue-52-high-memory-usage
Feb 27, 2026
Merged

tar: stream file reads instead of buffering into Vec<u8>#72
jlebon merged 3 commits intomainfrom
issue-52-high-memory-usage

Conversation

@jlebon
Copy link
Copy Markdown
Member

@jlebon jlebon commented Feb 27, 2026

So this is a pretty embarrassing bug. When adding files to the tar
builder, we were loading into memory the whole file. What's worse,
when writing the OCI archive itself, we were reading into memory the
whole tar layers, which is crazy.

Anyway, fix this by doing the obvious thing and just passing fds.

This reduces allocations related to the OCI building phase from ~6.5 GB
to ~29 MB for a 3.2 GiB FCOS image (99.6% reduction). Peak RSS drops to
~75 MB.

There's more memory savings in the RPM path handling area, but not sure
if the juice is worth the squeeze complexity-wise.

Fixes #52.

Assisted-by: Claude Opus 4.6

As part of debugging #52 I ended
up adding an `alloc_tracker` feature. I initially thought about making
it a permanent feature, but it seems unlikely that we'll need it often.

So for now at least, let's just capture that knowledge into a skill.

Assisted-by: Claude Opus 4.6
So this is a pretty embarrassing bug. When adding files to the tar
builder, we were loading into memory the _whole_ file. What's worse,
when writing the OCI archive itself, we were reading into memory the
_whole_ tar layers, which is crazy.

Anyway, fix this by doing the obvious thing and just passing fds.

This reduces allocations related to the OCI building phase from ~6.5 GB
to ~29 MB for a 3.2 GiB FCOS image (99.6% reduction). Peak RSS drops to
~75 MB.

There's more memory savings in the RPM path handling area, but not sure
if the juice is worth the squeeze complexity-wise.

Fixes #52.

Assisted-by: Claude Opus 4.6
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a significant memory consumption issue by switching from buffering entire files into memory to streaming them during the tar creation process. The changes in src/tar.rs are well-implemented and directly fix the problem described, leading to a massive reduction in memory allocations. The addition of memory profiling tools, including the alloc-tracker documentation and the --write-peak-mem-to flag, is a valuable enhancement for future performance analysis.

I've added a couple of suggestions to improve the robustness of the new memory usage parsing logic in src/utils.rs and the corresponding test script.

Comment thread src/utils.rs Outdated
Comment thread tests/e2e/test-fcos.sh
Add a hidden `--write-peak-mem-to` flag to the build command which reads
`VmHWM` from `/proc/self/status` and writes it out.

Then in the FCOS e2e test, use that knob and verify that it's under
200 MiB. This is just a soft guard against an egregious regression like
#52.

Assisted-by: Claude Opus 4.6
@jlebon jlebon force-pushed the issue-52-high-memory-usage branch from ea236d7 to f3e8177 Compare February 27, 2026 22:08
@jlebon jlebon enabled auto-merge (rebase) February 27, 2026 22:09
@jlebon jlebon merged commit 9890a78 into main Feb 27, 2026
6 checks passed
@jlebon jlebon deleted the issue-52-high-memory-usage branch February 27, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High memory usage when rechunking images

1 participant