Skip to content

runtime: normalize tar headers for reproducible build contexts#86

Merged
bilby91 merged 1 commit into
mainfrom
runtime/tar-deterministic-headers
Jun 2, 2026
Merged

runtime: normalize tar headers for reproducible build contexts#86
bilby91 merged 1 commit into
mainfrom
runtime/tar-deterministic-headers

Conversation

@bilby91
Copy link
Copy Markdown
Member

@bilby91 bilby91 commented Jun 2, 2026

Summary

  • tarDirectory now normalizes every tar header (ModTime → unix epoch, AccessTime/ChangeTime zero, uid/gid 0, uname/gname empty). Wall-clock mtimes from os.WriteFile in synthesized build contexts (e.g. useruid's uid-fix.sh + Dockerfile) were leaking into the tar stream and shifting BuildKit's COPY vertex digest across invocations of byte-identical content.
  • useruid.reconcileRemoteUserUID additionally os.Chtimes-pins the two files it writes to the epoch — defensive, so the synthesized context's reproducibility is a local property of the caller rather than only the tar layer.
  • Two unit tests in runtime/docker/build_test.go: TestTarDirectoryNormalizesMetadata asserts every header field is normalized; TestTarDirectoryDeterministic tars the same content twice with diverging on-disk mtimes and asserts byte-identical streams.

Why it matters

Downstream pipelines that run dc.Up against a blank PVC, snapshot the PVC, and restore the snapshot in session pods were getting full BuildKit cache misses on the second dc.Up. Observed impact on one workspace: COPY vertex digest changed across three runs over the same input (6270201163e7…, f5ed748968f2…, 8c82f115a53e…), containerd snapshotter dir grew 10.2G → 16.5G after one cache-missed Up, and first-session pod startup went from <1s to ~70s.

BuildKit hashes content for cache purposes, so erasing mtime/uid/gid from the tar header doesn't drop information it relies on — it just removes a host-specific perturbation from the digest. Behavior change worth noting: any external consumer that inspects raw layer metadata for original mtimes will no longer see them. None exists in this repo.

Test plan

  • go test ./... is green locally
  • TestTarDirectoryDeterministic fails without the tarDirectory change (verified during development)
  • Optional: integration test that runs BuildImage twice on the same context and asserts identical layer digests — skipped here since it needs a live daemon

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Normalized tar header metadata in Docker build streams to ensure consistent file ownership and timestamps
    • Pinned file modification times to a baseline value in build contexts
  • Tests

    • Added tests validating tar metadata normalization behavior
    • Added tests verifying tar output consistency across runs

tarDirectory stamps mtime/uid/gid into headers from live FileInfo, so
synthesized contexts (uid-reconcile, etc.) carry wall-clock mtimes that
shift BuildKit's COPY vertex digest across invocations of byte-identical
content. Downstream consumers that snapshot a workspace after one Up
and restore it for a later Up hit full cache misses and re-extract
GBs of image layers into new snapshotter dirs.

Normalize ModTime to epoch, zero AccessTime/ChangeTime, and clear
uid/gid/uname/gname in every tar header. Also pin useruid's temp
context files to the epoch via os.Chtimes so determinism is a local
property of the synthesizer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0fd7aeb4-b7f1-40fc-8fff-ff1a37e58dfe

📥 Commits

Reviewing files that changed from the base of the PR and between a9e2d70 and b62711f.

📒 Files selected for processing (3)
  • runtime/docker/build.go
  • runtime/docker/build_test.go
  • useruid.go

📝 Walkthrough

Walkthrough

This PR normalizes tar archive metadata to ensure deterministic, reproducible container builds. Tar header timestamps and ownership metadata are pinned to Unix epoch and zeroed, eliminating host-dependent variance in generated tar streams. The pattern is applied to both core tar generation and the UID reconciliation build context.

Changes

Deterministic tar archive generation

Layer / File(s) Summary
Tar header normalization implementation
runtime/docker/build.go
tarEpoch constant set to Unix epoch and tarDirectory function updated to override tar header ModTime, AccessTime, ChangeTime, Uid, Gid, Uname, and Gname with normalized/zeroed values.
Tar metadata and determinism tests
runtime/docker/build_test.go
TestTarDirectoryNormalizesMetadata verifies headers are normalized to epoch and zero values; TestTarDirectoryDeterministic confirms byte-identical tar output for identical content regardless of source file modification times.
UID reconciliation context determinism
useruid.go
Generated uid-fix.sh and Dockerfile file modification times are pinned to Unix epoch via os.Chtimes in reconcileRemoteUserUID to ensure reproducible build context tar streams.

Possibly related PRs

  • crunchloop/devcontainer#34: Modifies reconcileRemoteUserUID in useruid.go around UID reconciliation script generation and context setup.
  • crunchloop/devcontainer#47: Modifies runtime/docker/build.go tar handling and BuildKit integration for symlink and archive generation behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A rabbit hops through time so still,
Pinning epochs to Docker's will—
No host-bound clocks, no ownership's weight,
Just deterministic streams, bit-for-bit, great!
Reproducible builds, archive by archive,
Watch those tar bytes consistently thrive. 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'runtime: normalize tar headers for reproducible build contexts' directly and clearly summarizes the main change: normalizing tar headers to achieve reproducible build contexts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch runtime/tar-deterministic-headers

Comment @coderabbitai help to get the list of available commands and usage tips.

@bilby91 bilby91 merged commit 7ded96c into main Jun 2, 2026
18 checks passed
@bilby91 bilby91 mentioned this pull request Jun 2, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant