bench(v0.4): Phase 7.5 — live BRANCH pause-window data on a clean source#210
Merged
Conversation
Replaces the "pause_ms TBD" disclaimer in v0.4 docs with measured
numbers from a clean Hub-pulled `python-numpy` source (1.5 GiB,
sha256-verified). The previous attempt at this measurement used
`coding-agent-fork-prewarm-v1`, which had 17 baked-in guest Oopses
contaminating the timing — fixed by switching source.
Methodology (`bench/live-fork-pause-window/bench-live-fork.py`,
based on `scripts/dev/e2e-live-branch.py` Phase 6 E2E harness):
- One memfd-backed source sandbox spawned with `live_fork: true`
- 10 iterations × 4 modes ({live-sync, live-async, diff, full}),
interleaved so cold-cache effects average across modes
- Each iteration: POST .../branch, record `pause_ms` and HTTP RT,
DELETE the result snapshot to bound disk usage
- Async iterations also record `poll_until_ready_ms`
Results (Intel i7-12700, 30 GiB RAM, Linux 6.14, ext4 on **HDD**):
| mode | pause p50 | pause p90 | RT p50 |
|--------------|----------:|----------:|----------:|
| live-sync | **56 ms**| 64 ms | 13 730 ms |
| live-async | 54 ms | 241 ms | **69 ms** |
| diff | 202 ms | 418 ms | 13 461 ms |
| full | 13 550 ms | 14 268 ms | 13 559 ms |
Key ratios at p50:
- live vs diff: **3.6× faster pause** (202 / 56)
- live vs full: **242× faster pause** (13550 / 56)
- async RT vs sync RT: **198× faster return** (13730 / 69)
The "on HDD" point is a feature, not a bug for the writeup:
Live's pause is disk-independent (memory copy runs after resume,
not during), so the Live / Diff gap *widens* on slow storage rather
than shrinking. NVMe would speed up Diff but not Live, making the
ratio narrower — but Live is always bounded by CPU work (vmstate
dump + UFFD_WP arming), never by disk throughput.
Files:
- `bench/live-fork-pause-window/bench-live-fork.py` — runnable
harness, parameterized on source-tag and iterations
- `bench/live-fork-pause-window/bench-live-fork.csv` — 40-row raw
data (one per BRANCH iteration)
- `bench/live-fork-pause-window/RESULTS-v0.4.md` — writeup with
methodology, host config, per-mode interpretation of what
pause_ms / RT measure, and honest caveats (single host, one
source size, p90 outlier on async iter #8)
Docs updated:
- `README.md` headline: "BRANCH a live VM in 150 ms" → "in 56 ms
(v0.4 live mode)". v0.4 preview block now leads with the
measured 3.6× / 200× ratios and links to RESULTS-v0.4.md.
- `README-zh.md`: same headline + intro update.
- `CHANGELOG.md`: Unreleased's v0.4 section's "Bench in progress"
disclaimer replaced with the actual numbers table.
Phase 7 (user surface for v0.4 live BRANCH) is complete with this
PR: REST (#204), CLI (#205), SDKs (#206), doctor (#207), docs
(#208), bench (this).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the "bench in progress" disclaimer in v0.4 docs with measured numbers from a clean Hub-pulled
python-numpysource. The previous attempt usedcoding-agent-fork-prewarm-v1, which had 17 baked-in guest Oopses contaminating the timing — fixed by switching source.Results (Intel i7-12700, 30 GiB RAM, Linux 6.14, ext4 on HDD, source = python-numpy 1.5 GiB)
Key ratios at p50:
Why HDD makes the numbers more impressive, not less
Live's pause is disk-independent — the memory copy runs after resume, not during. So:
This is the structural advantage of moving the memory copy out of the critical section.
Files
bench/live-fork-pause-window/bench-live-fork.py— runnable harness, parameterizedbench/live-fork-pause-window/bench-live-fork.csv— 40-row raw data (one per BRANCH)bench/live-fork-pause-window/RESULTS-v0.4.md— writeup with methodology + honest caveatsDocs
README.mdheadline: "BRANCH a live VM in 56 ms (v0.4 live mode)"README-zh.mdheadline: same in ChineseCHANGELOG.md: Unreleased v0.4 section's "bench in progress" placeholder replaced with the tableHonest caveats called out in the writeup
unprivileged_userfaultfd=1or running as root (forkd doctorprobes both)Phase 7 complete with this PR
REST (#204) · CLI (#205) · SDKs (#206) · doctor (#207) · docs (#208) · bench (this).
Test plan
bench-live-fork.py🤖 Generated with Claude Code