diff --git a/CHANGELOG.md b/CHANGELOG.md
index b28ff78..1f6957b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -50,11 +50,21 @@ Firecracker fork from
— upstream FC doesn't yet ship `mem_backend.shared = true`. See
[`docs/VENDORED-FIRECRACKER.md`](./docs/VENDORED-FIRECRACKER.md).
-**Stability**: the user surface is stable; numbers across realistic
-workloads (`bench/live-fork-pause-window.md`) are in progress against
-a freshly-rebuilt clean snapshot — the previous
-`coding-agent-fork-prewarm-v1` parent had 17 pre-baked guest Oopses
-that contaminate Live BRANCH timing.
+**Bench numbers** ([`bench/live-fork-pause-window/RESULTS-v0.4.md`](./bench/live-fork-pause-window/RESULTS-v0.4.md))
+on a clean Hub-pulled `python-numpy` source (1.5 GiB, Intel i7-12700,
+ext4 on HDD):
+
+| mode | pause p50 | pause p90 | RT p50 |
+|------------|----------:|----------:|----------:|
+| live-sync | 56 ms | 64 ms | 13 730 ms |
+| live-async | 54 ms | 241 ms | **69 ms** |
+| diff | 202 ms | 418 ms | 13 461 ms |
+| full | 13 550 ms | 14 268 ms | 13 559 ms |
+
+Headline: **3.6× faster pause** vs v0.3 Diff at p50, and the gap
+widens on slower storage because Live's pause is disk-independent.
+`wait=false` gives callers a ~70 ms HTTP return (vs 13.7 s for sync),
+**~200× RT improvement** for fire-and-forget BRANCH.
### Security — bearer-token comparison was a length oracle (closes #162)
diff --git a/README-zh.md b/README-zh.md
index 112f0ae..99463c8 100644
--- a/README-zh.md
+++ b/README-zh.md
@@ -21,7 +21,7 @@
-## 101 毫秒 fork 100 个 microVM,150 毫秒 BRANCH 一个运行中的 VM。
+## 101 毫秒 fork 100 个 microVM,56 毫秒 BRANCH 一个运行中的 VM(v0.4 live 模式)。
面向 **AI Agent 扇出**(fan-out)场景的 microVM 沙箱运行时。子 VM
从一个已"暖启动"的父快照 fork 而来,通过写时复制(CoW)继承
@@ -44,12 +44,15 @@ pause 时间会从 150 ms 涨到 2.7 s
([#146](https://github.com/deeplethe/forkd/issues/146));修复后
连续 BRANCH 保持平直(第 6 次 BRANCH 快了 17.6×)。
-**v0.4 live BRANCH** 把源 VM 的卡顿窗口从 ~150 ms(Diff)降到
-sub-50 ms:vCPU 状态 dump 完源 VM 立刻恢复,脏页通过 UFFD_WP
-异步抓取。端到端路径已经全部接入:CLI 用 `--live`、REST 用
-`mode: "live"`、Python / TypeScript / MCP SDK 同名。再加 `--no-wait`
-(CLI)或 `wait: false`(REST/SDK)就立刻返回(~10 ms),不等
-背景拷贝完成。
+**v0.4 live BRANCH** 把源 VM 卡顿窗口从 ~200 ms(Diff)压到
+**56 ms p50 / 64 ms p90**(1.5 GiB 源 VM,实测,
+[`bench/live-fork-pause-window/RESULTS-v0.4.md`](./bench/live-fork-pause-window/RESULTS-v0.4.md))。
+p50 比 v0.3 Diff 快 **3.6 倍**,而且在慢盘上这个比值**变得更大**——
+因为 Live 的 pause 是 disk-independent 的(内存拷贝跑在 resume 之
+后,不占临界区)。加 `wait: false` 让调用方 ~70 ms 就返回,背景
+拷贝异步完成——对于 agent 代码的 fire-and-forget BRANCH 是 **200×**
+的 RT 改进。CLI 用 `--live` / `--no-wait`,REST 用 `mode: "live"` /
+`wait: false`,Python / TypeScript / MCP SDK 同名。
```python
from forkd import Controller
diff --git a/README.md b/README.md
index 89e078f..93bce7d 100644
--- a/README.md
+++ b/README.md
@@ -21,7 +21,7 @@
-## Fork 100 microVMs in 101 ms. BRANCH a live VM in 150 ms.
+## Fork 100 microVMs in 101 ms. BRANCH a live VM in 56 ms (v0.4 live mode).
A microVM sandbox runtime for **AI agent fan-out**. Children fork
from a warmed parent snapshot, inheriting its address space
@@ -45,15 +45,17 @@ where repeated BRANCHes on the same parent ballooned from 150 ms to
2.7 s ([#146](https://github.com/deeplethe/forkd/issues/146)); the
chain now stays flat (17.6× faster on the 6th consecutive BRANCH).
-**v0.4 live BRANCH** drops the source-pause window from ~150 ms
-(Diff) to sub-50 ms by moving the memory copy out of the critical
-section: the source resumes as soon as Firecracker dumps vCPU state,
-and dirty pages get captured asynchronously via UFFD_WP. The full
-end-to-end path is wired up — pass `--live` on the CLI, `mode:
-"live"` on REST, or `mode="live"` / `mode: "live"` on the Python /
-TypeScript / MCP SDKs. Add `--no-wait` (CLI) or `wait: false` (REST /
-SDKs) to return as soon as the source resumes (~10 ms) rather than
-waiting on the background copy.
+**v0.4 live BRANCH** collapses the source-pause window from ~200 ms
+(Diff) to **56 ms p50 / 64 ms p90** on a 1.5 GiB source — measured
+on a real BRANCH workload, [`bench/live-fork-pause-window/RESULTS-v0.4.md`](./bench/live-fork-pause-window/RESULTS-v0.4.md).
+**3.6× faster pause** vs v0.3 Diff at p50, and the gap *widens* on
+slower storage because Live's pause is disk-independent (memory
+copy runs after resume, not during). With `wait: false` the caller
+returns in ~70 ms while the background copy completes asynchronously
+— a **200×** RT improvement for fire-and-forget BRANCH from agent
+code. Pass `--live` / `--no-wait` on the CLI, `mode: "live"` /
+`wait: false` on REST, or the same on the Python / TypeScript / MCP
+SDKs.
```python
from forkd import Controller
diff --git a/bench/live-fork-pause-window/RESULTS-v0.4.md b/bench/live-fork-pause-window/RESULTS-v0.4.md
new file mode 100644
index 0000000..27eb934
--- /dev/null
+++ b/bench/live-fork-pause-window/RESULTS-v0.4.md
@@ -0,0 +1,160 @@
+# v0.4 live BRANCH pause-window results
+
+Headline: `mode="live"` collapses the source-VM pause window from
+**202 ms p50 (Diff)** to **56 ms p50 (Live)** on a 1.5 GiB source —
+**3.6× faster** at the median, and the gap widens on slow storage
+because Live's pause is disk-independent while Diff's is not.
+`wait=false` lets the caller return after ~69 ms while the background
+memory copy runs to completion asynchronously.
+
+Methodology, raw numbers, and honest caveats below.
+
+## TL;DR
+
+| mode | pause p50 | pause p90 | pause max | RT p50 |
+|--------------|----------:|----------:|----------:|----------:|
+| live-sync | **56 ms**| 64 ms | 64 ms | 13 730 ms |
+| live-async | 54 ms | 241 ms | 258 ms | **69 ms** |
+| diff | 202 ms | 418 ms | 434 ms | 13 461 ms |
+| full | 13 550 ms | 14 268 ms | 14 314 ms | 13 559 ms |
+
+Key ratios at p50:
+
+- **live vs diff**: 202 / 56 = **3.6× faster pause window**
+- **live vs full**: 13 550 / 56 = **242× faster pause window**
+- **async RT vs sync RT**: 13 730 / 69 = **198× faster return** for
+ callers that don't need the snapshot bytes immediately
+
+> "Pause" is the source VM's downtime (the user-visible gap in TCP
+> connections, kvmclock, etc.). "RT" is the full HTTP round-trip on
+> `POST /v1/sandboxes//branch` — this is what your code waits on.
+
+## Setup
+
+| Item | Value |
+|-----------------|--------------------------------------------------------------------|
+| Host CPU | 12th Gen Intel Core i7-12700 (8P+4E) |
+| Host RAM | 30 GiB |
+| Host kernel | Linux 6.14.0-36-generic (Ubuntu) |
+| Snapshot disk | `/dev/sda2` — **WDC WD10EZEX-75WN4A1, ROTA=1 (spinning HDD)**, ext4 |
+| Firecracker | Vendored `forkd-v0.4-mem-backend-shared-v1.12` (musl release) |
+| Controller | `feat(doctor,uffd): Phase 7.4` (commit `a372e2a`) |
+| Source snapshot | `python-numpy` (from Hub, sha256-verified) |
+| Source RAM size | 1 610 612 736 bytes = **1 536 MiB** |
+| Iterations | 10 per mode, modes interleaved (live-sync, live-async, diff, full) |
+| Source sandbox | Spawned once with `live_fork: true`; all BRANCHes hit it |
+
+Modes interleave so disk warm-up, page-cache fill, and any
+process-wide drift contaminate all four modes equally instead of
+biasing the last batch.
+
+## Raw data
+
+[`bench-live-fork.csv`](./bench-live-fork.csv) — one row per BRANCH
+iteration; columns: `mode, iteration, http_round_trip_ms, pause_ms,
+memory_bin_bytes, poll_until_ready_ms`.
+
+Reproduced via [`bench-live-fork.py`](./bench-live-fork.py):
+
+```bash
+sudo python3 bench-live-fork.py \
+ --source-tag python-numpy \
+ --iterations 10 \
+ --modes live-sync,live-async,diff,full
+```
+
+## What pause_ms measures
+
+`pause_ms` is the source VM's vCPU-pause window:
+
+- **`mode: "full"`**: Pause → write full `memory.bin` to disk → resume.
+ Wall-bound by sequential disk write. On this HDD: ~120 MB/s, so
+ 1.5 GiB ≈ 13 s. SSD would cut this to ~3 s; NVMe ~1.5 s. Not
+ acceptable for a running agent.
+- **`mode: "diff"`**: Pause → snapshot vmstate + dirty pages → resume.
+ Still wall-bound on disk write because the diff is *inside* the
+ pause window. Tail goes wide as the snapshot's dirty page count
+ grows (p90 = 418 ms is the cost of any one BRANCH hitting more
+ dirty pages than the others).
+- **`mode: "live"`**: Pause → snapshot vmstate, arm UFFD_WP, resume.
+ The memory copy happens *after* resume, in a controller-side
+ background thread. pause_ms is bounded by the vmstate dump
+ (~30-50 ms for 1.5 GiB at our vmstate sizes) plus UFFD_WP arming
+ on the resident regions (~0.4-0.6 ms in Phase 6 E2E).
+
+This is why **the live pause window is disk-independent**: an NVMe
+host wouldn't see Live get any faster (it's CPU-bound on vmstate +
+WP arming), but Diff would still scale with disk speed. On slower
+storage, the Live/Diff ratio gets *wider*, not narrower.
+
+## What the round-trip column measures
+
+`http_round_trip_ms` is what your code's `await ctrl.branchSandbox(...)`
+or `c.branch_sandbox(...)` returns in:
+
+- **live-sync (`wait=true`)**: blocks for source pause AND the
+ background memory copy. p50 = 13 730 ms ≈ HDD throughput limit
+ (same as Diff and Full).
+- **live-async (`wait=false`)**: returns as soon as the source
+ resumes. p50 = **69 ms**. The background copy still runs (and is
+ visible via the `status` field flipping from `"writing"` to
+ `"ready"`), but the caller doesn't wait on it.
+- **diff / full**: synchronous by definition; same RT as live-sync.
+
+The `wait=false` path is the headline UX win for agents: a `pause_ms
+~ 56 ms` source downtime *and* a ~70 ms HTTP return. The bench
+records `poll_until_ready_ms` separately so you can see when the
+async snapshot is actually consumable — it's the same 13-14 s wall
+time as sync BRANCH, just out of the critical path.
+
+## Caveats
+
+1. **Single host, single source size.** 1.5 GiB Python+numpy on i7-12700
+ + HDD. Numbers will move with source RAM size (Live's pause is
+ ~CPU + vmstate-size bound; Diff/Full are ~disk-bound) and with
+ disk medium. We'd expect Live's headline gap to narrow on NVMe
+ (because Diff gets faster) but never invert — Live is always
+ bounded by the synchronous parts of FC's pause/dump path.
+
+2. **`live-async` p90 outlier.** Iteration #8 saw pause_ms=258 ms
+ (vs p50=54). Root cause not yet investigated; suspects: ext4
+ writeback pressure from the in-flight previous async BRANCH, or
+ FC's vmstate serialization hitting an irregularity. Reproducing
+ on a clean disk and a longer run is the right follow-up. Median
+ and p90 (excluding this point) stay tight.
+
+3. **`unprivileged_userfaultfd=0` requires root for the bench.** The
+ bench script runs the controller under `sudo` because
+ `vm.unprivileged_userfaultfd=0` is the default on this dev box.
+ Production deployments should either set the sysctl or give the
+ controller `CAP_SYS_PTRACE`. `forkd doctor` (Phase 7.4) probes
+ both.
+
+4. **Source guest must be quiet during the BRANCH.** We ran
+ python-numpy in its default warmed state with no in-guest
+ workload. A guest under heavy write pressure during a Live BRANCH
+ will see UFFD_WP capture more dirty pages, growing the bg-copy
+ wall time (but NOT pause_ms — the pause stays disk-independent).
+
+5. **`mode: "live"` requires the vendored Firecracker fork.**
+ `mem_backend.shared = true` is the one upstream gap; tracked as
+ [`FIRECRACKER-UPSTREAM-PROPOSAL.md`](../../FIRECRACKER-UPSTREAM-PROPOSAL.md).
+ Once it lands upstream, the vendor requirement goes away.
+
+## Comparison vs v0.3.4 Diff
+
+v0.3.4 closed the multi-BRANCH compounding anomaly via
+`posix_fallocate`, putting Diff at a steady ~150-300 ms on this same
+hardware (see [`bench/pause-window/RESULTS-v0.3.md`](../pause-window/RESULTS-v0.3.md)).
+This bench's Diff p50 of 202 ms lines up cleanly with that. The
+v0.4 Live win is **on top of** v0.3.4 Diff, not against the original
+v0.3.0 baseline.
+
+For comparison:
+
+| Version | Mode | p50 pause on this hardware |
+|---------|------|---------------------------:|
+| v0.2.x | Full | ~13 500 ms |
+| v0.3.0 | Diff | ~1 500-2 700 ms (anomaly) |
+| v0.3.4 | Diff | ~200 ms |
+| v0.4 | Live | **~56 ms** |
diff --git a/bench/live-fork-pause-window/bench-live-fork.csv b/bench/live-fork-pause-window/bench-live-fork.csv
new file mode 100644
index 0000000..4c77906
--- /dev/null
+++ b/bench/live-fork-pause-window/bench-live-fork.csv
@@ -0,0 +1,41 @@
+mode,iteration,http_round_trip_ms,pause_ms,memory_bin_bytes,poll_until_ready_ms
+live-sync,0,13669.03,40,1610612736,
+live-async,0,58.27,48,1610612736,13284.92
+diff,0,13510.88,434,1610612736,
+full,0,13182.87,13163,1610612736,
+live-sync,1,13239.02,64,1610612736,
+live-async,1,125.31,90,1610612736,13723.48
+diff,1,14428.33,238,1610612736,
+full,1,13837.76,13828,1610612736,
+live-sync,2,13437.02,58,1610612736,
+live-async,2,85.39,57,1610612736,13668.73
+diff,2,13288.21,227,1610612736,
+full,2,13539.74,13524,1610612736,
+live-sync,3,14129.82,54,1610612736,
+live-async,3,77.93,59,1610612736,14048.62
+diff,3,13154.87,207,1610612736,
+full,3,14384.82,14314,1610612736,
+live-sync,4,13791.01,58,1610612736,
+live-async,4,59.17,51,1610612736,13428.47
+diff,4,13411.8,164,1610612736,
+full,4,13071.25,13068,1610612736,
+live-sync,5,14237.89,61,1610612736,
+live-async,5,48.43,41,1610612736,14047.83
+diff,5,13359.63,271,1610612736,
+full,5,13578.29,13576,1610612736,
+live-sync,6,14196.55,64,1610612736,
+live-async,6,95.57,68,1610612736,14225.81
+diff,6,13542.8,196,1610612736,
+full,6,13323.87,13298,1610612736,
+live-sync,7,16440.8,43,1610612736,
+live-async,7,57.03,38,1610612736,13986.13
+diff,7,14454.35,190,1610612736,
+full,7,13523.34,13510,1610612736,
+live-sync,8,13405.48,39,1610612736,
+live-async,8,266.74,258,1610612736,13481.96
+diff,8,13521.46,179,1610612736,
+full,8,13641.49,13638,1610612736,
+live-sync,9,13463.18,32,1610612736,
+live-async,9,57.64,50,1610612736,14398.6
+diff,9,13189.66,170,1610612736,
+full,9,13853.29,13850,1610612736,
diff --git a/bench/live-fork-pause-window/bench-live-fork.py b/bench/live-fork-pause-window/bench-live-fork.py
new file mode 100644
index 0000000..30afb2e
--- /dev/null
+++ b/bench/live-fork-pause-window/bench-live-fork.py
@@ -0,0 +1,439 @@
+#!/usr/bin/env python3
+"""v0.4 live BRANCH pause-window bench.
+
+Drives N iterations of three BRANCH modes off the same live-fork source
+sandbox and emits per-iteration CSV plus a p50/p90/max summary. The
+point is to get an honest pause_ms number for `mode="live"` against a
+known-clean source — Phase 6's E2E used `coding-agent-fork-prewarm-v1`
+which has 17 baked guest Oopses contaminating the measurement.
+
+Source selection
+----------------
+
+The script symlinks an existing snapshot directory under the script's
+work-dir as the source tag. Override `--source-tag` and `--snap-root`
+if your snapshots live elsewhere. `python-numpy` is the default
+because it's the canonical Hub recipe (`forkd pull
+deeplethe/python-numpy`) — anyone with a fresh forkd install can
+reproduce against the same bytes.
+
+Setup pattern matches `scripts/dev/e2e-live-branch.py` (Phase 6 E2E):
+
+ 1. Stand up an isolated forkd-controller on a free port with a
+ `firecracker` wrapper that adds --no-seccomp (the vendored FC's
+ vmm seccomp filter blocks userfaultfd; following Phase 6's
+ pattern).
+ 2. POST /v1/sandboxes with `live_fork: true` to spawn a memfd-backed
+ source sandbox.
+ 3. Loop N times for each of {live wait=true, live wait=false, diff,
+ full}: POST .../branch, record `pause_ms`, delete the result
+ snapshot to keep disk usage bounded.
+ 4. Emit CSV per iteration + p50/p90/max table to stdout.
+
+Run as root: the FC API socket and snapshot dir are root-owned, and
+the system FC swap needs sudo too.
+
+Output
+------
+
+- `bench-live-fork.csv` — one row per BRANCH iteration:
+ mode, iteration, pause_ms, http_round_trip_ms, memory_bin_bytes,
+ poll_until_ready_ms (live wait=false only), source_memory_bytes
+- Stdout summary table with p50/p90/max per mode.
+
+Usage:
+ sudo python3 bench-live-fork.py \\
+ --source-tag python-numpy \\
+ --iterations 10 \\
+ --modes live-sync,live-async,diff,full
+"""
+import argparse
+import json
+import os
+import shutil
+import socket
+import statistics
+import subprocess
+import sys
+import time
+import urllib.error
+import urllib.request
+
+# Paths the dev box uses; override via CLI when porting.
+DEFAULT_BIN = "/home/yangdongxu/forkd/target/release/forkd-controller"
+DEFAULT_FC = (
+ "/home/yangdongxu/firecracker-fork/build/cargo_target"
+ "/x86_64-unknown-linux-musl/release/firecracker"
+)
+DEFAULT_SNAP_ROOT = "/home/yangdongxu/.local/share/forkd/snapshots"
+SYSTEM_FC = "/usr/local/bin/firecracker"
+SYSTEM_FC_BACKUP = "/usr/local/bin/firecracker.bench-live-backup"
+
+WORK = "/tmp/forkd-bench-live"
+
+
+def http(base_url, method, path, body=None, timeout=120):
+ data = json.dumps(body).encode() if body is not None else None
+ headers = {"Content-Type": "application/json"} if body is not None else {}
+ req = urllib.request.Request(
+ f"{base_url}{path}", data=data, method=method, headers=headers
+ )
+ try:
+ with urllib.request.urlopen(req, timeout=timeout) as resp:
+ raw = resp.read().decode("utf-8", errors="replace")
+ return resp.status, json.loads(raw) if raw else None
+ except urllib.error.HTTPError as e:
+ raw = e.read().decode("utf-8", errors="replace")
+ try:
+ return e.code, json.loads(raw)
+ except json.JSONDecodeError:
+ return e.code, raw
+
+
+def wait_for_healthy(base_url, port, deadline_s=20):
+ end = time.time() + deadline_s
+ while time.time() < end:
+ try:
+ s = socket.create_connection(("127.0.0.1", port), timeout=1)
+ s.close()
+ status, _ = http(base_url, "GET", "/healthz", timeout=2)
+ if status == 200:
+ return
+ except (ConnectionRefusedError, socket.timeout, OSError):
+ pass
+ time.sleep(0.3)
+ raise RuntimeError(f"daemon not healthy after {deadline_s}s")
+
+
+def setup_workdir(source_tag, source_dir, patched_fc):
+ shutil.rmtree(WORK, ignore_errors=True)
+ os.makedirs(f"{WORK}/snapshots", exist_ok=True)
+ os.makedirs(f"{WORK}/audit", exist_ok=True)
+
+ # FC wrapper. Same pattern as the Phase 6 E2E: vendored FC binary
+ # + --no-seccomp because the upstream vmm-thread filter still
+ # blocks userfaultfd(2).
+ wrapper = f"{WORK}/firecracker.wrapper"
+ with open(wrapper, "w") as f:
+ f.write(
+ "#!/bin/bash\n"
+ f"exec {patched_fc} --no-seccomp \"$@\"\n"
+ )
+ os.chmod(wrapper, 0o755)
+ if not os.path.exists(SYSTEM_FC_BACKUP):
+ subprocess.run(["sudo", "mv", SYSTEM_FC, SYSTEM_FC_BACKUP], check=True)
+ subprocess.run(["sudo", "cp", wrapper, SYSTEM_FC], check=True)
+ subprocess.run(["sudo", "chmod", "755", SYSTEM_FC], check=True)
+
+ # Symlink the source snapshot dir into our snap-root. Avoids
+ # copying the multi-hundred-MB memory.bin.
+ target = f"{WORK}/snapshots/{source_tag}"
+ if os.path.lexists(target):
+ os.unlink(target)
+ os.symlink(source_dir, target)
+
+ state = {
+ "snapshots": {
+ source_tag: {
+ "tag": source_tag,
+ "dir": target,
+ "created_at_unix": int(time.time()),
+ "status": "ready",
+ }
+ }
+ }
+ with open(f"{WORK}/state.json", "w") as f:
+ json.dump(state, f, indent=2)
+
+
+def restore_firecracker():
+ if os.path.exists(SYSTEM_FC_BACKUP):
+ subprocess.run(
+ ["sudo", "mv", "-f", SYSTEM_FC_BACKUP, SYSTEM_FC], check=False
+ )
+
+
+def start_daemon(bin_path, bind):
+ log = open(f"{WORK}/controller.log", "wb")
+ return subprocess.Popen(
+ [
+ "sudo",
+ bin_path,
+ "serve",
+ "--bind",
+ bind,
+ "--state",
+ f"{WORK}/state.json",
+ "--snapshot-root",
+ f"{WORK}/snapshots",
+ "--audit-log",
+ f"{WORK}/audit/audit.log",
+ ],
+ stdout=log,
+ stderr=log,
+ )
+
+
+def kill_leftovers(bind):
+ subprocess.run(
+ ["sudo", "pkill", "-f", f"forkd-controller serve --bind {bind}"],
+ stderr=subprocess.DEVNULL,
+ )
+ subprocess.run(
+ ["sudo", "pkill", "-f", f"{WORK}/"], stderr=subprocess.DEVNULL
+ )
+ time.sleep(0.5)
+
+
+def branch_once(base_url, sandbox_id, mode, wait, iteration):
+ """Run a single BRANCH; return a per-iteration row dict."""
+ tag = f"bench-{mode}-{iteration:03d}-{int(time.time() * 1000)}"
+ body = {"tag": tag}
+ if mode == "live-sync":
+ body["mode"] = "live"
+ body["wait"] = True
+ elif mode == "live-async":
+ body["mode"] = "live"
+ body["wait"] = False
+ elif mode == "diff":
+ body["mode"] = "diff"
+ elif mode == "full":
+ body["mode"] = "full"
+ else:
+ raise ValueError(f"unknown mode {mode}")
+
+ t0 = time.time()
+ status, resp = http(
+ base_url, "POST", f"/v1/sandboxes/{sandbox_id}/branch", body
+ )
+ rt_ms = (time.time() - t0) * 1000
+ if status not in (201, 202):
+ raise RuntimeError(f"BRANCH {mode} #{iteration} HTTP {status}: {resp!r}")
+
+ pause_ms = resp.get("pause_ms")
+ mem_bytes = None
+
+ ready_ms = None
+ if mode == "live-async":
+ # Poll until the snapshot flips to status=ready.
+ assert status == 202 and resp.get("status") == "writing"
+ poll_start = time.time()
+ deadline = poll_start + 60
+ while time.time() < deadline:
+ ls_status, ls = http(base_url, "GET", "/v1/snapshots")
+ assert ls_status == 200, f"list_snapshots HTTP {ls_status}"
+ entry = next((e for e in ls if e["tag"] == tag), None)
+ if entry is None:
+ raise RuntimeError(f"{tag} vanished")
+ if entry["status"] == "ready":
+ ready_ms = (time.time() - poll_start) * 1000
+ break
+ if entry["status"] == "failed":
+ raise RuntimeError(f"{tag} failed: {entry.get('warning')}")
+ time.sleep(0.05)
+ if ready_ms is None:
+ raise RuntimeError(f"{tag} did not reach ready in 60s")
+
+ mem_path = f"{WORK}/snapshots/{tag}/memory.bin"
+ if os.path.exists(mem_path):
+ mem_bytes = os.path.getsize(mem_path)
+
+ # Delete the snapshot to keep disk usage bounded. The source
+ # sandbox isn't affected; only this branch's tag goes away.
+ del_status, _ = http(base_url, "DELETE", f"/v1/snapshots/{tag}")
+ if del_status not in (200, 204):
+ # Non-fatal; bench can keep going, log it.
+ print(f" warn: DELETE {tag} -> HTTP {del_status}", file=sys.stderr)
+
+ return {
+ "mode": mode,
+ "iteration": iteration,
+ "http_round_trip_ms": round(rt_ms, 2),
+ "pause_ms": pause_ms,
+ "memory_bin_bytes": mem_bytes,
+ "poll_until_ready_ms": round(ready_ms, 2) if ready_ms is not None else None,
+ }
+
+
+def summarize(rows, csv_path):
+ # Write CSV
+ cols = [
+ "mode",
+ "iteration",
+ "http_round_trip_ms",
+ "pause_ms",
+ "memory_bin_bytes",
+ "poll_until_ready_ms",
+ ]
+ with open(csv_path, "w") as f:
+ f.write(",".join(cols) + "\n")
+ for r in rows:
+ f.write(
+ ",".join("" if r[c] is None else str(r[c]) for c in cols) + "\n"
+ )
+
+ # Per-mode p50 / p90 / max for pause_ms and round-trip.
+ by_mode = {}
+ for r in rows:
+ by_mode.setdefault(r["mode"], []).append(r)
+
+ print("\n=== SUMMARY ===")
+ print(
+ f" {'mode':<14} {'N':>3} "
+ f"{'pause_ms (p50)':>15} {'p90':>6} {'max':>6} "
+ f"{'RT_ms (p50)':>12} {'p90':>6} {'max':>6}"
+ )
+ for mode in ("live-sync", "live-async", "diff", "full"):
+ if mode not in by_mode:
+ continue
+ rs = by_mode[mode]
+ pauses = [r["pause_ms"] for r in rs if r["pause_ms"] is not None]
+ rts = [r["http_round_trip_ms"] for r in rs]
+ if pauses:
+ p_p50 = statistics.median(pauses)
+ p_p90 = statistics.quantiles(pauses, n=10)[-1] if len(pauses) >= 2 else pauses[0]
+ p_max = max(pauses)
+ else:
+ p_p50 = p_p90 = p_max = float("nan")
+ rt_p50 = statistics.median(rts)
+ rt_p90 = statistics.quantiles(rts, n=10)[-1] if len(rts) >= 2 else rts[0]
+ rt_max = max(rts)
+ print(
+ f" {mode:<14} {len(rs):>3} "
+ f"{p_p50:>15.1f} {p_p90:>6.1f} {p_max:>6.1f} "
+ f"{rt_p50:>12.1f} {rt_p90:>6.1f} {rt_max:>6.1f}"
+ )
+
+ # Headline ratio: live-sync p50 vs diff p50.
+ if "live-sync" in by_mode and "diff" in by_mode:
+ live_pauses = [
+ r["pause_ms"] for r in by_mode["live-sync"] if r["pause_ms"] is not None
+ ]
+ diff_pauses = [
+ r["pause_ms"] for r in by_mode["diff"] if r["pause_ms"] is not None
+ ]
+ if live_pauses and diff_pauses:
+ live_p50 = statistics.median(live_pauses)
+ diff_p50 = statistics.median(diff_pauses)
+ ratio = diff_p50 / live_p50 if live_p50 > 0 else float("inf")
+ print(
+ f"\n diff_p50 / live_p50 = {diff_p50:.0f}/{live_p50:.1f} "
+ f"= {ratio:.1f}×"
+ )
+ print(f"\n CSV: {csv_path}")
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description=__doc__,
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ )
+ parser.add_argument("--source-tag", default="python-numpy")
+ parser.add_argument("--snap-root", default=DEFAULT_SNAP_ROOT)
+ parser.add_argument("--controller-bin", default=DEFAULT_BIN)
+ parser.add_argument("--patched-fc", default=DEFAULT_FC)
+ parser.add_argument(
+ "--port", type=int, default=8891, help="port for the isolated controller"
+ )
+ parser.add_argument(
+ "--iterations", type=int, default=10, help="branches per mode"
+ )
+ parser.add_argument(
+ "--modes",
+ default="live-sync,live-async,diff,full",
+ help="comma-separated subset of {live-sync,live-async,diff,full}",
+ )
+ parser.add_argument(
+ "--out-csv",
+ default="/tmp/forkd-bench-live/bench-live-fork.csv",
+ )
+ args = parser.parse_args()
+
+ bind = f"127.0.0.1:{args.port}"
+ base_url = f"http://{bind}"
+
+ source_dir = os.path.join(args.snap_root, args.source_tag)
+ if not os.path.isdir(source_dir):
+ sys.exit(f"source snapshot not found: {source_dir}")
+
+ # Probe source size — useful for the writeup.
+ src_mem = os.path.join(source_dir, "memory.bin")
+ src_bytes = os.path.getsize(src_mem) if os.path.exists(src_mem) else None
+
+ modes = args.modes.split(",")
+ for m in modes:
+ if m not in {"live-sync", "live-async", "diff", "full"}:
+ sys.exit(f"unknown mode {m}")
+
+ print(f"[*] source: {source_dir}")
+ if src_bytes:
+ print(f" memory.bin: {src_bytes} bytes ({src_bytes // (1024 * 1024)} MiB)")
+ print(f"[*] modes: {modes}, iterations per mode: {args.iterations}")
+ print(f"[*] controller on {bind}")
+
+ print("[*] kill leftovers")
+ kill_leftovers(bind)
+
+ print(f"[*] setup work dir {WORK}")
+ setup_workdir(args.source_tag, source_dir, args.patched_fc)
+
+ print("[*] start daemon")
+ daemon = start_daemon(args.controller_bin, bind)
+ rows = []
+ try:
+ wait_for_healthy(base_url, args.port)
+ print("[+] daemon healthy")
+
+ # Spawn one live-fork source sandbox; all BRANCHes hit it.
+ print(f"\n[*] POST /v1/sandboxes live_fork=true tag={args.source_tag}")
+ status, body = http(
+ base_url,
+ "POST",
+ "/v1/sandboxes",
+ {"snapshot_tag": args.source_tag, "n": 1, "live_fork": True},
+ )
+ if status != 201:
+ raise RuntimeError(f"spawn HTTP {status}: {body!r}")
+ sandbox_id = body[0]["id"]
+ print(f"[+] sandbox {sandbox_id}")
+
+ # Give the guest a moment to settle (some recipes do post-boot
+ # work). Keep it small so the bench's "agent state" isn't
+ # dominated by warmup work.
+ time.sleep(1.5)
+
+ # Interleave modes so any one-shot effects (cold cache,
+ # warm-up, file-system state) average out instead of stacking
+ # on the last mode.
+ for i in range(args.iterations):
+ for m in modes:
+ print(f" [{m} #{i}] ...", end=" ", flush=True)
+ row = branch_once(base_url, sandbox_id, m, None, i)
+ rows.append(row)
+ extra = ""
+ if row["poll_until_ready_ms"] is not None:
+ extra = f" ready+{row['poll_until_ready_ms']:.0f}ms"
+ print(
+ f"pause={row['pause_ms']}ms "
+ f"rt={row['http_round_trip_ms']:.0f}ms{extra}"
+ )
+
+ summarize(rows, args.out_csv)
+
+ finally:
+ print("\n[*] tearing down")
+ subprocess.run(["sudo", "kill", str(daemon.pid)], stderr=subprocess.DEVNULL)
+ subprocess.run(
+ ["sudo", "pkill", "-9", "-f", "/usr/local/bin/firecracker"],
+ stderr=subprocess.DEVNULL,
+ )
+ time.sleep(0.5)
+ restore_firecracker()
+
+
+if __name__ == "__main__":
+ try:
+ main()
+ except Exception as e:
+ print(f"\n[!] FAIL: {e}", file=sys.stderr)
+ sys.exit(1)