Skip to content

feat(cli): Phase 7.2 — --live and --no-wait flags on forkd snapshot#205

Merged
WaylandYang merged 1 commit into
mainfrom
feat/v0.4-phase7.2-cli-live-flag
May 30, 2026
Merged

feat(cli): Phase 7.2 — --live and --no-wait flags on forkd snapshot#205
WaylandYang merged 1 commit into
mainfrom
feat/v0.4-phase7.2-cli-live-flag

Conversation

@WaylandYang
Copy link
Copy Markdown
Contributor

Summary

Wires the canonical Phase 7.1 mode field through to the CLI so users can opt into v0.4 Live BRANCH from the command line.

New flags on forkd snapshot --from-sandbox <id>:

  • --live — v0.4 UFFD_WP-based BRANCH mode. Source must have been booted with --live-fork. Mutually exclusive with --diff (clap-enforced via conflicts_with).
  • --no-wait — return as soon as the source resumes (~10 ms) instead of blocking on the background memory copy. Requires --live (clap-enforced via requires); doesn't apply to v0.3 Diff because Diff is synchronous.

Body construction in branch_snapshot_via_daemon:

  • --diff still sends the legacy diff: true boolean so this CLI can drive both v0.3.x and v0.4+ daemons.
  • --live sends the canonical Phase 7.1 mode: "live" — v0.3 daemons don't support live BRANCH anyway, so there's no compat path to preserve.
  • --no-wait adds wait: false.

Standalone (no --from-sandbox) snapshot path early-errors on either flag, mirroring the existing --diff guard.

Drive-by fix

memfd::tests::create_and_populate_succeeds_for_small_file was silently failing on main. PR #202 fixed backend_path() to embed the explicit controller PID (resolving the cross-process /proc/self/fd/N ENOENT) but didn't update the test's expected prefix. Test now derives the prefix from std::process::id().

Test plan

  • cargo fmt --check --all — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo test --workspace — 96 / 96 pass
  • RUSTDOCFLAGS=-D warnings cargo doc --no-deps --workspace — clean
  • Smoke from forkd snapshot --from-sandbox <id> --live against a --live-fork source (covered by scripts/dev/e2e-live-branch.py against the REST surface; CLI smoke deferred to Phase 7.5 bench)

🤖 Generated with Claude Code

…hot`

Wire the canonical Phase 7.1 `mode` field through to the CLI so users
can opt into v0.4 Live BRANCH (sub-50 ms source pause) and the
fire-and-forget `wait=false` path from the command line.

CLI surface (only valid with `--from-sandbox`):

- `--live`              v0.4 UFFD_WP-based BRANCH mode. Source must have
                        been booted with `--live-fork`. Mutually
                        exclusive with `--diff` (clap `conflicts_with`).
- `--no-wait`           Return after source resumes (~10 ms) instead of
                        blocking on the background memory copy. Requires
                        `--live` (clap `requires`); doesn't apply to
                        v0.3 Diff because Diff is synchronous.

Body construction in `branch_snapshot_via_daemon`:

- `--diff` still sends the legacy `diff: true` boolean so this CLI can
  drive both v0.3.x and v0.4+ daemons.
- `--live` sends the canonical `mode: "live"` — v0.3 daemons can't do
  live BRANCH anyway, so there's no compat path worth preserving.
- `--no-wait` adds `wait: false`.

Standalone snapshot (no `--from-sandbox`) early-errors on either flag,
mirroring the existing `--diff` guard.

The two internal `snapshot_cmd` callers (`from-image`, `run`) pass
`false, false` for the new bools; both are local-boot paths where the
daemon BRANCH layer doesn't apply.

Drive-by: fix the `memfd::tests::create_and_populate_succeeds_for_small_file`
test that was silently failing on main. PR #202 changed `backend_path()`
to embed the explicit controller PID (the FC-side fix for the
cross-process `/proc/self/fd/N` resolution bug) but didn't update the
test's `/proc/self/fd/` prefix expectation. Test now uses
`std::process::id()` to compute the expected prefix.

Gates (dev box, Linux):

- cargo fmt --check --all                       clean
- cargo clippy --workspace --all-targets        clean
- cargo test --workspace                        96 / 96 pass
- RUSTDOCFLAGS=-D warnings cargo doc --no-deps  clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@WaylandYang WaylandYang merged commit 431989c into main May 30, 2026
2 checks passed
@WaylandYang WaylandYang deleted the feat/v0.4-phase7.2-cli-live-flag branch May 30, 2026 11:03
WaylandYang added a commit that referenced this pull request May 31, 2026
…rce (#210)

Replaces the "pause_ms TBD" disclaimer in v0.4 docs with measured
numbers from a clean Hub-pulled `python-numpy` source (1.5 GiB,
sha256-verified). The previous attempt at this measurement used
`coding-agent-fork-prewarm-v1`, which had 17 baked-in guest Oopses
contaminating the timing — fixed by switching source.

Methodology (`bench/live-fork-pause-window/bench-live-fork.py`,
based on `scripts/dev/e2e-live-branch.py` Phase 6 E2E harness):

- One memfd-backed source sandbox spawned with `live_fork: true`
- 10 iterations × 4 modes ({live-sync, live-async, diff, full}),
  interleaved so cold-cache effects average across modes
- Each iteration: POST .../branch, record `pause_ms` and HTTP RT,
  DELETE the result snapshot to bound disk usage
- Async iterations also record `poll_until_ready_ms`

Results (Intel i7-12700, 30 GiB RAM, Linux 6.14, ext4 on **HDD**):

| mode         | pause p50 | pause p90 | RT p50    |
|--------------|----------:|----------:|----------:|
| live-sync    |  **56 ms**|     64 ms | 13 730 ms |
| live-async   |     54 ms |    241 ms | **69 ms** |
| diff         |    202 ms |    418 ms | 13 461 ms |
| full         |  13 550 ms |  14 268 ms | 13 559 ms |

Key ratios at p50:

- live vs diff: **3.6× faster pause** (202 / 56)
- live vs full: **242× faster pause** (13550 / 56)
- async RT vs sync RT: **198× faster return** (13730 / 69)

The "on HDD" point is a feature, not a bug for the writeup:
Live's pause is disk-independent (memory copy runs after resume,
not during), so the Live / Diff gap *widens* on slow storage rather
than shrinking. NVMe would speed up Diff but not Live, making the
ratio narrower — but Live is always bounded by CPU work (vmstate
dump + UFFD_WP arming), never by disk throughput.

Files:

- `bench/live-fork-pause-window/bench-live-fork.py` — runnable
  harness, parameterized on source-tag and iterations
- `bench/live-fork-pause-window/bench-live-fork.csv` — 40-row raw
  data (one per BRANCH iteration)
- `bench/live-fork-pause-window/RESULTS-v0.4.md` — writeup with
  methodology, host config, per-mode interpretation of what
  pause_ms / RT measure, and honest caveats (single host, one
  source size, p90 outlier on async iter #8)

Docs updated:

- `README.md` headline: "BRANCH a live VM in 150 ms" → "in 56 ms
  (v0.4 live mode)". v0.4 preview block now leads with the
  measured 3.6× / 200× ratios and links to RESULTS-v0.4.md.
- `README-zh.md`: same headline + intro update.
- `CHANGELOG.md`: Unreleased's v0.4 section's "Bench in progress"
  disclaimer replaced with the actual numbers table.

Phase 7 (user surface for v0.4 live BRANCH) is complete with this
PR: REST (#204), CLI (#205), SDKs (#206), doctor (#207), docs
(#208), bench (this).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant