feat(doctor,uffd): Phase 7.4 — uffd_wp + memfd_create capability checks#207
Merged
Merged
Conversation
`forkd doctor` now probes the two kernel features the v0.4 live-fork
path needs. Saves users from hitting them as opaque errors mid-BRANCH.
New checks (15 + 16 in the doctor list):
- **uffd_wp (v0.4 live BRANCH)** — opens a `userfaultfd(2)`, negotiates
`UFFDIO_API` with `UFFD_FEATURE_PAGEFAULT_FLAG_WP`, drops the fd.
Maps EPERM to a `sysctl vm.unprivileged_userfaultfd=1` hint; maps
ENOSYS to a "kernel < 5.7 — live BRANCH unavailable but Diff still
works" hint.
- **memfd_create (v0.4 live BRANCH)** — opens an anonymous memfd and
drops it. ENOSYS hints at a restrictive seccomp profile (containers).
Both checks WARN (not FAIL) on failure: v0.3 Diff BRANCH and Full
BRANCH still work fine without uffd_wp/memfd, so a doctor red doesn't
match the actual user impact. The WARN message tells the user exactly
which path they lose.
Shared probe module — `forkd_uffd::probe`:
- `probe_uffd_wp()` / `probe_memfd_create()` are pub helpers that
perform the minimum syscall needed and drop the fd, with
human-readable error contexts. Kept in `forkd-uffd` because that's
where the rest of the UFFD_WP machinery lives, but trimmed to its
own module so doctor doesn't pull in the snapshot-side machinery.
- 2 unit tests: PASS on a supported host, error-message contains
actionable keywords otherwise.
Manual verification on dev box (Linux 6.14, `unprivileged_userfaultfd=0`):
- Unprivileged: `uffd_wp` WARN ("Operation not permitted") with the
exact `sysctl` hint. `memfd_create` PASS.
- Root (CAP_SYS_PTRACE granted): both PASS.
Gates: fmt ✓ · clippy -D warnings ✓ · `cargo test --workspace` 98/98
(was 96; +2 new probe tests) · `RUSTDOCFLAGS=-D warnings cargo doc` ✓.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
WaylandYang
added a commit
that referenced
this pull request
May 31, 2026
…urface (#208) Phases 6 and 7 shipped the full v0.4 live BRANCH path (sub-50 ms source pause via UFFD_WP + memfd) across REST, CLI, and SDKs — but the README still pitched it as "experimental, try with `forkd wp-bench`" and the status section explicitly claimed "we chose not to fork Firecracker." That contradicts reality on main today. Updates: - **README.md / README-zh.md** — v0.4 preview block rewritten as "v0.4 live BRANCH" with the actual user-facing surface (REST `mode: "live"`, CLI `--live` / `--no-wait`, SDK `mode=`). Doctor check count 14 → 16 (uffd_wp + memfd_create). Python and TypeScript SDK examples show `live_fork=True` / `liveFork: true` + `mode="live"` + `wait=False`. Status section: "we chose not to fork Firecracker" paragraph replaced with the honest version — we did fork, here's the branch, here's the upstream proposal, vendor requirement goes away if upstream takes it. - **docs/API.md** — `POST /v1/sandboxes/:id/branch` documents `mode`, `wait`, the `mode`/`diff` mutex (HTTP 400), and per-mode pause semantics. `POST /v1/sandboxes` documents `live_fork`. `SnapshotInfo` gains the `status` field for the `wait=false` lifecycle. - **DESIGN-v0.4.md** — status banner flipped from DRAFT to IMPLEMENTED with links to PRs #194–#207; DRAFT body preserved verbatim as the architecture record (the implementation tracks it closely). - **CHANGELOG.md** — Unreleased gets a "v0.4 live-fork: user-facing surface complete" section. Calls out the prereqs (Linux ≥ 5.7, `unprivileged_userfaultfd=1`, vendored FC fork) and the one known CLI gap (`forkd fork --live-fork` for spawn-time opt-in isn't surfaced — use SDK / REST for now; tracking as a follow-up). - ROADMAP.md left as-is — it's milestone-shaped (M1/M2/M3) and v0.4 live-fork wasn't on the original critical path; the CHANGELOG + Status section already cover the shipped state. No code change; pure docs. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
WaylandYang
added a commit
that referenced
this pull request
May 31, 2026
…rce (#210) Replaces the "pause_ms TBD" disclaimer in v0.4 docs with measured numbers from a clean Hub-pulled `python-numpy` source (1.5 GiB, sha256-verified). The previous attempt at this measurement used `coding-agent-fork-prewarm-v1`, which had 17 baked-in guest Oopses contaminating the timing — fixed by switching source. Methodology (`bench/live-fork-pause-window/bench-live-fork.py`, based on `scripts/dev/e2e-live-branch.py` Phase 6 E2E harness): - One memfd-backed source sandbox spawned with `live_fork: true` - 10 iterations × 4 modes ({live-sync, live-async, diff, full}), interleaved so cold-cache effects average across modes - Each iteration: POST .../branch, record `pause_ms` and HTTP RT, DELETE the result snapshot to bound disk usage - Async iterations also record `poll_until_ready_ms` Results (Intel i7-12700, 30 GiB RAM, Linux 6.14, ext4 on **HDD**): | mode | pause p50 | pause p90 | RT p50 | |--------------|----------:|----------:|----------:| | live-sync | **56 ms**| 64 ms | 13 730 ms | | live-async | 54 ms | 241 ms | **69 ms** | | diff | 202 ms | 418 ms | 13 461 ms | | full | 13 550 ms | 14 268 ms | 13 559 ms | Key ratios at p50: - live vs diff: **3.6× faster pause** (202 / 56) - live vs full: **242× faster pause** (13550 / 56) - async RT vs sync RT: **198× faster return** (13730 / 69) The "on HDD" point is a feature, not a bug for the writeup: Live's pause is disk-independent (memory copy runs after resume, not during), so the Live / Diff gap *widens* on slow storage rather than shrinking. NVMe would speed up Diff but not Live, making the ratio narrower — but Live is always bounded by CPU work (vmstate dump + UFFD_WP arming), never by disk throughput. Files: - `bench/live-fork-pause-window/bench-live-fork.py` — runnable harness, parameterized on source-tag and iterations - `bench/live-fork-pause-window/bench-live-fork.csv` — 40-row raw data (one per BRANCH iteration) - `bench/live-fork-pause-window/RESULTS-v0.4.md` — writeup with methodology, host config, per-mode interpretation of what pause_ms / RT measure, and honest caveats (single host, one source size, p90 outlier on async iter #8) Docs updated: - `README.md` headline: "BRANCH a live VM in 150 ms" → "in 56 ms (v0.4 live mode)". v0.4 preview block now leads with the measured 3.6× / 200× ratios and links to RESULTS-v0.4.md. - `README-zh.md`: same headline + intro update. - `CHANGELOG.md`: Unreleased's v0.4 section's "Bench in progress" disclaimer replaced with the actual numbers table. Phase 7 (user surface for v0.4 live BRANCH) is complete with this PR: REST (#204), CLI (#205), SDKs (#206), doctor (#207), docs (#208), bench (this). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
forkd doctornow probes the two kernel features v0.4 live-fork needs, so users see capability problems before they hit them mid-BRANCH.Two new checks (15 + 16):
uffd_wp (v0.4 live BRANCH)userfaultfd(2)+UFFDIO_APIwithUFFD_FEATURE_PAGEFAULT_FLAG_WPsysctl vm.unprivileged_userfaultfd=1hint. ENOSYS → "kernel < 5.7, Diff still works".memfd_create (v0.4 live BRANCH)memfd_create(2)withMFD_CLOEXECWARN (not FAIL) because v0.3 Diff and Full BRANCH still work without uffd_wp/memfd. The WARN explicitly tells the user which path they lose, so they can decide whether to fix it.
Shared probe module —
forkd_uffd::probe:probe_uffd_wp()/probe_memfd_create()— pub helpers, minimum syscalls, drop the fd before returning. Kept inforkd-uffd(where the rest of the UFFD_WP machinery already lives) but isolated in their own module soforkd-clidoesn't pull in the snapshot-side machinery just to probe.Manual verification (dev box, Linux 6.14)
Test plan
cargo fmt --check --all— cleancargo clippy --workspace --all-targets -- -D warnings— cleancargo test --workspace— 98 / 98 pass (was 96; +2 new probe tests)RUSTDOCFLAGS=-D warnings cargo doc --no-deps— cleanforkd doctorend-to-end on Linux 6.14 — both PASS and WARN paths verified🤖 Generated with Claude Code