Skip to content

docs(v0.4): Phase 6.1.5 — FC POST /uffd/wp endpoint shipped#194

Merged
WaylandYang merged 1 commit into
mainfrom
feat/v0.4-phase6.1.5-wp-uffd-endpoint
May 30, 2026
Merged

docs(v0.4): Phase 6.1.5 — FC POST /uffd/wp endpoint shipped#194
WaylandYang merged 1 commit into
mainfrom
feat/v0.4-phase6.1.5-wp-uffd-endpoint

Conversation

@WaylandYang
Copy link
Copy Markdown
Contributor

Summary

The vendored FC fork now carries the snapshot-side WP-uffd endpoint at deeplethe/firecracker@7d80afade. This PR brings forkd-side docs and a smoke test in line.

  • `docs/VENDORED-FIRECRACKER.md`: bumps the commit list to five — `cc3632b72` / `f3b299ff7` / `fe2b39026` / `c5dff4bb1` / `7d80afade` — with one-line descriptions of what each commit does.
  • `DESIGN-v0.4-PHASE6.md` PR 6.1.5 row: flipped from "in-progress, EINVAL unsolved" to done, with the post-mortem inline:
    1. FC's vmm-thread seccomp filter doesn't allow `userfaultfd(2)` (syscall 323) or the `UFFDIO_API` / `UFFDIO_REGISTER` / `UFFDIO_WRITEPROTECT` ioctls. Development uses `--no-seccomp`; productionizing needs a filter update (separate follow-up).
    2. `UFFDIO_REGISTER (WP)` returns `EINVAL` on file-backed VMAs — the kernel correctly refuses to WP an ext4 mmap (`vma_can_userfault` allows only anon, tmpfs, shmem, memfd). Phase 5b's `MemoryBackend::MemfdShared` already routes through memfd, so the real caller's VMA is shmem-backed and registration succeeds. The earlier hypotheses (THP, KVM page pinning) turned out to be wrong; the EINVAL was a smoke-test setup issue.
  • `scripts/dev/test-wp-uffd-memfd.py`: end-to-end smoke test. `memfd_create` + populate from the snapshot's `memory.bin`, boot FC with `backend_path=/proc/$$/fd/`, fire `PUT /uffd/wp`, receive the uffd fd via `recvmsg+SCM_RIGHTS`, verify it's a real `anon_inode:[userfaultfd]`.

Unblocks Phase 6.2 (controller-side `Vm::request_wp_uffd` + `Vm::memfd_handle` getters).

Test plan

  • FC vendor commit `7d80afade` builds clean (`tools/devtool build --release` exit 0).
  • `sudo HOME=... python3 scripts/dev/test-wp-uffd-memfd.py` end-to-end:
    • `/snapshot/load` -> 204, FC mmap shows `/memfd:forkd-wp-test (deleted) rw-s` (NOT ext4 file-backed)
    • `/uffd/wp` -> 204
    • Receiver gets payload `[{base_host_virt_addr:..., size:536870912, offset:0, page_size:4096, page_size_kib:4096}]`
    • Received fd resolves to `/proc/self/fd/ -> anon_inode:[userfaultfd]` ✓
  • `cargo fmt --check`, `cargo clippy --workspace --all-targets -- -D warnings`, `cargo test --workspace` all still pass on a fresh sync from `main`.

(Docs-only on the forkd side. Real implementation work in 6.2 next.)

🤖 Generated with Claude Code

The vendored FC fork now carries the snapshot-side WP-uffd
endpoint at deeplethe/firecracker@7d80afade. forkd-side docs:

  - VENDORED-FIRECRACKER.md: bump the commit list to five
    (cc3632b72 / f3b299ff7 / fe2b39026 / c5dff4bb1 / 7d80afade),
    describe what each one does.

  - DESIGN-v0.4-PHASE6.md PR 6.1.5 row: flip from
    "in-progress, EINVAL unsolved" to DONE, with the post-mortem:
      1. seccomp filter doesn't allow userfaultfd/UFFDIO_* post-boot —
         development uses --no-seccomp; productionizing the endpoint
         needs a filter update (separate follow-up).
      2. UFFDIO_REGISTER (WP) returns EINVAL on file-backed VMAs;
         the kernel correctly refuses to WP an ext4 mmap. The real
         caller (Phase 5b MemoryBackend::MemfdShared) routes through
         memfd so the VMA is shmem-backed and registration succeeds.
    The original blueprint warned about both — the seccomp one was
    accurate, the EINVAL one turned out to be a smoke-test setup
    issue (not the THP / KVM-pinning explanation we'd hypothesized).

  - scripts/dev/test-wp-uffd-memfd.py: end-to-end smoke test.
    memfd_create + populate from snapshot's memory.bin, boot FC,
    restore with backend_path=/proc/$$/fd/<N>, fire PUT /uffd/wp,
    receive the uffd fd via recvmsg+SCM_RIGHTS, verify it is a
    real userfaultfd. Runs as root (FC API socket is root-owned).

Unblocks Phase 6.2 (controller-side request_wp_uffd + memfd_handle
getter).
@WaylandYang WaylandYang merged commit 413f46d into main May 30, 2026
2 checks passed
@WaylandYang WaylandYang deleted the feat/v0.4-phase6.1.5-wp-uffd-endpoint branch May 30, 2026 03:40
WaylandYang added a commit that referenced this pull request May 30, 2026
* docs(v0.4): Phase 6.1.5 — FC POST /uffd/wp endpoint shipped

The vendored FC fork now carries the snapshot-side WP-uffd
endpoint at deeplethe/firecracker@7d80afade. forkd-side docs:

  - VENDORED-FIRECRACKER.md: bump the commit list to five
    (cc3632b72 / f3b299ff7 / fe2b39026 / c5dff4bb1 / 7d80afade),
    describe what each one does.

  - DESIGN-v0.4-PHASE6.md PR 6.1.5 row: flip from
    "in-progress, EINVAL unsolved" to DONE, with the post-mortem:
      1. seccomp filter doesn't allow userfaultfd/UFFDIO_* post-boot —
         development uses --no-seccomp; productionizing the endpoint
         needs a filter update (separate follow-up).
      2. UFFDIO_REGISTER (WP) returns EINVAL on file-backed VMAs;
         the kernel correctly refuses to WP an ext4 mmap. The real
         caller (Phase 5b MemoryBackend::MemfdShared) routes through
         memfd so the VMA is shmem-backed and registration succeeds.
    The original blueprint warned about both — the seccomp one was
    accurate, the EINVAL one turned out to be a smoke-test setup
    issue (not the THP / KVM-pinning explanation we'd hypothesized).

  - scripts/dev/test-wp-uffd-memfd.py: end-to-end smoke test.
    memfd_create + populate from snapshot's memory.bin, boot FC,
    restore with backend_path=/proc/$$/fd/<N>, fire PUT /uffd/wp,
    receive the uffd fd via recvmsg+SCM_RIGHTS, verify it is a
    real userfaultfd. Runs as root (FC API socket is root-owned).

Unblocks Phase 6.2 (controller-side request_wp_uffd + memfd_handle
getter).

* feat(vmm): Phase 6.2 — Vm::memfd_handle + Vm::request_wp_uffd

Phase 6.2 — controller-side wiring for the v0.4 live-fork path.

Two new methods on `Vm`:

  - `memfd_handle() -> Option<&memfd::MemfdRegion>`. Phase 5b already
    stored the MemfdRegion on Vm; this getter exposes it so Phase 6.3
    can mmap the same backing shmem in the controller's process and
    stream clean pages out for the bulk copier. Returns None for
    file-backed VMs (which can't be WP'd — see DESIGN-v0.4-PHASE6.md).

  - `request_wp_uffd(socket_path) -> Result<Handshake>` (Linux-only).
    Bind a UDS, fire PUT /uffd/wp at the vendored FC, accept the
    incoming connection FC opens, recvmsg the uffd fd + region
    descriptors via SCM_RIGHTS. Reuses the SCM_RIGHTS handshake
    receiver from `forkd_uffd::handshake` (the same one the
    restore-side handler has been using). On non-Linux, a stub that
    errors clearly so callers don't silently degrade.

Listener is bound synchronously *before* the API PUT so FC's
connect() can never race past the listen() and ECONNREFUSED. The
accept thread is joined unconditionally — even on API failure — so a
dangling accept can't hang forever (the listener drops at end of
scope and accept unblocks).

Unit test asserts the body shape (`{"socket": "..."}` exactly one
field, FC's SetupWpUffdParams has deny_unknown_fields). Full
end-to-end already proven by scripts/dev/test-wp-uffd-memfd.py
landed in #194 — the Python smoke is the integration test for the
SCM_RIGHTS protocol; this Rust wrapper just gives forkd-controller a
typed interface to it.

Unblocks Phase 6.3 (mode="live" path in branch_sandbox).
WaylandYang added a commit that referenced this pull request May 31, 2026
…urface (#208)

Phases 6 and 7 shipped the full v0.4 live BRANCH path (sub-50 ms source
pause via UFFD_WP + memfd) across REST, CLI, and SDKs — but the README
still pitched it as "experimental, try with `forkd wp-bench`" and the
status section explicitly claimed "we chose not to fork Firecracker."
That contradicts reality on main today.

Updates:

- **README.md / README-zh.md** — v0.4 preview block rewritten as "v0.4
  live BRANCH" with the actual user-facing surface (REST `mode: "live"`,
  CLI `--live` / `--no-wait`, SDK `mode=`). Doctor check count
  14 → 16 (uffd_wp + memfd_create). Python and TypeScript SDK examples
  show `live_fork=True` / `liveFork: true` + `mode="live"` + `wait=False`.
  Status section: "we chose not to fork Firecracker" paragraph replaced
  with the honest version — we did fork, here's the branch, here's the
  upstream proposal, vendor requirement goes away if upstream takes it.
- **docs/API.md** — `POST /v1/sandboxes/:id/branch` documents `mode`,
  `wait`, the `mode`/`diff` mutex (HTTP 400), and per-mode pause
  semantics. `POST /v1/sandboxes` documents `live_fork`. `SnapshotInfo`
  gains the `status` field for the `wait=false` lifecycle.
- **DESIGN-v0.4.md** — status banner flipped from DRAFT to IMPLEMENTED
  with links to PRs #194#207; DRAFT body preserved verbatim as the
  architecture record (the implementation tracks it closely).
- **CHANGELOG.md** — Unreleased gets a "v0.4 live-fork: user-facing
  surface complete" section. Calls out the prereqs (Linux ≥ 5.7,
  `unprivileged_userfaultfd=1`, vendored FC fork) and the one
  known CLI gap (`forkd fork --live-fork` for spawn-time opt-in
  isn't surfaced — use SDK / REST for now; tracking as a follow-up).
- ROADMAP.md left as-is — it's milestone-shaped (M1/M2/M3) and v0.4
  live-fork wasn't on the original critical path; the CHANGELOG + Status
  section already cover the shipped state.

No code change; pure docs.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant