Skip to content

Migrate all subprocess spawns to running-process; eliminate direct std::process / tokio::process usage #141

@zackees

Description

@zackees

Goal

Every subprocess fbuild starts should go through running-process (the same crate that already powers fbuild_core::containment). Today there are still 31 direct Command::new(...) sites scattered across the workspace — each one is a place where containment, env handling, pipe draining, and logging have to be re-implemented and can drift.

The recent CI outage on ESP32-C6 was a textbook example: the containment rewrite (#108) hand-rolled a stdout-then-stderr drain in fbuild-core::subprocess::run_command, which deadlocked the moment a compiler filled its stderr pipe. A single `running-process`-backed helper would have made that class of bug impossible.

Migration plan

Migrate to running-process-core::NativeProcess (or, ideally, the convenience helper requested in zackees/running-process#100) for every spawn site. After migration, no fbuild crate should reference `std::process::Command::new` or `tokio::process::Command::new` outside of the wrapper module(s) and a small set of intentional hold-outs (the daemon-spawning paths in CLI / Python, and zccache).

Detection script

`ci/find_direct_subprocess.py` enumerates every remaining direct spawn site in `crates/`. Workflow:

  1. Run `uv run python ci/find_direct_subprocess.py` to see the punch list.
  2. Pick a site, reroute it through `NativeProcess` (or annotate with `// allow-direct-spawn: ` if it must remain raw — e.g. "daemon must outlive CLI").
  3. Re-run the script; the count drops.
  4. When the count reaches zero (or only the documented hold-outs remain), delete `ci/find_direct_subprocess.py` along with the allow-marker comments. The migration is complete.

The script supports `--fail` (CI-friendly nonzero exit when any unmigrated sites remain) and `--json` (machine-readable output).

Current baseline

```
Direct Command::new sites: 31
to migrate: 31
allowlisted: 0
```

Categories the 31 hits fall into:

  • Wrapper internals (will go away after migration): `subprocess.rs`, `containment.rs`
  • Intentional hold-outs (annotate and keep): daemon spawn from CLI/Python, zccache bootstrap
  • Real call sites needing migration: emulator handlers (QEMU, node), build script_runtime, CLI external-tool helpers (browser open, where/which, taskkill/kill/tasklist/pgrep)
  • Tests + harnesses: containment integration tests, port_recovery test

Done criteria

  • `uv run python ci/find_direct_subprocess.py --fail` exits 0
  • All migrated sites use `running-process`
  • Hold-outs are annotated with `// allow-direct-spawn: ` and reviewed
  • `ci/find_direct_subprocess.py` is deleted in the same PR that closes this issue
  • Allow-marker comments are removed (no longer needed once the script is gone)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions