Skip to content

audit: sync code that could be async in fbuild-cli + fbuild-python (sub-issue of #813) #817

Description

@zackees

Sub-issue of #813. Audit of crates/fbuild-cli/ and crates/fbuild-python/ for synchronous code paths that could be migrated to async so they share the tokio runtime (CLI's per-invocation runtime, or pyo3-async-runtimes' tokio for Python).

Intro

These two crates have very different audiences but share the same goal: lean on tokio so they can do more than one I/O thing at a time.

  • fbuild-cli — transient CLI process; spins a Builder::new_multi_thread() runtime in main.rs and block_on(cli::async_main()). Most of the CLI is already async (daemon_client.rs's DaemonClient is fully reqwest::Client + await, daemon spawn uses tokio::process::Command, daemon polling uses tokio::time::sleep). A few diagnostic and edge-case paths still use blocking primitives even though they execute inside the tokio context. The biggest motivator to fix them is concurrent progress reporting (e.g. spinner while a build streams) — today the CLI is already async-clean enough that a sync HTTP call in a hot path is the exception, not the rule.

  • fbuild-python — PyO3 extension shipped to FastLED. The crate is structured around the "additive async surface" pattern (issue feat(python): native async API via pyo3-async-runtimes (AsyncSerialMonitor, AsyncDaemon) #65): sync types (SerialMonitor, Daemon, DaemonConnection) stay untouched for backwards compat and the native async counterparts (AsyncSerialMonitor, AsyncDaemon, AsyncDaemonConnection) wrap the same logic in pyo3_async_runtimes::tokio::future_into_py. The sync surface is intentional — but a few HTTP calls inside the sync surface (notably SerialMonitor.reset_device) still use reqwest::blocking::* even though the type owns a tokio::runtime::Runtime already and could reuse it.

PyO3-async-runtimes adoption status

I counted 16 pyo3_async_runtimes::tokio::future_into_py callsites across the crate:

  • async_serial_monitor.rs7 (__aenter__, __aexit__, read_lines, write, write_json_rpc, plus 2 more). Full async surface.
  • async_daemon_connection.rs8 (__aenter__, __aexit__, build, deploy, monitor, build_result, deploy_result, monitor_result). Full async surface.
  • daemon.rs3 on AsyncDaemon (status, ensure_running, stop). Full async surface.

Sync surface (SerialMonitor, DaemonConnection, Daemon) is entirely synchronous and uses either rt.block_on(...) (when the type owns a tokio::runtime::Runtime) or reqwest::blocking::* directly.

Migration order recommendation: the async surface is already complete and FastLED should consume it. The remaining work is purely internal hygiene on the sync surface — finding/replacing the few reqwest::blocking::* calls that could reuse the owned runtime via rt.block_on(reqwest::Client::...). That gives a single HTTP connection pool per SerialMonitor instance instead of a fresh blocking client per call (zero behavioral change visible to Python, but avoids re-running TLS / connection setup on every reset_device etc.).

For Daemon (the sync sibling of AsyncDaemon), the duplication is more substantial — there are two parallel _blocking / _async helpers (verify_broker_daemon_cache_identity_blocking vs _async, ensure_running_via_broker_blocking vs _async). If we're willing to make the sync Daemon methods spin a one-shot tokio::runtime::Builder::new_current_thread() and block_on(...) the async version, we delete ~80 lines of duplicated broker-adoption logic. Out of scope for this issue, but worth noting in #813.

Findings table

Severity Crate File:line Current pattern Proposed async replacement
HIGH fbuild-python serial_monitor.rs:461 (reset_device) reqwest::blocking::Client::new().post(&url)...send() inside a #[pymethod] that already owns self.runtime: Option<Runtime> Use rt.block_on(reqwest::Client::new().post(&url)...send().await); eliminates a second runtime spinup per reset and reuses the existing connection pool.
HIGH fbuild-python serial_monitor.rs:493, serial_monitor.rs:516 (reset_device) std::thread::sleep(...) for USB re-enumeration and post-reset settle These run after py.allow_threads is implicitly NOT held; they're fine for sync SerialMonitor but the equivalent path in AsyncSerialMonitor should be tokio::time::sleep(...).await. Verify AsyncSerialMonitor::reset_device already does. (It does — async_serial_monitor.rs:287 uses reqwest::Client::new(); this finding is to align the sync path's choice of HTTP client.)
HIGH fbuild-python daemon.rs:142 (verify_broker_daemon_cache_identity_blocking) reqwest::blocking::Client::builder()...send()...json() Already has an _async counterpart at daemon.rs:158. Migrate sync Daemon::ensure_running (line 283) to one-shot block_on(_async) and delete the blocking helper.
HIGH fbuild-python daemon.rs:181-220 (ensure_running_via_broker_blocking) Sync clone of ensure_running_via_broker_async (lines 222-267); uses reqwest::blocking::get and std::thread::sleep in the health-poll loop Delete the sync clone; have sync Daemon::ensure_running create a one-shot current_thread runtime and block_on(ensure_running_via_broker_async(...)). ~80 LoC of duplication goes away.
HIGH fbuild-python daemon.rs:191-201 (broker health poll) for _ in 0..100 { std::thread::sleep(100ms); reqwest::blocking::get(url) } Replaced by the change above — the async version already polls with tokio::time::sleep + reqwest::Client.
HIGH fbuild-python daemon.rs:291,327 (Daemon::ensure_running direct path) reqwest::blocking::get(&url) for /health checks + std::thread::sleep(100ms) polling Same as above — fold into a single block_on(async path).
HIGH fbuild-python daemon.rs:339-345 (Daemon::stop) reqwest::blocking::Client::new().post(&url).headers(...).send() AsyncDaemon::stop (line 505) already exists as async. Sync version should block_on(async_version) instead of cloning the body.
HIGH fbuild-python daemon.rs:350 (Daemon::status) reqwest::blocking::get(&url) + resp.text() blocking Fold into one-shot block_on(AsyncDaemon::status_impl(...)).
HIGH fbuild-python outcome.rs:121-164 (send_op) Whole function is reqwest::blocking::Client POST + JSON. Used by DaemonConnection::{build, deploy, monitor} send_op_async already exists in the same file (line 172). Sync send_op could literally be tokio::runtime::Builder::new_current_thread()...block_on(send_op_async(...)) — 35 lines deleted.
MEDIUM fbuild-python serial_monitor.rs:6,33-35 (SerialMonitor struct holds Mutex<WsSink> + Mutex<WsSource> from std::sync) std::sync::Mutex cannot be held across .await, so each WS op must lock() outside the future and pass into block_on. Works today, but forces every WS op to be sync. (Already done in AsyncSerialMonitor — it uses Arc<tokio::sync::Mutex<Option<_>>> deliberately, per the module docstring at line 22-42.) This finding is informational; sync SerialMonitor cannot adopt tokio::sync::Mutex without becoming async.
MEDIUM fbuild-cli cli/port_scan.rs:117-160 (fetch_overlay_to) reqwest::blocking::Client for USB VID overlay fetch, wrapped in std::thread::spawn(...).join() to escape the outer tokio runtime — comment explicitly says "reqwest::blocking spins its own internal runtime and rejects being called from inside an outer tokio runtime" Migrate to reqwest::Client::builder().timeout(15s).build() then .get(url).send().await — the surrounding port_scan is invoked from the async dispatcher so no thread hop is needed. Removes the workaround thread and lets cancellation propagate.
MEDIUM fbuild-cli cli/show.rs:38-58 (show_daemon_logs follow loop) std::thread::sleep(100ms) polling loop + std::fs::File::open + seek + read_to_string Use tokio::time::interval(Duration::from_millis(100)) and tokio::fs::*. Lets Ctrl+C / signal handling work cleanly under the tokio runtime. Low-throughput so the gain is small but the consistency win is real.
MEDIUM fbuild-cli daemon_client.rs:876-887 (spawn_daemon_process) std::fs::create_dir_all + std::fs::OpenOptions::new().create(true).append(true).open(...) for the daemon log file inside an async fn tokio::fs::create_dir_all(...).await + tokio::fs::OpenOptions::new().append(true)...open().await. Doesn't matter for a one-shot 1-byte create, but keeps the function fully non-blocking.
MEDIUM fbuild-cli cli/clang_tools.rs:274,299,369 (IWYU cache read/write) std::fs::read_to_string, std::fs::write, std::fs::read inside tokio::spawn tasks These tasks already run tokio::process::Command::output().await for IWYU itself, so the fs ops are tiny compared to the subprocess. Migrating to tokio::fs::* is purely a consistency win, not a perf one.
MEDIUM fbuild-cli daemon_client.rs:166-179 (LAST_DAEMON_ACQUISITION) OnceLock<Mutex<Option<DaemonAcquisition>>> using std::sync::Mutex The acquisition struct is read/written from async fns but the lock is never held across an .await, so std::sync::Mutex is fine. Leaving as-is is correct. (Listed for completeness — this is NOT a finding.)
LOW fbuild-cli main.rs:6-27 Trampoline-thread + tokio::runtime::Builder::new_multi_thread().build()...block_on(cli::async_main()) This is the runtime entry point; not a finding. Listed only so #813 readers don't flag it.
LOW fbuild-cli cli/lnk.rs:112,168,197 std::fs::read, std::fs::File::create + write_all for .lnk archive read/write inside async fns Consistency-only; the surrounding download_file().await already does the heavy lifting.
LOW fbuild-cli cli/clangd_config.rs:63-87,148,224 std::fs::write, std::fs::read_to_string for .clangd / .vscode/settings.json generation One-shot config file emission; the function is itself sync (fn run_clangd_config(...)), no async runtime to migrate to. Out of scope unless the surrounding command itself is made async.
LOW fbuild-cli cli/purge.rs:104-174, cli/symbols_cmd.rs:107-180, cli/graph_cmd.rs:85, cli/build.rs:160 Various std::fs::* calls in sync command bodies These commands are inherently sync (file system enumeration, ELF parsing, etc.); migrating to tokio::fs::* would force them to be async without measurable benefit. Skip.
LOW fbuild-cli mcp/server.rs:24-33 std::io::stdin().lock(); for line in reader.lines() { ... } blocking stdin read loop inside an async fn run_mcp_server() -> i32 MCP protocol is request/response over stdin/stdout; one outstanding request at a time. Migrating to tokio::io::stdin() adds complexity for zero behavioral gain. Skip unless we want to add concurrent notification streaming.

What was searched

Grep patterns (across both crates/fbuild-cli/src/ and crates/fbuild-python/src/):

  • reqwest:: — classified reqwest::Client (async, good) vs reqwest::blocking::* (sync, candidate)
  • block_on — distinguished the explicit sync-Python bridge calls (intentional, in PyO3 entry points) from block_on inside HTTP/WS plumbing (candidate for sharing the host runtime)
  • pyo3_async_runtimes — counted the existing async surface (16 callsites, all in Async* types)
  • #[pyfunction] / #[pymethods] / #[staticmethod] — enumerated PyO3 entry points, sorted by sync vs async return type
  • std::thread::sleep / std::thread::spawn — flagged blocking sleeps inside async contexts
  • std::process::Command vs tokio::process::Command — CLI uses tokio variant everywhere except the diagnostic clang tools (which use tokio); fbuild-python uses std::process::Command for daemon spawn, which is correct since the child is detached
  • tokio_tungstenite:: / tungstenite:: — only in fbuild-python (sync SerialMonitor uses block_on(read.next()); AsyncSerialMonitor uses native .await)
  • std::sync::Mutex / RwLock / mpsc — all uses are either short-lived non-.await critical sections or test infrastructure
  • tokio::runtime::Runtime::new() — sync SerialMonitor constructs a per-instance multi-thread runtime; tests construct one-shot runtimes. PyO3 itself uses the shared pyo3-async-runtimes runtime for async surface, as expected
  • Client::new / Client::builder / Client::default — daemon_client.rs's single reqwest::Client per DaemonClient instance is correct (one connection pool per CLI invocation)
  • spawn_blocking / block_in_place — no occurrences in fbuild-cli or fbuild-python (correctly — fbuild-cli has no blocking work; fbuild-python sync surface uses Runtime::block_on directly)
  • WebSocket reads from /ws/serial-monitor, /ws/logs, /ws/status — only fbuild-python reads /ws/serial-monitor (SerialMonitor sync via block_on(read.next()), AsyncSerialMonitor async via .await)

Modules / files audited:

fbuild-cli (12 files): main.rs, daemon_client.rs (+ daemon_client/tests.rs, daemon_client/types.rs), lib_select.rs, mcp/server.rs, mcp/jsonrpc.rs, cli/dispatch.rs, cli/daemon_cmd.rs, cli/port_scan.rs, cli/show.rs, cli/clang_tools.rs, cli/clangd_config.rs, cli/lnk.rs, cli/serial_probe.rs, cli/symbols_cmd.rs, cli/purge.rs, cli/graph_cmd.rs, cli/build.rs, cli/compile_many.rs, cli/bringup.rs.

fbuild-python (9 files): lib.rs, daemon.rs, daemon_connection.rs, async_daemon_connection.rs, serial_monitor.rs, async_serial_monitor.rs, outcome.rs, json_rpc.rs, messages.rs.

Out-of-scope notes

  1. block_on at the PyO3 boundary in sync types is intentional, per feat(python): native async API via pyo3-async-runtimes (AsyncSerialMonitor, AsyncDaemon) #65's additive-async pattern. The crate explicitly preserves sync SerialMonitor, DaemonConnection, Daemon for backwards-compatible Python consumers. We are not proposing to delete those types — only to thin out the sync internal helpers that duplicate logic already present in their Async* siblings.

  2. std::process::Command::spawn for daemon spawn in both fbuild-cli/src/daemon_client.rs::spawn_daemon_process (uses tokio) and fbuild-python/src/daemon.rs::Daemon::ensure_running (uses std) is intentional. The daemon is detached and outlives its parent (#32); tokio vs std Command doesn't matter for a Stdio::null() + spawn() + no .await path.

  3. The LAST_DAEMON_ACQUISITION OnceLock<Mutex<Option<_>>> in daemon_client.rs:166 is correctly std::sync::Mutex — the lock is never held across .await, only for set/get of a small struct.

  4. CLI WebSocket consumption does not exist. The CLI does not connect to /ws/serial-monitor, /ws/logs, or /ws/status directly — the only WebSocket consumer in this audit's scope is fbuild-python's SerialMonitor/AsyncSerialMonitor, which talk to /ws/serial-monitor. The streaming build output uses NDJSON over HTTP (daemon_client.rs:304), not a WebSocket.

  5. Test-only tokio::runtime::Runtime::new() constructions in lib.rs:312,330,350 are integration-test helpers and not part of the runtime story.

  6. fbuild-cli/src/cli/show.rs::show_daemon_logs is invoked as a sync fn from the dispatcher (it has no async fn wrapping). Migrating it to tokio::fs requires also making the caller async, which is straightforward but touches dispatch.rs. Listed as MEDIUM rather than HIGH because the throughput is human-paced log tailing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions