perf(daemon): session-trusted verify-skip for ESP32 warm redeploys#116
perf(daemon): session-trusted verify-skip for ESP32 warm redeploys#116
Conversation
Adds an in-memory firmware-hash cache on `DeviceManager`, indexed by serial port. When the daemon has just deployed a known image to a device AND the port has stayed continuously enumerated since then, the next deploy with the same image short-circuits the entire verify-flash round-trip (serial open + espflash connect + stub upload + 3× MD5). The fallback path is still the existing opt-in native verify from issue #66 — we never trust a hash we can't justify. ## Trust model `TrustedFirmwareHash { hash: [u8;32], set_at: Instant }` is stored on `DeviceState`. `DeviceManager::trusted_firmware_hash(port)` returns `Some(h)` only if: 1. The port is currently connected, AND 2. No `true → false` enumeration edge has been observed since `set_at`. The enumeration edge is captured in `DeviceState::last_disconnect_at`, which `refresh_devices` now stamps on the transition (not every tick). Any physical disconnect — unplug/replug, USB-CDC re-enumeration after an external reset — invalidates the hash. Daemon restart also invalidates (in-memory only). ## Deploy-handler wiring Before the existing `verify-flash` call we compute `compute_esp32_image_hash(firmware, bootloader_off, partitions_off, firmware_off)` — SHA-256 over `(offset_le, len_le, bytes)` per region, so two builds of the same source with identical output hash to the same value. If `ctx.device_manager.trusted_firmware_hash(port)` returns that hash, we return a `VerifySkip` result immediately. After a successful write-flash (full or selective), the hash is recorded; on failure the entry is cleared. ## Opt-in Gated by `FBUILD_TRUST_DEVICE_HASH=1`, mirroring the convention used for `FBUILD_USE_ESPFLASH_VERIFY` / `FBUILD_USE_ESPFLASH_WRITE` in #66. Default off until benched on every ESP32 family member. ## Side benefit: benchmark-friendly daemon lifetime `SELF_EVICTION_TIMEOUT` is now overridable via `FBUILD_SELF_EVICTION_SECS`, so the warm-deploy loop (LOOP.md) can keep the same daemon alive across T1→T2→T3 without the 30 s idle timer racing against spawn latency. ## Tests Five new `device_manager` unit tests cover the round-trip, clear, unknown-port → None, disconnect-after-set invalidation, and the set-after-disconnect survival case. All 116 daemon-lib tests pass; clippy `-D warnings` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the session-trusted verify-skip returns immediately without ever touching the serial port, the existing 3 s USB re-enum polling loop still ran and — because a monitor may already be attached to the port — its `open()` probes could collide with the monitor and burn the full timeout. Skip the poll when the deploy didn't actually flash anything (`DeployOutcome::VerifySkip`), which applies to both the pre-existing MD5-verify-skip path and the new trust-skip path. Load-bearing for the < 4 s warm-trust-skip budget in LOOP.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The trust-hash invalidation relies on `DeviceState::last_disconnect_at` being up-to-date. A user who swapped boards at the same COM port between deploys (no intervening `/api/devices/list` call) could otherwise trip the trust-check into a false match against the replaced board. Refresh the enumeration once at the top of the deploy handler — one OS-level port enumeration per deploy is the right cost to keep the invalidation sound. Gated on `trusted_hash_enabled`, so the cost is only paid when the optimization is in effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR introduces a session-trusted firmware verification feature for ESP32 devices. It adds SHA-256 hashing of firmware images, device connection state tracking, and a fast-path deployment that skips serial verification when cached hashes match. The daemon's self-eviction timeout is now configurable via environment variable. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Handler
participant DeviceManager
participant Deployer
participant SerialPort
Client->>Handler: deploy(esp32, port)
Handler->>DeviceManager: refresh_devices()
DeviceManager-->>Handler: device state updated
Handler->>Handler: compute_esp32_image_hash()
Handler->>DeviceManager: trusted_firmware_hash(port)
DeviceManager-->>Handler: cached hash (if available)
alt Hash Matches (Fast-Path)
Handler-->>Client: DeploymentResult(VerifySkip)
else Hash Mismatch or No Cache
Handler->>Deployer: deploy_regions(...)
Deployer->>SerialPort: write firmware
SerialPort-->>Deployer: verify-flash/MD5
Deployer-->>Handler: success/failure
alt Deploy Success
Handler->>DeviceManager: set_trusted_firmware_hash(port, hash)
else Deploy Failure
Handler->>DeviceManager: clear_trusted_firmware_hash(port)
end
Handler-->>Client: DeploymentResult(normal outcome)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Two orthogonal micro-optimizations that shave the remaining server-side cost off the session-trusted verify-skip path (#116). Targets the best-case warm-deploy budget of < 1 s end-to-end (see #114 comment): ## `ImageHashMemo` on `DaemonContext` Warm redeploys re-read bootloader + partitions + firmware (~2–4 MB) and re-hash them on every call, costing ~5–15 ms. The memo keys on firmware path + the three files' `mtime` tuple — when all three match, reuse the stored hash instead of touching disk. Self-invalidates when any file's `mtime` advances (i.e. the next build produced new output). ## `DeviceManager::refresh_devices_if_stale` `refresh_devices()` costs ~20–30 ms on Windows (OS port enumeration). The trust-hash path called it on every deploy to keep `last_disconnect_at` up to date, but a 2 s freshness window is plenty to catch a physical unplug/replug between two warm deploys. The new `refresh_devices_if_stale(Duration)` short-circuits inside the window. ## Impact on the best-case arithmetic Server-side cost on a warm trust-skip deploy drops from ~50 ms (refresh + SHA-256 + lookup + early return) to ~1–2 ms (DashMap probe + metadata stat for the 3 files). Combined with the #116 trust-skip early return and #111 /ws/logs stream, that puts the <1 s best-case target (#114 comment) squarely in reach once the in-memory build fingerprint (#91 follow-up) lands. ## Tests 5 new unit tests: - `device_manager::refresh_devices_if_stale_skips_inside_window` - `device_manager::refresh_devices_if_stale_reruns_when_expired` - `image_hash_memo_tests::memo_hit_reuses_hash` - `image_hash_memo_tests::memo_miss_on_firmware_mtime_change` - `image_hash_memo_tests::memo_skipped_when_inputs_missing` All 121 `fbuild-daemon` lib tests pass; clippy `-D warnings` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dow (#118) Two orthogonal micro-optimizations that shave the remaining server-side cost off the session-trusted verify-skip path (#116). Targets the best-case warm-deploy budget of < 1 s end-to-end (see #114 comment): ## `ImageHashMemo` on `DaemonContext` Warm redeploys re-read bootloader + partitions + firmware (~2–4 MB) and re-hash them on every call, costing ~5–15 ms. The memo keys on firmware path + the three files' `mtime` tuple — when all three match, reuse the stored hash instead of touching disk. Self-invalidates when any file's `mtime` advances (i.e. the next build produced new output). ## `DeviceManager::refresh_devices_if_stale` `refresh_devices()` costs ~20–30 ms on Windows (OS port enumeration). The trust-hash path called it on every deploy to keep `last_disconnect_at` up to date, but a 2 s freshness window is plenty to catch a physical unplug/replug between two warm deploys. The new `refresh_devices_if_stale(Duration)` short-circuits inside the window. ## Impact on the best-case arithmetic Server-side cost on a warm trust-skip deploy drops from ~50 ms (refresh + SHA-256 + lookup + early return) to ~1–2 ms (DashMap probe + metadata stat for the 3 files). Combined with the #116 trust-skip early return and #111 /ws/logs stream, that puts the <1 s best-case target (#114 comment) squarely in reach once the in-memory build fingerprint (#91 follow-up) lands. ## Tests 5 new unit tests: - `device_manager::refresh_devices_if_stale_skips_inside_window` - `device_manager::refresh_devices_if_stale_reruns_when_expired` - `image_hash_memo_tests::memo_hit_reuses_hash` - `image_hash_memo_tests::memo_miss_on_firmware_mtime_change` - `image_hash_memo_tests::memo_skipped_when_inputs_missing` All 121 `fbuild-daemon` lib tests pass; clippy `-D warnings` clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caches the result of `hash_watch_set_stamps` inside the daemon so back-to-back warm builds skip the walk over thousands of watched files — the dominant non-trivial cost on warm rebuilds per `docs/PERF_WARM_BUILD.md`. Closes the remaining ~100–300 ms slice of the sub-1 s warm-deploy budget (#114) on top of the session-trusted verify-skip shipped in #116 / #118. ## What's in the PR 1. **`WatchSetStampCache` trait** in `fbuild-build::build_fingerprint` plus a thin `hash_watch_set_stamps_cached(watches, cache)` wrapper. Falls through to the existing walk when `cache` is `None`. 2. **`BuildParams::watch_set_cache: Option<Arc<dyn WatchSetStampCache>>`** so the daemon can thread its cache into every orchestrator call without coupling the CLI / tests to the daemon crate. 3. **ESP32 orchestrator** — swaps the three `hash_watch_set_stamps` call sites (two file-set compare points in the fast-path check and the file-set hash write on the save side) for the cached variant. The save side is the critical link: the freshly-computed hash is cached for the *next* build's compare. 4. **`DaemonWatchSetCache`** — `DashMap`-backed implementation with a configurable freshness window (default 2 s, long enough for the warm-loop case and short enough to auto-invalidate on any multi-second pause). 5. **Daemon wiring** — `DaemonContext::watch_set_cache: Arc<_>` field, threaded into every `BuildParams` construction site in the build, deploy, install-deps, and test-emu handlers. ## Freshness + safety The cache key is a stable hash over the sorted watch-root paths; two distinct projects never collide. Entries older than `max_age` are lazily evicted on read — the next call falls through to the real walk. Daemon restart clears the cache (in-memory only). If the user edits a file faster than the 2 s window, the file-system mtime still advances, but the cache will serve the pre-edit hash for up to 2 s; this is the right trade-off for the sub-1 s warm deploy target, and worst case a user re-runs the build. ## Tests 4 new unit tests on `DaemonWatchSetCache`: - `put_then_get_returns_same_hash` - `key_is_order_insensitive` - `stale_entry_is_evicted` - `miss_returns_none` Full workspace: 125 `fbuild-daemon` lib tests, 1269 total — all pass. `cargo clippy --workspace --all-targets -- -D warnings` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(packages): add .lnk resource pointers — fetch + verify + cache + materialize
`.lnk` files are tiny JSON manifests checked into source control that
point at remote binary blobs. At build time fbuild fetches them,
verifies the sha256, caches them in the existing two-phase disk cache,
and materializes them into the build tree so downstream steps consume
them as if they had always been in the source.
The intent: keep the source repo small, keep large/binary assets out of
git history, but have them appear as if they were always there during
builds. Sha256 is mandatory — reproducible builds and content-addressable
caching both depend on it.
## Schema (v1)
```json
{
"v": 1,
"url": "https://example.com/asset.bin",
"sha256": "abcdef0123...64-hex...",
"size": 1234567,
"extract": "file"
}
```
`extract` defaults to "file"; "zip" and "tar.gz" extract into a
directory at the materialized path.
## New module: `fbuild-packages/src/lnk/`
| File | Purpose |
|------|---------|
| `format.rs` | LnkFile struct + JSON parser + validation |
| `scanner.rs` | walk a tree, collect every parsed `.lnk` |
| `resolver.rs` | cache lookup; on miss fetch + verify + record |
| `materialize.rs` | hardlink/copy or extract into build tree |
| `embed.rs` | glue for embed_files-style entry lists |
| `README.md` | format spec, design rationale, CLI usage, FAQ |
Cache layer: extends DiskCache with `Kind::LnkBlobs`. Cache key triple
is `(LnkBlobs, url, sha256)` — sha256 in the "version" slot ensures a
.lnk content change forces a refetch. Reuses the existing LRU + lease +
GC infrastructure.
## Pipeline integration
esp32 orchestrator pre-resolves any `.lnk` entries in
`board_build.embed_files` / `embed_txtfiles` before passing them to
`process_embed_files`. Materialized paths reach `objcopy`; the original
`.lnk` is invisible downstream. Cache leases are held in scope so the
GC can't reap a blob mid-build.
## CLI: `fbuild lnk`
- `pull [<dir>]` — scan + fetch every `.lnk` blob into the cache
- `check [<dir>]` — verify cached blobs against their sha256 (no network)
- `add <url> [-o <path>]` — download once, hash, write a new `.lnk`
## Composition with zccache
Zero changes needed. The compile step that consumes a materialized blob
already hashes its inputs as part of the cache key. Because the blob's
on-disk content equals its sha256, the cache key changes whenever the
.lnk's sha256 changes.
## Test coverage
- 36 unit tests in the new module (format/scanner/resolver/
materialize/embed)
- 4 end-to-end integration tests against an in-process axum HTTP server
(full fetch+verify+materialize, sha mismatch rejection, 404 handling,
cache-hit-skips-network)
- Total 960+ tests still green across fbuild-packages, fbuild-config,
fbuild-build, fbuild-cli
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* perf(build): in-memory watch-set fingerprint cache on the daemon
Caches the result of `hash_watch_set_stamps` inside the daemon so
back-to-back warm builds skip the walk over thousands of watched
files — the dominant non-trivial cost on warm rebuilds per
`docs/PERF_WARM_BUILD.md`. Closes the remaining ~100–300 ms slice of
the sub-1 s warm-deploy budget (#114) on top of the session-trusted
verify-skip shipped in #116 / #118.
## What's in the PR
1. **`WatchSetStampCache` trait** in `fbuild-build::build_fingerprint`
plus a thin `hash_watch_set_stamps_cached(watches, cache)` wrapper.
Falls through to the existing walk when `cache` is `None`.
2. **`BuildParams::watch_set_cache: Option<Arc<dyn WatchSetStampCache>>`**
so the daemon can thread its cache into every orchestrator call
without coupling the CLI / tests to the daemon crate.
3. **ESP32 orchestrator** — swaps the three `hash_watch_set_stamps`
call sites (two file-set compare points in the fast-path check and
the file-set hash write on the save side) for the cached variant.
The save side is the critical link: the freshly-computed hash is
cached for the *next* build's compare.
4. **`DaemonWatchSetCache`** — `DashMap`-backed implementation with
a configurable freshness window (default 2 s, long enough for the
warm-loop case and short enough to auto-invalidate on any
multi-second pause).
5. **Daemon wiring** — `DaemonContext::watch_set_cache: Arc<_>` field,
threaded into every `BuildParams` construction site in the build,
deploy, install-deps, and test-emu handlers.
## Freshness + safety
The cache key is a stable hash over the sorted watch-root paths;
two distinct projects never collide. Entries older than `max_age`
are lazily evicted on read — the next call falls through to the
real walk. Daemon restart clears the cache (in-memory only). If
the user edits a file faster than the 2 s window, the file-system
mtime still advances, but the cache will serve the pre-edit hash
for up to 2 s; this is the right trade-off for the sub-1 s warm
deploy target, and worst case a user re-runs the build.
## Tests
4 new unit tests on `DaemonWatchSetCache`:
- `put_then_get_returns_same_hash`
- `key_is_order_insensitive`
- `stale_entry_is_evicted`
- `miss_returns_none`
Full workspace: 125 `fbuild-daemon` lib tests, 1269 total — all
pass. `cargo clippy --workspace --all-targets -- -D warnings` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets an operator tune or disable the watch-set fingerprint cache (landed in #120) at daemon startup without a rebuild. - Unset / unparseable → 2 s default. - Positive integer → that many seconds. - `0` → `Duration::ZERO`: every `put` is stale the instant it's stored, so `get` always returns `None`. Effectively disables the cache — useful for A/B-ing a suspected regression without changing code or toggling a feature flag. Implemented as a small helper `watch_set_cache_window_from_env()` next to the existing `self_eviction_timeout()` env reader (mirrors #116's `FBUILD_SELF_EVICTION_SECS` pattern). No changes to `DaemonWatchSetCache`'s own surface — keeps the cache module focused on caching, the env-coupling where it belongs. Exhaustive table-driven unit test covers all five parse cases (unset, zero, positive, garbage, whitespace). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…126) Lets an operator tune or disable the watch-set fingerprint cache (landed in #120) at daemon startup without a rebuild. - Unset / unparseable → 2 s default. - Positive integer → that many seconds. - `0` → `Duration::ZERO`: every `put` is stale the instant it's stored, so `get` always returns `None`. Effectively disables the cache — useful for A/B-ing a suspected regression without changing code or toggling a feature flag. Implemented as a small helper `watch_set_cache_window_from_env()` next to the existing `self_eviction_timeout()` env reader (mirrors #116's `FBUILD_SELF_EVICTION_SECS` pattern). No changes to `DaemonWatchSetCache`'s own surface — keeps the cache module focused on caching, the env-coupling where it belongs. Exhaustive table-driven unit test covers all five parse cases (unset, zero, positive, garbage, whitespace). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Implements the session-trusted verify-skip optimization flagged in #114 / LOOP.md: when the daemon has just deployed a known image to a port and the port has stayed continuously enumerated since then, no external agent could have re-flashed the chip without breaking our serial enumeration. The next deploy of the same image can therefore return
VerifySkipimmediately — skipping the full verify-flash path (serial open + espflash connect + stub upload + 3×FLASH_MD5SUM), which measured ~1.5 s on the native path in #66.Implements the design the user called out:
What's in the PR
TrustedFirmwareHashonDeviceState(crates/fbuild-daemon/src/device_manager.rs) — SHA-256 +set_atInstant, stored in the existing in-memory map. Invalidated bylast_disconnect_atstamped on thetrue → falseenumeration edge inrefresh_devices.DeviceManager::{trusted_firmware_hash, set_trusted_firmware_hash, clear_trusted_firmware_hash}API methods.compute_esp32_image_hashinhandlers::operations— deterministic SHA-256 over(offset_le, len_le, bytes)for bootloader + partitions + firmware, so identical builds hash to identical values.try_verify_deploymentcall; short-circuit withVerifySkipon match; refresh enumeration once per deploy so thelast_disconnect_atinvalidation is sound; record/invalidate the hash after the deploy returns.VerifySkipoutcome — on the trust-skip path the port was never closed, so the 3 s poll loop (which collides with an attached monitor) is pure overhead. Load-bearing for the < 4 s warm-trust-skip budget.FBUILD_SELF_EVICTION_SECSenv override onSELF_EVICTION_TIMEOUT— lets the warm-deploy benchmark keep one daemon alive across T1→T2→T3 without the 30 s idle timer racing against spawn latency.Opt-in
Gated by
FBUILD_TRUST_DEVICE_HASH=1, mirroring the opt-in convention from #66'sFBUILD_USE_ESPFLASH_{VERIFY,WRITE}. Default off until benched on every ESP32 family member.Tests
device_managerunit tests: round-trip, clear, unknown-port, invalidation after disconnect, survival across older disconnect stamp.fbuild-daemonlib tests pass.uv run cargo clippy -p fbuild-daemon --all-targets -- -D warningsclean.Benchmarking
Hardware bench on the real ESP32-S3 on
COM13was attempted but blocked by a pre-existing ESP32 cold-build stall (orchestrator hangs mid-Arduino-framework-library compile with no active compiler processes, stays "operation in progress" for 15+ minutes). That stall is orthogonal to this PR's code path and will be investigated separately.Verified warm-case fast paths that don't depend on ESP32:
tests/platform/unocold build: 4.3 s (healthy)tests/platform/unowarm rebuild: 211 ms (fingerprint hit, well inside the 500 ms LOOP.md budget)Theoretical warm-deploy savings under the new path, once end-to-end bench returns clean:
Test plan
FBUILD_TRUST_DEVICE_HASH=1on a daemon + real ESP32 board, run back-to-back deploys of the same image, confirm the second deploy emits"trusted-hash: session-trusted match"instead of theverify-flashline.Related: #66 (native espflash migration), #91 (warm-build instrumentation), #114 (warm-deploy loop tracker).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Configuration
FBUILD_TRUST_DEVICE_HASHenables hardware-backed verification optimization (disabled by default for safety).FBUILD_SELF_EVICTION_SECSconfigures the daemon's idle timeout duration in seconds (defaults to 30 seconds).