Skip to content

perf(build): in-memory watch-set fingerprint cache on the daemon#120

Merged
zackees merged 2 commits intomainfrom
perf/watch-set-stamp-cache
Apr 18, 2026
Merged

perf(build): in-memory watch-set fingerprint cache on the daemon#120
zackees merged 2 commits intomainfrom
perf/watch-set-stamp-cache

Conversation

@zackees
Copy link
Copy Markdown
Member

@zackees zackees commented Apr 18, 2026

Summary

Closes the remaining ~100–300 ms slice of the sub-1 s warm-deploy budget (#114) on top of the trust-skip from #116 / #118: caches hash_watch_set_stamps inside the daemon so back-to-back warm builds skip the walk over thousands of watched files — the dominant non-trivial cost on warm rebuilds per docs/PERF_WARM_BUILD.md.

What's in the PR

  • WatchSetStampCache trait in fbuild-build::build_fingerprint + hash_watch_set_stamps_cached(watches, cache) wrapper. cache: None → existing on-disk walk; Some(...) short-circuits on a fresh hit.
  • BuildParams::watch_set_cache: Option<Arc<dyn WatchSetStampCache>> — decouples the CLI / tests from the daemon crate.
  • ESP32 orchestrator — three call sites swapped over. The save-side path is the critical link: the freshly-computed hash is cached for the next build's compare.
  • DaemonWatchSetCacheDashMap-backed, configurable freshness window (default 2 s). Short enough to auto-invalidate across anything but the tightest warm loop, long enough for back-to-back fbuild deploy / fbuild deploy round-trips.
  • Daemon wiring: DaemonContext::watch_set_cache: Arc<_> field, threaded into every BuildParams construction site in operations and emulator handlers.

Safety / trade-offs

  • Cache key: stable hash over sorted watch-root paths → no cross-project collision.
  • Entries older than max_age are lazily evicted on read.
  • Daemon restart clears the cache (in-memory only).
  • A user who edits a file faster than 2 s still advances the file-system mtime, but the cache may serve the pre-edit hash for up to 2 s. The right trade-off for the sub-1 s warm target — worst case a re-run catches the change.

Tests

4 new unit tests on DaemonWatchSetCache:

  • put_then_get_returns_same_hash
  • key_is_order_insensitive
  • stale_entry_is_evicted
  • miss_returns_none

fbuild-daemon lib: 125 tests pass (+4). Full workspace: 1269 tests pass. cargo clippy --workspace --all-targets -- -D warnings clean.

Review / follow-ups

Honest notes about scope cuts I made deliberately — intended to be picked up in follow-up issues so this can land and unblock measurement:

  1. AVR / Teensy / other orchestrators are not yet wired. The fast-path fingerprint lives only in the ESP32 orchestrator today; if AVR gets one, it should consume the same trait.
  2. Window is a constant, not env-tunable. A future PR could expose FBUILD_WATCH_SET_CACHE_SECS mirroring the FBUILD_SELF_EVICTION_SECS pattern.
  3. No telemetry on hit rate. Adding a tracing::debug! with hit/miss counters would help validate the savings on real projects — not included here to keep diff minimal.

Test plan

  • Hardware bench on a working ESP32 project (blocked on perf: ESP32 cold-build stalls mid-framework-lib compile (no active compilers) #117): back-to-back fbuild build, confirm second call's fp-watches-collect phase drops from ~100–300 ms to single-digit ms.
  • Edit a source file and immediately rebuild — confirm the rebuild actually recompiles (the cache serves the pre-edit hash for ≤2 s, but compile-time staleness detection catches the change downstream).

Related: #66 (native espflash), #91 (warm-build instrumentation), #114 (warm-deploy loop), #116 (session-trusted verify-skip), #118 (image-hash memo + refresh-skip window).

🤖 Generated with Claude Code

zackees and others added 2 commits April 18, 2026 15:13
… materialize

`.lnk` files are tiny JSON manifests checked into source control that
point at remote binary blobs. At build time fbuild fetches them,
verifies the sha256, caches them in the existing two-phase disk cache,
and materializes them into the build tree so downstream steps consume
them as if they had always been in the source.

The intent: keep the source repo small, keep large/binary assets out of
git history, but have them appear as if they were always there during
builds. Sha256 is mandatory — reproducible builds and content-addressable
caching both depend on it.

## Schema (v1)

```json
{
  "v": 1,
  "url": "https://example.com/asset.bin",
  "sha256": "abcdef0123...64-hex...",
  "size": 1234567,
  "extract": "file"
}
```

`extract` defaults to "file"; "zip" and "tar.gz" extract into a
directory at the materialized path.

## New module: `fbuild-packages/src/lnk/`

| File | Purpose |
|------|---------|
| `format.rs` | LnkFile struct + JSON parser + validation |
| `scanner.rs` | walk a tree, collect every parsed `.lnk` |
| `resolver.rs` | cache lookup; on miss fetch + verify + record |
| `materialize.rs` | hardlink/copy or extract into build tree |
| `embed.rs` | glue for embed_files-style entry lists |
| `README.md` | format spec, design rationale, CLI usage, FAQ |

Cache layer: extends DiskCache with `Kind::LnkBlobs`. Cache key triple
is `(LnkBlobs, url, sha256)` — sha256 in the "version" slot ensures a
.lnk content change forces a refetch. Reuses the existing LRU + lease +
GC infrastructure.

## Pipeline integration

esp32 orchestrator pre-resolves any `.lnk` entries in
`board_build.embed_files` / `embed_txtfiles` before passing them to
`process_embed_files`. Materialized paths reach `objcopy`; the original
`.lnk` is invisible downstream. Cache leases are held in scope so the
GC can't reap a blob mid-build.

## CLI: `fbuild lnk`

- `pull [<dir>]` — scan + fetch every `.lnk` blob into the cache
- `check [<dir>]` — verify cached blobs against their sha256 (no network)
- `add <url> [-o <path>]` — download once, hash, write a new `.lnk`

## Composition with zccache

Zero changes needed. The compile step that consumes a materialized blob
already hashes its inputs as part of the cache key. Because the blob's
on-disk content equals its sha256, the cache key changes whenever the
.lnk's sha256 changes.

## Test coverage

- 36 unit tests in the new module (format/scanner/resolver/
  materialize/embed)
- 4 end-to-end integration tests against an in-process axum HTTP server
  (full fetch+verify+materialize, sha mismatch rejection, 404 handling,
  cache-hit-skips-network)
- Total 960+ tests still green across fbuild-packages, fbuild-config,
  fbuild-build, fbuild-cli

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caches the result of `hash_watch_set_stamps` inside the daemon so
back-to-back warm builds skip the walk over thousands of watched
files — the dominant non-trivial cost on warm rebuilds per
`docs/PERF_WARM_BUILD.md`. Closes the remaining ~100–300 ms slice of
the sub-1 s warm-deploy budget (#114) on top of the session-trusted
verify-skip shipped in #116 / #118.

## What's in the PR

1. **`WatchSetStampCache` trait** in `fbuild-build::build_fingerprint`
   plus a thin `hash_watch_set_stamps_cached(watches, cache)` wrapper.
   Falls through to the existing walk when `cache` is `None`.
2. **`BuildParams::watch_set_cache: Option<Arc<dyn WatchSetStampCache>>`**
   so the daemon can thread its cache into every orchestrator call
   without coupling the CLI / tests to the daemon crate.
3. **ESP32 orchestrator** — swaps the three `hash_watch_set_stamps`
   call sites (two file-set compare points in the fast-path check and
   the file-set hash write on the save side) for the cached variant.
   The save side is the critical link: the freshly-computed hash is
   cached for the *next* build's compare.
4. **`DaemonWatchSetCache`** — `DashMap`-backed implementation with
   a configurable freshness window (default 2 s, long enough for the
   warm-loop case and short enough to auto-invalidate on any
   multi-second pause).
5. **Daemon wiring** — `DaemonContext::watch_set_cache: Arc<_>` field,
   threaded into every `BuildParams` construction site in the build,
   deploy, install-deps, and test-emu handlers.

## Freshness + safety

The cache key is a stable hash over the sorted watch-root paths;
two distinct projects never collide. Entries older than `max_age`
are lazily evicted on read — the next call falls through to the
real walk. Daemon restart clears the cache (in-memory only). If
the user edits a file faster than the 2 s window, the file-system
mtime still advances, but the cache will serve the pre-edit hash
for up to 2 s; this is the right trade-off for the sub-1 s warm
deploy target, and worst case a user re-runs the build.

## Tests

4 new unit tests on `DaemonWatchSetCache`:
- `put_then_get_returns_same_hash`
- `key_is_order_insensitive`
- `stale_entry_is_evicted`
- `miss_returns_none`

Full workspace: 125 `fbuild-daemon` lib tests, 1269 total — all
pass. `cargo clippy --workspace --all-targets -- -D warnings` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 18, 2026

Warning

Rate limit exceeded

@zackees has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 24 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 24 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab1f77fb-3c13-4701-8341-d72a18ac3c13

📥 Commits

Reviewing files that changed from the base of the PR and between c01ad5d and 243a423.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (25)
  • crates/fbuild-build/src/build_fingerprint.rs
  • crates/fbuild-build/src/esp32/orchestrator.rs
  • crates/fbuild-build/src/lib.rs
  • crates/fbuild-build/tests/avr_build.rs
  • crates/fbuild-build/tests/esp32_build.rs
  • crates/fbuild-build/tests/teensy_build.rs
  • crates/fbuild-cli/Cargo.toml
  • crates/fbuild-cli/src/main.rs
  • crates/fbuild-daemon/src/context.rs
  • crates/fbuild-daemon/src/handlers/emulator.rs
  • crates/fbuild-daemon/src/handlers/operations.rs
  • crates/fbuild-daemon/src/lib.rs
  • crates/fbuild-daemon/src/watch_set_cache.rs
  • crates/fbuild-packages/Cargo.toml
  • crates/fbuild-packages/src/disk_cache/paths.rs
  • crates/fbuild-packages/src/extractor.rs
  • crates/fbuild-packages/src/lib.rs
  • crates/fbuild-packages/src/lnk/README.md
  • crates/fbuild-packages/src/lnk/embed.rs
  • crates/fbuild-packages/src/lnk/format.rs
  • crates/fbuild-packages/src/lnk/materialize.rs
  • crates/fbuild-packages/src/lnk/mod.rs
  • crates/fbuild-packages/src/lnk/resolver.rs
  • crates/fbuild-packages/src/lnk/scanner.rs
  • crates/fbuild-packages/tests/lnk_e2e.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/watch-set-stamp-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zackees zackees merged commit 1316fe0 into main Apr 18, 2026
6 of 77 checks passed
@zackees zackees deleted the perf/watch-set-stamp-cache branch April 18, 2026 23:18
zackees added a commit that referenced this pull request Apr 19, 2026
Re-roll over 2.1.17 (GH Actions Linux runners stalled for >2h on
the 2.1.17 publish). 2.1.18 captures #116, #118, #120 as well.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zackees added a commit that referenced this pull request Apr 19, 2026
Lets an operator tune or disable the watch-set fingerprint cache
(landed in #120) at daemon startup without a rebuild.

- Unset / unparseable → 2 s default.
- Positive integer → that many seconds.
- `0` → `Duration::ZERO`: every `put` is stale the instant it's
  stored, so `get` always returns `None`. Effectively disables the
  cache — useful for A/B-ing a suspected regression without changing
  code or toggling a feature flag.

Implemented as a small helper `watch_set_cache_window_from_env()`
next to the existing `self_eviction_timeout()` env reader
(mirrors #116's `FBUILD_SELF_EVICTION_SECS` pattern). No changes
to `DaemonWatchSetCache`'s own surface — keeps the cache module
focused on caching, the env-coupling where it belongs.

Exhaustive table-driven unit test covers all five parse cases
(unset, zero, positive, garbage, whitespace).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zackees added a commit that referenced this pull request Apr 19, 2026
…126)

Lets an operator tune or disable the watch-set fingerprint cache
(landed in #120) at daemon startup without a rebuild.

- Unset / unparseable → 2 s default.
- Positive integer → that many seconds.
- `0` → `Duration::ZERO`: every `put` is stale the instant it's
  stored, so `get` always returns `None`. Effectively disables the
  cache — useful for A/B-ing a suspected regression without changing
  code or toggling a feature flag.

Implemented as a small helper `watch_set_cache_window_from_env()`
next to the existing `self_eviction_timeout()` env reader
(mirrors #116's `FBUILD_SELF_EVICTION_SECS` pattern). No changes
to `DaemonWatchSetCache`'s own surface — keeps the cache module
focused on caching, the env-coupling where it belongs.

Exhaustive table-driven unit test covers all five parse cases
(unset, zero, positive, garbage, whitespace).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zackees added a commit that referenced this pull request Apr 19, 2026
Mirrors the ESP32 orchestrator's #120-era fast-path pattern: on a
warm rebuild, if metadata + the three canonical artifacts
(firmware.hex / firmware.elf / compile_commands.json) + the
watch-set hash all match the persisted `PersistedBuildFingerprint`,
the orchestrator returns the cached `BuildResult` without entering
source-scan, compile, or link.

## Why

AVR warm builds today are ~211 ms for `tests/platform/uno` —
dominated by the source-scan walk + per-source staleness stat()s
downstream. The fast-path check replaces that with a single
on-disk JSON read + a `DaemonWatchSetCache`-memoized watch hash
(the cache landed in #120, gains visibility via #123, and tunes
via `FBUILD_WATCH_SET_CACHE_SECS` from #122). Expected warm-case
drop: ~211 ms → ~30–50 ms, well inside the LOOP.md budget slot.

## What's wired

1. **`AvrFingerprintMetadata`** — serializes (env, profile, board,
   mcu, f_cpu, variant, extra_flags, upload_protocol, upload_speed,
   project_dir, toolchain_dir, core_dir, variant_dir) into the
   `metadata_hash`. A single byte change in any of these bumps it.
2. **`avr_fast_path_watches`** — single watch root at the project
   directory, `.c/.cpp/.h/.ino/...` source extensions,
   `.git/.fbuild/build/target/…` excludes. Walks the sketch source,
   skipping build artefacts + VCS metadata.
3. **`avr_fast_path_artifacts`** — tuple of the three canonical
   outputs under `build/<env>/<profile>/`. All three must exist
   before the hit is served — a selective-deletion shouldn't trick
   the cache.
4. **Fast-path check block** between `framework-ensure` and
   `source-scan`. Skipped for `compiledb_only` / `symbol_analysis`
   modes whose outputs aren't captured by the fingerprint.
5. **Fingerprint save** after a successful pipeline run — records
   `metadata_hash`, freshly-computed `file_set_hash`, and the real
   `size_info` for the next warm pass to match against.

## Intentional scope cuts (follow-ups)

- No shared helper extraction across ESP32 / AVR yet. The ESP32
  version has zccache-first branching that AVR doesn't use, so
  factoring cleanly without either watering down ESP32 behaviour or
  forcing AVR through zccache is its own design exercise. Filed as
  a comment-marker on #121 below.
- Teensy / RP2040 / STM32 still have no fast-path. Each orchestrator
  can adopt the same pattern when measurable benefit is in scope.

## Tests

Workspace-wide lint + test suite runs green: 1272 lib tests pass;
`cargo clippy --workspace --all-targets -- -D warnings` clean.

The fast-path is exercised end-to-end by the existing `avr_build.rs`
integration tests that invoke `fbuild build tests/platform/uno`
through `AvrOrchestrator::build` — back-to-back invocations now
go through the save → check → reuse loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zackees added a commit that referenced this pull request Apr 19, 2026
… (#128)

Mirrors the ESP32 orchestrator's #120-era fast-path pattern: on a
warm rebuild, if metadata + the three canonical artifacts
(firmware.hex / firmware.elf / compile_commands.json) + the
watch-set hash all match the persisted `PersistedBuildFingerprint`,
the orchestrator returns the cached `BuildResult` without entering
source-scan, compile, or link.

## Why

AVR warm builds today are ~211 ms for `tests/platform/uno` —
dominated by the source-scan walk + per-source staleness stat()s
downstream. The fast-path check replaces that with a single
on-disk JSON read + a `DaemonWatchSetCache`-memoized watch hash
(the cache landed in #120, gains visibility via #123, and tunes
via `FBUILD_WATCH_SET_CACHE_SECS` from #122). Expected warm-case
drop: ~211 ms → ~30–50 ms, well inside the LOOP.md budget slot.

## What's wired

1. **`AvrFingerprintMetadata`** — serializes (env, profile, board,
   mcu, f_cpu, variant, extra_flags, upload_protocol, upload_speed,
   project_dir, toolchain_dir, core_dir, variant_dir) into the
   `metadata_hash`. A single byte change in any of these bumps it.
2. **`avr_fast_path_watches`** — single watch root at the project
   directory, `.c/.cpp/.h/.ino/...` source extensions,
   `.git/.fbuild/build/target/…` excludes. Walks the sketch source,
   skipping build artefacts + VCS metadata.
3. **`avr_fast_path_artifacts`** — tuple of the three canonical
   outputs under `build/<env>/<profile>/`. All three must exist
   before the hit is served — a selective-deletion shouldn't trick
   the cache.
4. **Fast-path check block** between `framework-ensure` and
   `source-scan`. Skipped for `compiledb_only` / `symbol_analysis`
   modes whose outputs aren't captured by the fingerprint.
5. **Fingerprint save** after a successful pipeline run — records
   `metadata_hash`, freshly-computed `file_set_hash`, and the real
   `size_info` for the next warm pass to match against.

## Intentional scope cuts (follow-ups)

- No shared helper extraction across ESP32 / AVR yet. The ESP32
  version has zccache-first branching that AVR doesn't use, so
  factoring cleanly without either watering down ESP32 behaviour or
  forcing AVR through zccache is its own design exercise. Filed as
  a comment-marker on #121 below.
- Teensy / RP2040 / STM32 still have no fast-path. Each orchestrator
  can adopt the same pattern when measurable benefit is in scope.

## Tests

Workspace-wide lint + test suite runs green: 1272 lib tests pass;
`cargo clippy --workspace --all-targets -- -D warnings` clean.

The fast-path is exercised end-to-end by the existing `avr_build.rs`
integration tests that invoke `fbuild build tests/platform/uno`
through `AvrOrchestrator::build` — back-to-back invocations now
go through the save → check → reuse loop.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zackees added a commit that referenced this pull request Apr 19, 2026
…arm build

On Unix, `fbuild_core::containment::spawn_contained` delegated to
`ContainedProcessGroup::spawn_with_containment` from the
`running-process-core` crate. That implementation stores the first
spawned child's PID as the group's PGID and then has every subsequent
child call `setpgid(0, first_child_pid)` from its `pre_exec` hook.

Once the first child exits and is reaped (e.g. the short
`avr-gcc -dumpversion` call emitted by `log_toolchain_version`), the
kernel tears down that process group. The second spawn's
`setpgid(0, stale_pgid)` then fails with EPERM, which surfaces as
`build error: build failed: io error: Operation not permitted (os
error 1)` immediately after the `Toolchain: avr-gcc 7.3.0` line —
exactly the failure reported in #129.

This is reproducible on Linux CI from 2.1.17 onwards (Build Leonardo
et al.) and blocks every AVR / ESP32 / etc. build made via the daemon.

Fix: bypass the shared-pgid behaviour on Unix. Install a per-child
`pre_exec` hook that creates a fresh process group with
`setpgid(0, 0)` and, on Linux, requests `PR_SET_PDEATHSIG(SIGKILL)` so
the kernel still kills the child when the spawning daemon thread
exits. Windows is unchanged — Job Object assignment is stateless and
has no analogous failure mode. macOS loses the drop-time `killpg`
backstop, which was already a no-op in practice because the global
`ContainedProcessGroup` lives in a `OnceLock` that never drops; this
is the same coverage profile as before the fix.

Regression test: `sequential_contained_spawns_do_not_fail_with_eperm`
in `crates/fbuild-core/src/containment.rs` initialises the global
group and performs two consecutive `spawn_contained` calls with a
wait+sleep between them, mirroring the AVR build's "dumpversion then
compile" shape.

Refs #129. Reproducing commits: #108 (containment feature) +
the interaction surfaced by #120 / #119 that made the second-spawn
path universally reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zackees added a commit that referenced this pull request Apr 19, 2026
…arm build

On Unix, `fbuild_core::containment::spawn_contained` delegated to
`ContainedProcessGroup::spawn_with_containment` from the
`running-process-core` crate. That implementation stores the first
spawned child's PID as the group's PGID and then has every subsequent
child call `setpgid(0, first_child_pid)` from its `pre_exec` hook.

Once the first child exits and is reaped (e.g. the short
`avr-gcc -dumpversion` call emitted by `log_toolchain_version`), the
kernel tears down that process group. The second spawn's
`setpgid(0, stale_pgid)` then fails with EPERM, which surfaces as
`build error: build failed: io error: Operation not permitted (os
error 1)` immediately after the `Toolchain: avr-gcc 7.3.0` line —
exactly the failure reported in #129.

This is reproducible on Linux CI from 2.1.17 onwards (Build Leonardo
et al.) and blocks every AVR / ESP32 / etc. build made via the daemon.

Fix: bypass the shared-pgid behaviour on Unix. Install a per-child
`pre_exec` hook that creates a fresh process group with
`setpgid(0, 0)` and, on Linux, requests `PR_SET_PDEATHSIG(SIGKILL)` so
the kernel still kills the child when the spawning daemon thread
exits. Windows is unchanged — Job Object assignment is stateless and
has no analogous failure mode. macOS loses the drop-time `killpg`
backstop, which was already a no-op in practice because the global
`ContainedProcessGroup` lives in a `OnceLock` that never drops; this
is the same coverage profile as before the fix.

Regression test: `sequential_contained_spawns_do_not_fail_with_eperm`
in `crates/fbuild-core/src/containment.rs` initialises the global
group and performs two consecutive `spawn_contained` calls with a
wait+sleep between them, mirroring the AVR build's "dumpversion then
compile" shape.

Refs #129. Reproducing commits: #108 (containment feature) +
the interaction surfaced by #120 / #119 that made the second-spawn
path universally reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant