perf(library-select): #205 follow-up — parallelize BFS walker and memoize Pass 1 scans across Pass 2

## Summary

The LDF implementation shipped under #205 / #214 is correct and the warm-cache path is healthy, but the **cold scan** leaves real performance on the table:

1. The transitive include walker (`fbuild-header-scan::walker::walk`) is single-threaded BFS — no rayon, no concurrent I/O.
2. The fixed-point resolver (`fbuild-library-select::resolve`) calls `walk()` twice; Pass 2 re-reads and re-scans every file Pass 1 already touched, because `visited` is local to each `walk()` invocation.
3. No `tracing` spans wrap the scan/walk, so regressions are invisible without external profiling.
4. Search-path lookup is `O(search_paths)` per `#include` per library.

None of this breaks the #205 acceptance criteria today (cold scan still hits the ≤ 200 ms gate for teensy41), but on bigger FastLED examples / future ESP32 wiring the cold path is the one that bites users.

## Observations

### 1. Walker is fully sequential

`crates/fbuild-header-scan/src/walker.rs:29-67`

```rust
while let Some(file) = queue.pop_front() {
    let Ok(text) = std::fs::read_to_string(&file) else { continue; };
    for inc in scan(&text) { ... }
}
```

`VecDeque::pop_front` + blocking `read_to_string` per file. For a Teensy 4.1 sketch the BFS frontier easily reaches dozens of headers per level. `fbuild-header-scan/Cargo.toml` has no `rayon` / `crossbeam` / async runtime dep — scanner throughput (≥ 50 MB/s/thread per `benches/scan_throughput.rs`) goes unused.

### 2. Pass 2 throws away Pass 1's work

`crates/fbuild-library-select/src/lib.rs:85-129`

Pass 1 (`let res = walk(seeds, ...)`) and Pass 2 (`walk(&recon_seeds, ...)` inside `loop`) each construct their own `visited` set. Pass 2's `recon_seeds` are `seeds ∪ selected_libs' source_files`, so every header reached in Pass 1 is re-read + re-scanned in Pass 2. On a converged resolution this is roughly **2× the I/O and 2× the scanner work** for the same answer.

### 3. No tracing instrumentation

No `tracing::info_span!(\"ldf_walk\")` or `#[tracing::instrument]` in `walker.rs` or `lib.rs`. Adding spans around `walk()` and the Pass 2 loop would expose per-iteration timing in the daemon's existing log stream without external profilers.

### 4. Linear search-path scan per include

`crates/fbuild-header-scan/src/walker.rs:69-85`

```rust
for sp in search_paths {
    let candidate = sp.join(&inc.path);
    if candidate.is_file() { return Some(candidate); }
}
```

Each `#include` does up to `search_paths.len()` filesystem stat calls. With ~30 Teensy libraries each contributing 1–2 include dirs, that's 30–60 stats × (every #include in every file). The hot misses (e.g. unresolved system headers) are the worst because they walk the full list.

## Suggested scope (no design lock-in — pick what fits)

### A. Rayon-parallelize the walker frontier (biggest win)

- Process the BFS frontier in waves: each iteration pulls everything currently in the queue, fans out across rayon to read+scan in parallel, then merges the new edges back into a shared visited set guarded by a single `Mutex` (only locked at merge boundaries, not during scan).
- Keeps the BFS-order semantics; only the within-wave work runs concurrently. Scanner stays pure (no `Send`/`Sync` plumbing needed beyond what it already has).
- Target: ≥ 4× cold-scan speedup on an 8-core CI runner for the Teensy 4.1 corpus (~500 sources).

### B. Memoize scan results across Pass 1 and Pass 2

- Lift `visited` and a `HashMap<PathBuf, Vec<IncludeRef>>` cache out of `walk()` into `resolve()`. Pass `&mut WalkState` (visited + scan_cache) into a new `walk_with_state()` so Pass 2's recon walk reuses everything Pass 1 already learned.
- No behavior change — the merged include set is the union either way.
- Expected: ~halves cold-scan time on multi-pass resolutions (which is the common case once a library is selected).

### C. Add tracing spans

- `#[tracing::instrument(skip_all, fields(seeds = seeds.len(), search_paths = search_paths.len()))]` on `walk()` and `resolve()`.
- A `tracing::info_span!(\"ldf_pass\", pass = i)` around each Pass 2 iteration.
- Visible in the existing daemon log without new wiring.

### D. Precompute a header-name index per library (lower priority)

- Build a `HashMap<&str /* basename */, Vec<&FrameworkLibrary>>` once per resolve; `resolve(inc, …)` consults it before the linear search-path scan.
- Mostly speeds up unresolved-include misses (the dominant case on framework header walks). Skippable if A+B land.

### E. CI perf gate using existing benches

- `benches/resolve_cold.rs` and `benches/scan_throughput.rs` exist but aren't gated. Wire criterion-baseline into the workflow so regressions on cold scan / scan throughput fail PRs.

## Acceptance criteria

1. Cold `resolve()` on `teensy41` + Blink ≤ **100 ms** on CI runner (currently ≤ 200 ms gate per #205 AC#6).
2. Cold `resolve()` on `teensy41` Audio-shield example (larger transitive set) shows ≥ 2× speedup over current master.
3. Single-pass walk reads each file exactly once across Pass 1 + Pass 2 (assert in a test via a counting `Read` shim, or check via tracing-test).
4. Warm `resolve_cached()` is unchanged (still a single bincode deserialize from `zccache_artifact::KvStore`).
5. `tracing::Subscriber` capture shows `ldf_walk` and `ldf_pass` spans with durations.
6. New criterion baseline check in CI for `resolve_cold` and `scan_throughput`.

## Out of scope

- mmap-based reads (`memmap2`). Files are small (< 64 KB headers); the win is in parallelism + memoization, not read syscall path.
- Replacing `BTreeSet`/`HashSet` with `DashSet`. The wave-merge boundary is fine with a `Mutex` and avoids `DashSet`'s lookup overhead.
- ESP32 / AVR orchestrator wiring — separate task (out of scope per #214).
- Eviction policy for the library-selection KvStore. The existing zccache eviction handles it.

## Related

- #205 — original LDF design + acceptance gates this builds on
- #214 — production wiring of `resolve_cached` (teensy, stm32 orchestrators)
- `crates/fbuild-header-scan/benches/scan_throughput.rs`, `crates/fbuild-library-select/benches/resolve_{cold,warm}.rs` — existing perf harness
- FastLED/FastLED#2406 — consumer-side perf is one of the user-visible benefits

## Files touched (likely)

- `crates/fbuild-header-scan/src/walker.rs` — add `walk_with_state`, rayon wave-fan-out
- `crates/fbuild-header-scan/Cargo.toml` — add `rayon` (workspace dep)
- `crates/fbuild-library-select/src/lib.rs:57-157` — thread `WalkState` through resolve, add tracing spans
- `.github/workflows/ci.yml` (or equivalent) — gate `resolve_cold` / `scan_throughput` benches

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(library-select): #205 follow-up — parallelize BFS walker and memoize Pass 1 scans across Pass 2 #236

Summary

Observations

1. Walker is fully sequential

2. Pass 2 throws away Pass 1's work

3. No tracing instrumentation

4. Linear search-path scan per include

Suggested scope (no design lock-in — pick what fits)

A. Rayon-parallelize the walker frontier (biggest win)

B. Memoize scan results across Pass 1 and Pass 2

C. Add tracing spans

D. Precompute a header-name index per library (lower priority)

E. CI perf gate using existing benches

Acceptance criteria

Out of scope

Related

Files touched (likely)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf(library-select): #205 follow-up — parallelize BFS walker and memoize Pass 1 scans across Pass 2 #236

Description

Summary

Observations

1. Walker is fully sequential

2. Pass 2 throws away Pass 1's work

3. No tracing instrumentation

4. Linear search-path scan per include

Suggested scope (no design lock-in — pick what fits)

A. Rayon-parallelize the walker frontier (biggest win)

B. Memoize scan results across Pass 1 and Pass 2

C. Add tracing spans

D. Precompute a header-name index per library (lower priority)

E. CI perf gate using existing benches

Acceptance criteria

Out of scope

Related

Files touched (likely)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions