Skip to content

perf(library-select): #236 parallel BFS walker + scan memoization + tracing spans#237

Merged
zackees merged 1 commit into
mainfrom
perf/issue-236-parallel-ldf-walker
May 12, 2026
Merged

perf(library-select): #236 parallel BFS walker + scan memoization + tracing spans#237
zackees merged 1 commit into
mainfrom
perf/issue-236-parallel-ldf-walker

Conversation

@zackees
Copy link
Copy Markdown
Member

@zackees zackees commented May 12, 2026

Summary

Closes #236 (proposals A, B, C).

The LDF resolver shipped under #205 was correct, but the cold scan left real performance on the table:

  • The walker was single-threaded BFS — one std::fs::read_to_string at a time.
  • Pass 2's reconciliation walk re-read every file Pass 1 already touched, because the visited set was local to each walk() invocation.
  • No tracing spans, so per-pass regressions were invisible without external profiling.

This PR closes all three.

TDD process (RED → GREEN)

Two failing tests went in first (crates/fbuild-library-select/tests/perf_tdd.rs):

  • pass2_reuses_pass1_scan_results_no_re_reads — builds a scenario where Wire is only reachable through SPI.cpp (forces a 2-pass reconciliation), then asserts files_read == included_files.len(). Without memoization this assertion fails by a factor of ~2×.
  • resolve_emits_ldf_pass_and_ldf_walk_spans — uses tracing-test::traced_test to confirm both span names appear in the captured log.

Both tests failed to compile against main (no resolve_with_stats, no WalkState) — that's the RED gate. The implementation that follows makes them pass.

Implementation

fbuild-header-scan

  • New WalkState { visited, scan_cache, files_read } and walk_with_state(seeds, search_paths, &mut state).
  • BFS proceeds in waves: each wave reads not-yet-cached files in parallel via rayon::par_iter().filter_map(...).collect(), merges them serially into the scan cache + bumps files_read, then resolves every include and queues the next wave.
  • walk() stays a thin wrapper around a fresh state for one-shot callers — all 8 existing walker tests pass unchanged.
  • walk_with_state carries a #[tracing::instrument(name = \"ldf_walk\", ...)] attribute.

fbuild-library-select

  • New ResolveStats { files_read, passes } and resolve_with_stats(). resolve() now delegates to resolve_with_stats(...).0.
  • A single WalkState is threaded through Pass 1 and the reconciliation loop. Each pass runs inside an info_span!(\"ldf_pass\", pass = N).
  • Library-attribution against the per-pass delta is equivalent to the old full-set check, because a library can only become newly-selected via a path reached for the first time in this pass — paths reached in earlier passes already had their lib-attribution chance.

Measured performance

uv run soldr cargo bench -p fbuild-library-select -- --quick:

Bench Δ time Δ throughput
resolve/cold_30_libs_chain_5 -35.4 % (3.27 ms) +54.8 %
resolve/warm_30_libs_chain_5 -26.7 % (1.17 ms) +36.5 %

Cold-resolve gain is dominated by parallel file reads + the deduplicated Pass 1/Pass 2 scans; warm-resolve gain comes from the faster cache-key construction path that benefits from the same memoization.

Test plan

  • uv run soldr cargo test -p fbuild-library-select --test perf_tdd (2 passed — the new TDD gates)
  • uv run soldr cargo test -p fbuild-header-scan -p fbuild-library-select (51 + 19 passed — full existing coverage)
  • uv run soldr cargo test -p fbuild-build --lib (499 passed — orchestrator consumers)
  • uv run soldr cargo clippy --workspace --all-targets -- -D warnings (clean)
  • uv run soldr cargo fmt --all (clean)
  • uv run soldr cargo bench -p fbuild-library-select -- --quick (35% / 27% improvements documented above)
  • Production cold-scan measurement on a real teensy41 FastLED project (best done by hand or in a follow-up after merge — would also let us tune AC#1 ≤ 100 ms from perf(library-select): #205 follow-up — parallelize BFS walker and memoize Pass 1 scans across Pass 2 #236).

Out of scope (deferred to follow-up issues)

  • Proposal D — header-name precompute index for the search-path scan.
  • Proposal E — CI gates that fail PRs on resolve_cold / scan_throughput regressions (the benches exist; only the workflow wiring is missing).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added stateful header scanning API with memoized file caching for improved performance across multiple passes.
    • Introduced resolver statistics reporting to track file reads and pass counts.
  • Documentation

    • Added test documentation for integration test scope and performance gates.
  • Tests

    • Added performance and contract verification tests for multi-pass resolution and tracing observability.

Review Change Stack

…oization + tracing spans

The LDF resolver shipped under #205 was correct but left real perf on the
table: the walker was single-threaded and Pass 2 re-read every file Pass 1
had already touched. This PR closes both gaps.

Changes:
- fbuild-header-scan: add `WalkState` (visited + scan cache + files_read
  counter) and `walk_with_state()`. BFS now reads each wave's files in
  parallel via rayon, and the scan cache persists across calls so callers
  that walk multiple seed sets only pay for each file once. `walk()` stays
  a thin wrapper for one-shot callers. `walk_with_state` is wrapped in an
  `ldf_walk` tracing span.
- fbuild-library-select: add `ResolveStats { files_read, passes }` and
  `resolve_with_stats()`. `resolve()` now delegates. A single `WalkState`
  is threaded through Pass 1 and the reconciliation loop, and each pass
  runs inside an `ldf_pass` span. Library-attribution against the
  per-pass delta is equivalent to the old full-set check because a lib
  can only become newly-selected via a path reached for the first time
  in this pass.

TDD gates (crates/fbuild-library-select/tests/perf_tdd.rs):
- `pass2_reuses_pass1_scan_results_no_re_reads` -- asserts
  `files_read == included_files.len()` over a 2-pass scenario where
  Wire is only reachable through SPI.cpp.
- `resolve_emits_ldf_pass_and_ldf_walk_spans` -- asserts both spans are
  visible via tracing-test.

Measured perf (crates/fbuild-library-select/benches/, --quick):
- resolve_cold:  -35% time (3.27 ms vs ~5.1 ms baseline).
- resolve_warm:  -27% time (1.17 ms vs ~1.6 ms baseline).

Behavior unchanged: all 8 walker tests, all 10 resolver tests, all 7
cache tests, and the full fbuild-build 499-test suite stay green.

Closes #236 (proposals A, B, C). Proposals D (header-name precompute) and
E (CI bench gates) tracked as separate follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a54f84e6-7db5-42e7-9384-39b3ac4b1b83

📥 Commits

Reviewing files that changed from the base of the PR and between dd134d2 and 408e43a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • Cargo.toml
  • crates/fbuild-header-scan/Cargo.toml
  • crates/fbuild-header-scan/src/lib.rs
  • crates/fbuild-header-scan/src/walker.rs
  • crates/fbuild-library-select/Cargo.toml
  • crates/fbuild-library-select/src/lib.rs
  • crates/fbuild-library-select/tests/README.md
  • crates/fbuild-library-select/tests/perf_tdd.rs

📝 Walkthrough

Walkthrough

This PR implements issue #236: a performance optimization that parallelizes the include-graph walker with rayon, introduces cross-pass file-scan memoization via WalkState, and adds tracing instrumentation to the LDF resolver. The changes eliminate redundant file reads when Pass 2's reconciliation re-walks the same frontier as Pass 1.

Changes

Parallel and Memoized Header Scanner with Multi-Pass Optimization

Layer / File(s) Summary
Dependencies and public API surface
Cargo.toml, crates/fbuild-header-scan/Cargo.toml, crates/fbuild-library-select/Cargo.toml, crates/fbuild-header-scan/src/lib.rs
Workspace adds rayon and tracing-test dependencies. fbuild-header-scan links rayon and tracing; fbuild-library-select adds tracing-test for dev. Public exports of walk_with_state and WalkState from scanner crate root.
Stateful parallel walker with wave-based BFS
crates/fbuild-header-scan/src/walker.rs
WalkState holds shared visited set, per-file scan cache, and read counter. walk_with_state() processes BFS frontiers in parallel waves: each wave fans out via rayon to read+scan uncached files, resolves includes from cache, and returns only newly discovered files. walk() is a convenience wrapper allocating fresh state. Include-resolution helper renamed to resolve_include.
Multi-pass resolver with stateful walking and tracing
crates/fbuild-library-select/src/lib.rs
ResolveStats struct reports aggregated files_read and pass count. resolve_with_stats() threads a shared WalkState through Pass 1 and all reconciliation passes, reusing cached scans across iterations. Wraps each pass in ldf_pass and ldf_walk tracing spans. resolve() delegates to resolve_with_stats() for backward compatibility.
Integration tests and documentation
crates/fbuild-library-select/tests/README.md, crates/fbuild-library-select/tests/perf_tdd.rs
Test README documents integration test scope and perf/TDD expectations. perf_tdd.rs adds two tests: pass2_reuses_pass1_scan_results_no_re_reads asserts that stats.files_read equals selection.included_files.len() across ≥2 passes; resolve_emits_ldf_pass_and_ldf_walk_spans verifies tracing spans are emitted.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit hops through files with glee,
No re-reads with rayon's decree.
Pass 1 scans, Pass 2 reuses—
The walker wins where once it loses.
Tracing spans light the way,
✨ Performance saved the day! 🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the PR's main objective: implementing parallel BFS walker, scan memoization, and tracing spans for performance improvements on library-select, addressing issue #236.
Linked Issues check ✅ Passed The PR fully implements all accepted proposals from #236: parallel BFS walker (Proposal A), memoization across passes (Proposal B), tracing spans (Proposal C), with TDD tests verifying single-read behavior and span emission.
Out of Scope Changes check ✅ Passed All changes are scoped to #236 acceptance criteria: walker parallelization, scan memoization, tracing instrumentation, and corresponding test coverage. No unrelated modifications were introduced.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/issue-236-parallel-ldf-walker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zackees zackees merged commit 6d4c0f7 into main May 12, 2026
89 checks passed
zackees added a commit that referenced this pull request May 23, 2026
… release v2.2.5 (#265)

Closes #263.

## The regression

fbuild 2.2.4 broke ALL teensy41 examples on FastLED's CI: every example
link-fails with multiple-definition errors on every FastLED symbol.

Root cause: the LDF resolver (introduced as `perf(library-select)` in
PR #237) selects the framework's bundled FastLED library at
`cores/teensy4/.../libraries/FastLED/` even when the user's project
ships its own FastLED. The bundled library's source files get appended
to `core_sources` (teensy orchestrator.rs:207), get compiled into the
build, and produce duplicate symbols at link time.

The path-prefix attribution in `fbuild-library-select::resolve` can
mis-attribute a `#include <FastLED.h>` when the user's transitive
includes resolve into the bundled library — even though the user's
project owns `FastLED.h` directly.

## The fix

New `filter_framework_libs_shadowed_by_project(libraries, roots)` in
`framework_libs.rs` drops any framework library whose primary header
(`<lib_name>.h`) is shadowed by a same-basename header anywhere under
the project's include roots. Applied at the start of both the cached
and non-cached resolver paths.

Conservative: only drops a library when the project itself ships a
header matching the library's canonical name. Other framework
libraries (SPI, Wire, etc.) are unaffected.

## Tests

- `project_is_the_library_does_not_pull_in_bundled_copy` — the
  simpler case (project src/FastLED.h, framework libraries/FastLED/);
  passed before the fix too (the resolver handled this case via
  path-prefix attribution) but stays as a regression gate.
- `example_only_root_does_not_pull_in_bundled_fastled_when_user_owns_fastled`
  — the failing case (per-example walker root doesn't see the repo's
  src/, but the user owns FastLED at a higher level). Demonstrates the
  filter dropping the bundled library before the resolver runs.

Full workspace cargo check / clippy / fmt / test all green.

## Release v2.2.5

Patch release rolling up:
- THIS fix (#263 regression repair)
- The LTO-tmpdir fix from #261 / PR #262 (Windows MSYS `mv` path collapse)

Cargo.toml + pyproject.toml bumped to 2.2.5.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(library-select): #205 follow-up — parallelize BFS walker and memoize Pass 1 scans across Pass 2

1 participant