Summary
#489 moved include/exclude filtering to the CLI walk seam:
expand_seed_paths (big-code-analysis-cli/src/lib.rs) now walks each
directory seed with ignore::WalkBuilder, filters each candidate at the
walk root via walk_seed::match_path_for + WalkFilters::passes, and
passes the resulting flat file list to the library with
empty include/exclude globsets (FilesData { include: GlobSet::empty(), exclude: GlobSet::empty() }). This fixed the
path-form bugs (#488/#489) but left two rough edges worth a
follow-up.
1. Dead library globsets + double per-file stat
The library explore() (src/concurrent_files.rs) still runs
WalkDir::new(path) on every entry it is handed and still applies its
own globset filter (Filters::matches). With a pre-expanded file list
and empty globsets, that filter is a guaranteed no-op and each file is
re-stated a second time (once by the ignore walk in
expand_seed_paths, once by the library WalkDir). Not a full second
tree traversal (single-file WalkDir yields one entry), but O(N)
redundant stats and two filtering mechanisms for one job.
2. Latent re-introduction of the #488 path-form bug
The empty-globset hand-off is load-bearing and undocumented at the
FilesData construction site beyond a comment: the library globset
matcher matches against the walker's emitted (possibly absolute) path
form — exactly the path-form dependence #488 removed. Any future code
path that injects files into FilesData.paths after expand_seed_paths
(or sets non-empty library globsets expecting them to apply) silently
re-inherits the flood-the-offender-set bug. [check.exclude] (#493) is
the existing instance of a filter matching the emitted path instead of
the anchored form.
Suggested direction
Consolidate to a single anchored walk seam: have the library walk accept
an already-resolved file set as terminal (no re-walk, no globset params),
or push the anchored match_path_for filtering into the one library
walk so the emitted path is canonical root-relative once and every
consumer (include/exclude, [check.exclude] #493, baseline keys) matches
that single invariant — removing the per-filter/per-mechanism anchoring
(reanchor_seed for emission + match_path_for for matching + the dead
library globsets).
Acceptance
- One walk + one filtering mechanism; the library globset params are
either removed or genuinely used.
- No O(N) redundant per-file
stat.
- A new path-consuming filter cannot silently re-inherit the
#488
path-form dependence.
Found during the final /code-review of the gate-consolidation epic
(#482). Introduced by the #488/#489 rework; not a correctness bug
today (the CLI pre-filters correctly), but an architecture/perf
follow-up. Related: #493.
Status (updated)
Scope: 2.0-deferred (API reshape). Investigation showed the
structural acceptance criteria need a SemVer-breaking FilesData reshape
and/or moving gitignore-aware walking into the library, and that
canonical emission conflicts with the #488 PyO3-parity contract — see the
analysis comment below. The 1.x-safe part (documenting the path-form
invariant at the public API surface, item 2) landed in 997720dc. The
structural single-seam consolidation remains open as a 2.0 item.
Summary
#489moved include/exclude filtering to the CLI walk seam:expand_seed_paths(big-code-analysis-cli/src/lib.rs) now walks eachdirectory seed with
ignore::WalkBuilder, filters each candidate at thewalk root via
walk_seed::match_path_for+WalkFilters::passes, andpasses the resulting flat file list to the library with
empty include/exclude globsets (
FilesData { include: GlobSet::empty(), exclude: GlobSet::empty() }). This fixed thepath-form bugs (
#488/#489) but left two rough edges worth afollow-up.
1. Dead library globsets + double per-file stat
The library
explore()(src/concurrent_files.rs) still runsWalkDir::new(path)on every entry it is handed and still applies itsown globset filter (
Filters::matches). With a pre-expanded file listand empty globsets, that filter is a guaranteed no-op and each file is
re-
stated a second time (once by theignorewalk inexpand_seed_paths, once by the libraryWalkDir). Not a full secondtree traversal (single-file
WalkDiryields one entry), but O(N)redundant
stats and two filtering mechanisms for one job.2. Latent re-introduction of the
#488path-form bugThe empty-globset hand-off is load-bearing and undocumented at the
FilesDataconstruction site beyond a comment: the library globsetmatcher matches against the walker's emitted (possibly absolute) path
form — exactly the path-form dependence
#488removed. Any future codepath that injects files into
FilesData.pathsafterexpand_seed_paths(or sets non-empty library globsets expecting them to apply) silently
re-inherits the flood-the-offender-set bug.
[check.exclude](#493) isthe existing instance of a filter matching the emitted path instead of
the anchored form.
Suggested direction
Consolidate to a single anchored walk seam: have the library walk accept
an already-resolved file set as terminal (no re-walk, no globset params),
or push the anchored
match_path_forfiltering into the one librarywalk so the emitted path is canonical root-relative once and every
consumer (include/exclude,
[check.exclude]#493, baseline keys) matchesthat single invariant — removing the per-filter/per-mechanism anchoring
(
reanchor_seedfor emission +match_path_forfor matching + the deadlibrary globsets).
Acceptance
either removed or genuinely used.
stat.#488path-form dependence.
Found during the final
/code-reviewof the gate-consolidation epic(
#482). Introduced by the#488/#489rework; not a correctness bugtoday (the CLI pre-filters correctly), but an architecture/perf
follow-up. Related:
#493.Status (updated)
Scope: 2.0-deferred (API reshape). Investigation showed the
structural acceptance criteria need a SemVer-breaking
FilesDatareshapeand/or moving gitignore-aware walking into the library, and that
canonical emission conflicts with the #488 PyO3-parity contract — see the
analysis comment below. The 1.x-safe part (documenting the path-form
invariant at the public API surface, item 2) landed in
997720dc. Thestructural single-seam consolidation remains open as a 2.0 item.