B5 layered ignore patterns (PR-C)#10
Conversation
Co-authored-by: Cursor <cursoragent@cursor.com>
Review: PR-C — B5 layered ignore patterns +
|
| Sentinel | Status |
|---|---|
ONTOLOGY_VERSION value bump |
✅ unchanged — the one match is just a re-imported symbol |
CREATE NODE/REL TABLE / DROP TABLE |
✅ zero |
Route(, HTTP_CALLS, ASYNC_CALLS, find_route_callers |
✅ zero |
BrownfieldOverrides, CodebaseRoute |
✅ zero |
analyze_pr impl |
✅ zero — the 6 hits are README context lines + B4 test names |
EXPOSES impl |
✅ zero — the 4 hits are README context lines for existing tools |
New @mcp.tool registrations |
✅ exactly 1: diagnose_ignore (as planned) |
| Schema column declarations changed | ✅ zero |
Plan compliance — § PR-C
| # | Plan step | Verified |
|---|---|---|
| 1 | New path_filtering.py with IgnoreLayer dataclass + LayeredIgnore class |
✅ path_filtering.py:62, 223 |
| 2 | 4-layer order builtin_default → project_root → nested → gitignore (later wins; negation honoured) |
✅ via _mega_build_for_rel (path_filtering.py:138) — replays patterns in order, prefix-translating nested anchors |
| 3 | `is_ignored(path) -> tuple[bool, IgnoreLayer | None]` |
| 4 | diagnose(path) multi-line explanation |
✅ cites file path, layer, line number, and pattern (verified manually) |
| 5 | Replace COMMON_EXCLUDED_PATH_PATTERNS call sites in graph_enrich.py + java_index_flow_lancedb.py |
✅ all callers migrated; legacy module loses 72 LOC |
| 6 | iter_java_source_files signature change with @overload deprecation shim |
✅ path_filtering.py:354-377 — DeprecationWarning fires correctly (verified live) |
| 7 | diagnose_ignore MCP tool |
✅ server.py:772 |
| 8 | Tests 39-46 + e2e ext + MCP smoke + deprecation/legacy | ✅ all present in tests/test_path_filtering.py (178 LOC) |
| 9 | pathspec in deps |
✅ added to pyproject.toml (already pinned in requirements.txt) |
| 10 | README "Ignore patterns" section | ✅ |
Tests
210 passed, 4 skipped
+11 passed / +1 skipped vs PR-B baseline (199/3). The +11 covers the planned 8 unit tests (39-46) + 1 e2e extension + 1 MCP smoke + 1 deprecation/legacy parity test. The +1 skip is a new heavy LANCEDB e2e gated by LANCEDB_MCP_RUN_HEAVY=1. ✅
Manual evidence reproduced
Smoke fixture (tests/fixtures/lancedb_ignore_smoke/):
With ignore (.lancedb-mcp/ignore = "**/generated/**"):
['src/main/java/com/example/Real.java']
Without ignore:
['src/main/java/com/example/Real.java',
'src/main/java/com/example/generated/SkipMe.java']
✅ Matches PR description: 1 vs 2 files.
Bank-chat parity (use_gitignore=False, default for build_ast_graph.py):
[pass4] Route extraction: emitted=11, exposes=11, routes_resolved_pct=81.8,
routes_from_brownfield_pct=0.0
✅ Identical to PR-A2/A3 baseline — zero regression.
diagnose() live output:
Excluded by /…/lancedb_ignore_smoke/.lancedb-mcp/ignore (project_root) at line 1: '**/generated/**'
Cites the layer, file path, line number, and pattern verbatim — exactly the multi-line format the plan called out. Outside-project paths get a distinct message. ✅
Deprecation shim — iter_java_source_files(root, ['**/Generated/**']) fires:
DeprecationWarning: iter_java_source_files(root, exclude_globs) is deprecated;
use iter_java_source_files(root, ignore=LayeredIgnore(root, …))
Clear migration path. ✅
Notes that earned my trust
- Real gitignore semantics —
_winning_rowreturns "the last rule line that changes the cumulative match result", not just "highest-rank layer that matched". This is whatgit check-ignoreactually does and is critical for negation correctness. A naive "later layer wins" would fail the §4.3 case where a project-root*.javaignore is locally re-included by a nested!**/Real.java. Worth keeping in your head when answering user questions aboutdiagnose_ignoreoutput. - Nested-ignore anchor translation —
_prefix_line_to_projectre-anchors patterns from a nested ignore to project-relative form before joining the mega-spec. Without this,**/build/**in a nestedsvc-a/.lancedb-mcp/ignorewould also matchsvc-b/build/, which is wrong. Implementation handles!-prefixed lines correctly when prepending the anchor. - Permissive walk + per-file
LayeredIgnorefor CocoIndex — the PR description correctly notes that pruning directories early would be unsound when negation can un-ignore under a pruned dir. This implementation avoids that bug class entirely (at the cost of walking more files; acceptable for this codebase's size). - Cheap-path early exit for "no negation anywhere" —
_scan_negation_any_*lets the common case (no!lines) skip the per-file mega-build. Nice optimisation that doesn't compromise correctness.
Observations (non-blocking)
requirements.txtpinspathspec==1.0.4butpyproject.tomladdspathspec>=0.12,<2. Both versions supportGitIgnoreSpec(verified on 1.0.4: negation works), so this is fine — but therequirements.txtpin pre-dated this PR (was already there before B5 needed it). Consider tighteningpyproject.tomlto match the pinned version, or looseningrequirements.txtso both files are clearly in sync. Minor doc drift._scan_negation_any_*walksrglobat construction time. For very large monorepos (10k+ dirs), therglob(".lancedb-mcp/ignore")andrglob(".gitignore")walks in__init__could be measurable. Cache hit onLayeredIgnorereuse mitigates this; worth knowing if you ever build it inside a tight loop.diagnose()shows resolved file path absolute, not relative. The plan example showed/repo/svc-a— the implementation prints the full host path. Slight verbosity in CLI output but accurate. Could relativise to project root for cleaner output if the path lies under it. Trivial cosmetic.is_relative_path_excludedandcompile_excluded_glob_patternsre-exported as legacy shims inpath_filtering.py:44, 51— they're kept for backward-compat with anything that still imports them fromjava_index_v1_common. Good defensive move; consider a# pragma: no coveror scheduled removal note in a follow-up.
Plan deltas needed
None. Implementation matches plans/PLAN-TIER1-COMPLETION.md::PR-C line by line.
Ready to merge. This closes Tier 1 (PR-A1 → A2 → A3 → B → C). Next milestones per the plan are B2b (HTTP_CALLS edges + find_route_callers) and B6 (ASYNC_CALLS) under propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md.
Sync pathspec constraints, make ignore diagnostics project-relative, document negation-scan cost, and harden pr diff heuristics and ambiguity reporting while removing duplicate Symbol fetches in risk scoring. Co-authored-by: Cursor <cursoragent@cursor.com>
…lls/ - Rewrite architecture: agent-skills/ + compile pipeline → skills/ at project root. All hosts read from skills/ directly. No compile.py, no compile-skills CLI subcommand, no AUTOGENERATED banner. - Developer workflow skills stay in .agents/skills/ (not skills/). - Add hints_structured awareness (new principle #10, decision #17). - Collapse 5-PR → 4-PR migration (no compile step PR). - Move propose to propose/active/ per new folder structure. - Delete docs/skills/ (java-codebase-explore.md, .zip) and scripts/build-explore-skill.sh. - Update README, AGENTS.md, test.yml, tests/README, automation README to reference skills/ instead of docs/skills/ and reports/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… docs/skills/ Rewrites the propose doc (revision 5) to use skills/ at project root instead of agent-skills/ + compile pipeline. All hosts read from skills/ directly. No compile step. - Move propose to propose/active/ per new folder structure - Delete docs/skills/ (java-codebase-explore.md, .zip) and scripts/build-explore-skill.sh - Add hints_structured awareness (principle #10, decision #17) - Collapse 5-PR → 4-PR migration (no compile step PR) - Update README, AGENTS.md, test.yml, tests/README, automation README Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… docs/skills/ - Rewrite propose doc (revision 5): skills/ at project root instead of agent-skills/ + compile pipeline. No compile step, no multi-host copy. - Move propose from propose/ root to propose/active/ per new folder structure. - Delete docs/skills/ (java-codebase-explore.md, .zip) and scripts/build-explore-skill.sh. - Add hints_structured awareness (principle #10, decision #17). - Collapse 5-PR → 4-PR migration. - Update README, AGENTS.md, test.yml, tests/README, automation README. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… docs/skills/ (#224) - Rewrite propose doc (revision 5): skills/ at project root instead of agent-skills/ + compile pipeline. No compile step, no multi-host copy. - Move propose from propose/ root to propose/active/ per new folder structure. - Delete docs/skills/ (java-codebase-explore.md, .zip) and scripts/build-explore-skill.sh. - Add hints_structured awareness (principle #10, decision #17). - Collapse 5-PR → 4-PR migration. - Update README, AGENTS.md, test.yml, tests/README, automation README. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Implements PR-C from
plans/PLAN-TIER1-COMPLETION.md: layered ignore rules withpathspec.GitIgnoreSpec, project and nested.lancedb-mcp/ignorefiles, optional.gitignoreintegration, and MCP tooldiagnose_ignore.Behaviour
builtin_default→project_root→nested→gitignore(later wins; negation supported)..lancedb-mcp/ignorekeep the same builtin-only behaviour as before when git rules do not apply; bank-chat parity is tested withuse_gitignore=False.LayeredIgnorewhen negation could un-ignore under pruned dirs.Dependencies
pathspecadded topyproject.toml(already inrequirements.txt).Tests
tests/test_path_filtering.py(PR-C §4 cases 39–46 + deprecation + legacy count).tests/test_mcp_tools.py:diagnose_ignoresmoke.tests/test_lancedb_e2e.py: heavy test for Lance unique Java filenames with vs without ignore (gate:LANCEDB_MCP_RUN_HEAVY=1).Re-index
No Lance/Kuzu schema change. Re-running indexing picks up new ignore files automatically.
Fixture note
Example project with ignore:
tests/fixtures/lancedb_ignore_smoke/— with ignore: 1 indexed.javafile; without.lancedb-mcp/: 2.Made with Cursor