Skip to content

B5 layered ignore patterns (PR-C)#10

Merged
HumanBean17 merged 2 commits into
masterfrom
feat/b5-layered-ignores
May 5, 2026
Merged

B5 layered ignore patterns (PR-C)#10
HumanBean17 merged 2 commits into
masterfrom
feat/b5-layered-ignores

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Summary

Implements PR-C from plans/PLAN-TIER1-COMPLETION.md: layered ignore rules with pathspec.GitIgnoreSpec, project and nested .lancedb-mcp/ignore files, optional .gitignore integration, and MCP tool diagnose_ignore.

Behaviour

  • Resolution order: builtin_defaultproject_rootnestedgitignore (later wins; negation supported).
  • Projects with no .lancedb-mcp/ignore keep the same builtin-only behaviour as before when git rules do not apply; bank-chat parity is tested with use_gitignore=False.
  • CocoIndex uses permissive walk + per-file LayeredIgnore when negation could un-ignore under pruned dirs.

Dependencies

  • pathspec added to pyproject.toml (already in requirements.txt).

Tests

  • tests/test_path_filtering.py (PR-C §4 cases 39–46 + deprecation + legacy count).
  • tests/test_mcp_tools.py: diagnose_ignore smoke.
  • tests/test_lancedb_e2e.py: heavy test for Lance unique Java filenames with vs without ignore (gate: LANCEDB_MCP_RUN_HEAVY=1).

Re-index

No Lance/Kuzu schema change. Re-running indexing picks up new ignore files automatically.

Fixture note

Example project with ignore: tests/fixtures/lancedb_ignore_smoke/ — with ignore: 1 indexed .java file; without .lancedb-mcp/: 2.

Made with Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>
@HumanBean17
Copy link
Copy Markdown
Owner Author

Review: PR-C — B5 layered ignore patterns + diagnose_ignore

Verdict: Approved ✅

Tight, on-spec implementation of the layered ignore system. Uses real gitignore semantics ("last rule that changes cumulative match wins") via a per-file mega-spec rather than naive layer overrides — meaningfully more correct for negation patterns. Scope discipline is excellent; no schema changes; bank-chat parity preserved with use_gitignore=False.

Scope discipline (out-of-scope checks)

Sentinel Status
ONTOLOGY_VERSION value bump ✅ unchanged — the one match is just a re-imported symbol
CREATE NODE/REL TABLE / DROP TABLE ✅ zero
Route(, HTTP_CALLS, ASYNC_CALLS, find_route_callers ✅ zero
BrownfieldOverrides, CodebaseRoute ✅ zero
analyze_pr impl ✅ zero — the 6 hits are README context lines + B4 test names
EXPOSES impl ✅ zero — the 4 hits are README context lines for existing tools
New @mcp.tool registrations ✅ exactly 1: diagnose_ignore (as planned)
Schema column declarations changed ✅ zero

Plan compliance — § PR-C

# Plan step Verified
1 New path_filtering.py with IgnoreLayer dataclass + LayeredIgnore class path_filtering.py:62, 223
2 4-layer order builtin_default → project_root → nested → gitignore (later wins; negation honoured) ✅ via _mega_build_for_rel (path_filtering.py:138) — replays patterns in order, prefix-translating nested anchors
3 `is_ignored(path) -> tuple[bool, IgnoreLayer None]`
4 diagnose(path) multi-line explanation ✅ cites file path, layer, line number, and pattern (verified manually)
5 Replace COMMON_EXCLUDED_PATH_PATTERNS call sites in graph_enrich.py + java_index_flow_lancedb.py ✅ all callers migrated; legacy module loses 72 LOC
6 iter_java_source_files signature change with @overload deprecation shim path_filtering.py:354-377DeprecationWarning fires correctly (verified live)
7 diagnose_ignore MCP tool server.py:772
8 Tests 39-46 + e2e ext + MCP smoke + deprecation/legacy ✅ all present in tests/test_path_filtering.py (178 LOC)
9 pathspec in deps ✅ added to pyproject.toml (already pinned in requirements.txt)
10 README "Ignore patterns" section

Tests

210 passed, 4 skipped

+11 passed / +1 skipped vs PR-B baseline (199/3). The +11 covers the planned 8 unit tests (39-46) + 1 e2e extension + 1 MCP smoke + 1 deprecation/legacy parity test. The +1 skip is a new heavy LANCEDB e2e gated by LANCEDB_MCP_RUN_HEAVY=1. ✅

Manual evidence reproduced

Smoke fixture (tests/fixtures/lancedb_ignore_smoke/):

With ignore (.lancedb-mcp/ignore = "**/generated/**"):
  ['src/main/java/com/example/Real.java']
Without ignore:
  ['src/main/java/com/example/Real.java',
   'src/main/java/com/example/generated/SkipMe.java']

✅ Matches PR description: 1 vs 2 files.

Bank-chat parity (use_gitignore=False, default for build_ast_graph.py):

[pass4] Route extraction: emitted=11, exposes=11, routes_resolved_pct=81.8,
        routes_from_brownfield_pct=0.0

✅ Identical to PR-A2/A3 baseline — zero regression.

diagnose() live output:

Excluded by /…/lancedb_ignore_smoke/.lancedb-mcp/ignore (project_root) at line 1: '**/generated/**'

Cites the layer, file path, line number, and pattern verbatim — exactly the multi-line format the plan called out. Outside-project paths get a distinct message. ✅

Deprecation shimiter_java_source_files(root, ['**/Generated/**']) fires:

DeprecationWarning: iter_java_source_files(root, exclude_globs) is deprecated;
use iter_java_source_files(root, ignore=LayeredIgnore(root, …))

Clear migration path. ✅

Notes that earned my trust

  1. Real gitignore semantics_winning_row returns "the last rule line that changes the cumulative match result", not just "highest-rank layer that matched". This is what git check-ignore actually does and is critical for negation correctness. A naive "later layer wins" would fail the §4.3 case where a project-root *.java ignore is locally re-included by a nested !**/Real.java. Worth keeping in your head when answering user questions about diagnose_ignore output.
  2. Nested-ignore anchor translation_prefix_line_to_project re-anchors patterns from a nested ignore to project-relative form before joining the mega-spec. Without this, **/build/** in a nested svc-a/.lancedb-mcp/ignore would also match svc-b/build/, which is wrong. Implementation handles !-prefixed lines correctly when prepending the anchor.
  3. Permissive walk + per-file LayeredIgnore for CocoIndex — the PR description correctly notes that pruning directories early would be unsound when negation can un-ignore under a pruned dir. This implementation avoids that bug class entirely (at the cost of walking more files; acceptable for this codebase's size).
  4. Cheap-path early exit for "no negation anywhere"_scan_negation_any_* lets the common case (no ! lines) skip the per-file mega-build. Nice optimisation that doesn't compromise correctness.

Observations (non-blocking)

  1. requirements.txt pins pathspec==1.0.4 but pyproject.toml adds pathspec>=0.12,<2. Both versions support GitIgnoreSpec (verified on 1.0.4: negation works), so this is fine — but the requirements.txt pin pre-dated this PR (was already there before B5 needed it). Consider tightening pyproject.toml to match the pinned version, or loosening requirements.txt so both files are clearly in sync. Minor doc drift.
  2. _scan_negation_any_* walks rglob at construction time. For very large monorepos (10k+ dirs), the rglob(".lancedb-mcp/ignore") and rglob(".gitignore") walks in __init__ could be measurable. Cache hit on LayeredIgnore reuse mitigates this; worth knowing if you ever build it inside a tight loop.
  3. diagnose() shows resolved file path absolute, not relative. The plan example showed /repo/svc-a — the implementation prints the full host path. Slight verbosity in CLI output but accurate. Could relativise to project root for cleaner output if the path lies under it. Trivial cosmetic.
  4. is_relative_path_excluded and compile_excluded_glob_patterns re-exported as legacy shims in path_filtering.py:44, 51 — they're kept for backward-compat with anything that still imports them from java_index_v1_common. Good defensive move; consider a # pragma: no cover or scheduled removal note in a follow-up.

Plan deltas needed

None. Implementation matches plans/PLAN-TIER1-COMPLETION.md::PR-C line by line.


Ready to merge. This closes Tier 1 (PR-A1 → A2 → A3 → B → C). Next milestones per the plan are B2b (HTTP_CALLS edges + find_route_callers) and B6 (ASYNC_CALLS) under propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md.

Sync pathspec constraints, make ignore diagnostics project-relative, document negation-scan cost, and harden pr diff heuristics and ambiguity reporting while removing duplicate Symbol fetches in risk scoring.

Co-authored-by: Cursor <cursoragent@cursor.com>
@HumanBean17 HumanBean17 merged commit ea5baeb into master May 5, 2026
@HumanBean17 HumanBean17 deleted the feat/b5-layered-ignores branch May 5, 2026 07:31
HumanBean17 added a commit that referenced this pull request May 24, 2026
…lls/

- Rewrite architecture: agent-skills/ + compile pipeline → skills/ at
  project root. All hosts read from skills/ directly. No compile.py,
  no compile-skills CLI subcommand, no AUTOGENERATED banner.
- Developer workflow skills stay in .agents/skills/ (not skills/).
- Add hints_structured awareness (new principle #10, decision #17).
- Collapse 5-PR → 4-PR migration (no compile step PR).
- Move propose to propose/active/ per new folder structure.
- Delete docs/skills/ (java-codebase-explore.md, .zip) and
  scripts/build-explore-skill.sh.
- Update README, AGENTS.md, test.yml, tests/README, automation README
  to reference skills/ instead of docs/skills/ and reports/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HumanBean17 added a commit that referenced this pull request May 24, 2026
… docs/skills/

Rewrites the propose doc (revision 5) to use skills/ at project root
instead of agent-skills/ + compile pipeline. All hosts read from skills/
directly. No compile step.

- Move propose to propose/active/ per new folder structure
- Delete docs/skills/ (java-codebase-explore.md, .zip) and
  scripts/build-explore-skill.sh
- Add hints_structured awareness (principle #10, decision #17)
- Collapse 5-PR → 4-PR migration (no compile step PR)
- Update README, AGENTS.md, test.yml, tests/README, automation README

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HumanBean17 added a commit that referenced this pull request May 24, 2026
… docs/skills/

- Rewrite propose doc (revision 5): skills/ at project root instead of
  agent-skills/ + compile pipeline. No compile step, no multi-host copy.
- Move propose from propose/ root to propose/active/ per new folder structure.
- Delete docs/skills/ (java-codebase-explore.md, .zip) and
  scripts/build-explore-skill.sh.
- Add hints_structured awareness (principle #10, decision #17).
- Collapse 5-PR → 4-PR migration.
- Update README, AGENTS.md, test.yml, tests/README, automation README.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HumanBean17 added a commit that referenced this pull request May 24, 2026
… docs/skills/ (#224)

- Rewrite propose doc (revision 5): skills/ at project root instead of
  agent-skills/ + compile pipeline. No compile step, no multi-host copy.
- Move propose from propose/ root to propose/active/ per new folder structure.
- Delete docs/skills/ (java-codebase-explore.md, .zip) and
  scripts/build-explore-skill.sh.
- Add hints_structured awareness (principle #10, decision #17).
- Collapse 5-PR → 4-PR migration.
- Update README, AGENTS.md, test.yml, tests/README, automation README.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant