From 345d8182dfd07f78653956d5015b403a0f1c38b7 Mon Sep 17 00:00:00 2001 From: Dmitry Date: Mon, 4 May 2026 16:26:17 +0000 Subject: [PATCH] Add Tier 1 completion plan + proposals (B2a + B4 + B5) - propose/TIER1-COMPLETION-PROPOSE.md: active proposal for B2a, B4, B5 with brownfield route_overrides integration. - propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md: skeleton for the follow-on B2b + B6 PR; locks the join-key contract with B2a so the two cannot drift. - plans/PLAN-TIER1-COMPLETION.md: implementation plan split into five independent PRs (A1 schema, A2 SpEL/MCP, A3 brownfield, B analyze_pr, C layered ignores), each with file-by-file changes, test buckets, and acceptance criteria. --- plans/PLAN-TIER1-COMPLETION.md | 990 +++++++++++++++++++++ propose/TIER1-COMPLETION-PROPOSE.md | 960 ++++++++++++++++++++ propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md | 438 +++++++++ 3 files changed, 2388 insertions(+) create mode 100644 plans/PLAN-TIER1-COMPLETION.md create mode 100644 propose/TIER1-COMPLETION-PROPOSE.md create mode 100644 propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md diff --git a/plans/PLAN-TIER1-COMPLETION.md b/plans/PLAN-TIER1-COMPLETION.md new file mode 100644 index 0000000..5d8e797 --- /dev/null +++ b/plans/PLAN-TIER1-COMPLETION.md @@ -0,0 +1,990 @@ +# Plan: Tier 1 completion (B2a + B4 + B5) + +Status: **ready to implement**. Self-contained: an agent picking this up +should be able to land it without re-deriving the design. Pairs with +[`propose/TIER1-COMPLETION-PROPOSE.md`](../propose/TIER1-COMPLETION-PROPOSE.md) +(scope, rationale, schema). The follow-on proposal +[`propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md`](../propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md) +(B2b + B6) depends on this plan landing first — **do not pre-implement +its hooks here**. + +## Goal + +Close out Tier 1 within the static-analysis remit: + +- **B2a** — `Route` node + `EXPOSES` rel for Spring MVC / WebFlux / + Feign / Kafka / RabbitMQ / JMS / Spring Cloud Stream **declarations + only**. Brownfield-overridable via the same surface that exists for + roles/capabilities. +- **B4** — `analyze_pr(diff)` MCP tool: maps a unified diff to changed + symbols, computes blast-radius via `impact_analysis`, returns a risk + score. +- **B5** — Layered ignore patterns: `pathspec` over project-root + `.lancedb-mcp/ignore` + nested `.lancedb-mcp/ignore` files + + `.gitignore` integration. + +Three sub-features ship in **three independent PRs** (see §Rollout). + +## Principles (do not relitigate in review) + +- **Mostly additive.** No table dropped, no MCP tool removed. New nodes + / edges / tools only. +- **Brownfield surface extends `BrownfieldOverrides` — does not parallel + it.** The route resolver mirrors `resolve_role_and_capabilities` + shape-for-shape. See + [`plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md`](completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md) + — **mandatory reading** before touching §PR-A3. +- **Confidence-scored edges.** Same three-strategy ladder as + `pass3_calls`: literal=1.0, SpEL=0.85, constant_ref=0.7. +- **LanceDB untouched.** No reindex; only the Kuzu graph rebuilds. +- **Ontology bump 4 → 5** (only on B2a's PR, not on B4 or B5). +- **Microservice-aware identity.** `Route.id` includes `microservice` + so the same path in two services is two routes — required by + the deferred B2b/B6 join. + +## PR breakdown — overview + +| PR | Scope | Ontology bump | Files touched (approx) | Test buckets | Independent of | +| --------- | ---------------------------------------------------------------- | ------------- | ---------------------- | --------------------------------- | ------------------- | +| **PR-A1** | B2a schema + extractor (literal-only, no brownfield, no MCP) | 4 → 5 | 3 | unit + integration | — | +| **PR-A2** | B2a SpEL/constant-ref resolution + MCP tools | none | 3 | unit + MCP | PR-A1 | +| **PR-A3** | B2a brownfield (route_overrides + @CodebaseRoute + 5-layer) | none | 4 | 12 brownfield fixtures | PR-A2 | +| **PR-B** | B4 `analyze_pr` MCP tool | none | 3 | unit + MCP | PR-A1 (only edges) | +| **PR-C** | B5 layered ignores | none | 4 | unit + integration | none | + +PRs land in order **A1 → A2 → A3 → B → C**. PR-B and PR-C are also +independently mergeable after A1 if priorities shift. Each PR keeps the +test suite green at every commit. + +--- + +# PR-A1 — B2a schema + literal extractor + +**Goal:** Land the `Route` node, `EXPOSES` rel, and a `pass4_routes` +extractor that handles **literal-string** annotation arguments only. +SpEL and constant-ref handling is deferred to PR-A2; brownfield is +deferred to PR-A3. After this PR, `Route` nodes appear in the graph for +the bank-chat-system corpus. + +## File-by-file changes + +### 1. `ast_java.py` — Route declaration model + +Additions (~40 lines, no removals): + +1. New dataclass `RouteDecl`: + ```python + @dataclass + class RouteDecl: + method_fqn: str # owning method's Symbol id + method_sig: str # method signature for stable Symbol lookup + kind: str # 'http_endpoint' | 'http_consumer' | 'kafka_topic' | 'rabbit_queue' | 'jms_destination' | 'stream_binding' + framework: str # 'spring_mvc' | 'webflux' | 'feign' | 'kafka' | 'rabbitmq' | 'jms' | 'stream' + http_method: str # 'GET' | 'POST' | … | '' for async + path: str # raw path as it appeared in source (literal only in PR-A1) + topic: str # async only + broker: str # async only — '' for default broker + feign_name: str # @FeignClient(name=…) — '' for non-Feign + feign_url: str # @FeignClient(url=…) — '' when name-based + resolution_strategy: str # PR-A1: always 'annotation' (literal). PR-A2 adds 'spel' / 'constant_ref'. + confidence: float # PR-A1: always 1.0. PR-A2 adds 0.85 / 0.7. + resolved: bool # PR-A1: always True for emitted routes. PR-A2 adds False for unresolved. + filename: str + start_line: int + end_line: int + ``` + Exported in `__all__`. + +2. New `MethodDecl.routes: list[RouteDecl] = field(default_factory=list)`. + +3. **Bump `ONTOLOGY_VERSION` from 4 to 5.** Update the comment in + `ast_java.py` to mention "Phase 4: Route + EXPOSES (B2a)". + +4. New helper `_collect_routes(method_node, type_node, src, *, …)` + called from `_parse_method`. Reads: + - **Type-level base path / class config:** + - `@RequestMapping("/api/v1")` on `@Controller` / `@RestController`. + - `@FeignClient(name=…, url=…, path=…)` on the interface. + - `@KafkaListener(topics=…)` at class level (rare). + - `@RabbitListener(queues=…)` at class level. + - **Method-level mapping:** + - `@RequestMapping`, `@GetMapping`, `@PostMapping`, + `@PutMapping`, `@DeleteMapping`, `@PatchMapping`. + - WebFlux equivalents (same annotations; framework differs only by + enclosing class — see #5 below). + - `@KafkaListener`, `@RabbitListener`, `@JmsListener`, + `@StreamListener`, Spring Cloud Stream `@Bean Function/Consumer/Supplier`. + - **Path composition:** `class_base + method_path`, normalized via + `posixpath.normpath`. `value` / `path` arrays produce one + `RouteDecl` per element. + - **PR-A1 scope:** literal-string arguments only. If an argument is a + SpEL expression (`${…}`) or a constant reference (e.g. + `Endpoints.USERS`), **skip the route** and increment a counter + `routes_skipped_unresolved`. PR-A2 will pick these up. + +5. Framework detection rule for WebFlux: same annotations as Spring MVC + but the controller method's return type is `Mono<…>` / `Flux<…>` or + the class is annotated with `@RestController` and uses reactive + types in any signature. Use `framework='webflux'` in that case; + otherwise `'spring_mvc'`. Document this rule next to `_collect_routes`. + +6. Feign nuance: `@FeignClient` interfaces have no body, but each + abstract method is an exposer. `_collect_routes` emits one + `RouteDecl` per method with `kind='http_endpoint'`, + `framework='feign'`, plus `feign_name` / `feign_url` populated from + the interface annotation. The "exposer" semantically is the Feign + declaration; the imperative caller side is B2b's job, not this PR's. + +### 2. `java_ontology.py` — route taxonomy + +Additions (~15 lines): + +1. New frozensets: + ```python + VALID_ROUTE_FRAMEWORKS: frozenset[str] = frozenset(( + "spring_mvc", "webflux", "feign", "kafka", "rabbitmq", "jms", "stream", + )) + VALID_ROUTE_KINDS: frozenset[str] = frozenset(( + "http_endpoint", "http_consumer", "kafka_topic", + "rabbit_queue", "jms_destination", "stream_binding", + )) + ``` +2. Add both to `__all__`. + +### 3. `build_ast_graph.py` — schema, extractor pass, writers + +#### 3.1 Schema additions + +Add after the existing `_SCHEMA_*` constants (around line 1127): + +```python +_SCHEMA_ROUTE = ( + "CREATE NODE TABLE Route(" + "id STRING, kind STRING, framework STRING, " + "method STRING, path STRING, path_template STRING, path_regex STRING, " + "topic STRING, broker STRING, " + "feign_name STRING, feign_url STRING, " + "microservice STRING, module STRING, " + "filename STRING, start_line INT64, end_line INT64, " + "resolved BOOLEAN, " + "PRIMARY KEY(id))" +) +_SCHEMA_EXPOSES = ( + "CREATE REL TABLE EXPOSES(FROM Symbol TO Route, " + "confidence DOUBLE, strategy STRING)" +) +``` + +Add both to the create-tables list and the drop-on-rebuild list. +Edge direction `(Symbol)-[:EXPOSES]->(Route)` is **locked** — do not +reverse it; it is required for the deferred B2b/B6 traversal +`(caller)-[:HTTP_CALLS]->(Route)<-[:EXPOSES]-(handler)`. + +#### 3.2 New helpers + +Add module-level functions in `build_ast_graph.py`: + +1. **`_normalize_path(raw_path: str) -> tuple[str, str]`** — returns + `(path_template, path_regex)`. + - `/api/users/{id}` → `("/api/users/{}", "^/api/users/[^/]+/?$")`. + - `/api/users/{id:\d+}` → strip the regex constraint to `{}` for the + template; preserve the constraint in the regex + (`^/api/users/\d+/?$`). + - Trailing slash variants collapsed (template normalized without + trailing slash; regex allows both via `/?$`). + - Multi-`{}`: handle left-to-right. + - Unit-tested in PR-A1's tests; **shared by B2a/B2b** so its output + is the source of truth. + +2. **`_route_id(framework: str, kind: str, http_method: str, path_template: str, topic: str, broker: str, microservice: str) -> str`** — + stable hash: + ```python + import hashlib + key = f"{framework}|{kind}|{http_method}|{path_template}|{topic}|{broker}|{microservice}" + return f"r:{hashlib.sha1(key.encode()).hexdigest()[:16]}" + ``` + Including `microservice` makes "`/api/users` in svc A" and "`/api/users` + in svc B" two distinct routes. + +3. **`_emit_route(tables: GraphTables, decl: RouteDecl, *, microservice: str, module: str)`** — + appends `RouteRow` and `ExposesRow` to `tables`. Dedupes by `Route.id`. + +#### 3.3 Dataclasses + +Add to `GraphTables`: +```python +routes_rows: list[RouteRow] = field(default_factory=list) +exposes_rows: list[ExposesRow] = field(default_factory=list) +route_stats: RouteExtractionStats = field(default_factory=RouteExtractionStats) +``` + +`RouteRow` mirrors the Route node columns. `ExposesRow` carries +`(symbol_id, route_id, confidence, strategy)`. `RouteExtractionStats`: +counters per `framework`, per `kind`, plus `routes_skipped_unresolved` +(PR-A1 increments this for SpEL/const-ref; PR-A2 turns them into +unresolved Routes). + +#### 3.4 New `pass4_routes` function + +Runs after `pass3_calls`. Signature mirrors the existing pass: + +```python +def pass4_routes(tables: GraphTables, asts: dict[str, JavaFileAst], *, verbose: bool) -> None: + ... +``` + +Loop: +1. For each AST and each `MethodDecl` with `method.routes`: + - Determine `microservice` from the file's owning module (re-use the + existing `_microservice_for_file` / equivalent — search for how + `pass3_calls` derives it; do **not** reinvent). + - For each `RouteDecl`: + - `path_template, path_regex = _normalize_path(decl.path)` (or + `("", "")` if `kind != 'http_endpoint'`). + - `route_id = _route_id(...)`. + - Append a `RouteRow` (dedup by `id`). + - Append an `ExposesRow(method_symbol_id, route_id, decl.confidence, decl.resolution_strategy)`. + - Update `tables.route_stats`. + +The pass does **not** read or mutate `tables.calls_rows`. + +#### 3.5 Writers + +Add a writer block after the existing CALLS writer: +- Insert `Route` rows; idempotent on `id`. +- Insert `EXPOSES` rows; dedup by `(from, to)` since one method emits + one logical edge per route (Feign array-method case is already one + per `RouteDecl`). + +#### 3.6 graph_meta extension + +Add to the `graph_meta` MERGE call (around line 1343): +```python +"routes_total INT64, " +"exposes_total INT64, " +"routes_by_framework MAP(STRING, INT64), " +"routes_resolved_pct DOUBLE, " +``` +Populate from `tables.route_stats`. + +#### 3.7 CLI wire-up + +In `main`, add `pass4_routes(tables, asts, verbose=args.verbose)` +right after `pass3_calls(...)` (line 1421). + +### 4. Tests for PR-A1 + +#### 4.1 New test file: `tests/test_route_extraction.py` + +Inline-source unit tests for `_collect_routes` and `_normalize_path`. +Required cases: + +1. `@GetMapping("/users")` on a `@RestController` → one Route, + `framework=spring_mvc`, `http_method=GET`, `path=/users`. +2. `@RequestMapping(value="/api", method=RequestMethod.POST)` → + `http_method=POST`. +3. Class-level `@RequestMapping("/api/v1")` + method-level + `@GetMapping("/users")` → `path=/api/v1/users`. +4. `@RequestMapping(path={"/a", "/b"})` → two Routes, same method. +5. `Mono getUser()` return type with `@GetMapping` → + `framework=webflux`. +6. `@FeignClient(name="user-svc", url="", path="/users")` interface + with one `@GetMapping("/{id}")` method → one Route, + `framework=feign`, `feign_name=user-svc`, `path=/users/{id}`. +7. `@KafkaListener(topics="orders")` → Route with + `kind=kafka_topic`, `framework=kafka`, `topic=orders`, + `http_method=''`. +8. `@KafkaListener(topics="${app.topic}")` → **no Route emitted in + PR-A1**; `routes_skipped_unresolved` counter incremented. +9. `@GetMapping(Endpoints.USERS)` (constant ref) → no Route emitted; + counter incremented. +10. Path normalization: `_normalize_path("/api/users/{id}/orders/{oid}")` + → `("/api/users/{}/orders/{}", "^/api/users/[^/]+/orders/[^/]+/?$")`. +11. Path normalization with regex constraint: + `/api/users/{id:\\d+}` → template `{}`, regex `\\d+`. + +#### 4.2 New fixture: `tests/fixtures/route_extraction_smoke/` + +Maven-shaped mini-project, ~6 files: +- `UserController.java` — Spring MVC, class + method mappings. +- `OrderController.java` — `@RequestMapping` array form. +- `UserClient.java` — `@FeignClient` interface, 3 methods. +- `OrderListener.java` — `@KafkaListener` class form. +- `WebFluxController.java` — `Mono`/`Flux` return types. +- `pom.xml` — multi-module shape with two services to exercise + microservice scoping. + +#### 4.3 Extensions to existing tests + +`tests/test_ast_graph_build.py`: +- New: `test_routes_and_exposes_populated` on bank-chat-system → + `count(Route) > 0` and `count(EXPOSES) > 0`. +- Update: `test_schema_has_all_expected_tables` adds `"Route"`, + `"EXPOSES"`. +- Update: `test_graph_meta_present_and_versioned` expects + `ontology_version == 5`, plus `routes_total >= 0`, + `routes_by_framework` non-empty. +- New: `test_route_id_includes_microservice` — fixture has same + `/api/users` path in two services; assert two distinct `Route` ids. +- New: `test_exposes_edge_direction` — `(Symbol)-[:EXPOSES]->(Route)` + succeeds; `(Route)-[:EXPOSES]->(Symbol)` returns 0 rows. + +### 5. PR-A1 Definition of done + +- [ ] `pytest` green; new + regression. +- [ ] `build_ast_graph.py --source-root tests/bank-chat-system` produces + a graph with `ontology_version=5` and non-empty `Route` / + `EXPOSES` tables. +- [ ] `graph_meta.routes_by_framework` shows at least `spring_mvc` for + bank-chat-system. +- [ ] No SpEL/const-ref routes in the graph yet (those land in PR-A2). +- [ ] PR description quotes the `RouteExtractionStats` from a manual + run on bank-chat-system. + +## PR-A1 implementation step list + +| # | Step | File(s) | Done when | +| -- | ---------------------------------------------------------------- | ------------------------ | -------------------------------------------------- | +| 1 | Add `VALID_ROUTE_FRAMEWORKS` / `VALID_ROUTE_KINDS` | `java_ontology.py` | imported successfully | +| 2 | Add `RouteDecl` dataclass + bump `ONTOLOGY_VERSION` 4→5 | `ast_java.py` | `ONTOLOGY_VERSION == 5` | +| 3 | Implement `_collect_routes` (literal-only) | `ast_java.py` | unit cases 1–7 pass | +| 4 | Implement skip-and-count for SpEL / constant_ref | `ast_java.py` | unit cases 8, 9 pass | +| 5 | Implement `_normalize_path` + `_route_id` helpers | `build_ast_graph.py` | unit cases 10, 11 pass | +| 6 | Add `RouteRow` / `ExposesRow` / `RouteExtractionStats` | `build_ast_graph.py` | imports clean | +| 7 | Implement `pass4_routes` and wire after `pass3_calls` | `build_ast_graph.py` | rebuild populates `routes_rows` | +| 8 | Add `_SCHEMA_ROUTE` / `_SCHEMA_EXPOSES`; create + drop wired | `build_ast_graph.py` | rebuild succeeds | +| 9 | Writers + `graph_meta` extension | `build_ast_graph.py` | `graph_meta.routes_total > 0` on smoke corpus | +| 10 | New fixture project | `tests/fixtures/route_extraction_smoke/` | files exist | +| 11 | New + extended tests | `tests/` | `pytest` green | + +--- + +# PR-A2 — B2a SpEL/constant-ref + MCP tools + +**Goal:** Turn the `routes_skipped_unresolved` counter into actual +unresolved `Route` nodes with proper confidence scoring; add the +read-only MCP tools that consume the route graph. + +## File-by-file changes + +### 1. `ast_java.py` — three-strategy resolution + +Replace the "skip" branch in `_collect_routes` with full three-strategy +ladder: + +| Strategy | When | `path` field | `path_template` / `path_regex` | `confidence` | `resolved` | +| --------------- | ------------------------------------------------- | ------------------------------------------------------ | ------------------------------ | ------------ | ---------- | +| `annotation` | literal string | as-written | normalized | `1.0` | `True` | +| `spel` | `${…}` placeholder anywhere in the path | as-written (with `${…}` retained) | `""` / `""` | `0.85` | `False` | +| `constant_ref` | bare identifier or qualified ident (no `${}`) | as-written (`Endpoints.USERS`) | `""` / `""` | `0.7` | `False` | + +Detection: a SpEL/const-ref node in the tree-sitter AST is anything +where the annotation argument is **not** a `string_literal`. SpEL +specifically is a `string_literal` whose decoded text starts with `${` +or contains `${` (Spring runtime evaluates SpEL inside string literals +too, e.g. `@GetMapping("${app.api.base}/users")`). + +For SpEL inside a literal: `decode_string_literal` then check `re.search(r"\\$\\{", text)`. + +### 2. `kuzu_queries.py` — read-only helpers + +Add (~80 lines): + +```python +def list_routes(graph, *, microservice: str | None = None, + framework: str | None = None, + path_prefix: str | None = None, + method: str | None = None, + limit: int = 100) -> list[dict]: ... + +def find_route_handlers(graph, *, route_id: str) -> list[dict]: + """ + All `Symbol`s that EXPOSES this route. (Plural because Feign + `feign_inherit` and class-level @RequestMapping arrays can produce + multiple exposers.) + """ + +def get_route_by_path(graph, *, microservice: str, path_template: str, + method: str = '') -> dict | None: ... +``` + +Cypher patterns (Kuzu dialect): +```cypher +MATCH (s:Symbol)-[:EXPOSES]->(r:Route) +WHERE r.framework = $fw AND r.path STARTS WITH $prefix +RETURN r, s +ORDER BY r.framework, r.path +LIMIT $limit +``` + +**Do not** add `find_route_callers` — that's B2b's tool. PR-A2 only +ships the read-only handler-lookup side. + +### 3. `server.py` — MCP tools + +Three new MCP tools: + +| Tool | Inputs | Output | +| --------------------- | -------------------------------------------------------- | --------------------------------------- | +| `list_routes` | `microservice?`, `framework?`, `path_prefix?`, `method?`, `limit?` | List of route dicts | +| `find_route_handlers` | `route_id` | List of `{symbol, confidence, strategy}` | +| `get_route_by_path` | `microservice`, `path_template`, `method?` | Single route dict or `null` | + +Update `_INSTRUCTIONS` to mention the new tools and the deferred +`find_route_callers` (note: "available after B2b ships"). + +### 4. Tests for PR-A2 + +#### 4.1 New tests in `tests/test_route_extraction.py` + +12. `@GetMapping("${app.api.base}/users")` → Route with + `strategy=spel`, `confidence=0.85`, `resolved=False`, + `path_template==""`, `path_regex==""`. +13. `@GetMapping(Endpoints.USERS)` → `strategy=constant_ref`, + `confidence=0.7`, `resolved=False`. +14. Mixed: `@RequestMapping("${prefix}" + Endpoints.USERS)` (string + concat) → out of scope; document that it falls through to + `constant_ref` since it isn't a string literal. + +#### 4.2 New tests in `tests/test_kuzu_queries.py` + +15. `test_list_routes_filter_by_framework` — fixture with mixed + `spring_mvc` / `feign` returns only requested. +16. `test_find_route_handlers_feign_array` — Feign interface with 3 + methods returns 3 handlers when the type-level Route is queried. +17. `test_get_route_by_path_microservice_isolated` — fixture has + `/api/users` in two services; lookup with svc A returns A's route, + not B's. + +#### 4.3 New tests in `tests/test_mcp_tools.py` + +18. Smoke for each of the three new MCP tools (same pattern as + existing `test_find_injectors_tool`). + +### 5. PR-A2 Definition of done + +- [ ] `pytest` green; new + regression. +- [ ] After rebuild on bank-chat-system, `graph_meta.routes_resolved_pct` + reported and quoted in the PR description. +- [ ] All three new MCP tools callable through the server. +- [ ] `_INSTRUCTIONS` updated with the new tools. + +## PR-A2 implementation step list + +| # | Step | File(s) | Done when | +| - | ---------------------------------------------------------- | --------------------- | ---------------------------------------- | +| 1 | Replace skip-branch with three-strategy resolution | `ast_java.py` | unit 12, 13 pass | +| 2 | Implement `list_routes` / `find_route_handlers` / `get_route_by_path` | `kuzu_queries.py` | unit 15, 16, 17 pass | +| 3 | Wire MCP tools | `server.py` | unit 18 passes | +| 4 | Update `_INSTRUCTIONS` | `server.py` | grep finds new tool names | +| 5 | Update `README.md` route section | `README.md` | manual review | + +--- + +# PR-A3 — B2a brownfield (route_overrides + @CodebaseRoute) + +**Goal:** Make the route detector work on legacy codebases that use +custom (non-Spring) annotations. Mirrors the existing role/capability +brownfield system **exactly**. **Mandatory reading before this PR:** +[`plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md`](completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md). + +## File-by-file changes + +### 1. `graph_enrich.py` — extend BrownfieldOverrides + new resolver + +Extend `BrownfieldOverrides` (around line 162): + +```python +@dataclass(frozen=True) +class BrownfieldOverrides: + annotation_to_role: dict[str, str] + annotation_to_capabilities: dict[str, tuple[str, ...]] + fqn_role: dict[str, str] + fqn_capabilities: dict[str, tuple[str, ...]] + # NEW for B2a: + annotation_to_route_hint: dict[str, "RouteHint"] # by annotation FQN + fqn_to_route_hint: dict[str, "RouteHint"] # by class FQN +``` + +Where `RouteHint` is a small frozen dataclass: +```python +@dataclass(frozen=True) +class RouteHint: + framework: str + kind: str + path: str = "" + method: str = "" + topic: str = "" + broker: str = "" +``` + +Extend `_load_brownfield_overrides` to read the new YAML keys +`route_overrides.annotations` and `route_overrides.fqn` from +`.lancedb-mcp.yml`. **Do not duplicate the file-loading code** — add +parsing branches inside the existing function. + +YAML shape (mirrors `role_overrides`): +```yaml +route_overrides: + annotations: + "com.acme.AcmeRoute": + framework: spring_mvc + kind: http_endpoint + method: GET + fqn: + "com.legacy.UserApi": + framework: spring_mvc + kind: http_endpoint + path: "/legacy/users" +``` + +New resolver, shape-identical to `resolve_role_and_capabilities` (line 466): + +```python +def resolve_routes_for_method( + *, + method_decl: MethodDecl, + enclosing_type: TypeDecl, + overrides: BrownfieldOverrides, + meta_chain: dict[str, frozenset[str]] | None, + builtin_routes: list[RouteDecl], +) -> list[RouteDecl]: + """ + Apply 5-layer composition (last writer wins) to produce the final + route list for a method: + + 1. `builtin_routes` — what `_collect_routes` produced (Spring, + Feign, Kafka built-ins). + 2. Layer B annotations: any annotation on the method or type whose + FQN is in `overrides.annotation_to_route_hint`. + 3. Layer A meta-chain: any annotation that transitively meta-points + to a built-in framework annotation (re-use + `collect_annotation_meta_chain`). + 4. Layer C in-source: `@CodebaseRoute` on the method (or + `@CodebaseRoutes` repeatable container). + 5. Layer B fqn: if `enclosing_type.fqn` is in + `overrides.fqn_to_route_hint`, apply the hint to every method + whose route list is still empty. + """ +``` + +The composition rule is **last writer wins per key** (`framework` / +`kind` / `path` / etc.), not "replace whole list". This mirrors +`resolve_role_and_capabilities`. + +### 2. `ast_java.py` — `@CodebaseRoute` source stub support + +Add detection in `_collect_routes`. After collecting built-in routes, +also collect: + +- `@CodebaseRoute(framework=…, kind=…, path=…, method=…, topic=…)` +- `@CodebaseRoutes({…})` — `@Repeatable` container. + +These are emitted as `RouteDecl` with `resolution_strategy='codebase_route'` +and the same confidence as their underlying form (`1.0` for literal, +`0.85` for SpEL, `0.7` for constant_ref). **They are visible to the +five-layer resolver in `graph_enrich`** — `_collect_routes` does not +itself merge layers; that's `resolve_routes_for_method`. + +### 3. `build_ast_graph.py` — wire resolver into pass4_routes + +Replace direct use of `method.routes` in `pass4_routes` with: + +```python +final_routes = resolve_routes_for_method( + method_decl=method, + enclosing_type=type_decl, + overrides=overrides, # already loaded in pass2 / pass3 + meta_chain=meta_chain, # already loaded in pass2 + builtin_routes=method.routes, +) +for route in final_routes: + _emit_route(tables, route, microservice=ms, module=mod) +``` + +Update `RouteExtractionStats` with: +- `routes_from_brownfield_pct: float` +- `routes_by_layer: dict[str, int]` (counts of `'builtin' | 'layer_b_ann' | 'layer_a_meta' | 'layer_c_source' | 'layer_b_fqn'`) + +Surface both via `graph_meta`. + +### 4. Source stubs in `tests/fixtures/` + +Add `@CodebaseRoute` and `@CodebaseRoutes` to the existing brownfield +fixture annotation directory (same place where `@CodebaseRole` / +`@CodebaseCapability` live). + +### 5. Tests for PR-A3 (12 mandatory brownfield fixtures) + +In `tests/test_brownfield_routes.py`: + +19. **Layer B annotation override:** custom `@AcmeRoute` mapped via + YAML → produces a `Route` with the configured framework/kind. +20. **Layer B fqn override:** legacy class `com.legacy.UserApi` listed + in `route_overrides.fqn` → all its methods get routes. +21. **Layer A meta-chain:** `@AcmeRestController` is meta-annotated + with `@RestController` → its `@GetMapping` methods produce routes + even though the class is not directly `@RestController`. +22. **Layer C source stub:** method with `@CodebaseRoute(framework=spring_mvc, kind=http_endpoint, path="/x")` + → Route emitted. +23. **Layer C wins over auto-detect:** method has both `@GetMapping("/a")` + *and* `@CodebaseRoute(path="/b")` → final Route's path is `/b`, + `resolution_strategy='codebase_route'`. +24. **`@CodebaseRoutes` repeatable:** method with two `@CodebaseRoute` + entries via `@CodebaseRoutes({…})` → two Routes emitted. +25. **Layer B fqn wins over Layer C:** as designed in + `resolve_role_and_capabilities` — fqn override is the *outermost* + layer (last writer wins). +26. **Empty override file:** missing `.lancedb-mcp.yml` → no error, + no overrides applied. +27. **Malformed override:** YAML with unknown framework value → + rejected at load time with a clear error message that mentions the + bad key. +28. **Brownfield doesn't affect built-ins:** vanilla `@GetMapping` + fixture still yields the same Routes whether or not + `route_overrides` is present. +29. **Determinism:** running twice over the same fixture produces + byte-identical Route ids. +30. **`graph_meta.routes_from_brownfield_pct`** matches the + fixture-counted percentage of brownfield-sourced routes. + +### 6. PR-A3 Definition of done + +- [ ] All 12 brownfield fixtures pass. +- [ ] PR description cites line numbers from + `PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md` (Fix 1 — meta + chain, Fix 2 — iterative closure, Fix 6 — sorted iteration) to + prove the implementation followed the existing pattern. +- [ ] No new file-loading code in `_load_brownfield_overrides` — + only new parsing branches. +- [ ] `graph_meta.routes_from_brownfield_pct` reported on + bank-chat-system in PR description. +- [ ] `README.md` brownfield section extended to document + `route_overrides` and `@CodebaseRoute`. + +## PR-A3 implementation step list + +| # | Step | File(s) | Done when | +| - | ------------------------------------------------------------- | --------------------- | ---------------------------------------- | +| 1 | Read `PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md` end-to-end | (no code) | implementer notes Fix 1, 2, 6 in PR desc | +| 2 | Add `RouteHint` + extend `BrownfieldOverrides` | `graph_enrich.py` | dataclass imports clean | +| 3 | Extend `_load_brownfield_overrides` for `route_overrides:` | `graph_enrich.py` | YAML parses without error | +| 4 | Implement `resolve_routes_for_method` (5-layer) | `graph_enrich.py` | fixtures 19–27 pass | +| 5 | Add `@CodebaseRoute` / `@CodebaseRoutes` detection | `ast_java.py` | fixtures 22, 23, 24 pass | +| 6 | Wire resolver into `pass4_routes` | `build_ast_graph.py` | brownfield routes appear in graph | +| 7 | Extend stats + `graph_meta` | `build_ast_graph.py` | `routes_from_brownfield_pct` populated | +| 8 | Add fixture annotations | `tests/fixtures/` | files exist | +| 9 | All 12 fixture tests | `tests/test_brownfield_routes.py` | pytest green | +| 10 | Update `README.md` | `README.md` | manual review | + +--- + +# PR-B — B4 `analyze_pr` MCP tool + +**Goal:** Add `analyze_pr(diff_unified: str)` MCP tool that maps a unified +diff to changed `Symbol` ids, runs `impact_analysis` on them, returns a +risk score and human-readable summary. No graph schema changes. + +## File-by-file changes + +### 1. New module: `pr_analysis.py` + +```python +@dataclass +class ChangedSymbol: + symbol_id: str + fqn: str + kind: str # 'method' | 'type' | 'field' + change_type: str # 'added' | 'removed' | 'modified' + file: str + hunk_lines: list[int] # affected line numbers in the new file + +@dataclass +class PrRiskReport: + changed_symbols: list[ChangedSymbol] + blast_radius_total: int # sum of `impact_analysis` callers + blast_radius_by_symbol: dict[str, int] + cross_service_callers: int # sum of callers in a different microservice (uses CALLS only — HTTP_CALLS arrives with B2b) + routes_touched: list[str] # Route.id values for any EXPOSES the changed symbols carry + risk_score: float # 0.0–1.0; see formula below + risk_band: str # 'low' | 'medium' | 'high' + notes: list[str] # human-readable bullets + +def parse_unified_diff(diff_text: str) -> list["DiffHunk"]: ... +def map_hunks_to_symbols(graph, hunks: list["DiffHunk"]) -> list[ChangedSymbol]: ... +def compute_risk(graph, changed: list[ChangedSymbol]) -> PrRiskReport: ... +``` + +Use the [`unidiff`](https://pypi.org/project/unidiff/) PyPI library +for parsing. Add it to `pyproject.toml` / `requirements.txt`. + +#### 1.1 Hunk → symbol mapping + +For each `(file, line_range)` hunk: +1. Find all `Symbol` rows in the graph where + `filename = hunk.file AND start_line <= hunk_max AND end_line >= hunk_min`. +2. Symbols whose entire body is inside the hunk → `change_type='modified'`. +3. Symbols where only a few lines overlap (e.g. signature change) → + still `'modified'`; flag in `notes`. +4. Added symbols (file is `+++ /dev/null` reverse, or symbols not in + graph but in the new content) → `'added'`. **PR-B does not parse + added Java content** — only graph-resident symbols are mapped. New + symbols are reported as a count in `notes` ("3 new methods not yet + indexed; risk underestimated"). +5. Removed symbols → `'removed'`. Look up by old line numbers from the + `---` side of the diff. + +#### 1.2 Risk score formula + +``` +risk_score = clip( + 0.4 * normalize(blast_radius_total, 100) + + 0.3 * normalize(cross_service_callers, 20) + + 0.2 * (1.0 if any change is in a public interface method else 0.0) + + 0.1 * normalize(len(routes_touched), 5), + 0, 1) + +risk_band: + < 0.3 → 'low' + < 0.7 → 'medium' + else → 'high' +``` + +`normalize(x, ceiling) = min(x, ceiling) / ceiling`. Constants are +v1 baselines; document in code that they are intentionally simple and +expected to be tuned after real-world use. + +### 2. `kuzu_queries.py` — supporting query + +Add `find_symbols_in_file_range(graph, *, filename, start_line, end_line)` +that returns symbols overlapping the given range. Used by +`map_hunks_to_symbols`. + +### 3. `server.py` — MCP tool wiring + +```python +@mcp.tool() +def analyze_pr(diff_unified: str) -> dict: + """ + Map a unified diff to changed symbols and report blast radius. + Inputs: unified-diff text (e.g. `git diff master`). + Output: PrRiskReport as a JSON-serializable dict. + """ +``` + +Add to `_INSTRUCTIONS`. + +### 4. Tests for PR-B + +#### 4.1 New file: `tests/test_pr_analysis.py` + +31. `parse_unified_diff` on a single-file diff → list with one + `DiffHunk`. +32. `parse_unified_diff` on multi-file diff → list with N hunks. +33. `map_hunks_to_symbols` on bank-chat-system: hand-crafted diff over + `ChatManagementService.enqueue` → returns that symbol with + `change_type='modified'`. +34. `compute_risk` on a leaf private method → `risk_band='low'`, + `blast_radius_total <= 2`. +35. `compute_risk` on a controller method (many callers, on a route) + → `risk_band='high'`, `routes_touched` non-empty. +36. Removed symbol: diff with a `-public void foo()` block → reported + with `change_type='removed'`. +37. Added symbol: diff adds a new method → reported in `notes` with + "not yet indexed". + +#### 4.2 New tests in `tests/test_mcp_tools.py` + +38. Smoke: `analyze_pr` over a tiny diff returns a dict with + `risk_score`, `risk_band`, `changed_symbols`. + +### 5. PR-B Definition of done + +- [ ] `pytest` green. +- [ ] `analyze_pr` callable via MCP server. +- [ ] PR description includes a sample run on a real diff against + bank-chat-system, with the resulting JSON report quoted. +- [ ] `unidiff` added to dependencies. +- [ ] `README.md` documents the tool with one example. + +## PR-B implementation step list + +| # | Step | File(s) | Done when | +| - | ------------------------------------------------- | -------------------- | ------------------------------------ | +| 1 | Add `unidiff` to dependencies | `pyproject.toml` | install succeeds | +| 2 | Implement `parse_unified_diff` | `pr_analysis.py` | tests 31, 32 pass | +| 3 | Implement `find_symbols_in_file_range` | `kuzu_queries.py` | unit query works | +| 4 | Implement `map_hunks_to_symbols` | `pr_analysis.py` | tests 33, 36, 37 pass | +| 5 | Implement `compute_risk` | `pr_analysis.py` | tests 34, 35 pass | +| 6 | Wire `analyze_pr` MCP tool | `server.py` | test 38 passes | +| 7 | Update `_INSTRUCTIONS` + `README.md` | `server.py`, `README.md` | manual review | + +--- + +# PR-C — B5 layered ignore patterns + +**Goal:** Replace the single `COMMON_EXCLUDED_PATH_PATTERNS` list with a +layered ignore system: project-root `.lancedb-mcp/ignore` → +nested `.lancedb-mcp/ignore` files (innermost wins) → `.gitignore` +integration. Uses [`pathspec`](https://pypi.org/project/pathspec/). + +**Behavioural compatibility:** the existing +`COMMON_EXCLUDED_PATH_PATTERNS` set becomes the **default top layer** +when no project-root ignore file exists. So projects without any +`.lancedb-mcp/ignore` see no behaviour change. + +## File-by-file changes + +### 1. New module: `path_filtering.py` + +```python +@dataclass +class IgnoreLayer: + root: Path # the directory this layer applies to + spec: pathspec.PathSpec + source: str # 'builtin_default' | 'project_root' | 'nested' | 'gitignore' + +class LayeredIgnore: + def __init__(self, project_root: Path, *, use_gitignore: bool = True): ... + def is_ignored(self, path: Path) -> tuple[bool, IgnoreLayer | None]: ... + def diagnose(self, path: Path) -> str: ... # multi-line explanation; see §3 +``` + +Resolution order (innermost wins; later overrides earlier): +1. `builtin_default` — current `COMMON_EXCLUDED_PATH_PATTERNS`. +2. `project_root` — `/.lancedb-mcp/ignore`. +3. `nested` — every `/.lancedb-mcp/ignore` discovered while + walking; for a given file, the *closest* nested ignore wins. +4. `gitignore` — if `use_gitignore` and a sibling `.gitignore` exists, + merge it as an additional layer. **Negation patterns (`!foo`) are + honoured.** + +`is_ignored(path)` returns the **last layer that produced a match**, or +`(False, None)` if no layer matched. + +### 2. `graph_enrich.py` + `java_index_flow_lancedb.py` — replace direct use + +Find every call site that consumes `COMMON_EXCLUDED_PATH_PATTERNS`: + +```bash +grep -n COMMON_EXCLUDED_PATH_PATTERNS *.py +``` + +Replace with `LayeredIgnore(project_root)` calls. Keep the legacy +constant in `path_filtering.py` for the `builtin_default` layer. + +`iter_java_source_files(root, excludes)` becomes +`iter_java_source_files(root, *, ignore: LayeredIgnore)`. Old signature +deprecated; provide a compatibility shim that builds a `LayeredIgnore` +from the legacy `excludes` list for one release, with a +`DeprecationWarning`. + +### 3. Diagnostics — `diagnose_ignore` MCP tool + +```python +@mcp.tool() +def diagnose_ignore(path: str) -> dict: + """ + Explain whether `path` is ignored and which layer made the decision. + Returns: + { + "ignored": bool, + "layer": "builtin_default" | "project_root" | ... | None, + "matching_pattern": "**/*.class" | None, + "explanation": "Excluded by .lancedb-mcp/ignore at /repo/svc-a (line 4): **/build/**" + } + """ +``` + +Useful for users debugging "why is this file missing from the graph". + +### 4. Tests for PR-C + +#### 4.1 New file: `tests/test_path_filtering.py` + +39. Builtin default: `Foo.class` is ignored when no other layer matches. +40. Project-root override: `.lancedb-mcp/ignore` containing + `!**/Foo.class` un-ignores it. +41. Nested ignore: nested `.lancedb-mcp/ignore` further down adds + `**/Generated*.java`; only files under that nested root are + ignored, files in siblings are not. +42. Innermost wins: nested ignore re-includes (`!**/Generated*.java`) + something the project-root ignored. +43. `.gitignore` integration: `.gitignore` at repo root with `**/build/` + ignores `build/` even without `.lancedb-mcp/ignore`. +44. `.gitignore` with `use_gitignore=False` → no effect. +45. `diagnose` output for a file matched by nested layer: explanation + cites the nested ignore file path and the pattern line number. +46. `is_ignored` on a path *outside* the project root → `False`, + `None`. + +#### 4.2 Extension to existing tests + +47. `tests/test_lancedb_e2e.py` — add a fixture project with a custom + `.lancedb-mcp/ignore` and assert the indexed file count differs + accordingly. + +#### 4.3 New MCP tool test + +48. `tests/test_mcp_tools.py` — smoke for `diagnose_ignore`. + +### 5. PR-C Definition of done + +- [ ] `pytest` green; legacy `COMMON_EXCLUDED_PATH_PATTERNS` projects + see zero behaviour change. +- [ ] `diagnose_ignore` MCP tool works. +- [ ] PR description includes before/after file count on a project + that uses a `.lancedb-mcp/ignore` to exclude generated code. +- [ ] `pathspec` added to dependencies. +- [ ] `README.md` has a new "Ignore patterns" section. + +## PR-C implementation step list + +| # | Step | File(s) | Done when | +| - | ----------------------------------------------------- | ------------------------ | -------------------------------------- | +| 1 | Add `pathspec` dependency | `pyproject.toml` | install succeeds | +| 2 | Implement `LayeredIgnore` | `path_filtering.py` | tests 39–46 pass | +| 3 | Implement `diagnose_ignore` helper | `path_filtering.py` | test 45 passes | +| 4 | Replace `COMMON_EXCLUDED_PATH_PATTERNS` call sites | `graph_enrich.py`, `java_index_flow_lancedb.py` | grep returns only the canonical definition | +| 5 | Compatibility shim for old `iter_java_source_files` | `path_filtering.py` | deprecation warning emitted, tests still pass | +| 6 | Wire `diagnose_ignore` MCP tool | `server.py` | test 48 passes | +| 7 | Extend `tests/test_lancedb_e2e.py` | `tests/test_lancedb_e2e.py` | test 47 passes | +| 8 | Update `README.md` | `README.md` | manual review | + +--- + +# Cross-PR risks (re-stated from the proposal) + +| # | Risk | Severity | Mitigation | +| -- | ------------------------------------------------------------------------------------------ | -------- | ------------------------------------------------------------------------------------------- | +| 1 | `_normalize_path` regression breaks future B6 matcher | High | PR-A1 must include round-trip tests on `path_template ↔ path_regex` so B2b inherits a stable contract. | +| 2 | Brownfield divergence from role/capability resolver | High | PR-A3 implementer must cite line numbers from `PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md` in PR description. | +| 3 | `unidiff` library quirks with binary diffs / renames | Medium | PR-B explicitly skips binary diffs and reports renames as `notes`, not `changed_symbols`. | +| 4 | `pathspec` deviation from gitignore semantics on edge cases | Medium | PR-C uses `pathspec.GitIgnoreSpec` (not `WildMatchPattern`) for the `gitignore` layer specifically. | +| 5 | `routes_from_brownfield_pct` ambiguous when both built-in and brownfield contribute | Low | Define as: % of final routes whose `resolution_strategy ∈ {layer_b_ann, layer_a_meta, layer_c_source, layer_b_fqn}`. Document in code. | + +# Out of scope (tracked elsewhere) + +- **B2b + B6** — imperative HTTP/async edges + cross-service matcher. + See `propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md`. Do not pre-implement + any caller-side detection here. +- **Microservice-scoped CALLS resolution** — the correctness gap in + `_lookup_method_candidates`. Orthogonal small PR; tracked in + `propose/TIER1-COMPLETION-PROPOSE.md` §10. +- **B7 Louvain communities**, **B8 dead code**, **B3 runtime traces** — + separate proposals. + +# Done-definition (whole plan) + +1. PRs A1, A2, A3, B, C all merged in order. +2. Ontology version `5` on bank-chat-system after rebuild. +3. `pytest` green at every commit. +4. `graph_meta` includes `routes_total`, `exposes_total`, + `routes_by_framework`, `routes_resolved_pct`, + `routes_from_brownfield_pct`. +5. New MCP tools live: `list_routes`, `find_route_handlers`, + `get_route_by_path`, `analyze_pr`, `diagnose_ignore`. +6. `README.md` updated for routes, brownfield routes, `analyze_pr`, + layered ignores. +7. Each PR's description quotes the relevant stats from a manual run + on bank-chat-system as evidence. diff --git a/propose/TIER1-COMPLETION-PROPOSE.md b/propose/TIER1-COMPLETION-PROPOSE.md new file mode 100644 index 0000000..131238e --- /dev/null +++ b/propose/TIER1-COMPLETION-PROPOSE.md @@ -0,0 +1,960 @@ +# Tier 1 completion — active proposal + +Status: **active — ready for planning**. Pairs with the borrow guide +[`reports/what-to-borrow-from-cmm.md`](../reports/what-to-borrow-from-cmm.md) +and follows on from the completed +[`propose/completed/CALL-GRAPH-PROPOSE.md`](completed/CALL-GRAPH-PROPOSE.md). + +This proposal closes out **Tier 1 of the borrow guide** within the +**static-analysis scope**. It explicitly defers items that require +runtime data (B3) and items already shipped (B1). + +--- + +## 1. Why now + +The call-graph layer (intra-JVM `CALLS`) is in. With +`confidence`-scored static edges plus `EXTENDS` / `IMPLEMENTS` / +`INJECTS`, the graph captures **every wire that lives inside one JVM**. + +What it still cannot answer: + +- *"List all controller routes under `/api/v1/`"* / *"What endpoints + does `user-service` expose?"* — endpoints are only present today as + `role=CONTROLLER` symbols with no path/method metadata. The Spring + annotation tree carries the answer; we just don't surface it. +- *"What's the blast radius of this PR?"* — `impact_analysis` exists + for symbols, but not for a `git diff`. The reverse-closure machinery + is already there; only the diff-to-symbol mapper is missing. +- *"Why does indexing miss / over-include files in this monorepo?"* — + the current single-list of glob excludes + (`COMMON_EXCLUDED_PATH_PATTERNS`) ignores `.gitignore` content and has + no per-project override. + +These are the three remaining static-analysis items in Tier 1: **B2a** +(`Route` declarations), **B4** (`analyze_pr` with risk score), **B5** +(layered ignores). + +--- + +## 2. Scope of this proposal + +### Static-analysis remit + +This proposal stays **inside the static-analysis layer**: source code, +AST, and FQN-keyed graph state. No runtime data, no cross-process +matching. + +### Tier 1 status snapshot + +| Borrow item | Status | Where | +|---|---|---| +| **B1** — confidence-scored `CALLS` cascade | ✅ done | `build_ast_graph.py`: `_resolve_receiver_type`, `_resolve_and_emit_call`; `kuzu_queries.py` `CALLS` schema with `confidence` / `strategy` / `source` | +| **B2** — `Route` as a first-class node | ❌ **not started** | Spring annotations are detected for *role classification* only (`ast_java.py` `_ROLE_BY_ANNOTATION`); no `Route` node, no `EXPOSES` / `HTTP_CALLS` / `ASYNC_CALLS` rel, no path/method extraction | +| **B3** — runtime trace ingestion | ⛔ **out of scope** (runtime data, not static) | — | +| **B4** — `analyze_pr` with risk score | ❌ **not started** | `impact_analysis` exists for a single symbol; no diff parser, no risk formula, no MCP tool | +| **B5** — layered ignore patterns | ❌ **not started** | `java_index_v1_common.py` `COMMON_EXCLUDED_PATH_PATTERNS` is a single hardcoded list; no `.gitignore` walk, no project-level override file | + +Verified state — Cursor can re-confirm by `grep`-ing the files cited above. + +### B2 split — declarations now, edges later (with B6) + +Originally B2 was a single feature: `Route` node + `EXPOSES` (server +side) + `HTTP_CALLS` / `ASYNC_CALLS` (client side). Designing the +client-side edges in isolation forces decisions that B6 (cross-service +matching) would need to revisit: + +- The **dominant case** for `HTTP_CALLS` is *cross-JVM* (Feign clients, + external base-URL `RestTemplate`, Kafka producer→consumer). B6 is the + matcher that joins those calls to peer services' `Route` nodes. +- Feign's `name="user-service"` argument is a service-registry join key + for B6, not just a string property. +- `confidence` semantics for `HTTP_CALLS` to a *phantom* `Route` + flip from "low — unknown target" (no B6) to "high — resolved + cross-service" (with B6). Choosing one without the other locks in a + schema that needs to change. +- `path_template` canonicalization needs a join-friendly normal form + shared by *exposers* and *callers*; B6 may discover style mismatches + between services that drive that decision. + +So B2 is split: + +- **B2a — `Route` + `EXPOSES` only** (this proposal). Pure server-side + declarations: parse `@GetMapping` / `@RequestMapping` / + `@KafkaListener` / etc., create `Route` nodes, link them to declaring + methods via `EXPOSES`. **No `HTTP_CALLS` / `ASYNC_CALLS` in this + proposal.** +- **B2b + B6 together** — the imperative side (`HTTP_CALLS` / + `ASYNC_CALLS`) and the cross-service matcher land in a follow-on + proposal. They share design decisions (path canonicalization, join + keys, edge direction, cross-service `confidence` semantics) that are + cleanest to make once. + +What B2a unlocks immediately: + +- *"List all controller routes under `/api/v1/`"* — answerable. +- *"What endpoints does `user-service` expose?"* — answerable. +- *"Show me the route for this method"* — answerable. + +What B2a sets up cleanly: the join target for B2b/B6 is *already in the +graph* before the matcher is written. + +### What this proposal delivers + +Three independent sub-features, **landable as three separate PRs in any +order** (no shared code paths): + +1. **B2a — `Route` nodes + `EXPOSES`** (§4) +2. **B4 — `analyze_pr` MCP tool with risk score** (§5) +3. **B5 — Layered ignore patterns** (§6) + +### Explicit non-goals + +- **Imperative-side HTTP / async edges (`HTTP_CALLS` / `ASYNC_CALLS`).** + Deferred to the B2b + B6 proposal. See §10. +- **Cross-service matcher (B6 itself).** Same proposal as B2b — depends + on the `Route` node B2a delivers. +- **Path-template canonicalization beyond simple `{var}` capture → + template + regex.** Good enough for declarations alone; the full + normal form discussion belongs in B2b/B6. +- **AOP / proxy-aware resolution** of route handlers. Confidence-flagged + but unresolved remains the right behaviour for `@Async`, + `@Transactional` self-invocations, etc. Runtime traces (B3) are the + right fix. +- **Microservice classpath isolation in `CALLS` resolution.** Tracked + separately ([noted in this session]) — does not block Tier 1. + +--- + +## 3. Principle: additive evolution + +Same posture as `CALL-GRAPH-PROPOSE.md`. Nothing existing is removed. + +### What stays exactly as-is + +- LanceDB tables, `JavaLanceChunk` schema, CocoIndex flow. **No re-index + required for B4 / B5.** B2a requires a Kuzu rebuild only. +- All existing MCP tool signatures. New tools (`find_routes`, + `analyze_pr`) are additive. +- `CALLS` schema and its three resolution passes. B2a emits its rels in + a new pass that runs after `pass3_calls`. +- **Brownfield resolver execution order in `graph_enrich.py`** stays + exactly as documented (built-in → Layer B annotations → Layer A meta + chain → Layer C in-source → Layer B fqn). B2a hooks into this layered + composition rather than running its own parallel resolution. See §4.6. + +### What gets added on top + +| Sub-feature | AST / parsing | Graph builder | Kuzu schema | Queries / MCP | Ontology | +|---|---|---|---|---|---| +| **B2a Routes** | Annotation-arg extraction (path / method / topic / queue) in `ast_java.py` | New `pass4_routes` | `Route` node + `EXPOSES` rel | `find_routes` MCP tool; `trace_flow` `follow_routes` flag | 4 → 5 | +| **B4 `analyze_pr`** | none | none | none | New `analyze_pr` tool; new `_diff_to_symbols.py` helper | none | +| **B5 Ignores** | none | New `path_filter.py` (gitignore-spec via `pathspec`) | none | `graph_meta` exposes the resolved ignore stack for diagnostics | none | + +--- + +## 4. B2a — `Route` + `EXPOSES` (declarations only) + +### 4.1 Goal + +Turn endpoint declarations into graph-traversable metadata so that +listing, filtering, and per-method route lookup are first-class. This +is **server-side only** in this phase. + +### 4.2 Annotation surface (Spring 6.x focus) + +Three families. **Each populates the same `Route` node label**, with +`framework` distinguishing them. + +| Family | Annotations | `framework` | `kind` | +|---|---|---|---| +| **HTTP server** (Spring MVC + WebFlux) | `@RequestMapping`, `@GetMapping`, `@PostMapping`, `@PutMapping`, `@DeleteMapping`, `@PatchMapping` | `spring_mvc` | `http_endpoint` | +| **HTTP client (declarative)** | `@FeignClient` (class-level base path / `name` / `url`) + same mappings on its methods | `feign` | `http_consumer` | +| **Async listener** | `@KafkaListener`, `@RabbitListener`, `@JmsListener`, `@StreamListener` | `kafka` / `rabbitmq` / `jms` / `stream` | `kafka_topic` / `rabbit_queue` / `jms_destination` / `stream_binding` | + +Why Feign declarations are in scope here even though Feign callers +aren't: a `@FeignClient` interface is a **declarative endpoint +description** — it tells us "this microservice expects to consume +`GET /users/{id}` on `user-service`". It's structurally a route +declaration. The *imperative* `userClient.findById(123)` call is +B2b/B6's job; B2a stops at the interface. + +### 4.3 Schema additions + +```sql +-- New node label +CREATE NODE TABLE Route( + id STRING PRIMARY KEY, -- stable: hash(framework, normalized_path|topic, method, microservice) + kind STRING, -- 'http_endpoint' | 'http_consumer' | 'kafka_topic' | 'rabbit_queue' | 'jms_destination' | 'stream_binding' + framework STRING, -- 'spring_mvc' | 'webflux' | 'feign' | 'kafka' | 'rabbitmq' | 'jms' | 'stream' + method STRING, -- 'GET' | 'POST' | … | '' for async + path STRING, -- '/api/users/{id}' | '' for async + path_template STRING, -- normalized: '/api/users/{}' (curly captures collapsed) + path_regex STRING, -- '^/api/users/[^/]+$' — provided here so B2b/B6 can reuse it without re-deriving + topic STRING, -- async only + broker STRING, -- async only + feign_name STRING, -- @FeignClient(name=…) — empty for non-Feign; B2b/B6 will use it as the join key + feign_url STRING, -- @FeignClient(url=…) — empty when name-based + microservice STRING, + module STRING, + filename STRING, + start_line INT64, + end_line INT64, + resolved BOOLEAN -- false if path/topic was unparseable (SpEL, constant ref, etc.) +); + +CREATE REL TABLE EXPOSES( + FROM Symbol TO Route, + confidence DOUBLE, -- 1.0 annotation-derived literal | 0.85 SpEL ${prop} | 0.7 constant ref + strategy STRING -- 'annotation' | 'spel' | 'constant_ref' | 'feign_inherit' +); +``` + +`Route.id` is a stable hash so re-runs produce the same id and the same +rel does not duplicate. Including `microservice` in the hash means +*"`/api/users` exposed by service A"* and *"`/api/users` exposed by +service B"* are two different `Route` nodes — exactly the behaviour +B2b/B6 will need. + +`path_regex` is precomputed at extraction time and stored on the node +so the eventual B2b/B6 matcher does not need to re-derive it; this +keeps regex generation in **one** code path. + +**Edge direction is `(Symbol)-[:EXPOSES]->(Route)`.** The rationale: a +method exposes a route; the route is a destination. This makes the +B2b/B6 traversal `(caller)-[:HTTP_CALLS]->(Route)<-[:EXPOSES]-(handler)` +work without reversing direction at any hop. Locking it now. + +### 4.4 Extraction algorithm + +New `pass4_routes`, runs after `pass3_calls`. **Single phase** in this +proposal (B2a declarations only): + +For every `MethodDecl` whose enclosing `TypeDecl` carries one of the +trigger annotations (see §4.6 for how those are *resolved* — not just +the literal Spring set): + +1. Collect class-level base path. For Spring MVC controllers: + `@RequestMapping("/api/v1")`. For Feign: `@FeignClient(name=…, + url=…, path=…)`. For Kafka class listeners: + `@KafkaListener(topics=…)` at class level. +2. Collect method-level mapping: + - `value` / `path` — string or string array; arrays produce one + `Route` per element. + - `method` — for `@RequestMapping` only. `@GetMapping` etc. carry an + implicit method. + - `topics` / `queues` / `destination` for async. +3. Compose final path = `class_base + method_path` (handle leading / + trailing slashes; `class_base ?? "" + method_path ?? "/"` collapsed + via `posixpath.normpath`). +4. **Normalize path** (deterministic, source-of-truth for B2b/B6): + - `/api/users/{id}` → `path_template = "/api/users/{}"`, + `path_regex = "^/api/users/[^/]+/?$"`. + - Trailing slash variants collapsed (`/foo` and `/foo/` produce the + same template; regex allows both). + - Multiple `{}` segments handled left-to-right. +5. **Resolve annotation argument values** through three strategies, in + order: + - **Literal string** — `confidence=1.0`, `strategy='annotation'`. + - **SpEL `${app.api.base}`** — emit a `Route` with the literal SpEL + placeholder kept in `path` (e.g. `/${app.api.base}/users`), + `path_template` and `path_regex` left empty, + `strategy='spel'`, `confidence=0.85`, `resolved=false`. Future + enhancement: read property files. + - **Constant reference** (`Endpoints.USERS`) — emit a `Route` with + the unresolved expression in `path`, `strategy='constant_ref'`, + `confidence=0.7`, `resolved=false`. Future enhancement: walk the + constant. +6. Emit `Route` node + `EXPOSES` edge from the method's `Symbol.id` to + `Route.id`. For Feign interfaces, emit one extra `EXPOSES` edge per + method using `strategy='feign_inherit'` so per-method route lookup + works without traversing the type-method-class triangle. + +The phase is purely additive — it does not consult or modify +`tables.calls_rows`. + +### 4.5 No imperative side here (deliberately) + +This proposal does **not**: + +- Visit `RestTemplate` / `WebClient` / `KafkaTemplate.send` / + `StreamBridge.send` call sites. +- Emit `HTTP_CALLS` or `ASYNC_CALLS` edges. +- Match call-site URL literals against `Route.path_regex`. +- Walk Feign-interface method invocations to their `EXPOSES` route. +- Handle WebClient builder-chain extraction. + +All of the above belong in B2b + B6, where the cross-service join is +the primary design constraint. + +### 4.6 Brownfield integration (load-bearing — read carefully) + +**Why this matters for B2a:** legacy and vendor codebases routinely +wrap Spring stereotypes (`@AcmeRestController extends @RestController`) +or use proprietary annotations (`@HttpEndpoint`, `@MessageHandler`) +that this proposal's hardcoded annotation list does not know about. The +existing brownfield system already solves this for *roles* / +*capabilities*; B2a must extend the same machinery, **not** introduce a +parallel one. + +The user has explicitly called this out as critical for their use case +("legacy projects, if auto resolve did not work properly"). + +#### 4.6.1 What already exists (for context) + +`graph_enrich.py::resolve_role_and_capabilities` runs five layers in +this order; **last to apply wins**: + +1. Built-in inference (hardcoded annotation → role / capability map). +2. **Layer B annotations** — `role_overrides.annotations` and + `role_overrides.capabilities` from `.lancedb-mcp.yml`. +3. **Layer A meta-chain** — automatic walk over project `@interface` + declarations; resolves `@AcmeService → @Service → SERVICE` + transitively. +4. **Layer C in-source** — `@CodebaseRole` / `@CodebaseCapability` + stub annotations on a class. +5. **Layer B fqn** — `role_overrides.fqn` per-type config (highest + priority). + +#### 4.6.2 What B2a adds — `route_overrides` + +A new top-level key in `.lancedb-mcp.yml`, shaped to match the existing +`role_overrides` style so the brownfield surface stays one mental model: + +```yaml +microservice_roots: [] + +role_overrides: + annotations: + AcmeService: SERVICE + fqn: + com.legacy.OrderProcessor: + role: SERVICE + +# NEW — B2a +route_overrides: + # Layer B annotations: simple-name → route declaration semantics + annotations: + AcmeRestController: + framework: spring_mvc + kind: http_endpoint + # implies "this is a class-level controller; methods inside use + # path_attribute and method_attribute (or @GetMapping etc.) below" + class_path_attribute: basePath # @AcmeRestController(basePath="/api/x") + AcmeRoute: + framework: spring_mvc + kind: http_endpoint + path_attribute: value # @AcmeRoute("/users") + method_attribute: httpMethod # @AcmeRoute(value="/users", httpMethod="GET") + method_default: GET # used when method_attribute is absent + AcmeKafkaTopic: + framework: kafka + kind: kafka_topic + topic_attribute: name + CompanyHttpEndpoint: + framework: spring_mvc + kind: http_endpoint + path_attribute: url + method_attribute: verb + + # Layer C-equivalent for routes — direct per-FQN declaration + fqn: + com.legacy.SoapBridge#process(Request): + framework: spring_mvc + kind: http_endpoint + path: /legacy/soap/process + method: POST + com.legacy.JmsHandler: + framework: jms + kind: jms_destination + topic: legacy.events.in +``` + +Two layers, mirroring the role override design: + +- **`route_overrides.annotations`** — annotation simple name → "treat + this annotation as a route declaration with these argument + conventions". Resolved before the meta-chain walk. +- **`route_overrides.fqn`** — per-method or per-type FQN → fully + specified `Route` declaration. Highest priority. The user can pin a + route exactly when no annotation pattern matches at all. + +#### 4.6.3 New in-source stub — `@CodebaseRoute` + +Mirrors `@CodebaseRole` / `@CodebaseCapability` so the "last-resort +source stub" pattern carries over. The existing brownfield doc already +introduces `@CodebaseRole` as a way to fix things without YAML edits; +`@CodebaseRoute` is its route equivalent. + +```java +package com.example.rag; // any package; matched by simple name only + +import java.lang.annotation.*; + +public enum CodebaseRouteFramework { + SPRING_MVC, WEBFLUX, FEIGN, KAFKA, RABBITMQ, JMS, STREAM +} + +public enum CodebaseRouteKind { + HTTP_ENDPOINT, HTTP_CONSUMER, KAFKA_TOPIC, RABBIT_QUEUE, + JMS_DESTINATION, STREAM_BINDING +} + +@Target(ElementType.METHOD) +@Retention(RetentionPolicy.SOURCE) +@Repeatable(CodebaseRoutes.class) +public @interface CodebaseRoute { + CodebaseRouteFramework framework(); + CodebaseRouteKind kind(); + String path() default ""; + String method() default ""; // GET/POST/... ; empty for async + String topic() default ""; // async only +} + +@Target(ElementType.METHOD) +@Retention(RetentionPolicy.SOURCE) +public @interface CodebaseRoutes { + CodebaseRoute[] value(); +} +``` + +Method-level only (routes are method-anchored), `@Repeatable` because +one method can legitimately serve multiple paths. + +#### 4.6.4 Resolution order — full table for B2a + +For every method, the route extractor runs **these layers in order**; +last to apply wins (consistent with `resolve_role_and_capabilities`). +Multiple paths produce multiple `Route` nodes; layers don't *replace* +each other's emissions, they *add* and the latest layer's +`(framework, kind, path|topic, method)` overrides any prior identical +tuple. + +| # | Layer | Source | What it produces | Confidence | +|---|---|---|---|---| +| 1 | **Built-in annotation map** | hardcoded list in `pass4_routes` (Spring MVC + WebFlux + Feign + Kafka + Rabbit + JMS + Stream) | one `Route` per resolved literal | 1.0 (literal), 0.85 (SpEL), 0.7 (constant) | +| 2 | **Layer B route_overrides.annotations** | `.lancedb-mcp.yml` | one `Route` per matching annotation, using the configured `path_attribute` / `method_attribute` etc. | 1.0 (literal), 0.85 (SpEL), 0.7 (constant) | +| 3 | **Layer A meta-chain** | `graph_enrich.collect_annotation_meta_chain` (existing function, reused) | resolves `@AcmeMapping` → `@GetMapping` transitively, then runs Layer 1 logic | 1.0 (literal) etc. | +| 4 | **Layer C in-source `@CodebaseRoute`** | source code | one `Route` per `@CodebaseRoute` instance, fully specified | 1.0 | +| 5 | **Layer B route_overrides.fqn** | `.lancedb-mcp.yml` | one `Route` per FQN entry, fully specified; **and** suppresses any conflicting earlier emissions for the same FQN | 1.0 | + +Crucial design points: + +- **Layer A reuses the existing `collect_annotation_meta_chain` + function.** No second filesystem walk, no parallel index. The + function already returns `simple_name → frozenset[built-in + simple names reachable]`. B2a passes its own annotation set + (`{"GetMapping", "PostMapping", "RequestMapping", "FeignClient", + "KafkaListener", …}`) and asks "for annotation X, does its meta-chain + reach any of these?". Single source of truth, exactly as the existing + brownfield architecture mandates. + +- **`route_overrides.annotations` is checked BEFORE the meta-chain + walk** — same precedence rule as `role_overrides`. Explicit user + config wins over automatic resolution. + +- **`route_overrides.fqn` is the strongest layer** and is the user's + escape hatch when meta-chain + Layer C + literal annotations all fail + (or, more importantly, produce wrong results that the user wants to + override). + +- **Conflict resolution within a layer**: if the same (framework, kind, + path|topic, method) tuple is produced twice in the same layer, dedup + silently. If two layers produce conflicting tuples for the same + method, the later layer wins and the earlier layer's `Route` is not + emitted (logged at INFO). + +- **Validation on YAML load**: same pattern as `role_overrides` — + unknown `framework` / `kind` strings are dropped with a stderr + warning. Schema validated in `graph_enrich._load_brownfield_overrides` + (extend the existing function rather than write a parallel one). + +#### 4.6.5 Plumbing changes + +- `graph_enrich.BrownfieldOverrides` dataclass gains two fields: + ```python + route_annotation_specs: dict[str, RouteAnnotationSpec] + route_fqn_specs: dict[str, RouteFqnSpec] + ``` +- `graph_enrich._load_brownfield_overrides` parses `route_overrides:` + alongside `role_overrides:`. Same YAML file, same load function, same + cache. +- New `graph_enrich.resolve_routes_for_method(method, type, *, + overrides, meta_chain) -> list[ResolvedRoute]` — the route analogue + of `resolve_role_and_capabilities`, runs the five layers above and + returns all emitted `Route` declarations for one method. +- `pass4_routes` in `build_ast_graph.py` calls + `resolve_routes_for_method` for every member; emits `Route` + `EXPOSES` + rows from the result. +- `java_ontology.py` gains `VALID_ROUTE_FRAMEWORKS` / `VALID_ROUTE_KINDS` + for the validator (mirrors `VALID_ROLES`). + +### 4.7 MCP surface + +One new tool, read-only, filters by microservice: + +- `find_routes(framework: str | None, method: str | None, + path_pattern: str | None, microservice: str | None, + kind: str | None) -> list[RouteHit]` + + Each `RouteHit` contains the `Route` row plus the methods that + `EXPOSES` it. + +Plus one optional flag on `trace_flow`: + +- `follow_routes: bool = False`. When true, `EXPOSES` edges count as a + stage transition (so `CONTROLLER -> Route` is visible in one walk). + Default off — preserves existing behaviour. + +`graph_meta` gains: + +- `routes_total` +- `routes_by_framework` +- `routes_by_kind` +- `routes_unresolved_pct` (fraction with `resolved=false`) +- `routes_from_brownfield_pct` (fraction emitted by Layers 2–5; lets + the user verify their overrides are actually being applied) + +### 4.8 Tests (mandatory before merge) + +A new fixture `tests/fixtures/routes_smoke/` with the following files: + +``` +src/main/java/smoke/ + GreetingController.java // @RestController + @GetMapping("/hello/{name}") + OrderController.java // class-level @RequestMapping("/api/v1/orders") + @PostMapping("") + MultiPathController.java // @GetMapping({"/a", "/b"}) — one method, two routes + PathVarController.java // @GetMapping("/users/{id}/posts/{postId}") — two captures + UserClient.java // @FeignClient(name="user-svc", path="/users") + @GetMapping("/{id}") + KafkaConsumer.java // class-level @KafkaListener(topics="orders.created") + RabbitConsumer.java // @RabbitListener(queues="orders.q") + SpelPathController.java // @GetMapping("${app.api.base}/foo") — SpEL case + ConstantRefController.java // @GetMapping(Endpoints.USERS) — constant case +``` + +Plus a brownfield fixture `tests/fixtures/routes_brownfield/`: + +``` +.lancedb-mcp.yml // route_overrides config (see §4.6.2 example) +src/main/java/smoke/ + AcmeController.java // @AcmeRestController(basePath="/legacy") + @AcmeRoute(value="/x", httpMethod="POST") + AcmeMapping.java // @AcmeRestController @interface meta-annotated with @RestController + AcmeMappingUser.java // class using @AcmeMapping (Layer A meta-chain target) + CodebaseRouteUser.java // method with @CodebaseRoute(framework=SPRING_MVC, kind=HTTP_ENDPOINT, path="/legacy/soap", method="POST") + FqnOverrideTarget.java // plain class with no annotations; route_overrides.fqn declares its route +``` + +Test cases (a list, not exhaustive code — implementor fills in): + +**Built-in annotations:** +- `test_get_mapping_emits_route_and_exposes` +- `test_class_level_request_mapping_concatenates_with_method_path` +- `test_multi_value_mapping_emits_one_route_per_path` +- `test_path_variable_collapses_to_template_and_regex` +- `test_two_path_variables_handled_left_to_right` +- `test_feign_client_emits_route_with_framework_feign_and_feign_name` +- `test_feign_class_path_concatenates_with_method_path` +- `test_kafka_listener_class_emits_route_with_framework_kafka` +- `test_rabbit_listener_method_emits_route` + +**Resolved values:** +- `test_spel_path_emits_route_with_resolved_false_and_strategy_spel` +- `test_constant_ref_path_emits_route_with_resolved_false_and_strategy_constant_ref` +- `test_route_id_is_stable_across_runs` +- `test_route_id_includes_microservice_so_same_path_in_two_services_is_two_routes` + +**Brownfield (mandatory — not optional):** +- `test_route_overrides_annotations_layer_b_resolves_acme_rest_controller` +- `test_route_overrides_annotations_layer_b_uses_configured_path_attribute` +- `test_route_overrides_annotations_layer_b_uses_method_default_when_attribute_absent` +- `test_meta_chain_layer_a_resolves_user_defined_annotation_to_get_mapping` +- `test_codebase_route_layer_c_emits_fully_specified_route` +- `test_codebase_route_layer_c_repeatable_emits_multiple_routes` +- `test_route_overrides_fqn_layer_b_overrides_annotation_emission` +- `test_route_overrides_fqn_emits_route_for_method_with_no_annotations` +- `test_unknown_framework_in_yaml_dropped_with_warning` +- `test_unknown_kind_in_yaml_dropped_with_warning` +- `test_brownfield_layers_compose_in_documented_order` +- `test_graph_meta_routes_from_brownfield_pct_nonzero_when_overrides_apply` + +**Query surface:** +- `test_find_routes_filters_by_microservice` +- `test_find_routes_filters_by_framework` +- `test_find_routes_filters_by_path_pattern_regex` +- `test_trace_flow_follow_routes_walks_exposes` + +### 4.9 Ontology bump + +`ONTOLOGY_VERSION` 4 → 5. Stale graphs must fail loudly on open +(existing N4 guard from the call-graph review applies automatically). + +--- + +## 5. B4 — `analyze_pr` MCP tool + +### 5.1 Goal + +Take a `git diff` (text or `git_ref`), map its line ranges to graph +nodes, run the existing reverse closure, and return a structured +impact + risk report. Single call from a code review or CI gate. + +### 5.2 Inputs and outputs + +```python +@dataclass +class AnalyzePrInput: + diff: str | None = None # raw unified-diff text + base_ref: str | None = None # e.g. 'origin/main' + head_ref: str | None = None # default 'HEAD' + # exactly one of (diff,) or (base_ref/head_ref) required + risk_thresholds: tuple[float, float] = (1.0, 2.5) # (low->medium, medium->high) + max_blast_depth: int = 3 + microservice: str | None = None + min_confidence: float = 0.0 + +@dataclass +class AnalyzePrOutput: + changed_nodes: list[ChangedSymbol] + blast_radius: list[Symbol] # reverse closure unioned across all changed nodes + risk_score: float + risk_level: str # 'low' | 'medium' | 'high' + per_file: list[FileImpact] # for UX rendering + unmapped_hunks: list[UnmappedHunk] # diff lines that didn't map to any node +``` + +### 5.3 Algorithm + +1. **Resolve diff source.** + - If `diff` is set, use as-is. + - Else run `git diff --unified=0 --no-color --no-renames {base_ref}..{head_ref}` + via `subprocess.run` from `LANCEDB_MCP_PROJECT_ROOT`. + - Parse with `unidiff` (well-tested PyPI library; pin in + `requirements.txt`). +2. **Map hunks → symbols.** + For each `(file, line_range)` from each hunk: + - Look up Kuzu `Symbol` rows where `filename = file` + AND `start_line <= range.end` AND `end_line >= range.start`. + - Prefer the smallest enclosing symbol (method > type > file). Ties + broken by `start_line` proximity. + - Hunks that don't map to any symbol go to `unmapped_hunks` (e.g., + `pom.xml`, `README.md`; comment-only changes inside a method are + still mapped to the method). +3. **Reverse closure.** + For each changed symbol, run `find_callers(depth=max_blast_depth)` + and union the results into `blast_radius`. Existing `find_callers` + already supports `microservice` and `min_confidence` filters; expose + them on `AnalyzePrInput` as optional pass-throughs. +4. **Risk score.** + ``` + per_node_risk = log10(1 + len(downstream_consumers)) + * role_weight[node.role] + * cross_service_factor + + role_weight: + CONFIG 1.8 + CONTROLLER 1.5 + ENTITY 1.3 + SERVICE 1.2 + FEIGN_CLIENT 1.2 + COMPONENT 1.1 + REPOSITORY 1.0 + MAPPER 0.9 + DTO 0.6 + OTHER 0.7 + + cross_service_factor = 2.0 if changed_nodes span >1 microservice else 1.0 + + risk_score = max(per_node_risk for node in changed_nodes) + ``` + Threshold: `low < 1.0 ≤ medium ≤ 2.5 < high` (overrideable via input). + + The `max` (not sum) keeps a single-controller change rated as high + risk regardless of how many trivial DTO renames sit in the same PR. + +### 5.4 What this tool deliberately doesn't do + +- Run tests. It maps changes; review tooling decides. +- Understand semantic equivalence (renames look like add+delete in the + graph). A future enhancement plugs into git-rename detection + (`git diff -M`). +- Consult LanceDB (vector). Pure graph closure — same semantics as + `impact_analysis`, just diff-driven instead of symbol-driven. + +### 5.5 MCP surface + +``` +analyze_pr( + diff: str | None, + base_ref: str | None = None, + head_ref: str | None = "HEAD", + risk_thresholds: tuple[float, float] = (1.0, 2.5), + max_blast_depth: int = 3, + microservice: str | None = None, + min_confidence: float = 0.0, +) -> AnalyzePrOutput +``` + +Reuses the existing `Symbol` / `CallEdge` projections (no new schema). + +### 5.6 Tests + +Unit tests (no Kuzu round-trip): + +- `test_unidiff_parser_handles_unified_zero_context_diffs` +- `test_hunk_to_symbol_picks_smallest_enclosing_method` +- `test_hunk_to_symbol_falls_back_to_type_when_outside_methods` +- `test_unmapped_hunks_collected_for_pom_xml` +- `test_risk_role_weight_table_keys_match_known_roles` +- `test_cross_service_factor_two_when_changes_span_microservices` +- `test_risk_uses_max_not_sum_across_changed_nodes` +- `test_risk_thresholds_overridable_via_input` + +Round-trip tests against the existing fixture: + +- `test_analyze_pr_with_synthetic_diff_returns_blast_radius` +- `test_analyze_pr_blast_radius_respects_min_confidence` +- `test_analyze_pr_handles_pure_test_only_diff_as_low_risk` (paths + under `src/test/` are excluded from blast radius — they consume but + aren't consumed) + +### 5.7 Performance note + +`analyze_pr` against a 5-microservice repo with a 100-line diff should +finish well under a second — diff parsing + ~10–50 reverse closures, +each already a fast indexed query. + +--- + +## 6. B5 — Layered ignore patterns + +### 6.1 Goal + +Replace the single hardcoded `COMMON_EXCLUDED_PATH_PATTERNS` list with +a layered resolver that respects existing `.gitignore` files **and** +allows project-level overrides. Existing behaviour stays the default; +the new layers are additive. + +### 6.2 Layer order (innermost wins) + +1. **Hardcoded must-skip** (cycle protection, security): + `.git/`, `node_modules/`, `target/`, `build/`, `out/`, `.idea/`, + `.gradle/`, `bin/`, `*.class`, symlinks. Cannot be overridden. +2. **Walk up `.gitignore` files** from each indexed directory toward + `LANCEDB_MCP_PROJECT_ROOT`. Standard gitignore semantics (negation, + directory-vs-file, anchored paths). +3. **Project-level `.lancedb-mcp.yml` `ignore:` list** at project root. + Treated as gitignore-spec patterns. Already validated YAML — extend + the schema. +4. **Project-level `.lancedb-mcp-ignore` file** with full gitignore + syntax (one pattern per line, `#` comments). For users who don't + want to commit project-internal index settings to YAML. + +### 6.3 Implementation + +New module `path_filter.py`: + +```python +class IgnoreResolver: + def __init__(self, project_root: Path): ... + def is_ignored(self, path: Path) -> bool: ... + def explain(self, path: Path) -> IgnoreDecision: + """Return which layer matched (for diagnostics / graph_meta).""" +``` + +Use [`pathspec`](https://pypi.org/project/pathspec/) (gitignore-spec +compliant; used by Black / Ruff / pre-commit). Explicit-pin it. + +Replace `compile_excluded_glob_patterns(COMMON_EXCLUDED_PATH_PATTERNS)` +call sites with `resolver.is_ignored(p)`. Three call sites: +`build_ast_graph.py:228`, `graph_enrich.py:216`, +`java_index_flow_lancedb.py:335-354`. + +Keep `COMMON_EXCLUDED_PATH_PATTERNS` as the **layer 1** seed inside +`IgnoreResolver` so the constant remains the single source of truth +for "must-skip". + +### 6.4 Diagnostics + +Extend `graph_meta` with: + +```python +ignore_layers: list[IgnoreLayerSummary] +# [ +# {layer: "hardcoded", patterns: 8, files_excluded: 1247}, +# {layer: ".gitignore", sources: ["chat-app/.gitignore", ...], files_excluded: 89}, +# {layer: "lancedb-mcp.yml", patterns: 3, files_excluded: 12}, +# {layer: ".lancedb-mcp-ignore", patterns: 0, files_excluded: 0}, +# ] +``` + +Lets the user diagnose "why isn't `Foo.java` in the index?" without +turning on debug logging. + +### 6.5 Tests + +Pure unit tests, no Kuzu / Lance: + +- `test_hardcoded_must_skip_cannot_be_negated_by_gitignore` +- `test_gitignore_in_subdirectory_overrides_parent` +- `test_gitignore_negation_pattern_re_includes_file` +- `test_lancedb_yml_ignore_list_applied_after_gitignore` +- `test_lancedb_mcp_ignore_file_takes_precedence_over_yml` +- `test_symlinks_always_skipped` +- `test_explain_returns_innermost_winning_layer` +- `test_compatibility_default_excludes_match_old_behaviour` — + important regression: with no `.gitignore` and no project files, the + new resolver must skip exactly the same paths as the old constant + list. + +Integration test: + +- `test_pass1_parse_skips_files_per_resolved_ignores` — tiny tmp_path + fixture with one `.gitignore` and one `.lancedb-mcp-ignore`. + +### 6.6 Migration + +No migration. Layer 1 alone matches the current behaviour; layers 2–4 +only kick in when the project has `.gitignore` / config files. +Existing users see zero behaviour change unless they opt in. + +--- + +## 7. Cross-cutting concerns + +### 7.1 Ontology version + +| Sub-feature | Bump? | +|---|---| +| B2a Routes | **Yes**, 4 → 5 (new node label + new rel table) | +| B4 `analyze_pr` | No (read-only over existing schema) | +| B5 Ignores | No (no schema change) | + +If B2a and the others land in the same release, one bump (4 → 5) +covers everything. + +### 7.2 PR ordering recommendation + +Independent PRs, but a sensible review order: + +1. **B5 first** — pure code hygiene, no schema, smallest diff. Makes + subsequent indexer test runs more deterministic on dirty workspaces. +2. **B4 second** — pure additive MCP tool. Risk is in the diff parser + only, easily unit-testable. +3. **B2a last** — biggest scope, schema bump, requires Kuzu rebuild. + Land it when the other two are stable. + +### 7.3 Rollback strategy + +- B2a — drop the rel table and the `Route` label, revert + `ONTOLOGY_VERSION`. No data dependency from existing tools. +- B4 — remove the MCP tool registration. No persisted state. +- B5 — revert call-site replacements; the resolver module is dead + code, remove or keep on the shelf. + +### 7.4 Documentation updates + +- `README.md`: add `Route` section under "What goes in the graph", + document `find_routes` under MCP tools, document `analyze_pr`, + document layered ignores under configuration. **Extend the brownfield + section with `route_overrides` examples and `@CodebaseRoute` + source stub** — same shape as the existing `role_overrides` / + `@CodebaseRole` material. +- `CODEBASE_REQUIREMENTS.md`: update the schema diagram and the env-var + table (`.lancedb-mcp-ignore` mention). Document the route resolver + five-layer composition table from §4.6.4. +- `propose/PRODUCT-VISION.md`: tick B2a / B4 / B5 off the roadmap; note + B2b + B6 as the next proposal. + +--- + +## 8. Risks and open questions + +| Risk | Likelihood | Mitigation | +|---|---|---| +| Spring annotation arg parsing edge cases (SpEL `${}`, constants, array form `value={"a","b"}`) | medium | Treat unparseable values as `Route` with `resolved=false`, low confidence; add unit fixtures as discovered. **When in doubt, emit phantom + low confidence rather than guess.** | +| `unidiff` not handling `git diff` extensions (e.g. `--stat`, binary) | low | Use `--unified=0 --no-color --no-renames` explicitly when shelling to git; document the input format. | +| `pathspec` performance on huge monorepos | low | Cache compiled `PathSpec` per directory; invalidate on `.gitignore` mtime change. Same caching pattern as `lru_cache` already used. | +| Path-template false positives (`/api/users` matches `/api/users/{id}` regex if anchors are wrong) | medium | Lock down regex generation in unit tests with the cases from §4.4; trailing slash and `$` anchors must be explicit. Note: B2b/B6 will be the primary consumer of these regexes — getting them right here matters for the next phase too. | +| Brownfield route resolver diverges from role resolver in subtle ways (ordering, caching, validation) | **high** | Extend the *same* `BrownfieldOverrides` dataclass and the *same* `_load_brownfield_overrides` function; reuse `collect_annotation_meta_chain`; mirror the `resolve_role_and_capabilities` execution-order docstring word-for-word in `resolve_routes_for_method`. Cursor implementor: read `graph_enrich.py` §"brownfield role / capability overrides" and `plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md` before writing route resolution. | +| Feign + class-level `@RequestMapping` interactions | low | Resolved by §4.4's class-base + method-path concatenation rule, same as for controllers. | +| Conflicting `route_overrides.fqn` and `@CodebaseRoute` on the same method | low | Documented: Layer 5 (`fqn`) wins over Layer 4 (`@CodebaseRoute`). Same precedence rule as roles. Test `test_brownfield_layers_compose_in_documented_order` covers it. | + +Open questions to settle during implementation, not now: + +- Should `@MessageMapping` (WebSocket / STOMP) join the `Route` family? + Defer until a real corpus uses it — it would slot in as another + built-in `framework`. +- Should we extract `produces=` / `consumes=` content types onto the + `Route` node? Probably yes when B2b/B6 lands (helpful for matching); + not needed for B2a's listing use cases. + +--- + +## 9. Definition of done (per sub-feature) + +**B2a — `Route` + `EXPOSES`:** +- [ ] All §4.8 tests pass — including the **brownfield** group (mandatory, not optional). +- [ ] `graph_meta` reports non-zero `routes_total` and + `routes_from_brownfield_pct` against the brownfield fixture. +- [ ] `find_routes` registered as an MCP tool with `microservice`, + `framework`, `kind`, `path_pattern`, `method` filters. +- [ ] `trace_flow` `follow_routes` flag wired through. +- [ ] `ONTOLOGY_VERSION` bumped 4 → 5; stale-graph guard test added. +- [ ] README brownfield section extended with `route_overrides` and + `@CodebaseRoute` examples. +- [ ] `CODEBASE_REQUIREMENTS.md` documents the §4.6.4 five-layer + composition table. +- [ ] No regressions in existing role / capability resolution + (run the existing brownfield test suite). + +**B4 — `analyze_pr`:** +- [ ] All §5.6 tests pass. +- [ ] `analyze_pr` registered as an MCP tool with full input/output + schemas. +- [ ] README documents the tool with a worked example. + +**B5 — Ignores:** +- [ ] All §6.5 tests pass. +- [ ] Old `compile_excluded_glob_patterns` call sites replaced (3 of + them). +- [ ] `graph_meta` exposes `ignore_layers`. +- [ ] `CODEBASE_REQUIREMENTS.md` documents the layer order. + +--- + +## 10. What comes after Tier 1 + +This proposal closes Tier 1 within static-analysis scope. The natural +follow-ups, in order of leverage: + +1. **B2b + B6 — imperative HTTP/async edges + cross-service matcher.** + Single proposal because they share design constraints (path + canonicalization, join keys, cross-service `confidence` semantics, + edge direction). Depends on B2a's `Route` node landing first. + Unlocks: *"what breaks if I rename `POST /api/orders`?"*, + *"who calls this endpoint?"* across the whole system. +2. **Microservice-scoped resolution in `CALLS`** — the correctness gap + noted during this session: `_lookup_method_candidates` should filter + by caller's microservice. Small PR; orthogonal to anything here. +3. **B7 Louvain communities** and **B8 dead code** — both unlock from + the existing `CALLS` graph. +4. **B3 runtime traces** — leaves static analysis. Lifts confidence on + Spring AOP / polymorphic / reflective edges that no amount of static + work can reach. + +--- + +## 11. References + +- [`reports/what-to-borrow-from-cmm.md`](../reports/what-to-borrow-from-cmm.md) — original borrow guide (Tier 1 §B1–B5). +- [`propose/completed/CALL-GRAPH-PROPOSE.md`](completed/CALL-GRAPH-PROPOSE.md) — completed call-graph proposal; same shape & style. +- [`reports/call-graph-review.md`](../reports/call-graph-review.md) — review that surfaced the resolver / extractor invariants. +- [`plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md`](../plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md) — **mandatory reading** for the implementer of §4.6 (brownfield route resolver mirrors this design). +- `graph_enrich.py` §"brownfield role / capability overrides" — the + existing implementation B2a extends. +- CMM source for pattern reference (read, don't fork): + - [`pass_route_nodes.c`](https://github.com/DeusData/codebase-memory-mcp/blob/master/src/pipeline/pass_route_nodes.c) — Route extraction shape. + - [`pass_gitdiff.c`](https://github.com/DeusData/codebase-memory-mcp/blob/master/src/pipeline/pass_gitdiff.c) — `analyze_pr` shape. + - [`discover/`](https://github.com/DeusData/codebase-memory-mcp/tree/master/src/discover) — layered ignore shape. +- [`pathspec`](https://pypi.org/project/pathspec/) — gitignore-spec library for B5. +- [`unidiff`](https://pypi.org/project/unidiff/) — diff parser for B4. diff --git a/propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md b/propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md new file mode 100644 index 0000000..b117235 --- /dev/null +++ b/propose/TIER1B-HTTP-ASYNC-EDGES-PROPOSE.md @@ -0,0 +1,438 @@ +# B2b + B6 — `HTTP_CALLS` / `ASYNC_CALLS` + cross-service matcher + +Status: **skeleton — not ready for planning**. Pairs with the active +proposal [`TIER1-COMPLETION-PROPOSE.md`](TIER1-COMPLETION-PROPOSE.md) +(B2a + B4 + B5). **Do not start implementation until B2a is merged +and the `Route` schema below is verified against what actually +shipped.** + +This document fixes the **join-key contract** between B2a +(declarations) and B2b/B6 (edges) so the two PRs cannot drift. Most +sections are deliberately stubs marked **`[TBD — design pass needed]`** +— the goal here is to lock the *interface*, not the algorithm. + +--- + +## 0. Reading order + +Before working on this proposal, read in order: + +1. [`TIER1-COMPLETION-PROPOSE.md`](TIER1-COMPLETION-PROPOSE.md) §4 + (B2a `Route` + `EXPOSES`) — defines every join key used here. +2. [`reports/what-to-borrow-from-cmm.md`](../reports/what-to-borrow-from-cmm.md) + §B2 (Route shape) and §B6 (cross-service edges). +3. [`reports/call-graph-review.md`](../reports/call-graph-review.md) + — same correctness invariants apply (microservice scoping, + confidence semantics, phantom-id collisions). +4. [`plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md`](../plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md) + — brownfield surface for the **caller** side mirrors the same + pattern as B2a (see §6). +5. CMM source (pattern reference, do not port): + - [`pass_http_edges.c`](https://github.com/DeusData/codebase-memory-mcp/tree/master/src/pipeline) (or equivalent) — shape only. + +--- + +## 1. Why one proposal, not two + +B2b (imperative HTTP/async edges) and B6 (cross-service matcher) were +split out of the original Tier 1 plan together for one reason: **they +share state.** + +- The same canonical path/topic representation is read by *exposers* + (B2a writes it) and *callers* (B2b emits, B6 matches). +- `confidence` for a `HTTP_CALLS` edge to a phantom `Route` flips + meaning depending on whether B6 has matched it cross-service — + designing one without the other locks in the wrong scale. +- Feign's `name="user-service"` is a *service-registry join key* for + B6, not just a string property on a Feign-client method. +- The edge-direction decision in B2a + (`(Symbol)-[:EXPOSES]->(Route)`) only pays off when the matching + query is `(caller)-[:HTTP_CALLS]->(Route)<-[:EXPOSES]-(handler)` — + testing that traversal end-to-end requires both sides. + +**Decision:** ship B2b and B6 together. B7 (Louvain) and B8 (dead +code) are separate proposals because they consume the resulting +graph but don't change its shape. + +--- + +## 2. Scope + +### In scope + +- New `HTTP_CALLS` rel: `(Symbol caller)-[:HTTP_CALLS]->(Route target)`. +- New `ASYNC_CALLS` rel: `(Symbol producer)-[:ASYNC_CALLS]->(Route topic)`. +- New `pass5_imperative_edges` (runs after `pass4_routes` — see B2a §4.4). +- Cross-service matching of caller-side edges to exposer-side `Route` + nodes via the join keys defined in §4. +- Brownfield override surface for **caller-side** declarations + (`@CodebaseClient`, `@CodebaseProducer` — mirrors `@CodebaseRoute` + on the exposer side; see §6). +- New MCP tools: `find_route_callers`, `trace_request_flow`. + +### Out of scope (explicit non-goals) + +- **Path matching of intra-service controller-to-controller HTTP + calls.** These are rare in well-modeled microservice codebases and + add 4-way matching combinatorics. If the user has a `RestTemplate` + hitting `localhost`, B2b emits a phantom-`Route` edge with + `confidence ≤ 0.5` and stops. Re-evaluate after B7. +- **Spring Cloud Gateway route definitions** (`RouteLocator` DSL). + Treat as a follow-on once B2b stabilizes. +- **Runtime trace ingestion.** That's B3, separate proposal. +- **OpenAPI/AsyncAPI doc parsing** as a fallback resolver. Maybe + later; not needed to ship B2b/B6. + +--- + +## 3. The join-key contract + +This is the **only** part of this skeleton that is fully specified. +B2a writes these keys; B2b reads them; B6 matches on them. Any change +breaks both sides. + +### 3.1 Keys produced by B2a (read-only for B2b/B6) + +These come from the `Route` node defined in +[`TIER1-COMPLETION-PROPOSE.md`](TIER1-COMPLETION-PROPOSE.md) §4.3. +Reproduced here for the implementer's convenience — **if these +diverge from B2a as shipped, B2a is the source of truth, fix this +doc**. + +| Field | Used by | Purpose | +| --------------- | -------------------- | -------------------------------------------------------- | +| `Route.id` | B2b edge target | Stable hash incl. `microservice` — same path in svc A vs svc B = two routes | +| `path_template` | B6 HTTP matcher | `/api/users/{}` — already curly-collapsed in B2a | +| `path_regex` | B6 HTTP matcher | `^/api/users/[^/]+/?$` — pre-derived in B2a, do not re-derive | +| `method` | B6 HTTP matcher | Must match caller's HTTP method (or `''` allows any) | +| `topic` | B6 async matcher | Producer→consumer join | +| `broker` | B6 async matcher | Disambiguates same-topic across brokers | +| `feign_name` | B6 Feign matcher | Service-registry join key — primary cross-service link | +| `feign_url` | B6 Feign fallback | Used only when `feign_name` is empty (URL-mode clients) | +| `microservice` | B6 scoping | Skip self-edges; flag intra-service matches as low-conf | +| `kind` | B2b edge-type select | `http_endpoint` → `HTTP_CALLS`; `kafka_topic` etc. → `ASYNC_CALLS` | + +### 3.2 Keys produced by B2b (caller side) + +For each imperative call site B2b discovers, it computes a tuple +that is **structurally identical** to the exposer side, then asks B6 +to match. The fields are: + +| Field | Source | Notes | +| -------------------- | ------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | +| `client_kind` | `feign_method` / `rest_template` / `web_client` / `kafka_send` / `stream_bridge_send` | Picks the matcher branch. | +| `feign_target_name` | `@FeignClient(name=…)` on the interface the caller's method belongs to | Resolution: literal → SpEL → constant. Same three-strategy ladder as B2a §4.4.5. | +| `path_template_call` | URI argument of `RestTemplate.exchange` etc., curly-collapsed via B2a's normalizer | Re-use B2a's normalizer — do not re-implement. | +| `method_call` | `HttpMethod.GET` etc., or extracted from the called function (`getForObject` → `GET`) | `''` means "couldn't determine". | +| `topic_call` | First arg of `KafkaTemplate.send` / `StreamBridge.send` | Same three-strategy resolution. | +| `broker_call` | The bean name of the template, when multi-broker | `''` for the default broker. Heuristic; see §5. | +| `caller_microservice` | The caller `Symbol`'s microservice | Required for cross-service detection. | + +### 3.3 Match outcome enum + +B6 returns one of these for every B2b call site: + +| Outcome | Meaning | Effect on edge | +| --------------- | -------------------------------------------------------- | ---------------------------------------------------- | +| `cross_service` | B6 found exactly one `Route` in a *different* svc | Emit edge to that `Route`, `confidence` per §5.3 | +| `intra_service` | B6 matched a `Route` in the *same* svc as caller | Emit edge with `confidence ≤ 0.5`, flag in stats | +| `ambiguous` | More than one `Route` matched | Emit phantom-`Route` edge, `confidence=0.4`, log all candidates | +| `phantom` | No `Route` matched at all (external API, missing svc) | Emit phantom-`Route` edge, `confidence=0.3` | +| `unresolved` | Caller-side fields couldn't be extracted (SpEL, dynamic) | Emit phantom-`Route` edge, `confidence=0.2`, `resolved=false` | + +`phantom` `Route` nodes follow the same shape as resolved ones but +with empty `path_template` / `path_regex` and a synthetic id — same +trick B2a uses for `strategy='spel'` routes. + +--- + +## 4. Schema additions + +```sql +-- Two new edge tables. Edge direction matches B2a §4.3 traversal. +CREATE REL TABLE HTTP_CALLS( + FROM Symbol TO Route, + confidence DOUBLE, + strategy STRING, -- 'feign_inherit' | 'feign_method' | 'rest_template' | 'web_client' + method_call STRING, -- duplicated for query convenience (same as Route.method on a perfect match) + raw_uri STRING, -- the unresolved URI string when strategy='unresolved'; for debugging + match STRING -- 'cross_service' | 'intra_service' | 'ambiguous' | 'phantom' | 'unresolved' +); + +CREATE REL TABLE ASYNC_CALLS( + FROM Symbol TO Route, + confidence DOUBLE, + strategy STRING, -- 'kafka_template' | 'stream_bridge' | 'rabbit_template' | 'jms_template' + direction STRING, -- 'producer' (always, in B2b — consumers are EXPOSES on B2a) + raw_topic STRING, + match STRING +); +``` + +No new node tables. **Do not** introduce a separate `HttpCallSite` +node — the `Symbol` (the caller method) is the source of truth, and +`Route` is the destination. This keeps the graph queryable as a +pure `Symbol → Route ← Symbol` triangle. + +`ONTOLOGY_VERSION` 5 → 6. + +--- + +## 5. Caller-side extraction — `pass5_imperative_edges` + +**`[Skeleton — full design pass needed before planning.]`** + +Runs after `pass4_routes` (defined in B2a §4.4). Purely additive; +does not consult or modify `tables.routes_rows` (already written) or +`tables.calls_rows`. + +### 5.1 Detection patterns (per `client_kind`) + +Stub list — to be expanded with concrete AST patterns and tests. + +| `client_kind` | Pattern | Notes | +| -------------------- | ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------- | +| `feign_method` | Method on a `@FeignClient` interface | The exposer side already wrote a `Route` per method via `feign_inherit` — this is just a join on `Symbol.id`, no new resolution work. **Cleanest case.** | +| `rest_template` | `RestTemplate.{exchange,getForObject,postForEntity,…}` invocation | URI is first or second arg; method is in the method name or the second arg. **[TBD]** | +| `web_client` | `WebClient.{get,post,…}().uri(…).retrieve()` chain | Fluent API → walk the chain back to the URI/method. **[TBD]** | +| `kafka_send` | `KafkaTemplate.send(topic, …)` | Topic is first arg. | +| `stream_bridge_send` | `StreamBridge.send(binding, …)` | `binding` resolves to a topic via Spring Cloud Stream config — **[TBD]: deferred to v2 of this proposal, emit `unresolved` for now.** | + +### 5.2 Resolution ladder + +Mirror B2a §4.4.5 exactly. Three strategies in order: + +1. **Literal string** — `confidence_base = 1.0`, `strategy='annotation'`/`feign_method`/etc. +2. **SpEL `${prop}`** — keep literal, `confidence_base = 0.85`, + `resolved=false`. +3. **Constant reference** — keep expression, `confidence_base = 0.7`, + `resolved=false`. + +Re-use B2a's resolver — do not re-implement. **[TBD: extract it from +`pass4_routes` into a shared helper as part of this PR.]** + +### 5.3 Final confidence + +``` +confidence = confidence_base × match_factor × micro_factor +``` + +Where: + +- `match_factor`: `cross_service=1.0`, `intra_service=0.6`, + `ambiguous=0.5`, `phantom=0.4`, `unresolved=0.3`. +- `micro_factor`: `1.0` if caller microservice is known, `0.85` + otherwise. + +**[TBD: validate this on the real 5-service codebase. Baseline only.]** + +### 5.4 Where to plug in + +`build_ast_graph.py` has `pass3_calls` at line 1067 and the call +site is at line 1421. B2a adds `pass4_routes` after `pass3_calls`. +B2b adds `pass5_imperative_edges` after `pass4_routes`. **Each pass +is purely additive on `tables.*` — no shared mutable state across +passes.** + +--- + +## 6. Brownfield surface — caller side + +Mirrors B2a §4.6 exactly — same dataclass, same YAML config file, +same in-source stubs, same 5-layer resolution table. **Do not invent +a parallel system; extend `BrownfieldOverrides` again.** + +### 6.1 New YAML keys + +```yaml +# .lancedb-mcp.yml +http_client_overrides: + annotations: + "com.acme.LegacyHttpClient": + client_kind: rest_template + target_service: "user-service" # forces the cross-service join key + fqn: + "com.legacy.OldUserApi": + client_kind: feign_method + target_service: "user-service" + +async_producer_overrides: + annotations: + "com.acme.LegacyEvent": + client_kind: kafka_send + topic: "user-events" + fqn: {} +``` + +### 6.2 New in-source stubs + +```java +@Target(METHOD) +@Repeatable(CodebaseClients.class) +public @interface CodebaseClient { + String clientKind(); // 'feign_method' | 'rest_template' | 'web_client' + String targetService() default ""; + String path() default ""; + String method() default ""; +} + +@Target(METHOD) +@Repeatable(CodebaseProducers.class) +public @interface CodebaseProducer { + String clientKind(); // 'kafka_send' | 'stream_bridge_send' | … + String topic(); + String broker() default ""; +} +``` + +### 6.3 5-layer resolution table + +Identical structure to B2a §4.6.4, applied to caller-side fields +instead of route-side. Composition order: + +1. Built-in client/producer detection +2. Layer B: `http_client_overrides.annotations` / + `async_producer_overrides.annotations` +3. Layer A: meta-annotation chain walk (re-use + `collect_annotation_meta_chain`) +4. Layer C: `@CodebaseClient` / `@CodebaseProducer` in source +5. Layer B: `http_client_overrides.fqn` / `async_producer_overrides.fqn` + +Last writer wins, exactly like B2a. + +### 6.4 Plumbing + +Add to `BrownfieldOverrides`: + +- `http_client_overrides_by_annotation: dict[str, dict]` +- `http_client_overrides_by_fqn: dict[str, dict]` +- `async_producer_overrides_by_annotation: dict[str, dict]` +- `async_producer_overrides_by_fqn: dict[str, dict]` + +New `graph_enrich.resolve_http_client_for_method` and +`resolve_async_producer_for_method` — shape-identical to +`resolve_role_and_capabilities` and B2a's +`resolve_routes_for_method`. + +`graph_meta` exposes +`http_clients_from_brownfield_pct` / `async_producers_from_brownfield_pct`. + +--- + +## 7. MCP surface + +### 7.1 New tools + +| Tool | Purpose | Inputs | Output | +| --------------------- | ---------------------------------------------------------------------------------------- | -------------------------------------------- | -------------------------------------------- | +| `find_route_callers` | All `Symbol`s that call a given `Route` (cross- and intra-service) | `route_id` *or* (`microservice`, `path_template`, `method`) | List of caller `Symbol`s with `confidence`, `microservice`, `match` | +| `trace_request_flow` | Walk `(entry)-[:HTTP_CALLS\|ASYNC_CALLS]->(Route)<-[:EXPOSES]-(handler)-[:CALLS*]->(…)` for N hops | `entry_route_id`, `max_hops` | Ordered chain across services, with confidence per hop | + +### 7.2 Existing tool changes + +- `impact_analysis`: extend reverse closure to follow `HTTP_CALLS` + and `ASYNC_CALLS` edges *outbound from the changed `Route`* — so + "what breaks if I rename `POST /api/orders`" works. +- `trace_flow`: add `HTTP_CALLS` and `ASYNC_CALLS` to its budgeted + walk; preserve the structural-first ordering from the call-graph + D5 fix. +- `analyze_pr` (B4): if the PR touches a method with `EXPOSES` edges, + surface "N callers across M services" in the risk score. + +--- + +## 8. Tests + +**[TBD — full plan after extraction patterns settle.]** Mandatory +buckets: + +- **Per-pattern detection** (one fixture per `client_kind`). +- **Three-strategy resolution** (literal / SpEL / constant) — same + cases as B2a but on the caller side. +- **Cross-service matching** — Feign name match, HTTP path-template + match, Kafka topic+broker match. +- **Match-outcome enum** — at least one fixture per outcome + (`cross_service`, `intra_service`, `ambiguous`, `phantom`, + `unresolved`). +- **Brownfield**: 12 fixtures mirroring B2a §4.8 (custom annotation, + fqn override, meta-chain, `@CodebaseClient` wins over auto-detect, + repeatable, etc.). +- **Confidence semantics** — assert `match_factor` × `confidence_base` + matches §5.3 for each outcome. +- **Microservice scoping** — feed a fixture with two services that + expose the same path; assert callers from each service match + *only* their counterpart, not their own service. +- **End-to-end traversal** — assert the + `(caller)-[:HTTP_CALLS]->(Route)<-[:EXPOSES]-(handler)` query + works without direction reversal (validates B2a's edge-direction + decision). + +--- + +## 9. Risks and open questions + +| # | Risk | Severity | Mitigation | +| -- | ------------------------------------------------------------------------------------------ | -------- | ------------------------------------------------------------------------------------------- | +| 1 | B2a's `path_regex` regression breaks B6 | High | B2a §4.8 must include round-trip tests on `path_template ↔ path_regex` so B6 inherits a stable contract. | +| 2 | `feign_name` resolution rules diverge between B2a (interface decl) and B2b (caller side) | High | One resolver, used by both passes (§5.2). PR description must cite shared helper location. | +| 3 | SpEL routes can't be matched cross-service | Medium | Accepted — `unresolved` outcome with `confidence=0.2`. Ingest property files in a follow-on PR. | +| 4 | Multi-broker Kafka — same topic on different brokers wrongly merged | Medium | Include `broker` in the join key. Default broker = `''` so single-broker codebases are unaffected. | +| 5 | Brownfield divergence from B2a's role/route resolver | High | Same mitigation as B2a §8 risk #5: implementer cites `PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md` line numbers. | +| 6 | Spring Cloud Gateway routes never appear, leaving phantom edges to gateway-routed services | Medium | Out of scope (§2). Document as known gap in `README` so users add brownfield overrides. | +| 7 | `RestTemplate` URIs built via `UriComponentsBuilder` chains | Medium | Best-effort: walk linear builder chains, fall back to `unresolved`. **[TBD: scope decision needed.]** | +| 8 | Performance — `pass5` adds another full AST walk | Low | Re-use the visitor from `pass3_calls`; only the *handlers* differ. Measure on the 5-service codebase before merge. | + +--- + +## 10. Definition of done + +- [ ] `Route` schema as it shipped in B2a verified against §3.1. +- [ ] `HTTP_CALLS` and `ASYNC_CALLS` tables created; ontology bumped to 6. +- [ ] `pass5_imperative_edges` runs after `pass4_routes`; stats + counter exposes per-`match`-outcome counts. +- [ ] Three-strategy resolver shared between B2a and B2b (no + duplication). +- [ ] Brownfield: `http_client_overrides`, `async_producer_overrides`, + `@CodebaseClient`, `@CodebaseProducer` all wired into + `BrownfieldOverrides` (extending, not paralleling). +- [ ] `graph_meta` reports `http_clients_from_brownfield_pct` and + `async_producers_from_brownfield_pct`. +- [ ] All test buckets in §8 covered. +- [ ] `find_route_callers` and `trace_request_flow` MCP tools live; + `impact_analysis` and `trace_flow` extended. +- [ ] Microservice-scoped CALLS gap (Tier 1 §10 follow-up #2) + either fixed in a sibling PR *before* this lands, or risk #2 + elevated and explicitly accepted. +- [ ] README / PRODUCT-VISION sections marked *planned* for + `HTTP_CALLS` / `ASYNC_CALLS` flipped to *shipped*. + +--- + +## 11. What this proposal does **not** decide + +These are deliberately left for the design pass that will turn this +skeleton into an active proposal: + +- Exact AST patterns for `WebClient` fluent chains and + `UriComponentsBuilder` URI construction. +- `StreamBridge` binding → topic resolution (read Spring Cloud + Stream config? defer entirely?). +- Whether `confidence` weights in §5.3 are correct on the real + 5-service codebase — needs measurement. +- Which (if any) OpenAPI / AsyncAPI doc sources to ingest as a + fallback resolver. +- Whether `find_route_callers` should accept regex over + `path_template` or only exact-match. + +When promoting this skeleton to "active", each `[TBD]` must be +resolved or explicitly deferred to a v2 proposal. + +--- + +## 12. References + +- [`TIER1-COMPLETION-PROPOSE.md`](TIER1-COMPLETION-PROPOSE.md) — B2a, B4, B5 (active). +- [`reports/what-to-borrow-from-cmm.md`](../reports/what-to-borrow-from-cmm.md) §B2, §B6. +- [`reports/call-graph-review.md`](../reports/call-graph-review.md) — invariants this proposal must not regress. +- [`plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md`](../plans/completed/PLAN-BROWNFIELD-ROLE-OVERRIDES-design-fixes.md) — mandatory reading for §6. +- [`propose/PRODUCT-VISION.md`](PRODUCT-VISION.md) §3 — `HTTP_CALLS` / `ASYNC_CALLS` are listed as *planned*; this proposal flips them to *shipped*.