From 0e1bbf141f61f4e04fecdb4d7a0e8e4b86f70278 Mon Sep 17 00:00:00 2001 From: Dmitry Date: Wed, 6 May 2026 12:27:27 +0000 Subject: [PATCH] =?UTF-8?q?propose:=20list=5Fclients=20MCP=20tool=20?= =?UTF-8?q?=E2=80=94=20outbound-side=20counterpart=20to=20list=5Froutes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Companion propose to the v2 brownfield annotations PR (#36). After v2 lands, Feign declarations leave the Route table — they're correctly modeled as outbound @CodebaseClient annotations rather than inbound routes. This leaves a real workflow without an entry point: "show me every outbound HTTP call this service makes". Pre-v2, agents reach for list_routes(framework=feign), which is wrong on three counts (only Feign, conflates direction, returns nothing post-v2). The propose adds: 1. A new Client graph node table storing outbound-client declarations (one row per @CodebaseClient annotation; Feign methods synthesised from source). 2. A new DECLARES_CLIENT(Symbol → Client) rel table mirroring the existing EXPOSES(Symbol → Route) edge. 3. A list_clients MCP tool with filters symmetric to list_routes (microservice, client_kind, target_service, path_prefix, method). The HTTP_CALLS(Symbol → Route) edge stays unchanged — Client is additional caller-side metadata, not a replacement. Pass6's hint recovery walk retargets from the caller's http_consumer route to the caller's DECLARES_CLIENT → Client path. Same data, new home. Three follow-up tools sketched but punted: - get_client_by_path (symmetric with get_route_by_path) - find_client_callers - find_client_target_route A parallel Producer node + list_async_producers tool is also flagged as future work for the async outbound side. ONTOLOGY_VERSION bump: 9 → 10. No code change in this propose; doc-only. --- propose/LIST-CLIENTS-MCP-TOOL-PROPOSE.md | 305 +++++++++++++++++++++++ 1 file changed, 305 insertions(+) create mode 100644 propose/LIST-CLIENTS-MCP-TOOL-PROPOSE.md diff --git a/propose/LIST-CLIENTS-MCP-TOOL-PROPOSE.md b/propose/LIST-CLIENTS-MCP-TOOL-PROPOSE.md new file mode 100644 index 0000000..e7ac5ab --- /dev/null +++ b/propose/LIST-CLIENTS-MCP-TOOL-PROPOSE.md @@ -0,0 +1,305 @@ +# `list_clients` MCP Tool — Outbound-Side Counterpart to `list_routes` + +## Status + +Proposal — depends on the brownfield annotations v2 propose +(`propose/BROWNFIELD-ANNOTATIONS-V2-PROPOSE.md`). This propose +defines a new MCP tool plus the persistence shape it queries; the +v2 annotations propose creates the data the tool consumes. + +## Problem Statement + +After the v2 annotations refactor, Feign declarations and other +outbound HTTP clients no longer live in the `Route` table. The +`list_routes` tool, post-v2, returns only inbound things this +service exposes (HTTP handlers + async listeners). A real workflow +is left without an entry point: + +> "Show me every outbound HTTP call this service makes, what +> service it targets, and what kind of client it uses." + +Today (pre-v2), an AMA agent answers this with +`list_routes(framework=feign)` — which is wrong on three counts: + +1. It only returns Feign declarations; imperative `RestTemplate` / + `WebClient` call sites are invisible because they don't carry a + compile-time path (they synthesize phantom routes that don't + surface in `list_routes`). +2. It conflates "this service exposes a Feign-fronted endpoint" + with "this service calls a Feign-fronted endpoint" — two + directions, same query. +3. After v2, the query returns nothing — Feign rows leave the + `Route` table. + +The agent needs a first-class outbound-client query tool. This +propose adds one and defines the persistence shape it reads from. + +## Proposed Solution + +### Two parts + +1. **A new graph node `Client`** that stores outbound-client + declarations (Feign methods, RestTemplate/WebClient call sites + when `@CodebaseClient` is present). One row per + `@CodebaseClient` annotation. + +2. **A new MCP tool `list_clients`** that queries `Client` with + filters symmetric to `list_routes`. + +### `Client` graph node — schema + +```sql +CREATE NODE TABLE Client ( + id STRING, -- deterministic: hash(microservice + member_fqn + clientKind + path + method) + client_kind STRING, -- enum: feign_method | rest_template | web_client + target_service STRING, -- e.g. "user-service" (optional; primarily Feign) + path STRING, -- remote URL template (raw) + path_template STRING, -- normalised ({} segments) + path_regex STRING, -- compiled match regex + method STRING, -- HTTP verb + member_fqn STRING, -- declaring caller method FQN+sig + member_id STRING, -- corresponding Symbol.id (for joins) + microservice STRING, -- caller's microservice (where this client lives) + module STRING, -- Maven/Gradle module of the caller + filename STRING, + start_line INT64, + end_line INT64, + resolved BOOLEAN, -- did extraction succeed (vs SpEL placeholder) + source_layer STRING, -- layer_a_meta | layer_b_ann | layer_b_fqn | layer_c_source | builtin + PRIMARY KEY (id) +); + +CREATE REL TABLE DECLARES_CLIENT ( + FROM Symbol TO Client, + confidence DOUBLE, + strategy STRING +); +``` + +The `Symbol → Client` `DECLARES_CLIENT` edge mirrors the existing +`Symbol → Route` `EXPOSES` edge — both link a member to its +direction-side declaration. + +The existing `HTTP_CALLS` edge stays `Symbol → Route` (the call +edge resolves to the *callee* `http_endpoint` Route). The `Client` +node is the *caller-side* metadata holder: it's where the resolver +finds path+target hints to do the matching. + +### `list_clients` MCP tool — surface + +```python +@mcp.tool( + name="list_clients", + description=( + "List outbound HTTP client declarations from the Kuzu graph " + "(Feign methods, RestTemplate/WebClient call sites annotated " + "with @CodebaseClient). Optional filters: microservice, " + "client_kind, target_service, path_prefix, HTTP method." + ), +) +async def list_clients( + microservice: str | None = Field(default=None, description="Filter to one microservice key (the caller's microservice)."), + client_kind: str | None = Field(default=None, description="Exact Client.client_kind: feign_method | rest_template | web_client."), + target_service: str | None = Field(default=None, description="Exact Client.target_service match (e.g. 'user-service')."), + path_prefix: str | None = Field(default=None, description="Client.path STARTS WITH this string."), + method: str | None = Field(default=None, description="HTTP verb on Client.method (GET, POST, …); omit for any."), + limit: int = Field(default=100, ge=1, le=500), +) -> ClientsListOutput: + ... +``` + +DTO mirrors `RouteRowDto`: + +```python +class ClientRowDto(BaseModel): + id: str = "" + client_kind: str = "" + target_service: str = "" + method: str = "" + path: str = "" + path_template: str = "" + path_regex: str = "" + member_fqn: str = "" + member_id: str = "" + microservice: str = "" + module: str = "" + filename: str = "" + start_line: int = 0 + end_line: int = 0 + resolved: bool = True + + +class ClientsListOutput(BaseModel): + success: bool + clients: list[ClientRowDto] = Field(default_factory=list) + message: str | None = None +``` + +### Companion tools (out of scope but worth flagging) + +The `list_clients` shape implies three more tools that round out +the outbound side. These are **not in this propose** but are +listed here so the surface is coherent when they land: + +- **`get_client_by_path(microservice, path_template, method)`** — + symmetric with `get_route_by_path`. Resolves a single client. +- **`find_client_callers(client_id)`** — given a client + declaration, list the call sites that invoke it (e.g. who + calls a particular Feign method). Practically the same as + walking `DECLARES_CLIENT` reversed plus following the + declaring member's `CALLS` callers. +- **`find_client_target_route(client_id)`** — given a client, + resolve to the most likely target `Route` on a remote service. + This is essentially the matcher's output, exposed as a tool. + +These three are sketched here to show the surface but punted to a +follow-up. The minimum viable list_clients propose is the node + +the listing tool. + +### Companion: `list_async_producers` (parallel, also out of scope) + +The same gap exists for `@CodebaseProducer` rows — currently they +don't appear in `list_routes` either. By symmetry, a future propose +should add a `Producer` node + `list_async_producers` tool. Punted +for now. + +## Resolver Integration + +### Extraction + +`graph_enrich.py` and `ast_java.py` extract `@CodebaseClient` +annotations alongside the existing extraction passes. Each +annotation emits one `Client` row with deterministic `id` +(hash of `microservice + member_fqn + clientKind + path + method`) +and one `DECLARES_CLIENT` edge from the declaring `Symbol`. + +For Feign interface methods (post-v2): the extractor synthesises +a `@CodebaseClient(clientKind=feign_method, targetService=, +path=, method=)` from the source +even when no explicit `@CodebaseClient` annotation is present. +Brownfield overrides remain available for cases where the source +is missing or wrong. + +For imperative `RestTemplate` / `WebClient` call sites: extraction +happens **only** when the user adds `@CodebaseClient` explicitly. +The existing call-site heuristic for inferring path/method +(`build_ast_graph.py:1444`) remains in place for HTTP_CALLS +matching but does not synthesise `Client` rows on its own. + +### Pass6 hint recovery (post-v2 change) + +The current pass6 hint-recovery walk (`build_ast_graph.py:1741–1770`) +looks up the caller's `http_consumer` route to find path+target +hints. Post-v2, that lookup retargets to: "find the +`DECLARES_CLIENT` edge from the caller member to its `Client` +node and read `path` / `target_service` / `method` from there." + +Same data, new home. The matcher's downstream logic (path-regex +match against `http_endpoint` routes on the target service) +doesn't change. + +### `HTTP_CALLS` edge — unchanged + +`HTTP_CALLS(Symbol → Route)` continues to point from the calling +member to the resolved `http_endpoint` Route on the target +service. The new `Client` node is *additional* metadata, not a +replacement. A typical Feign call has both: + +- One `DECLARES_CLIENT(Symbol → Client)` — caller-side declaration +- One `HTTP_CALLS(Symbol → Route)` — resolved cross-service edge + +Symbol-graph queries that walk only `HTTP_CALLS` (e.g. +`find_route_callers`) keep working with no changes. + +## Acceptance Criteria + +1. `Client` node table + `DECLARES_CLIENT` rel table created at + graph build time. Schema matches the shape above. `ONTOLOGY_VERSION` + bumps by 1 (next available — currently 9, this would be 10). +2. `graph_enrich` / `ast_java` emit one `Client` row per + `@CodebaseClient` annotation and one row per Feign interface + method (synthesised from source). +3. `list_clients` MCP tool registered, returns rows matching the + declared filters. Empty results return `success=True` with + `clients=[]`, not an error. +4. On `tests/bank-chat-system` after v2 lands and a hand-applied + `@CodebaseClient` on at least one Feign interface: + - `list_clients()` returns ≥1 row. + - `list_clients(client_kind="feign_method")` returns only + Feign rows. + - `list_clients(microservice="chat-assign")` filters to that + service's outbound calls. + - The Feign row's `id` matches the `DECLARES_CLIENT` edge's + target node id. +5. Pass6 hint recovery uses `Client` rows after v2 lands. + Regression test: a Feign call from chat-core to chat-assign + resolves to the right `http_endpoint` Route via the new + `Client`-based hint path. Match outcome (`cross_service`) + unchanged. +6. `find_route_callers` for an `http_endpoint` Route on + chat-assign still returns the chat-core Feign caller as a + caller. (i.e. removing Feign from `Route` doesn't lose the + caller-side resolution path.) +7. New tests in `tests/test_list_clients.py` covering: + - Each filter parameter independently. + - Empty-result case. + - Limit parameter clamping. + - Deterministic `id` (rebuild produces stable ids). +8. Test baseline holds: full pytest suite green. Test count + grows by ~6–8 new test cases in `test_list_clients.py`. + +## Out of Scope + +- **`get_client_by_path`, `find_client_callers`, + `find_client_target_route`.** Sketched above; separate proposals + if needed. +- **`Producer` node + `list_async_producers`.** Parallel work for + the async outbound side; separate proposal. +- **YAML override schema for `Client` rows.** The existing + `http_client_overrides` YAML key already exists in the + override system (`graph_enrich.py:519` neighbourhood). It can + remain a YAML feature and feed `Client` rows directly. No + schema change in this propose. +- **Migrating existing `find_route_callers` to surface + Client-side info.** Out of scope; the tool stays Route-centric. + +## Open Questions + +1. **Should `target_service` be a foreign-key reference to a + `Microservice` node, or stay a plain string?** Today + `Route.microservice` is a string; `Client.target_service` + stays symmetric with that. Recommendation: keep as string for + v1 of `list_clients`; revisit if/when a `Microservice` node + table lands. + +2. **Should the `Client` row also carry call-edge resolution + outcome (matched / phantom / ambiguous), or is that purely a + property of `HTTP_CALLS` edges?** Recommendation: leave on + `HTTP_CALLS` only. `Client` is a static declaration; the + call-edge match is a separate (sometimes multi-edge) result. + +3. **Naming: `list_clients` vs `list_http_clients`.** The latter + is more precise (an async producer is also a "client" in some + sense). The shorter name reads better and matches the + `@CodebaseClient` annotation it surfaces. Recommendation: + `list_clients`. Async producers will get their own + `list_async_producers` tool; no collision. + +4. **Should imperative call sites (RestTemplate / WebClient) emit + `Client` rows automatically, without `@CodebaseClient`?** This + would mirror how `Route` is populated greenfield from + `@RestController`. Recommendation: **no** for v1 — too noisy + (every `restTemplate.exchange(...)` creates a row). Keep + imperative-call-site Client rows opt-in via `@CodebaseClient`. + Revisit after we see how brownfield projects use the tool. + +## Notes + +- This propose is structured to land **after** the v2 brownfield + annotations PR (#36). The data the tool reads from doesn't exist + until v2 reshapes how Feign declarations are stored. +- The `Client` node + `list_clients` tool are the minimum surface + to recover the workflow that `list_routes(framework=feign)` + served pre-v2 — and they do it more honestly (no direction + conflation) and more completely (brownfield-annotated + RestTemplate call sites become queryable too).