From 93dc4ee968f228e03069ee930a704659500c4379 Mon Sep 17 00:00:00 2001 From: Dmitry Teryaev Date: Sat, 23 May 2026 19:16:59 +0300 Subject: [PATCH] decline HINTS-MCP-CALL-SHAPE propose Remove the in-flight propose for MCP-copyable hint strings and 250-char cap; implementation and planning for issue #195 are not proceeding on this path. Co-authored-by: Cursor --- propose/HINTS-MCP-CALL-SHAPE-PROPOSE.md | 217 ------------------------ 1 file changed, 217 deletions(-) delete mode 100644 propose/HINTS-MCP-CALL-SHAPE-PROPOSE.md diff --git a/propose/HINTS-MCP-CALL-SHAPE-PROPOSE.md b/propose/HINTS-MCP-CALL-SHAPE-PROPOSE.md deleted file mode 100644 index 58341e4..0000000 --- a/propose/HINTS-MCP-CALL-SHAPE-PROPOSE.md +++ /dev/null @@ -1,217 +0,0 @@ -# HINTS-MCP-CALL-SHAPE — copy-safe hint strings and honest node-id docs - -## Status - -**Proposal** — not yet implemented. - -**Tracks:** [issue #195](https://github.com/HumanBean17/java-codebase-rag/issues/195) (battle-test: agents copy `neighbors([''],…)` from hints → `Unknown id prefix for \`['']\``). - -**Implements fix combo:** **1** (hint templates) + **2** (agent docs + `server.py` field descriptions) + **6** (align id-shape documentation with live graph ids). Does **not** implement runtime coercion (options 3–5), structured hints (7), or host-only rules (10). - -**Amends (catalog lock):** [`propose/completed/HINTS-ROAD-SIGNS-PROPOSE.md`](./completed/HINTS-ROAD-SIGNS-PROPOSE.md) Appendix A, **§7.6 rendered-length cap (120 → 250)**, and follow-on hint proposes (v2/v3/v4, CALLS high-fanout strings). In-flight [`propose/DESCRIBE-HINTS-STRUCTURAL-PROPOSE.md`](./DESCRIBE-HINTS-STRUCTURAL-PROPOSE.md) must adopt the same call-shape convention and **250-char** cap for any **new** `TPL_DESCRIBE_*` rows landed after this work (or land structural templates in the same PR as this propose). - -## Problem statement - -Road-sign hints intentionally embed a **next-call sketch** so agents can chain tools without re-reading the schema. The locked v1 catalog uses **pseudo-Python** call syntax: - -```text -routes via members: neighbors(['{id}'],'out',['DECLARES.EXPOSES']) -``` - -Agents treat that sketch as the literal MCP argument shape. Two failures follow: - -| Agent copies | FastMCP `pre_parse_json` | `neighbors_v2` sees | Result | -|--------------|---------------------------|---------------------|--------| -| `"ids": "['']"` | `json.loads` fails (single quotes) → string passed through | one origin id = the literal `['']` | `Unknown id prefix for \`['']\`` | -| `"ids": "[\"\"]"` | parses to `list[str]` | correct | works | -| `"ids": ""` or `"ids": [""]` | string or list | correct | works | - -Root cause is **not** bad graph data or wrong ids from `resolve` / `describe` — `record.id` is already correct. The failure mode is **hint ergonomics** plus **misleading docs** that teach `sym:`-prefixed examples while the indexer emits **bare 40-character SHA1 hex** for `Symbol.id` (and short prefixed ids for route/client/producer — see §6). - -Secondary confusion: README and [`docs/AGENT-GUIDE.md`](../docs/AGENT-GUIDE.md) show `sym:com.bank…#method(…)` in tool examples while battle tests and brownfield corpora return unprefixed symbol ids. Agents prefix ids that do not need prefixing, or distrust bare hex as “wrong.” - -## Proposed solution - -### 1 — Hint template dialect (option 1) - -Replace pseudo-Python positional/`['{id}']` forms with an **MCP-copyable named-parameter sketch** aligned with the slash-alias JSON style already in AGENT-GUIDE § “Slash-style aliases”: - -```text -routes via members: neighbors(ids="{id}", direction="out", edge_types=["DECLARES.EXPOSES"]) -``` - -**Normative rules for every hint emission that references `neighbors` or `resolve`:** - -| Rule | Rationale | -|------|-----------| -| **Single origin** | `ids="{id}"` (string), never `ids=['{id}']` or `ids="['{id}']"`. | -| **Batch / placeholder lists** | `ids=member_ids` or `ids=client_ids` (bare placeholder token), not bracket-wrapped pseudo-lists. | -| **`direction` / `edge_types`** | Always named; `edge_types` uses JSON double-quoted strings inside the sketch. | -| **Two-hop chains** | Keep ` then ` between two `neighbors(…)` sketches when the catalog already uses a chain; each hop obeys the same rules. Prefer **dot-key** shortcuts where they are clearer; with the 250-char cap, re-expanding a two-hop chain is allowed when it improves copy clarity (see §1b). | -| **`edge_filter` / `filter` in meta hints** | Use JSON object literals with double quotes, e.g. `edge_filter={"callee_declaring_role":"SERVICE"}`, not Python `{{…}}` / single-quoted dicts. | -| **`resolve` / `find` / `search` sketches** | Same convention: `resolve(identifier="{identifier}", hint_kind="{kind}")`, `search(query="{q}")` — no Python-only quoting. | -| **Rendered length** | See §1b (250 chars, not 120). | - -Update `MCP_HINTS_FIELD_DESCRIPTION` in `mcp_hints.py` to state explicitly: *hints use MCP-copyable named-parameter sketches; copy `record.id` (or `results[i].id`) into `ids`, not Python list literals; each hint is at most 250 characters after placeholder substitution.* - -### 1b — Hint length cap: 120 → 250 - -Road-sign discipline keeps **≤ 5 hints per output** and **no prose tutorials** (HINTS-ROAD-SIGNS §2–§3). The **per-hint rendered length** limit moves from **120** to **250** characters (measured on the string **after** `{id}`, `{identifier}`, and other placeholders are substituted with realistic values — same rule as v1 §7.6, new numeric cap). - -**Why 250 (not 120 or 500)** - -- **120** is too tight once hints use named MCP parameters and JSON `edge_filter` literals. -- **500** was rejected as excessive for road-sign discipline (~1.25k chars if all five hints fire). -- **250** fits one copyable `neighbors(…)` call plus a short filter example; HINTS-V4’s ~194-char combined N1 row still fits. - -**Why raise from 120** - -- MCP-copyable named-parameter sketches are longer than pseudo-Python `neighbors(['{id}'],…)`. -- CALLS meta hints (`TPL_NEIGHBORS_CALLS_HIGH_FANOUT`, role-collision, unresolved) were written against the 120 cap with compressed grammar; 250 allows full `edge_filter={…}` examples without drop-on-overflow. -- HINTS-V4 **rejected** a combined N1 dot-key line at ~194 chars (v4 propose); that row fits under 250 if we choose to emit it (optional, not required for #195). - -**Implementation** - -| Item | Change | -|------|--------| -| `mcp_hints.py` | Replace `_FIND_SUCCESS_MAX_CHARS`, `_RESOLVE_HINT_MAX_CHARS`, `_NEIGHBORS_SUCCESS_MAX_CHARS` (all `120`) with one module constant **`HINT_MAX_RENDERED_CHARS = 250`** used by every drop-on-overflow check. | -| `MCP_HINTS_FIELD_DESCRIPTION` | Mention **250** chars per hint (still max **5** hints per output). | -| `tests/test_mcp_hints.py` | Rename/update `test_hints_*_under_120_chars` → **`test_hints_*_under_250_chars`**; assert `len(rendered) <= 250`. Remove or rewrite tests that **require** overflow past the cap (e.g. N1a rendered length `> 120` used only to justify split hints). | -| Completed proposes | Not edited on disk; this file is the amendment for §7.6. | - -**Drop-on-overflow unchanged:** if substitution still exceeds 250 chars, the hint is **not** emitted (no truncation, no ellipsis) — same policy as v2 §7.18. - -**Files — template source of truth** - -| File | Change | -|------|--------| -| `mcp_hints.py` | All `TPL_*` constants with `neighbors(['{id}']` or positional `neighbors(…)`; v4 success templates (`client_ids`, `route_ids`, …); CALLS meta hints with `edge_filter=`; **`HINT_MAX_RENDERED_CHARS = 250`**. | -| `java_ontology.py` | `EDGE_SCHEMA[*].typical_traversals` strings (feed empty-`neighbors` structural hints via `canonical_traversal`). | -| `scripts/generate_edge_navigation.py` + regenerate `docs/EDGE-NAVIGATION.md` | Keep generated doc aligned with `EDGE_SCHEMA` (same PR). | - -**Inventory (grep-driven, implementation checklist)** - -- `mcp_hints.py`: 16+ template lines with `neighbors(['{id}']` (describe/find/v4 success) plus v4 placeholder-id rows and CALLS filter hints. -- `java_ontology.py`: `_SYMBOL_TYPE_TRAVERSAL`, `_COMPOSED_MEMBER_TYPE_TRAVERSAL`, and per-edge `member_subject` / `type_subject` strings (19 occurrences of `['{id}']` pattern). -- Tests asserting verbatim template text: `tests/test_mcp_hints.py` (primary), any integration tests that golden-match hint substrings. - -### 2 — Documentation and schema descriptions (option 2) - -Add a short, normative **“Copying hints into tool calls”** subsection to [`docs/AGENT-GUIDE.md`](../docs/AGENT-GUIDE.md) (after “JSON, not stringified JSON” or inside the `neighbors` section): - -- Hints are **advisory**; authoritative id is always `describe.record.id`, `find.results[i].id`, `neighbors.results[i].other.id`, or `resolve.candidates[i].id`. -- For `neighbors`, pass that id as **`ids` string** for one origin, or **`ids` JSON array** for batch — never a string that looks like a Python list (`"['…']"`). -- FastMCP may accept a **JSON-encoded array string** (`"[\"id\"]"`); that is host behavior, not something hints should require agents to manufacture. -- Point to the new hint dialect (named parameters in hint text). - -Tighten tool-facing prose: - -| Location | Change | -|----------|--------| -| `server.py` → `neighbors` → `ids` `Field(description=…)` | State copy-safe shapes; warn against Python-list string for `ids`. | -| `server.py` → `describe` → `id` (if present) | One line: ids are graph-native (hex or prefixed route/client/producer). | -| `MCP_HINTS_FIELD_DESCRIPTION` | As in §1. | - -No change to MCP tool **signatures** or Pydantic models beyond `description=` strings. - -### 6 — Honest node-id documentation (option 6) - -Correct the public docs so examples match what the graph and tools return. - -| Kind | Live shape (indexer / Kuzu) | Docs today (misleading) | -|------|----------------------------|-------------------------| -| **Symbol** | 40-char lowercase SHA1 hex (`graph_enrich.symbol_id`) | `sym:` + FQN in README / AGENT-GUIDE tables | -| **Route** | `r:` + 16 hex | `route:` or `r:` (partially OK) | -| **Client** | `c:` + 16 hex | `client:` or `c:` (partially OK) | -| **Producer** | `p:` + 16 hex | `producer:` or `p:` (partially OK) | -| **UnresolvedCallSite** | `ucs:` + … | Documented; unchanged | - -**Doc edits** - -- [`docs/AGENT-GUIDE.md`](../docs/AGENT-GUIDE.md) § “Node ids”: Symbol row = **bare SHA1 hex**; note `sym:` is accepted on input via prefix convention but **not** what `describe` / `find` / `neighbors` emit for symbols. -- [`README.md`](../README.md) MCP tool examples: replace `sym:…ChatController#…` symbol examples with a **placeholder** bare hex (or “40-char hex from `describe.record.id`”) while keeping prefixed examples for route/client/producer where accurate. -- Optional one-line in `server.py` `describe` / `neighbors` descriptions: “Symbol ids are typically 40-char hex; route/client/producer use `r:`/`c:`/`p:` prefixes.” - -**Runtime:** No change to `_resolve_node_kind` — it already accepts bare hex and optional prefixes. This slice is documentation + hints only. - -## Scope - -| In scope | Out of scope (issue #195 options) | -|----------|-----------------------------------| -| Rewrite hint + `EDGE_SCHEMA` traversal strings to MCP-copyable form | `_coerce_ids()` / Python-bracket unwrap (3, 5) | -| **250-char** rendered hint cap (replaces 120) + unified `HINT_MAX_RENDERED_CHARS` | Raising **5 hints/output** cap | -| AGENT-GUIDE + README id-shape + hint-copy section | Fail-loud-only teaching error (4) without coercion | -| `server.py` `Field(description=…)` updates | `hints_structured` (7) | -| Regenerate `docs/EDGE-NAVIGATION.md` | Shorter hints-only labels (8) | -| `tests/test_mcp_hints.py` + sentinel regression test | FastMCP `ast.literal_eval` fallback (9) | -| Cross-reference in-flight DESCRIBE-HINTS-STRUCTURAL | Cursor host rules only (10) | - -## Schema / Ontology / Re-index impact - -- **Ontology bump:** not required. -- **Re-index required:** no (string-only hint and doc changes). -- **Config / tool surface:** no new tools or env vars; optional `description=` text on existing fields only. - -## Tests / validation - -1. **`.venv/bin/ruff check .`** -2. **`.venv/bin/python -m pytest tests/test_mcp_hints.py -v`** — existing template equality tests follow `mcp_hints.TPL_*` constants; update expectations when constants change. -3. **`test_hint_templates_never_use_python_list_ids`** — assert no rendered catalog template contains `(['` or `"['"` (after format with a realistic 40-char hex id). -4. **`test_hint_source_no_python_bracket_ids`** — source-level sentinel: no `neighbors(['` in `mcp_hints.py` or `java_ontology.py` (decision §5). -5. **Char-cap sweep:** `test_hints_template_rendered_length_leq_250` (and v4 parametrized sibling) — every catalog template renders to **≤ 250** chars with a realistic 40-char hex `{id}` (and realistic long FQN for resolve/search templates). -6. **Manual smoke (PR evidence):** - - `describe` on a controller from `tests/bank-chat-system` → copy `record.id` into `neighbors(ids="", direction="out", edge_types=["DECLARES.EXPOSES"])` via MCP host or `mcp_v2.neighbors_v2` with `list[str]` / bare string — success. - - Confirm copying the **old** hint shape `"ids": "['']"` still fails (documents why we changed hints; optional post-implementation note in PR). - -Heavy / graph rebuild tests not required. - -## Decisions (resolved) - -All former open questions — **chosen** 2026-05-21 (align with issue #195 recommendations). - -| # | Question | **Chosen** | -|---|----------|------------| -| 1 | Hint sketch vs strict JSON object | **Named-parameter sketch** in hint text (`neighbors(ids="…", direction="out", edge_types=[…])`). AGENT-GUIDE slash-alias table keeps JSON objects for hosts that prefer a single blob. | -| 2 | Single PR vs docs-first | **One implementation PR** — templates, docs, and tests land together (no docs-only lead). | -| 3 | DESCRIBE-HINTS-STRUCTURAL sequencing | Whichever lands first wins; the other rebases within the same week onto §1 rules + 250-char cap. **Never** merge new `TPL_DESCRIBE_*` using `['{id}']` syntax. | -| 4 | Mechanical CI lint | **Yes** — `test_hint_source_no_python_bracket_ids` (no `neighbors(['` in `mcp_hints.py` / `java_ontology.py`) plus rendered-output sentinels. | -| 5 | Re-merge HINTS-V4 N1a+N1b | **Out of #195 MVP** — keep N1a + N1b separate; combined dot-key line remains optional follow-up only. | - -## Out of scope - -- Runtime acceptance of Python-literal `ids` strings in `mcp_v2.neighbors_v2` (options 3–5) — follow-up propose if battle tests still show copy errors after 1+2+6. -- Changing FastMCP or MCP host JSON pre-parse behavior. -- `hints_structured: [{tool, args}]` (issue option 7). -- Bumping `ontology_version` or graph builder id assignment. -- Editing completed propose files in `propose/completed/` (this file is the amendment record). - -## Sequencing / Follow-ups - -| Step | Deliverable | -|------|-------------| -| **Implementation PR (one)** | Templates + `EDGE_SCHEMA` traversals + EDGE-NAVIGATION regen + AGENT-GUIDE/README/server descriptions + tests (decision §2) | -| **Follow-up (optional)** | Coercion / fail-loud hybrid (issue #195 options 4–5) if telemetry still shows `['…` id strings | - -**Suggested branch:** `cursor/hints-mcp-call-shape-195` or `feat/hints-mcp-call-shape-195`. - -## PR body template (implementation PR) - -```markdown -## Summary -- Fixes #195 (combo 1+2+6): MCP-copyable hint sketches; docs state bare SHA1 symbol ids. -- Implements `propose/HINTS-MCP-CALL-SHAPE-PROPOSE.md`. - -## Highlights -- Hint templates use `ids="{id}"`, named `direction` / `edge_types` (no `['{id}']`). -- Per-hint rendered cap **250** chars (`HINT_MAX_RENDERED_CHARS`; was 120). -- AGENT-GUIDE + README + server Field descriptions aligned with live id shapes. -- EDGE_SCHEMA / EDGE-NAVIGATION traversal strings updated to match. - -## Test plan -- [ ] pytest tests/test_mcp_hints.py -- [ ] new sentinel: no `(['` in hint sources / rendered templates -- [ ] manual neighbors call with id from describe (bank-chat fixture) - -## Re-index -Not required. -```