Skip to content

Propose: MCP API v2 — 4-tool graph navigator + ops CLI#48

Merged
HumanBean17 merged 5 commits into
masterfrom
propose/mcp-api-v2-redesign
May 7, 2026
Merged

Propose: MCP API v2 — 4-tool graph navigator + ops CLI#48
HumanBean17 merged 5 commits into
masterfrom
propose/mcp-api-v2-redesign

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Discussion PR — not for merge yet

Adds propose/MCP-API-V2-REDESIGN-PROPOSE.md. Goal: redesign the MCP API so the AMA agent (and any future agents) have an obvious, flexible, and small surface to navigate the Java codebase graph.

This PR is for review and comment. Implementation will land as PR-V2-1..4 once the design is locked.

Why now

  • Current API: 23 verb-first tools with overlapping responsibilities. Two separate trace_* tools confused the author of last week's PR reviews — and we hit this same class of confusion when debugging the empty list_clients() call this week.
  • The graph schema is settling (brownfield migration v3, ontology 11). Right moment to design the API on top of stable data.
  • "Nobody uses this MCP bundle yet" — we can hard-cutover without deprecation pain.

Core design call

The MCP is a GPS for the codebase, not a reasoning engine. Three phases per agent question — Locate → Inspect → Walk — and that's the entire abstraction. Multi-hop traversal is the agent's job; the MCP exposes structure.

This rules out trace_*, impact_*, and any NL "ask" tool. The agent walks via neighbors in a loop and decides its own stop conditions.

v2 surface

MCP — 4 tools (the agent's surface):

Tool Role
search(query, filter?) Locate nodes by NL/code text
find(kind, filter) Locate nodes by structured filter
describe(id) Full record + edge-count summary for one node
neighbors(ids, direction, edge_types) One-hop walk; direction + edge_types REQUIRED

CLI — 5 subcommands (the operator's toolbelt — user-rag command):

Subcommand Replaces
user-rag refresh refresh_code_index
user-rag meta graph_meta
user-rag tables list_code_index_tables
user-rag diagnose-ignore <path> diagnose_ignore
user-rag analyze-pr [--diff-file F] analyze_pr

23 → 4 MCP tools. The 5 ops tools move out of the MCP entirely because the agent never calls them — operators do.

What's dropped

trace_request_flow, trace_flow, impact_analysis — agent reasoning, not MCP responsibility. find_callers/find_route_callers/find_route_handlers/list_by_* — collapsed into find + neighbors with explicit edge types. codebase_search parameter explosion — split between search (vector) and find (filter).

Walked through 20 use cases

Distribution under v2 (see §7 of the proposal):

  • 12 of 20: 1–2 calls
  • 5 of 20: agent-driven neighbors loop — exactly the questions that should require thinking
  • 3 of 20: not navigation (handled by CLI / pure diff-grep)
  • 0 of 20: blocked by missing primitive

Migration — 4 PRs

  1. PR-V2-1 — implement search/find/describe/neighbors alongside v1; equivalence tests.
  2. PR-V2-2meta per-edge-type counts, search populates symbol_id, describe returns edge_summary.
  3. PR-V2-3 — delete 18 v1 navigation tools.
  4. PR-V2-4 — extract operational tools into user_rag/cli.py, remove from MCP, update README.

~700 LoC of test changes, ~500 LoC of handler+CLI code. No graph schema changes.

What I want from this discussion

  1. Sign-off on the GPS framing (tool reasoning lives in the agent, not the MCP).
  2. Sign-off on the MCP/CLI split (4 navigation tools + 5 CLI subcommands).
  3. Anything missing in the 20-use-case validation? Any real question Dmitry asks his codebase that I didn't list?
  4. Bikeshed: tool names. find vs list? describe vs inspect? The proposal locks names but they're cheap to change pre-implementation.

Once aligned, I'll split into 4 Cursor task prompts via the cursor-task-prompt skill and the implementation can start with PR-V2-1.


Companion to today's **/out/** fix (PR #47) — that was a symptom of the API's other failure mode (silent path filtering), this proposal addresses the agent-facing failure mode (overlapping tools, no multi-hop primitive).

@HumanBean17
Copy link
Copy Markdown
Owner Author

Great direction overall — I like the GPS framing and the 4-tool surface. I have a few consistency questions to de-risk implementation planning:

  1. PR count mismatch
  • Section header says "Migration plan — 3 PRs" but the body defines PR-V2-1 through PR-V2-4.
  • Should this be explicitly "4 PRs" everywhere?
  1. Ambiguous mapping for list_code_index_tables()
  • In the v1→v2 table it appears twice with different mappings:
    • list_code_index_tables() -> index_info()
    • list_code_index_tables() -> CLI user-rag tables
  • Which one is the intended final mapping? (I assume CLI, but want to confirm.)
  1. Use-case section still models CLI ops as MCP calls
  • UC12 uses meta() and UC16 uses meta() -> diagnose_ignore, both counted as MCP calls.
  • But the design says these move to CLI and MCP end-state is 4 tools.
  • Should these use cases be rewritten as CLI flows so the validation section matches the target architecture?

If you want, I can also propose exact wording edits to sections 7/10/11 so the implementation prompts have zero ambiguity.

Reframes the MCP as a graph navigator (GPS), not a reasoning engine.
The agent's job is to walk; the MCP's job is to expose what's adjacent.

MCP becomes 4 tools: search, find, describe, neighbors.
Drop trace/impact/ask — agent reasoning, not MCP responsibility.
Move 5 ops tools (refresh, meta, tables, diagnose-ignore, analyze-pr)
into a user-rag CLI. The agent never calls them; operators do.

Shared NodeFilter schema across search/find/neighbors.
Required direction + edge_types on neighbors (no fan-out by accident).
4-PR migration plan with hard cutover, no aliases.

Walked 20 use cases through the design: 12 in 1-2 calls, 5 are
agent-driven neighbours loops (correct), 3 are CLI/diff-grep, 0 missing
primitives.
@HumanBean17 HumanBean17 force-pushed the propose/mcp-api-v2-redesign branch from 12b9163 to d6c47ce Compare May 7, 2026 07:02
@HumanBean17
Copy link
Copy Markdown
Owner Author

Good catches, all three were real. Pushed as d6c47ce.

  1. PR count → fixed. §10 header now reads "4 PRs, hard cutover" — matches the TL;DR and the PR-V2-1..4 body.
  2. list_code_index_tables mapping → CLI is the intended final mapping. Removed the duplicate → index_info() row from the §11 mapping table; the only entry now is → user-rag tables.
  3. UC12 / UC16 in §7 → rewritten as CLI flows. The "Calls" column for both now reads "CLI" instead of a number, and the cells use user-rag meta / user-rag diagnose-ignore <path> so the validation matches the target architecture.

Also tightened the distribution summary: "12 of 20 single-shot MCP, 5 of 20 agent-driven MCP loop, 3 of 20 not navigation — UC11 pure diff-grep, UC12/UC16 CLI workflows." No more conflation.

No need for the wording-edit offer — these were the only three drift spots. Proposal is consistent end-to-end now.

@HumanBean17
Copy link
Copy Markdown
Owner Author

Follow-up review after the update (proposal + plan + prompts): this is much tighter now, but I still see a few execution-risk nits worth fixing before delegation.

  1. Test-count contract is too rigid and likely stale (high risk)
  • Plan/prompts require exact totals like 359/365/351/359 passed.
  • Those appear tied to an older baseline and can make DoD impossible even when the implementation is correct.
  • Suggestion: switch to "baseline + N new tests" or "full suite passes; new tests added as listed" instead of absolute totals.
  1. Small inconsistency in proposal NodeFilter key count
  • The schema lists 14 optional fields total (3 universal + 11 kind-specific), but prose says "11 optional keys, 3 universal".
  • Suggestion: reword to "14 optional keys total (3 universal, 11 kind-specific)".
  1. PR-V2-4 pretty-output test is ambiguous
  • Spec says pretty mode is auto-detected by isatty(), but test guidance suggests "PTY shim or --pretty flag" while --pretty is not in the declared CLI surface.
  • Suggestion: pick one explicitly:
    • either require PTY-based test only, or
    • add a real --pretty test-only override to CLI spec + tests.
  1. Manual evidence commands have brittle fixture assumptions (low)
  • Examples use hardcoded fixture paths/ids (tests/bank-chat-system/.kuzu, .git/HEAD) that may fail for setup reasons unrelated to the PR.
  • Suggestion: keep examples but label them "sample output may vary" or use setup-agnostic checks.

Everything else (the 3 earlier drift issues) looks resolved and the overall direction is strong.

@HumanBean17
Copy link
Copy Markdown
Owner Author

All four nits addressed in da59cfc. Summary:

1) Test-count contract (rigid totals → baseline+delta)

Replaced every absolute "N passed, 4 skipped" assertion across the plan and the cursor prompts with baseline + N new (or − deletions) phrasing. DoD checkboxes now read "Full suite green; the N new tests in <file> all pass." Plan rationale: baselines drift between branches and across machines, and gating implementation on a stale absolute count makes correct PRs un-mergeable.

2) NodeFilter key-count inconsistency

Re-counted the YAML schema in propose §4. The actual total is 15 keys (not 14): 3 universal + 5 symbol-only + 3 route-only + 4 client-only. The "4 symbol-only" line was wrong on my side — role, exclude_roles, annotation, capability, fqn_prefix is 5. Updated:

  • propose §4 prose and Appendix B → "15 optional keys total: 3 universal + 12 kind-specific (5 symbol + 3 route + 4 client)"
  • plan PR-V2-1 §1 → "3 universal + 5 symbol-only + 3 route-only + 4 client-only optional keys; total 15 optional fields"

3) PR-V2-4 pretty-output ambiguity → PTY-only

Picked PTY-only and explicitly forbade the side-door flag. Plan PR-V2-4 §1 now reads:

Output mode auto-detected; no user-facing flag controls it. Do not add --pretty, --json, or any equivalent override. The single isatty() switch is the contract; tests force it via PTY.

test_cli_meta_pretty_when_tty is rewritten to invoke under a real PTY using os.openpty() / pty.spawn. Escape hatch for flaky CI: @pytest.mark.skipif(...) with a clear reason — never a --pretty flag.

4) Manual evidence brittleness

Added a single shared disclaimer to the "Common rules" preamble of CURSOR-PROMPTS-MCP-API-V2.md instead of duplicating it across all four PRs:

Manual-evidence commands are samples. They reference the tests/bank-chat-system fixture and concrete ids/paths for illustration. Sample output may vary depending on the local fixture state. Adapt ids to whatever the local fixture actually contains; the shape of the output is the contract, not the exact strings.

Consistency pass run: no 359 passed / 365 passed / 351 passed strings remain; no stale 14 keys / 4 symbol-only references; --pretty appears only in the negative ("do not add"). Diff stat: propose +2/-2, plan +34/-14, prompts +29/-15.

Ready to delegate when you are.

Loosen brittle file-list and grep/count gate wording while keeping strict scope boundaries.
Validation now prefers surface assertions plus narrowly-related test harness allowances.

Co-authored-by: Cursor <cursoragent@cursor.com>
@HumanBean17 HumanBean17 marked this pull request as ready for review May 7, 2026 07:41
@HumanBean17 HumanBean17 merged commit 6d9571e into master May 7, 2026
@HumanBean17 HumanBean17 deleted the propose/mcp-api-v2-redesign branch May 10, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant