Skip to content

feat(mcp): steer agents to explore-first; fix Kotlin/Swift test detection#191

Merged
colbymchenry merged 2 commits into
mainfrom
feat/explore-first-guidance
May 20, 2026
Merged

feat(mcp): steer agents to explore-first; fix Kotlin/Swift test detection#191
colbymchenry merged 2 commits into
mainfrom
feat/explore-first-guidance

Conversation

@colbymchenry
Copy link
Copy Markdown
Owner

Came out of refreshing the README benchmark, which surfaced that Claude Code's Explore agent wasn't using codegraph_explore at all.

Diagnosis

Same VS Code question ("how does the extension host communicate with the main process?"), two agents:

Agent explore calls Tool uses Tokens
Explore subagent 0 37 ~90k
general-purpose 1 13 ~55k

The Explore agent used codegraph_search + Read + grep — it treated codegraph as a search index and never reached explore, landing at ~the without-CodeGraph token cost. Root causes: (1) the "search first to find names, then explore" guidance short-circuited — agents searched, got file:line locations, and Read them instead of feeding the names to explore; (2) explore was framed as a heavy last-resort for "unfamiliar surveys."

Changes

  1. Guidance reframe (4 spots in sync: server-instructions.ts, instructions-template.ts, .cursor/rules/codegraph.mdc, + the explore/search tool descriptions): codegraph_explore is the workhorse for understanding/"how does X work"/architecture questions. Seed it with the key symbol names (a quick codegraph_search/codegraph_context first only if the question names nothing concrete), read its output, fill gaps with node/Read — don't search-then-Read each file.

  2. isTestFile fix: now recognizes Kotlin (*Test.kt, jvmTest//commonTest//androidTest/ source sets), Swift (*Tests.swift), and other camelCase test conventions, so tests get deprioritized in explore/context ranking. Previously only Java/JS/Python were known.

    OkHttp cold "trace a request" Before After
    Real files 1/9 Call.kt, OkHttpClient.kt, Request.kt, Response.kt
    Test files 8/9 3/9

    Capital-led matching keeps latest.kt / manifest.kt / RealCall.kt unflagged.

Dropped

An IDF common-term down-weighting was prototyped for the cold-query case but dropped — measured a no-op (the supposedly-"common" terms weren't actually common in the test indexes: "process" 0.3%, "main" 0.8% in VS Code). The test-detection gap was the real cold-query noise source.

Test plan

  • 5 new tests in __tests__/is-test-file.test.ts (Kotlin/Swift/camelCase detection + false-positive guards)
  • Full suite 621/621
  • Cold-query measurement on VS Code / Flask / OkHttp before/after
  • Reviewer: the deeper cold-query gap (a plain NL question can't name the core symbol, so VS Code's rpcProtocol.ts is still missed) is intentionally left to the seed-then-explore guidance rather than a retrieval change — flag if you'd rather chase it in retrieval.

colbymchenry and others added 2 commits May 19, 2026 19:34
…tion

Two changes from diagnosing why Claude Code's Explore agent wasn't using
codegraph_explore on a benchmark run (37 calls / ~90k tokens via
search+Read+grep, vs a general-purpose agent that led with explore: 13
calls / ~55k tokens for the same question).

1. Tool guidance reframed across server-instructions.ts,
   instructions-template.ts, and .cursor/rules/codegraph.mdc (+ the
   explore/search tool descriptions): codegraph_explore is the workhorse
   for understanding/architecture/"how does X work" questions. Seed it with
   the key symbol names (a quick search/context first if the question names
   nothing concrete), read its output, and fill gaps with node/Read —
   instead of searching then Reading each file. The old "search first to
   find names, then explore" wording was short-circuiting: agents searched,
   got file:line locations, and Read them, never reaching explore.

2. isTestFile now recognizes Kotlin (*Test.kt, jvmTest/commonTest/
   androidTest source sets), Swift (*Tests.swift), and other camelCase test
   conventions, so test code is deprioritized in explore/context ranking.
   Previously only Java/JS/Python were known, letting tests dominate
   Kotlin/Swift exploration (OkHttp "trace a request" went from 8/9 test
   files to surfacing Call.kt/OkHttpClient.kt/Request.kt/Response.kt).
   Capital-led matching keeps latest.kt/manifest.kt unflagged.

An IDF common-term down-weighting was prototyped for the cold-query case
but dropped — it was a measured no-op (the "common" terms weren't actually
common in the test indexes); the test-detection gap was the real cause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…usage

Tooling to measure how a Claude Code agent actually uses the codegraph
MCP tools on a real repo — does it lead with codegraph_explore, how many
Read/Grep follow-ups, token cost — for validating tool-guidance changes
(server-instructions, tool descriptions) against real agent behavior.

- itrun.sh drives the real interactive TUI via tmux (the faithful
  Explore path). Hardened for unattended runs: type-and-verify prompt
  delivery (the ❯ glyph is drawn ~6s before the input accepts keys),
  auto-accepts the "trust this folder" dialog, busy-detection keys on
  the universal "(Ns · …)" spinner so the pre-stream thinking phase
  counts as busy, and fails loudly instead of capturing an empty pane.
- parse-session.mjs reports the tool breakdown + token accounting
  (gen / fresh-in / cached-in / billable) from the session and subagent
  logs, consistent across main-thread and subagent runs; counts
  main-thread Bash in the grep verdict.
- run-agent.sh / parse-run.mjs are the headless stream-json complement
  (exact per-tool tokens/cost via claude -p).
- run-interactive-test.md documents how to run it and how completion is
  detected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@colbymchenry colbymchenry merged commit 37cf566 into main May 20, 2026
colbymchenry added a commit that referenced this pull request May 20, 2026
Folds all changes since 0.7.10 into 0.7.12 (0.7.11 was unpublished from
npm): size-adaptive codegraph_explore output budget (#185/#187), line
numbers in explore source sections (#188), explore-first tool guidance
(#191), language-neutral source-omission markers, and Kotlin/Swift
test-file detection (#191).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant