feat(mcp): steer agents to explore-first; fix Kotlin/Swift test detection by colbymchenry · Pull Request #191 · colbymchenry/codegraph

colbymchenry · 2026-05-20T00:34:32Z

Came out of refreshing the README benchmark, which surfaced that Claude Code's Explore agent wasn't using codegraph_explore at all.

Diagnosis

Same VS Code question ("how does the extension host communicate with the main process?"), two agents:

Agent	`explore` calls	Tool uses	Tokens
Explore subagent	0	37	~90k
general-purpose	1	13	~55k

The Explore agent used codegraph_search + Read + grep — it treated codegraph as a search index and never reached explore, landing at ~the without-CodeGraph token cost. Root causes: (1) the "search first to find names, then explore" guidance short-circuited — agents searched, got file:line locations, and Read them instead of feeding the names to explore; (2) explore was framed as a heavy last-resort for "unfamiliar surveys."

Changes

Guidance reframe (4 spots in sync: server-instructions.ts, instructions-template.ts, .cursor/rules/codegraph.mdc, + the explore/search tool descriptions): codegraph_explore is the workhorse for understanding/"how does X work"/architecture questions. Seed it with the key symbol names (a quick codegraph_search/codegraph_context first only if the question names nothing concrete), read its output, fill gaps with node/Read — don't search-then-Read each file.
isTestFile fix: now recognizes Kotlin (*Test.kt, jvmTest//commonTest//androidTest/ source sets), Swift (*Tests.swift), and other camelCase test conventions, so tests get deprioritized in explore/context ranking. Previously only Java/JS/Python were known.

OkHttp cold "trace a request" Before After

Real files 1/9 Call.kt, OkHttpClient.kt, Request.kt, Response.kt

Test files 8/9 3/9

Capital-led matching keeps latest.kt / manifest.kt / RealCall.kt unflagged.

Dropped

An IDF common-term down-weighting was prototyped for the cold-query case but dropped — measured a no-op (the supposedly-"common" terms weren't actually common in the test indexes: "process" 0.3%, "main" 0.8% in VS Code). The test-detection gap was the real cold-query noise source.

Test plan

5 new tests in __tests__/is-test-file.test.ts (Kotlin/Swift/camelCase detection + false-positive guards)
Full suite 621/621
Cold-query measurement on VS Code / Flask / OkHttp before/after
Reviewer: the deeper cold-query gap (a plain NL question can't name the core symbol, so VS Code's rpcProtocol.ts is still missed) is intentionally left to the seed-then-explore guidance rather than a retrieval change — flag if you'd rather chase it in retrieval.

…tion Two changes from diagnosing why Claude Code's Explore agent wasn't using codegraph_explore on a benchmark run (37 calls / ~90k tokens via search+Read+grep, vs a general-purpose agent that led with explore: 13 calls / ~55k tokens for the same question). 1. Tool guidance reframed across server-instructions.ts, instructions-template.ts, and .cursor/rules/codegraph.mdc (+ the explore/search tool descriptions): codegraph_explore is the workhorse for understanding/architecture/"how does X work" questions. Seed it with the key symbol names (a quick search/context first if the question names nothing concrete), read its output, and fill gaps with node/Read — instead of searching then Reading each file. The old "search first to find names, then explore" wording was short-circuiting: agents searched, got file:line locations, and Read them, never reaching explore. 2. isTestFile now recognizes Kotlin (*Test.kt, jvmTest/commonTest/ androidTest source sets), Swift (*Tests.swift), and other camelCase test conventions, so test code is deprioritized in explore/context ranking. Previously only Java/JS/Python were known, letting tests dominate Kotlin/Swift exploration (OkHttp "trace a request" went from 8/9 test files to surfacing Call.kt/OkHttpClient.kt/Request.kt/Response.kt). Capital-led matching keeps latest.kt/manifest.kt unflagged. An IDF common-term down-weighting was prototyped for the cold-query case but dropped — it was a measured no-op (the "common" terms weren't actually common in the test indexes); the test-detection gap was the real cause. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…usage Tooling to measure how a Claude Code agent actually uses the codegraph MCP tools on a real repo — does it lead with codegraph_explore, how many Read/Grep follow-ups, token cost — for validating tool-guidance changes (server-instructions, tool descriptions) against real agent behavior. - itrun.sh drives the real interactive TUI via tmux (the faithful Explore path). Hardened for unattended runs: type-and-verify prompt delivery (the ❯ glyph is drawn ~6s before the input accepts keys), auto-accepts the "trust this folder" dialog, busy-detection keys on the universal "(Ns · …)" spinner so the pre-stream thinking phase counts as busy, and fails loudly instead of capturing an empty pane. - parse-session.mjs reports the tool breakdown + token accounting (gen / fresh-in / cached-in / billable) from the session and subagent logs, consistent across main-thread and subagent runs; counts main-thread Bash in the grep verdict. - run-agent.sh / parse-run.mjs are the headless stream-json complement (exact per-tool tokens/cost via claude -p). - run-interactive-test.md documents how to run it and how completion is detected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Folds all changes since 0.7.10 into 0.7.12 (0.7.11 was unpublished from npm): size-adaptive codegraph_explore output budget (#185/#187), line numbers in explore source sections (#188), explore-first tool guidance (#191), language-neutral source-omission markers, and Kotlin/Swift test-file detection (#191). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

colbymchenry and others added 2 commits May 19, 2026 19:34

colbymchenry merged commit 37cf566 into main May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): steer agents to explore-first; fix Kotlin/Swift test detection#191

feat(mcp): steer agents to explore-first; fix Kotlin/Swift test detection#191
colbymchenry merged 2 commits into
mainfrom
feat/explore-first-guidance

colbymchenry commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

OkHttp cold "trace a request"	Before	After
Real files	1/9	`Call.kt`, `OkHttpClient.kt`, `Request.kt`, `Response.kt`
Test files	8/9	3/9

Conversation

colbymchenry commented May 20, 2026

Diagnosis

Changes

Dropped

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant