From 440ef7af14b35e4ed927ac5ac07211db8a1f527a Mon Sep 17 00:00:00 2001 From: PatrickSys Date: Mon, 13 Apr 2026 09:54:01 +0200 Subject: [PATCH 1/6] chore(release): prepare v2.1.0 --- CHANGELOG.md | 19 + docs/benchmark.md | 38 +- package.json | 2 +- results/comparator-evidence.json | 752 +++++++++++++++++++++++++++++- results/gate-evaluation.json | 131 +++--- src/tools/search-codebase.ts | 234 ++++++---- tests/search-compact-mode.test.ts | 86 ++++ 7 files changed, 1072 insertions(+), 190 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ecc2cbd..0be5b07 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,25 @@ ## Unreleased +## [2.1.0](https://github.com/PatrickSys/codebase-context/compare/v2.0.0...v2.1.0) (2026-04-13) + +### Features + +- **search:** surface chunk intelligence directly in `search_codebase` results, including symbol identity, scope, signature preview, and compact/full response budgeting +- **map:** upgrade the conventions map with structural skeleton sections and add `map --export` so the compact map can be written to `CODEBASE_MAP.md` + +### Bug Fixes + +- **metadata:** require real dependency evidence plus multiple framework indicators before labeling a repo as Next.js or another specialized framework +- **reranker:** auto-heal corrupted cross-encoder cache entries and surface degraded reranker state in `searchQuality.rerankerStatus` +- **benchmarks:** harden comparator lanes for cross-platform execution and keep setup failures explicit instead of silently turning them into claims + +### Documentation + +- publish the v2.1.0 discovery benchmark rerun with the current gate output: `pending_evidence`, `claimAllowed: false`, `24` frozen tasks, `0.75` average usefulness, and `1822.25` average estimated tokens +- document the current comparator truth instead of stale assumptions: the public proof still has no real comparator lane data on this host, so benchmark win claims remain blocked +- note the new `searchQuality.tokenEstimate` advisory contract: estimates are based on the pre-advisory response payload and warnings only appear above the 4K-token threshold + ### Features - **mcp:** rework multi-project routing so one MCP server can serve multiple projects instead of one hardcoded server entry per repo diff --git a/docs/benchmark.md b/docs/benchmark.md index 1ea836e..93a9408 100644 --- a/docs/benchmark.md +++ b/docs/benchmark.md @@ -1,6 +1,6 @@ # Discovery Benchmark -This page documents the current public proof slice for `v2.0.0`. +This page documents the current public proof slice for `v2.1.0`. It is a discovery benchmark, not an implementation-quality benchmark. ## Scope @@ -37,28 +37,30 @@ From `results/gate-evaluation.json`: - `claimAllowed`: `false` - `totalTasks`: `24` - `averageUsefulness`: `0.75` -- `averageEstimatedTokens`: `903.7083333333334` +- `averagePayloadBytes`: `7287.625` +- `averageEstimatedTokens`: `1822.25` +- `averageFirstRelevantHit`: `null` - `bestExampleUsefulnessRate`: `0.125` Repo-level outputs from the same rerun: -| Repo | Tasks | Avg usefulness | Avg estimated tokens | Best-example usefulness | -| --- | ---: | ---: | ---: | ---: | -| `angular-spotify` | 12 | 0.8333 | 1080.6667 | 0.25 | -| `excalidraw` | 12 | 0.6667 | 726.75 | 0 | +| Repo | Tasks | Avg usefulness | Avg payload bytes | Avg estimated tokens | Best-example usefulness | +| --- | ---: | ---: | ---: | ---: | ---: | +| `angular-spotify` | 12 | 0.8333 | 8553 | 2138 | 0.25 | +| `excalidraw` | 12 | 0.6667 | 6023 | 1506 | 0 | ## Gate Truth The gate is intentionally still blocked. -- The combined suite now covers both public repos. +- The combined suite covers both frozen public repos. - The release claim is still disallowed because comparator evidence remains incomplete. - Missing evidence currently includes: - raw Claude Code baseline metrics - - GrepAI metrics - - jCodeMunch metrics - - codebase-memory-mcp metrics - - CodeGraphContext metrics + - GrepAI comparator metrics + - jCodeMunch comparator metrics + - codebase-memory-mcp comparator metrics + - CodeGraphContext comparator metrics ## Comparator Reality @@ -66,20 +68,20 @@ The current comparator artifact records setup failures, not benchmark wins. | Comparator | Status | Current reason | | --- | --- | --- | -| `codebase-memory-mcp` | `setup_failed` | Installer path still points to the external shell installer | -| `jCodeMunch` | `setup_failed` | MCP server closes during startup | -| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present | -| `CodeGraphContext` | `setup_failed` | MCP server closes during startup | -| `raw Claude Code` | `setup_failed` | Local `claude` CLI baseline is not installed/authenticated in this environment | +| `codebase-memory-mcp` | `ok` | The lane now executes on this host, but the captured outputs are near-empty (`19` bytes / `5` tokens on average, `0` usefulness), so the gate still treats it as missing evidence | +| `jCodeMunch` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`) | +| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path are not present | +| `CodeGraphContext` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`); database prerequisite remains unresolved | +| `raw Claude Code` | `ok` | The baseline now runs, but the captured outputs remain non-useful (`66.08` bytes / `17.17` tokens on average, `0` usefulness), so the gate still treats it as missing evidence | -`CodeGraphContext` is explicitly part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start. +`CodeGraphContext` remains part of the comparison frame. It is not removed from the public story just because the lane still fails to start. ## Important Limitations - This benchmark measures discovery usefulness and payload cost only. - It does not measure implementation correctness, patch quality, or end-to-end task completion. - Comparator setup is still environment-sensitive, so the gate remains `pending_evidence`. -- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness. +- Current search payload costs are higher than the older v2.0.0 proof slice because the v2.1.0 surface now includes richer map structure and `searchQuality.tokenEstimate` advisories. - `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set. ## What This Proof Can Support diff --git a/package.json b/package.json index 604877b..9c89799 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "codebase-context", - "version": "1.9.0", + "version": "2.1.0", "description": "Pre-maps your codebase architecture, conventions, and team memory so AI agents navigate with precision instead of exploring. Local-first MCP server with AST-backed hybrid search.", "type": "module", "main": "./dist/lib.js", diff --git a/results/comparator-evidence.json b/results/comparator-evidence.json index 70efa3c..bcb8c16 100644 --- a/results/comparator-evidence.json +++ b/results/comparator-evidence.json @@ -1,7 +1,379 @@ { "codebase-memory-mcp": { - "status": "setup_failed", - "reason": "codebase-memory-mcp install failed. Run: curl -fsSL https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | sh" + "averageUsefulness": 0, + "averagePayloadBytes": 19, + "averageEstimatedTokens": 5, + "averageFirstRelevantHit": null, + "bestExampleUsefulnessRate": null, + "averageToolCallCount": 1, + "averageElapsedMs": 0.375, + "status": "ok", + "taskResults": [ + { + "taskId": "as-map-01", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "libraries actually used", + "patterns", + "generated:" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 2 + }, + { + "taskId": "as-map-02", + "job": "map", + "surface": "get_codebase_metadata", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "framework", + "architecture", + "statistics" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "as-map-03", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "state", + "patterns", + "libraries actually used" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 1 + }, + { + "taskId": "as-map-04", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "import aliases", + "tsconfig" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "as-find-01", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "dependencyInjection" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "as-find-02", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "stateManagement" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 1 + }, + { + "taskId": "as-find-03", + "job": "find", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "preflight", + "bestExample", + "patterns" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "as-find-04", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "unitTestFramework", + "test" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "as-search-01", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 1 + }, + { + "taskId": "as-search-02", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "as-search-03", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "as-search-04", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 1 + }, + { + "taskId": "ex-map-01", + "job": "map", + "surface": "get_codebase_metadata", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "framework", + "architecture", + "statistics" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-map-02", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "codebase intelligence", + "libraries actually used", + "patterns" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 1 + }, + { + "taskId": "ex-map-03", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "import aliases", + "tsconfig" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-map-04", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "patterns", + "libraries actually used", + "generated:" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-find-01", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "stateManagement" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-find-02", + "job": "find", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "preflight", + "bestExample", + "patterns" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 1 + }, + { + "taskId": "ex-find-03", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "test", + "framework" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-find-04", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "dependencyInjection" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-search-01", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-search-02", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + }, + { + "taskId": "ex-search-03", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 1 + }, + { + "taskId": "ex-search-04", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": 1, + "elapsedMs": 0 + } + ] }, "jCodeMunch": { "status": "setup_failed", @@ -16,7 +388,379 @@ "reason": "MCP error -32000: Connection closed" }, "raw Claude Code": { - "status": "setup_failed", - "reason": "raw Claude Code baseline requires the Claude Code CLI (claude) to be installed and authenticated. This is the manual-log-capture baseline — record as pending_evidence if claude CLI is unavailable." + "averageUsefulness": 0, + "averagePayloadBytes": 66.08333333333333, + "averageEstimatedTokens": 17.166666666666668, + "averageFirstRelevantHit": null, + "bestExampleUsefulnessRate": null, + "averageToolCallCount": null, + "averageElapsedMs": 8944.833333333334, + "status": "ok", + "taskResults": [ + { + "taskId": "as-map-01", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "libraries actually used", + "patterns", + "generated:" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 9148 + }, + { + "taskId": "as-map-02", + "job": "map", + "surface": "get_codebase_metadata", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "framework", + "architecture", + "statistics" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 9291 + }, + { + "taskId": "as-map-03", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "state", + "patterns", + "libraries actually used" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 9344 + }, + { + "taskId": "as-map-04", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "import aliases", + "tsconfig" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 8200 + }, + { + "taskId": "as-find-01", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "dependencyInjection" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 8438 + }, + { + "taskId": "as-find-02", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "stateManagement" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 8169 + }, + { + "taskId": "as-find-03", + "job": "find", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "preflight", + "bestExample", + "patterns" + ], + "payloadBytes": 70, + "estimatedTokens": 18, + "toolCallCount": null, + "elapsedMs": 7484 + }, + { + "taskId": "as-find-04", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "unitTestFramework", + "test" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 8266 + }, + { + "taskId": "as-search-01", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 8696 + }, + { + "taskId": "as-search-02", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 8139 + }, + { + "taskId": "as-search-03", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 95, + "estimatedTokens": 24, + "toolCallCount": null, + "elapsedMs": 15486 + }, + { + "taskId": "as-search-04", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 9048 + }, + { + "taskId": "ex-map-01", + "job": "map", + "surface": "get_codebase_metadata", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "framework", + "architecture", + "statistics" + ], + "payloadBytes": 75, + "estimatedTokens": 19, + "toolCallCount": null, + "elapsedMs": 8162 + }, + { + "taskId": "ex-map-02", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "codebase intelligence", + "libraries actually used", + "patterns" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 9241 + }, + { + "taskId": "ex-map-03", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "import aliases", + "tsconfig" + ], + "payloadBytes": 19, + "estimatedTokens": 5, + "toolCallCount": null, + "elapsedMs": 8360 + }, + { + "taskId": "ex-map-04", + "job": "map", + "surface": "codebase://context", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "patterns", + "libraries actually used", + "generated:" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 7935 + }, + { + "taskId": "ex-find-01", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "stateManagement" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 9621 + }, + { + "taskId": "ex-find-02", + "job": "find", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "preflight", + "bestExample", + "patterns" + ], + "payloadBytes": 75, + "estimatedTokens": 19, + "toolCallCount": null, + "elapsedMs": 8801 + }, + { + "taskId": "ex-find-03", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "test", + "framework" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 7509 + }, + { + "taskId": "ex-find-04", + "job": "find", + "surface": "get_team_patterns", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "dependencyInjection" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 7824 + }, + { + "taskId": "ex-search-01", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 8208 + }, + { + "taskId": "ex-search-02", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 77, + "estimatedTokens": 20, + "toolCallCount": null, + "elapsedMs": 9034 + }, + { + "taskId": "ex-search-03", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 65, + "estimatedTokens": 17, + "toolCallCount": null, + "elapsedMs": 10112 + }, + { + "taskId": "ex-search-04", + "job": "search", + "surface": "search_codebase", + "usefulnessScore": 0, + "matchedSignals": [], + "missingSignals": [ + "results", + "searchQuality" + ], + "payloadBytes": 70, + "estimatedTokens": 18, + "toolCallCount": null, + "elapsedMs": 10160 + } + ] } } \ No newline at end of file diff --git a/results/gate-evaluation.json b/results/gate-evaluation.json index d38f01d..22284f1 100644 --- a/results/gate-evaluation.json +++ b/results/gate-evaluation.json @@ -1,14 +1,14 @@ { "totalTasks": 24, "averageUsefulness": 0.75, - "averagePayloadBytes": 3613.6666666666665, - "averageEstimatedTokens": 903.7083333333334, + "averagePayloadBytes": 7287.625, + "averageEstimatedTokens": 1822.25, "searchTasks": 8, "findTasks": 8, "mapTasks": 8, "averageFirstRelevantHit": null, "bestExampleUsefulnessRate": 0.125, - "averageElapsedMs": 282.625, + "averageElapsedMs": 304.4166666666667, "averageToolCallCount": 1, "results": [ { @@ -25,9 +25,9 @@ "generated:" ], "forbiddenHits": [], - "payloadBytes": 8548, - "estimatedTokens": 2137, - "elapsedMs": 20, + "payloadBytes": 23720, + "estimatedTokens": 5930, + "elapsedMs": 39, "toolCallCount": 1 }, { @@ -45,7 +45,7 @@ "forbiddenHits": [], "payloadBytes": 5751, "estimatedTokens": 1438, - "elapsedMs": 26, + "elapsedMs": 28, "toolCallCount": 1 }, { @@ -62,9 +62,9 @@ "libraries actually used" ], "forbiddenHits": [], - "payloadBytes": 8548, - "estimatedTokens": 2137, - "elapsedMs": 6, + "payloadBytes": 23720, + "estimatedTokens": 5930, + "elapsedMs": 18, "toolCallCount": 1 }, { @@ -79,9 +79,9 @@ "tsconfig" ], "forbiddenHits": [], - "payloadBytes": 8548, - "estimatedTokens": 2137, - "elapsedMs": 4, + "payloadBytes": 23720, + "estimatedTokens": 5930, + "elapsedMs": 13, "toolCallCount": 1 }, { @@ -98,7 +98,7 @@ "payloadBytes": 1802, "estimatedTokens": 451, "bestExampleUseful": true, - "elapsedMs": 3, + "elapsedMs": 2, "toolCallCount": 1 }, { @@ -131,10 +131,10 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 4014, - "estimatedTokens": 1004, + "payloadBytes": 4960, + "estimatedTokens": 1240, "bestExampleUseful": false, - "elapsedMs": 1015, + "elapsedMs": 1106, "toolCallCount": 1 }, { @@ -167,9 +167,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 2843, - "estimatedTokens": 711, - "elapsedMs": 99, + "payloadBytes": 3695, + "estimatedTokens": 924, + "elapsedMs": 115, "toolCallCount": 1 }, { @@ -184,9 +184,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 3262, - "estimatedTokens": 816, - "elapsedMs": 158, + "payloadBytes": 4627, + "estimatedTokens": 1157, + "elapsedMs": 292, "toolCallCount": 1 }, { @@ -201,9 +201,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 3156, - "estimatedTokens": 789, - "elapsedMs": 286, + "payloadBytes": 3981, + "estimatedTokens": 996, + "elapsedMs": 292, "toolCallCount": 1 }, { @@ -218,9 +218,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 3136, - "estimatedTokens": 784, - "elapsedMs": 169, + "payloadBytes": 4402, + "estimatedTokens": 1101, + "elapsedMs": 193, "toolCallCount": 1 }, { @@ -239,7 +239,7 @@ "forbiddenHits": [], "payloadBytes": 4268, "estimatedTokens": 1067, - "elapsedMs": 66, + "elapsedMs": 67, "toolCallCount": 1 }, { @@ -256,9 +256,9 @@ "libraries actually used" ], "forbiddenHits": [], - "payloadBytes": 4711, - "estimatedTokens": 1178, - "elapsedMs": 10, + "payloadBytes": 15329, + "estimatedTokens": 3833, + "elapsedMs": 54, "toolCallCount": 1 }, { @@ -273,9 +273,9 @@ "tsconfig" ], "forbiddenHits": [], - "payloadBytes": 4711, - "estimatedTokens": 1178, - "elapsedMs": 14, + "payloadBytes": 15329, + "estimatedTokens": 3833, + "elapsedMs": 48, "toolCallCount": 1 }, { @@ -292,9 +292,9 @@ "generated:" ], "forbiddenHits": [], - "payloadBytes": 4711, - "estimatedTokens": 1178, - "elapsedMs": 8, + "payloadBytes": 15329, + "estimatedTokens": 3833, + "elapsedMs": 53, "toolCallCount": 1 }, { @@ -311,7 +311,7 @@ "payloadBytes": 298, "estimatedTokens": 75, "bestExampleUseful": false, - "elapsedMs": 4, + "elapsedMs": 3, "toolCallCount": 1 }, { @@ -328,10 +328,10 @@ "bestExample" ], "forbiddenHits": [], - "payloadBytes": 3593, - "estimatedTokens": 899, + "payloadBytes": 4570, + "estimatedTokens": 1143, "bestExampleUseful": false, - "elapsedMs": 884, + "elapsedMs": 921, "toolCallCount": 1 }, { @@ -349,7 +349,7 @@ "payloadBytes": 1615, "estimatedTokens": 404, "bestExampleUseful": false, - "elapsedMs": 3, + "elapsedMs": 4, "toolCallCount": 1 }, { @@ -381,9 +381,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 2861, - "estimatedTokens": 716, - "elapsedMs": 934, + "payloadBytes": 4033, + "estimatedTokens": 1009, + "elapsedMs": 903, "toolCallCount": 1 }, { @@ -398,9 +398,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 2459, - "estimatedTokens": 615, - "elapsedMs": 1254, + "payloadBytes": 3440, + "estimatedTokens": 860, + "elapsedMs": 1224, "toolCallCount": 1 }, { @@ -415,9 +415,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 2968, - "estimatedTokens": 742, - "elapsedMs": 1116, + "payloadBytes": 4391, + "estimatedTokens": 1098, + "elapsedMs": 1208, "toolCallCount": 1 }, { @@ -432,9 +432,9 @@ ], "missingSignals": [], "forbiddenHits": [], - "payloadBytes": 2609, - "estimatedTokens": 653, - "elapsedMs": 697, + "payloadBytes": 3607, + "estimatedTokens": 902, + "elapsedMs": 716, "toolCallCount": 1 } ], @@ -446,25 +446,25 @@ "status": "pending_evidence", "payloadMetric": "averageEstimatedTokens", "payloadMetricPassed": false, - "beatenUsefulnessMetrics": [], + "beatenUsefulnessMetrics": [ + "averageUsefulness" + ], "missingMetrics": [ - "averageEstimatedTokens", - "averageUsefulness", "averageFirstRelevantHit", "bestExampleUsefulnessRate" ], "comparisons": [ { "metric": "averageEstimatedTokens", - "comparatorValue": null, - "actualValue": 903.7083333333334, + "comparatorValue": 17.166666666666668, + "actualValue": 1822.25, "passes": false }, { "metric": "averageUsefulness", - "comparatorValue": null, + "comparatorValue": 0, "actualValue": 0.75, - "passes": false + "passes": true }, { "metric": "averageFirstRelevantHit", @@ -546,16 +546,15 @@ "status": "pending_evidence", "tolerancePercent": 15, "missingMetrics": [ - "averageUsefulness", "averageFirstRelevantHit", "bestExampleUsefulnessRate" ], "comparisons": [ { "metric": "averageUsefulness", - "comparatorValue": null, + "comparatorValue": 0, "actualValue": 0.75, - "passes": false + "passes": true }, { "metric": "averageFirstRelevantHit", diff --git a/src/tools/search-codebase.ts b/src/tools/search-codebase.ts index 7f4efaf..d3991e9 100644 --- a/src/tools/search-codebase.ts +++ b/src/tools/search-codebase.ts @@ -1045,6 +1045,48 @@ export async function handle( ...(rerankerStatus === 'unavailable' && { rerankerStatus: 'unavailable' }) }; + type SearchResponsePayload = { + status: 'success'; + searchQuality: typeof searchQualityBlock & { + tokenEstimate?: number; + warning?: string; + }; + budget: { mode: 'compact' | 'full'; resultCount: number }; + preflight?: typeof preflightPayload; + patternSummary?: string; + bestExample?: string; + nextHops?: Array<{ tool: string; why: string; args?: Record }>; + results: Array>; + totalResults?: number; + relatedMemories?: string[]; + }; + + function renderSearchPayloadText(payload: SearchResponsePayload): string { + const baseRenderedPayload = JSON.stringify(payload, null, 2); + const transportPayload = + process.platform === 'win32' + ? baseRenderedPayload.replace(/\n/g, '\r\n') + : baseRenderedPayload; + const tokenEstimate = Math.ceil(transportPayload.length / 4); + const warning = + tokenEstimate > 4000 + ? `Large search payload: estimated ${tokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.` + : undefined; + + return JSON.stringify( + { + ...payload, + searchQuality: { + ...searchQualityBlock, + ...(warning && { warning }), + tokenEstimate + } + }, + null, + 2 + ); + } + // Compact mode (default): bounded response with light graph context const isCompact = mode !== 'full'; @@ -1054,122 +1096,112 @@ export async function handle( const patternSummary = buildPatternSummary(); const bestExample = getBestExample(compactResults); const nextHops = buildNextHops(compactResults, searchQuality); + const payloadText = renderSearchPayloadText({ + status: 'success', + searchQuality: searchQualityBlock, + budget: { mode: 'compact', resultCount: compactResults.length }, + ...(preflightPayload && { preflight: preflightPayload }), + ...(patternSummary && { patternSummary }), + ...(bestExample && { bestExample }), + ...(nextHops.length > 0 && { nextHops }), + results: compactResults.map((r) => { + const importedByCount = getImportedByCount(r); + const topExports = getTopExports(r.filePath); + const scope = buildScopeHeader(r.metadata); + // First 3 lines of chunk content as a lightweight signature preview + const signaturePreview = r.snippet + ? r.snippet.replace(/^\r?\n+/, '').split('\n').slice(0, 3).join('\n').trim() || undefined + : undefined; + return { + file: `${r.filePath}:${r.startLine}-${r.endLine}`, + summary: r.summary, + score: Math.round(r.score * 100) / 100, + ...(r.relevanceReason && { relevanceReason: r.relevanceReason }), + ...(r.componentType && + r.layer && + r.layer !== 'unknown' && { type: `${r.componentType}:${r.layer}` }), + ...(r.trend && r.trend !== 'Stable' && { trend: r.trend }), + ...(r.patternWarning && { patternWarning: r.patternWarning }), + importedByCount, + ...(topExports.length > 0 && { topExports }), + ...(r.layer && r.layer !== 'unknown' && { layer: r.layer }), + // Structural metadata: surface AST intelligence already computed at index time + ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), + ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), + ...(scope && { scope }), + ...(signaturePreview && { signaturePreview }) + }; + }), + ...(strongMemories.length > 0 && { + relatedMemories: strongMemories.map((m) => `${m.memory} (${m.effectiveConfidence})`) + }) + }); return { content: [ { type: 'text', - text: JSON.stringify( - { - status: 'success', - searchQuality: searchQualityBlock, - budget: { mode: 'compact', resultCount: compactResults.length }, - ...(preflightPayload && { preflight: preflightPayload }), - ...(patternSummary && { patternSummary }), - ...(bestExample && { bestExample }), - ...(nextHops.length > 0 && { nextHops }), - results: compactResults.map((r) => { - const importedByCount = getImportedByCount(r); - const topExports = getTopExports(r.filePath); - const scope = buildScopeHeader(r.metadata); - // First 3 lines of chunk content as a lightweight signature preview - const signaturePreview = r.snippet - ? r.snippet - .replace(/^\r?\n+/, '') - .split('\n') - .slice(0, 3) - .join('\n') - .trim() || undefined - : undefined; - return { - file: `${r.filePath}:${r.startLine}-${r.endLine}`, - summary: r.summary, - score: Math.round(r.score * 100) / 100, - ...(r.relevanceReason && { relevanceReason: r.relevanceReason }), - ...(r.componentType && - r.layer && - r.layer !== 'unknown' && { type: `${r.componentType}:${r.layer}` }), - ...(r.trend && r.trend !== 'Stable' && { trend: r.trend }), - ...(r.patternWarning && { patternWarning: r.patternWarning }), - importedByCount, - ...(topExports.length > 0 && { topExports }), - ...(r.layer && r.layer !== 'unknown' && { layer: r.layer }), - // Structural metadata: surface AST intelligence already computed at index time - ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), - ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), - ...(scope && { scope }), - ...(signaturePreview && { signaturePreview }) - }; - }), - ...(strongMemories.length > 0 && { - relatedMemories: strongMemories.map((m) => `${m.memory} (${m.effectiveConfidence})`) - }) - }, - null, - 2 - ) + text: payloadText } ] }; } // Full mode: today's response shape + budget + relevanceReason; consumers removed + const payloadText = renderSearchPayloadText({ + status: 'success', + searchQuality: searchQualityBlock, + budget: { mode: 'full', resultCount: results.length }, + ...(preflightPayload && { preflight: preflightPayload }), + results: results.map((r) => { + const relationshipsAndHints = buildRelationshipHints(r); + const enrichedSnippet = includeSnippets + ? enrichSnippetWithScope(r.snippet, r.metadata, r.filePath, r.startLine) + : undefined; + const scope = buildScopeHeader(r.metadata); + // Chunk-level imports/exports (top 5 each) + complexity + const chunkImports = (r as unknown as { imports?: string[] }).imports?.slice(0, 5); + const chunkExports = (r as unknown as { exports?: string[] }).exports?.slice(0, 5); + + return { + file: `${r.filePath}:${r.startLine}-${r.endLine}`, + summary: r.summary, + score: Math.round(r.score * 100) / 100, + ...(r.relevanceReason && { relevanceReason: r.relevanceReason }), + ...(r.componentType && + r.layer && + r.layer !== 'unknown' && { type: `${r.componentType}:${r.layer}` }), + ...(r.trend && r.trend !== 'Stable' && { trend: r.trend }), + ...(r.patternWarning && { patternWarning: r.patternWarning }), + ...(relationshipsAndHints.relationships && { + relationships: relationshipsAndHints.relationships + }), + ...(relationshipsAndHints.hints && { hints: relationshipsAndHints.hints }), + ...(enrichedSnippet && { snippet: enrichedSnippet }), + // Structural metadata + ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), + ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), + ...(scope && { scope }), + ...(chunkImports && chunkImports.length > 0 && { imports: chunkImports }), + ...(chunkExports && chunkExports.length > 0 && { exports: chunkExports }), + ...(r.metadata?.cyclomaticComplexity && { + complexity: r.metadata.cyclomaticComplexity + }) + }; + }), + totalResults: results.length, + ...(relatedMemories.length > 0 && { + relatedMemories: relatedMemories + .slice(0, 3) + .map((m) => `${m.memory} (${m.effectiveConfidence})`) + }) + }); + return { content: [ { type: 'text', - text: JSON.stringify( - { - status: 'success', - searchQuality: searchQualityBlock, - budget: { mode: 'full', resultCount: results.length }, - ...(preflightPayload && { preflight: preflightPayload }), - results: results.map((r) => { - const relationshipsAndHints = buildRelationshipHints(r); - const enrichedSnippet = includeSnippets - ? enrichSnippetWithScope(r.snippet, r.metadata, r.filePath, r.startLine) - : undefined; - const scope = buildScopeHeader(r.metadata); - // Chunk-level imports/exports (top 5 each) + complexity - const chunkImports = (r as unknown as { imports?: string[] }).imports?.slice(0, 5); - const chunkExports = (r as unknown as { exports?: string[] }).exports?.slice(0, 5); - - return { - file: `${r.filePath}:${r.startLine}-${r.endLine}`, - summary: r.summary, - score: Math.round(r.score * 100) / 100, - ...(r.relevanceReason && { relevanceReason: r.relevanceReason }), - ...(r.componentType && - r.layer && - r.layer !== 'unknown' && { type: `${r.componentType}:${r.layer}` }), - ...(r.trend && r.trend !== 'Stable' && { trend: r.trend }), - ...(r.patternWarning && { patternWarning: r.patternWarning }), - ...(relationshipsAndHints.relationships && { - relationships: relationshipsAndHints.relationships - }), - ...(relationshipsAndHints.hints && { hints: relationshipsAndHints.hints }), - ...(enrichedSnippet && { snippet: enrichedSnippet }), - // Structural metadata - ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), - ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), - ...(scope && { scope }), - ...(chunkImports && chunkImports.length > 0 && { imports: chunkImports }), - ...(chunkExports && chunkExports.length > 0 && { exports: chunkExports }), - ...(r.metadata?.cyclomaticComplexity && { - complexity: r.metadata.cyclomaticComplexity - }) - }; - }), - totalResults: results.length, - ...(relatedMemories.length > 0 && { - relatedMemories: relatedMemories - .slice(0, 3) - .map((m) => `${m.memory} (${m.effectiveConfidence})`) - }) - }, - null, - 2 - ) + text: payloadText } ] }; diff --git a/tests/search-compact-mode.test.ts b/tests/search-compact-mode.test.ts index f551d1a..25679d3 100644 --- a/tests/search-compact-mode.test.ts +++ b/tests/search-compact-mode.test.ts @@ -326,6 +326,43 @@ describe('search_codebase compact/full mode', () => { expect(payload.nextHops?.length ?? 0).toBeGreaterThan(0); }); + it('adds an exact tokenEstimate advisory to compact responses', async () => { + searchMocks.search.mockResolvedValueOnce([makeResult()]); + + const { server } = await import('../src/index.js'); + const handler = ( + server as { + _requestHandlers?: Map< + string, + (r: unknown) => Promise<{ content: Array<{ type: string; text: string }> }> + >; + } + )._requestHandlers?.get('tools/call'); + if (!handler) throw new Error('Expected tools/call handler'); + + const response = await handler({ + jsonrpc: '2.0', + id: 1, + method: 'tools/call', + params: { name: 'search_codebase', arguments: { query: 'auth service' } } + }); + + const payload = JSON.parse(response.content[0].text) as { + searchQuality: { + status: string; + confidence: string; + tokenEstimate: number; + warning?: string; + hint?: string; + rerankerStatus?: string; + }; + [key: string]: unknown; + }; + + expect(payload.searchQuality.tokenEstimate).toBeGreaterThan(0); + expect(payload.searchQuality.warning).toBeUndefined(); + }); + // Test 5: Full mode returns hints arrays and all memories + budget it('full mode returns hints object with callers/tests and budget metadata', async () => { searchMocks.search.mockResolvedValueOnce([makeResult()]); @@ -362,6 +399,55 @@ describe('search_codebase compact/full mode', () => { expect(Array.isArray(hints.callers)).toBe(true); }); + it('adds a warning only when the final full payload exceeds the compact budget threshold', async () => { + const oversizedSummary = 'Token-heavy summary '.repeat(1200); + const oversizedSnippet = 'const token = authService.getToken();\n'.repeat(600); + searchMocks.search.mockResolvedValueOnce([ + makeResult({ + summary: oversizedSummary, + snippet: oversizedSnippet + }) + ]); + + const { server } = await import('../src/index.js'); + const handler = ( + server as { + _requestHandlers?: Map< + string, + (r: unknown) => Promise<{ content: Array<{ type: string; text: string }> }> + >; + } + )._requestHandlers?.get('tools/call'); + if (!handler) throw new Error('Expected tools/call handler'); + + const response = await handler({ + jsonrpc: '2.0', + id: 1, + method: 'tools/call', + params: { + name: 'search_codebase', + arguments: { query: 'auth service', mode: 'full', includeSnippets: true } + } + }); + + const payload = JSON.parse(response.content[0].text) as { + searchQuality: { + status: string; + confidence: string; + tokenEstimate: number; + warning?: string; + hint?: string; + rerankerStatus?: string; + }; + [key: string]: unknown; + }; + + expect(payload.searchQuality.tokenEstimate).toBeGreaterThan(4000); + expect(payload.searchQuality.warning).toContain( + `estimated ${payload.searchQuality.tokenEstimate} tokens` + ); + }); + // Test 6: relevanceReason appears in results in both modes it('relevanceReason is included in results for both compact and full modes', async () => { searchMocks.search.mockResolvedValueOnce([ From 2df53997dfd43bfa306c99d562a6608afedb5411 Mon Sep 17 00:00:00 2001 From: PatrickSys Date: Mon, 13 Apr 2026 19:17:26 +0200 Subject: [PATCH 2/6] fix: finalize v2.1.0 token budget advisory --- CHANGELOG.md | 4 +- results/comparator-evidence.json | 126 +++++++++++++++--------------- results/gate-evaluation.json | 38 ++++----- src/index.ts | 49 +++++++++++- src/tools/search-codebase.ts | 65 +++++++++------ tests/search-compact-mode.test.ts | 3 +- 6 files changed, 175 insertions(+), 110 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0be5b07..248f1bb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,8 +18,8 @@ ### Documentation - publish the v2.1.0 discovery benchmark rerun with the current gate output: `pending_evidence`, `claimAllowed: false`, `24` frozen tasks, `0.75` average usefulness, and `1822.25` average estimated tokens -- document the current comparator truth instead of stale assumptions: the public proof still has no real comparator lane data on this host, so benchmark win claims remain blocked -- note the new `searchQuality.tokenEstimate` advisory contract: estimates are based on the pre-advisory response payload and warnings only appear above the 4K-token threshold +- document the current comparator truth instead of stale assumptions: the public proof still has setup failures plus near-empty comparator outputs on this host, so benchmark win claims remain blocked +- note the new `searchQuality.tokenEstimate` advisory contract: estimates are based on the final serialized response payload and warnings only appear above the 4K-token threshold ### Features diff --git a/results/comparator-evidence.json b/results/comparator-evidence.json index bcb8c16..ac47bfe 100644 --- a/results/comparator-evidence.json +++ b/results/comparator-evidence.json @@ -6,7 +6,7 @@ "averageFirstRelevantHit": null, "bestExampleUsefulnessRate": null, "averageToolCallCount": 1, - "averageElapsedMs": 0.375, + "averageElapsedMs": 0.3333333333333333, "status": "ok", "taskResults": [ { @@ -39,7 +39,7 @@ "payloadBytes": 19, "estimatedTokens": 5, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 1 }, { "taskId": "as-map-03", @@ -55,7 +55,7 @@ "payloadBytes": 19, "estimatedTokens": 5, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 0 }, { "taskId": "as-map-04", @@ -129,7 +129,7 @@ "payloadBytes": 19, "estimatedTokens": 5, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 1 }, { "taskId": "as-search-01", @@ -144,7 +144,7 @@ "payloadBytes": 19, "estimatedTokens": 5, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 0 }, { "taskId": "as-search-02", @@ -189,7 +189,7 @@ "payloadBytes": 19, "estimatedTokens": 5, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 0 }, { "taskId": "ex-map-01", @@ -282,7 +282,7 @@ "payloadBytes": 19, "estimatedTokens": 5, "toolCallCount": 1, - "elapsedMs": 1 + "elapsedMs": 0 }, { "taskId": "ex-find-03", @@ -297,7 +297,7 @@ "payloadBytes": 19, "estimatedTokens": 5, "toolCallCount": 1, - "elapsedMs": 0 + "elapsedMs": 1 }, { "taskId": "ex-find-04", @@ -389,12 +389,12 @@ }, "raw Claude Code": { "averageUsefulness": 0, - "averagePayloadBytes": 66.08333333333333, - "averageEstimatedTokens": 17.166666666666668, + "averagePayloadBytes": 71.54166666666667, + "averageEstimatedTokens": 18.5, "averageFirstRelevantHit": null, "bestExampleUsefulnessRate": null, "averageToolCallCount": null, - "averageElapsedMs": 8944.833333333334, + "averageElapsedMs": 9590.208333333334, "status": "ok", "taskResults": [ { @@ -411,7 +411,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 9148 + "elapsedMs": 12461 }, { "taskId": "as-map-02", @@ -427,7 +427,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 9291 + "elapsedMs": 9390 }, { "taskId": "as-map-03", @@ -443,7 +443,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 9344 + "elapsedMs": 9836 }, { "taskId": "as-map-04", @@ -458,7 +458,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 8200 + "elapsedMs": 10098 }, { "taskId": "as-find-01", @@ -469,10 +469,10 @@ "missingSignals": [ "dependencyInjection" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 70, + "estimatedTokens": 18, "toolCallCount": null, - "elapsedMs": 8438 + "elapsedMs": 8937 }, { "taskId": "as-find-02", @@ -483,10 +483,10 @@ "missingSignals": [ "stateManagement" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 75, + "estimatedTokens": 19, "toolCallCount": null, - "elapsedMs": 8169 + "elapsedMs": 8747 }, { "taskId": "as-find-03", @@ -499,10 +499,10 @@ "bestExample", "patterns" ], - "payloadBytes": 70, - "estimatedTokens": 18, + "payloadBytes": 65, + "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 7484 + "elapsedMs": 8747 }, { "taskId": "as-find-04", @@ -517,7 +517,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 8266 + "elapsedMs": 9351 }, { "taskId": "as-search-01", @@ -529,10 +529,10 @@ "results", "searchQuality" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 73, + "estimatedTokens": 19, "toolCallCount": null, - "elapsedMs": 8696 + "elapsedMs": 9376 }, { "taskId": "as-search-02", @@ -544,10 +544,10 @@ "results", "searchQuality" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 70, + "estimatedTokens": 18, "toolCallCount": null, - "elapsedMs": 8139 + "elapsedMs": 9891 }, { "taskId": "as-search-03", @@ -559,10 +559,10 @@ "results", "searchQuality" ], - "payloadBytes": 95, - "estimatedTokens": 24, + "payloadBytes": 65, + "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 15486 + "elapsedMs": 11377 }, { "taskId": "as-search-04", @@ -577,7 +577,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 9048 + "elapsedMs": 8972 }, { "taskId": "ex-map-01", @@ -590,10 +590,10 @@ "architecture", "statistics" ], - "payloadBytes": 75, - "estimatedTokens": 19, + "payloadBytes": 65, + "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 8162 + "elapsedMs": 10195 }, { "taskId": "ex-map-02", @@ -609,7 +609,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 9241 + "elapsedMs": 8753 }, { "taskId": "ex-map-03", @@ -621,10 +621,10 @@ "import aliases", "tsconfig" ], - "payloadBytes": 19, - "estimatedTokens": 5, + "payloadBytes": 71, + "estimatedTokens": 18, "toolCallCount": null, - "elapsedMs": 8360 + "elapsedMs": 8860 }, { "taskId": "ex-map-04", @@ -637,10 +637,10 @@ "libraries actually used", "generated:" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 75, + "estimatedTokens": 19, "toolCallCount": null, - "elapsedMs": 7935 + "elapsedMs": 8623 }, { "taskId": "ex-find-01", @@ -651,10 +651,10 @@ "missingSignals": [ "stateManagement" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 150, + "estimatedTokens": 38, "toolCallCount": null, - "elapsedMs": 9621 + "elapsedMs": 12098 }, { "taskId": "ex-find-02", @@ -667,10 +667,10 @@ "bestExample", "patterns" ], - "payloadBytes": 75, - "estimatedTokens": 19, + "payloadBytes": 65, + "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 8801 + "elapsedMs": 8783 }, { "taskId": "ex-find-03", @@ -685,7 +685,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 7509 + "elapsedMs": 8785 }, { "taskId": "ex-find-04", @@ -696,10 +696,10 @@ "missingSignals": [ "dependencyInjection" ], - "payloadBytes": 65, - "estimatedTokens": 17, + "payloadBytes": 83, + "estimatedTokens": 21, "toolCallCount": null, - "elapsedMs": 7824 + "elapsedMs": 8912 }, { "taskId": "ex-search-01", @@ -714,7 +714,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 8208 + "elapsedMs": 8043 }, { "taskId": "ex-search-02", @@ -726,10 +726,10 @@ "results", "searchQuality" ], - "payloadBytes": 77, - "estimatedTokens": 20, + "payloadBytes": 75, + "estimatedTokens": 19, "toolCallCount": null, - "elapsedMs": 9034 + "elapsedMs": 8755 }, { "taskId": "ex-search-03", @@ -744,7 +744,7 @@ "payloadBytes": 65, "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 10112 + "elapsedMs": 12373 }, { "taskId": "ex-search-04", @@ -756,10 +756,10 @@ "results", "searchQuality" ], - "payloadBytes": 70, - "estimatedTokens": 18, + "payloadBytes": 65, + "estimatedTokens": 17, "toolCallCount": null, - "elapsedMs": 10160 + "elapsedMs": 8802 } ] } diff --git a/results/gate-evaluation.json b/results/gate-evaluation.json index 22284f1..d3e5788 100644 --- a/results/gate-evaluation.json +++ b/results/gate-evaluation.json @@ -8,7 +8,7 @@ "mapTasks": 8, "averageFirstRelevantHit": null, "bestExampleUsefulnessRate": 0.125, - "averageElapsedMs": 304.4166666666667, + "averageElapsedMs": 546.75, "averageToolCallCount": 1, "results": [ { @@ -27,7 +27,7 @@ "forbiddenHits": [], "payloadBytes": 23720, "estimatedTokens": 5930, - "elapsedMs": 39, + "elapsedMs": 74, "toolCallCount": 1 }, { @@ -45,7 +45,7 @@ "forbiddenHits": [], "payloadBytes": 5751, "estimatedTokens": 1438, - "elapsedMs": 28, + "elapsedMs": 29, "toolCallCount": 1 }, { @@ -98,7 +98,7 @@ "payloadBytes": 1802, "estimatedTokens": 451, "bestExampleUseful": true, - "elapsedMs": 2, + "elapsedMs": 4, "toolCallCount": 1 }, { @@ -134,7 +134,7 @@ "payloadBytes": 4960, "estimatedTokens": 1240, "bestExampleUseful": false, - "elapsedMs": 1106, + "elapsedMs": 6310, "toolCallCount": 1 }, { @@ -169,7 +169,7 @@ "forbiddenHits": [], "payloadBytes": 3695, "estimatedTokens": 924, - "elapsedMs": 115, + "elapsedMs": 130, "toolCallCount": 1 }, { @@ -186,7 +186,7 @@ "forbiddenHits": [], "payloadBytes": 4627, "estimatedTokens": 1157, - "elapsedMs": 292, + "elapsedMs": 378, "toolCallCount": 1 }, { @@ -203,7 +203,7 @@ "forbiddenHits": [], "payloadBytes": 3981, "estimatedTokens": 996, - "elapsedMs": 292, + "elapsedMs": 303, "toolCallCount": 1 }, { @@ -220,7 +220,7 @@ "forbiddenHits": [], "payloadBytes": 4402, "estimatedTokens": 1101, - "elapsedMs": 193, + "elapsedMs": 187, "toolCallCount": 1 }, { @@ -239,7 +239,7 @@ "forbiddenHits": [], "payloadBytes": 4268, "estimatedTokens": 1067, - "elapsedMs": 67, + "elapsedMs": 148, "toolCallCount": 1 }, { @@ -258,7 +258,7 @@ "forbiddenHits": [], "payloadBytes": 15329, "estimatedTokens": 3833, - "elapsedMs": 54, + "elapsedMs": 63, "toolCallCount": 1 }, { @@ -275,7 +275,7 @@ "forbiddenHits": [], "payloadBytes": 15329, "estimatedTokens": 3833, - "elapsedMs": 48, + "elapsedMs": 52, "toolCallCount": 1 }, { @@ -294,7 +294,7 @@ "forbiddenHits": [], "payloadBytes": 15329, "estimatedTokens": 3833, - "elapsedMs": 53, + "elapsedMs": 48, "toolCallCount": 1 }, { @@ -331,7 +331,7 @@ "payloadBytes": 4570, "estimatedTokens": 1143, "bestExampleUseful": false, - "elapsedMs": 921, + "elapsedMs": 1018, "toolCallCount": 1 }, { @@ -383,7 +383,7 @@ "forbiddenHits": [], "payloadBytes": 4033, "estimatedTokens": 1009, - "elapsedMs": 903, + "elapsedMs": 920, "toolCallCount": 1 }, { @@ -400,7 +400,7 @@ "forbiddenHits": [], "payloadBytes": 3440, "estimatedTokens": 860, - "elapsedMs": 1224, + "elapsedMs": 1369, "toolCallCount": 1 }, { @@ -417,7 +417,7 @@ "forbiddenHits": [], "payloadBytes": 4391, "estimatedTokens": 1098, - "elapsedMs": 1208, + "elapsedMs": 1269, "toolCallCount": 1 }, { @@ -434,7 +434,7 @@ "forbiddenHits": [], "payloadBytes": 3607, "estimatedTokens": 902, - "elapsedMs": 716, + "elapsedMs": 775, "toolCallCount": 1 } ], @@ -456,7 +456,7 @@ "comparisons": [ { "metric": "averageEstimatedTokens", - "comparatorValue": 17.166666666666668, + "comparatorValue": 18.5, "actualValue": 1822.25, "passes": false }, diff --git a/src/index.ts b/src/index.ts index 8bc2d38..24b3987 100644 --- a/src/index.ts +++ b/src/index.ts @@ -119,6 +119,48 @@ type ProjectResolution = | { ok: true; project: ProjectState } | { ok: false; response: ToolResponse }; +function isPlainRecord(value: unknown): value is Record { + return typeof value === 'object' && value !== null && !Array.isArray(value); +} + +function finalizeJsonTextPayload(payload: Record): string { + if (!isPlainRecord(payload.searchQuality)) { + return JSON.stringify(payload); + } + + let tokenEstimate = + typeof payload.searchQuality.tokenEstimate === 'number' ? payload.searchQuality.tokenEstimate : 0; + let warning = + typeof payload.searchQuality.warning === 'string' ? payload.searchQuality.warning : undefined; + let renderedPayload = ''; + + for (let attempt = 0; attempt < 5; attempt += 1) { + renderedPayload = JSON.stringify({ + ...payload, + searchQuality: { + ...payload.searchQuality, + ...(warning ? { warning } : {}), + tokenEstimate + } + }); + + const nextTokenEstimate = Math.ceil(renderedPayload.length / 4); + const nextWarning = + nextTokenEstimate > 4000 + ? `Large search payload: estimated ${nextTokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.` + : undefined; + + if (nextTokenEstimate === tokenEstimate && nextWarning === warning) { + return renderedPayload; + } + + tokenEstimate = nextTokenEstimate; + warning = nextWarning; + } + + return renderedPayload; +} + function registerKnownRoot(rootPath: string): string { const resolvedRootPath = path.resolve(rootPath); knownRoots.set(normalizeRootKey(resolvedRootPath), { rootPath: resolvedRootPath }); @@ -941,7 +983,7 @@ export function registerHandlers(target: Server): void { const parsed = JSON.parse(result.content[0].text); result.content[0] = { type: 'text', - text: JSON.stringify({ + text: finalizeJsonTextPayload({ ...parsed, index: indexSignal, project: buildProjectDescriptor(project.rootPath) @@ -955,7 +997,10 @@ export function registerHandlers(target: Server): void { const parsed = JSON.parse(result.content[0].text); result.content[0] = { type: 'text', - text: JSON.stringify({ ...parsed, project: buildProjectDescriptor(project.rootPath) }) + text: finalizeJsonTextPayload({ + ...parsed, + project: buildProjectDescriptor(project.rootPath) + }) }; } catch { /* response wasn't JSON, skip injection */ diff --git a/src/tools/search-codebase.ts b/src/tools/search-codebase.ts index d3991e9..95e755a 100644 --- a/src/tools/search-codebase.ts +++ b/src/tools/search-codebase.ts @@ -1062,29 +1062,43 @@ export async function handle( }; function renderSearchPayloadText(payload: SearchResponsePayload): string { - const baseRenderedPayload = JSON.stringify(payload, null, 2); - const transportPayload = - process.platform === 'win32' - ? baseRenderedPayload.replace(/\n/g, '\r\n') - : baseRenderedPayload; - const tokenEstimate = Math.ceil(transportPayload.length / 4); - const warning = - tokenEstimate > 4000 - ? `Large search payload: estimated ${tokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.` - : undefined; + let tokenEstimate = 0; + let warning: string | undefined; + let renderedPayload = ''; - return JSON.stringify( - { - ...payload, - searchQuality: { - ...searchQualityBlock, - ...(warning && { warning }), - tokenEstimate - } - }, - null, - 2 - ); + for (let attempt = 0; attempt < 5; attempt += 1) { + renderedPayload = JSON.stringify( + { + ...payload, + searchQuality: { + ...searchQualityBlock, + ...(warning && { warning }), + tokenEstimate + } + }, + null, + 2 + ); + + const estimatedTransportPayload = + process.platform === 'win32' + ? renderedPayload.replace(/\n/g, '\r\n') + : renderedPayload; + const nextTokenEstimate = Math.ceil(estimatedTransportPayload.length / 4); + const nextWarning = + nextTokenEstimate > 4000 + ? `Large search payload: estimated ${nextTokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.` + : undefined; + + if (nextTokenEstimate === tokenEstimate && nextWarning === warning) { + return renderedPayload; + } + + tokenEstimate = nextTokenEstimate; + warning = nextWarning; + } + + return renderedPayload; } // Compact mode (default): bounded response with light graph context @@ -1110,7 +1124,12 @@ export async function handle( const scope = buildScopeHeader(r.metadata); // First 3 lines of chunk content as a lightweight signature preview const signaturePreview = r.snippet - ? r.snippet.replace(/^\r?\n+/, '').split('\n').slice(0, 3).join('\n').trim() || undefined + ? r.snippet + .replace(/^\r?\n+/, '') + .split('\n') + .slice(0, 3) + .join('\n') + .trim() || undefined : undefined; return { file: `${r.filePath}:${r.startLine}-${r.endLine}`, diff --git a/tests/search-compact-mode.test.ts b/tests/search-compact-mode.test.ts index 25679d3..d833129 100644 --- a/tests/search-compact-mode.test.ts +++ b/tests/search-compact-mode.test.ts @@ -359,7 +359,7 @@ describe('search_codebase compact/full mode', () => { [key: string]: unknown; }; - expect(payload.searchQuality.tokenEstimate).toBeGreaterThan(0); + expect(payload.searchQuality.tokenEstimate).toBe(Math.ceil(response.content[0].text.length / 4)); expect(payload.searchQuality.warning).toBeUndefined(); }); @@ -442,6 +442,7 @@ describe('search_codebase compact/full mode', () => { [key: string]: unknown; }; + expect(payload.searchQuality.tokenEstimate).toBe(Math.ceil(response.content[0].text.length / 4)); expect(payload.searchQuality.tokenEstimate).toBeGreaterThan(4000); expect(payload.searchQuality.warning).toContain( `estimated ${payload.searchQuality.tokenEstimate} tokens` From 88aa0bdc5ef189632d08d5e0823963981697f373 Mon Sep 17 00:00:00 2001 From: PatrickSys Date: Mon, 13 Apr 2026 19:18:09 +0200 Subject: [PATCH 3/6] style: format token budget payload handling --- src/index.ts | 4 +++- src/tools/search-codebase.ts | 4 +--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/index.ts b/src/index.ts index 24b3987..8429e79 100644 --- a/src/index.ts +++ b/src/index.ts @@ -129,7 +129,9 @@ function finalizeJsonTextPayload(payload: Record): string { } let tokenEstimate = - typeof payload.searchQuality.tokenEstimate === 'number' ? payload.searchQuality.tokenEstimate : 0; + typeof payload.searchQuality.tokenEstimate === 'number' + ? payload.searchQuality.tokenEstimate + : 0; let warning = typeof payload.searchQuality.warning === 'string' ? payload.searchQuality.warning : undefined; let renderedPayload = ''; diff --git a/src/tools/search-codebase.ts b/src/tools/search-codebase.ts index 95e755a..241e167 100644 --- a/src/tools/search-codebase.ts +++ b/src/tools/search-codebase.ts @@ -1081,9 +1081,7 @@ export async function handle( ); const estimatedTransportPayload = - process.platform === 'win32' - ? renderedPayload.replace(/\n/g, '\r\n') - : renderedPayload; + process.platform === 'win32' ? renderedPayload.replace(/\n/g, '\r\n') : renderedPayload; const nextTokenEstimate = Math.ceil(estimatedTransportPayload.length / 4); const nextWarning = nextTokenEstimate > 4000 From 396dd6687ae3ad33d1d6ca18ef6205ec836b5da4 Mon Sep 17 00:00:00 2001 From: PatrickSys Date: Mon, 13 Apr 2026 21:57:33 +0200 Subject: [PATCH 4/6] fix: resolve PR #98 review blockers --- CHANGELOG.md | 19 +++----- src/index.ts | 49 +++++---------------- src/tools/search-codebase.ts | 53 +++++------------------ src/tools/search-payload-budget.ts | 69 ++++++++++++++++++++++++++++++ tests/search-compact-mode.test.ts | 44 ++++++++++++++++++- 5 files changed, 138 insertions(+), 96 deletions(-) create mode 100644 src/tools/search-payload-budget.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 248f1bb..b48a7fd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,37 +2,28 @@ ## Unreleased -## [2.1.0](https://github.com/PatrickSys/codebase-context/compare/v2.0.0...v2.1.0) (2026-04-13) +## [2.1.0](https://github.com/PatrickSys/codebase-context/compare/v1.9.0...v2.1.0) (2026-04-13) ### Features - **search:** surface chunk intelligence directly in `search_codebase` results, including symbol identity, scope, signature preview, and compact/full response budgeting - **map:** upgrade the conventions map with structural skeleton sections and add `map --export` so the compact map can be written to `CODEBASE_MAP.md` +- **mcp:** rework multi-project routing so one MCP server can serve multiple projects instead of one hardcoded server entry per repo +- **mcp:** keep explicit `project` as the fallback when the client does not provide enough project context +- **mcp:** accept repo paths, subproject paths, and file paths as `project` selectors when routing is ambiguous ### Bug Fixes - **metadata:** require real dependency evidence plus multiple framework indicators before labeling a repo as Next.js or another specialized framework - **reranker:** auto-heal corrupted cross-encoder cache entries and surface degraded reranker state in `searchQuality.rerankerStatus` - **benchmarks:** harden comparator lanes for cross-platform execution and keep setup failures explicit instead of silently turning them into claims +- **search:** auto-heal on corrupted index now triggers a background rebuild instead of blocking the search response ### Documentation - publish the v2.1.0 discovery benchmark rerun with the current gate output: `pending_evidence`, `claimAllowed: false`, `24` frozen tasks, `0.75` average usefulness, and `1822.25` average estimated tokens - document the current comparator truth instead of stale assumptions: the public proof still has setup failures plus near-empty comparator outputs on this host, so benchmark win claims remain blocked - note the new `searchQuality.tokenEstimate` advisory contract: estimates are based on the final serialized response payload and warnings only appear above the 4K-token threshold - -### Features - -- **mcp:** rework multi-project routing so one MCP server can serve multiple projects instead of one hardcoded server entry per repo -- **mcp:** keep explicit `project` as the fallback when the client does not provide enough project context -- **mcp:** accept repo paths, subproject paths, and file paths as `project` selectors when routing is ambiguous - -### Bug Fixes - -- **search:** auto-heal on corrupted index now triggers a background rebuild instead of blocking the search response - -### Documentation - - simplify the setup story around a roots-first contract: roots-capable multi-project sessions, single-project fallback, and explicit `project` retries - clarify that issue #63 fixed the architecture and workspace-aware workflow, but issue #2 is still only partially solved when the client does not provide roots or active-project context - remove the repo-local `init` / marker-file story from the public setup guidance diff --git a/src/index.ts b/src/index.ts index 8429e79..a4d7c73 100644 --- a/src/index.ts +++ b/src/index.ts @@ -50,6 +50,7 @@ import { } from './utils/project-discovery.js'; import { readIndexMeta, validateIndexArtifacts } from './core/index-meta.js'; import { TOOLS, dispatchTool, type ToolContext, type ToolResponse } from './tools/index.js'; +import { finalizeSearchPayloadText } from './tools/search-payload-budget.js'; import type { ProjectDescriptor, ToolPaths } from './tools/types.js'; import { getOrCreateProject, @@ -119,48 +120,20 @@ type ProjectResolution = | { ok: true; project: ProjectState } | { ok: false; response: ToolResponse }; -function isPlainRecord(value: unknown): value is Record { - return typeof value === 'object' && value !== null && !Array.isArray(value); -} - function finalizeJsonTextPayload(payload: Record): string { - if (!isPlainRecord(payload.searchQuality)) { + const mode = + typeof payload.budget === 'object' && + payload.budget !== null && + 'mode' in payload.budget && + (payload.budget.mode === 'compact' || payload.budget.mode === 'full') + ? payload.budget.mode + : undefined; + + if (!mode) { return JSON.stringify(payload); } - let tokenEstimate = - typeof payload.searchQuality.tokenEstimate === 'number' - ? payload.searchQuality.tokenEstimate - : 0; - let warning = - typeof payload.searchQuality.warning === 'string' ? payload.searchQuality.warning : undefined; - let renderedPayload = ''; - - for (let attempt = 0; attempt < 5; attempt += 1) { - renderedPayload = JSON.stringify({ - ...payload, - searchQuality: { - ...payload.searchQuality, - ...(warning ? { warning } : {}), - tokenEstimate - } - }); - - const nextTokenEstimate = Math.ceil(renderedPayload.length / 4); - const nextWarning = - nextTokenEstimate > 4000 - ? `Large search payload: estimated ${nextTokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.` - : undefined; - - if (nextTokenEstimate === tokenEstimate && nextWarning === warning) { - return renderedPayload; - } - - tokenEstimate = nextTokenEstimate; - warning = nextWarning; - } - - return renderedPayload; + return finalizeSearchPayloadText(payload, { mode }); } function registerKnownRoot(rootPath: string): string { diff --git a/src/tools/search-codebase.ts b/src/tools/search-codebase.ts index 241e167..af3bd50 100644 --- a/src/tools/search-codebase.ts +++ b/src/tools/search-codebase.ts @@ -26,6 +26,7 @@ import type { MemoryWithConfidence } from '../memory/store.js'; import { InternalFileGraph } from '../utils/usage-tracker.js'; import type { FileExport } from '../utils/usage-tracker.js'; import { RELATIONSHIPS_FILENAME } from '../constants/codebase-context.js'; +import { finalizeSearchPayloadText } from './search-payload-budget.js'; // Stop words for compact-mode memory relevance filter (mirrors QUERY_STOP_WORDS in search.ts) const COMPACT_STOP_WORDS = new Set([ @@ -1061,44 +1062,6 @@ export async function handle( relatedMemories?: string[]; }; - function renderSearchPayloadText(payload: SearchResponsePayload): string { - let tokenEstimate = 0; - let warning: string | undefined; - let renderedPayload = ''; - - for (let attempt = 0; attempt < 5; attempt += 1) { - renderedPayload = JSON.stringify( - { - ...payload, - searchQuality: { - ...searchQualityBlock, - ...(warning && { warning }), - tokenEstimate - } - }, - null, - 2 - ); - - const estimatedTransportPayload = - process.platform === 'win32' ? renderedPayload.replace(/\n/g, '\r\n') : renderedPayload; - const nextTokenEstimate = Math.ceil(estimatedTransportPayload.length / 4); - const nextWarning = - nextTokenEstimate > 4000 - ? `Large search payload: estimated ${nextTokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.` - : undefined; - - if (nextTokenEstimate === tokenEstimate && nextWarning === warning) { - return renderedPayload; - } - - tokenEstimate = nextTokenEstimate; - warning = nextWarning; - } - - return renderedPayload; - } - // Compact mode (default): bounded response with light graph context const isCompact = mode !== 'full'; @@ -1108,7 +1071,8 @@ export async function handle( const patternSummary = buildPatternSummary(); const bestExample = getBestExample(compactResults); const nextHops = buildNextHops(compactResults, searchQuality); - const payloadText = renderSearchPayloadText({ + const payloadText = finalizeSearchPayloadText( + { status: 'success', searchQuality: searchQualityBlock, budget: { mode: 'compact', resultCount: compactResults.length }, @@ -1152,7 +1116,9 @@ export async function handle( ...(strongMemories.length > 0 && { relatedMemories: strongMemories.map((m) => `${m.memory} (${m.effectiveConfidence})`) }) - }); + }, + { mode: 'compact', pretty: true, transportAware: true } + ); return { content: [ @@ -1165,7 +1131,8 @@ export async function handle( } // Full mode: today's response shape + budget + relevanceReason; consumers removed - const payloadText = renderSearchPayloadText({ + const payloadText = finalizeSearchPayloadText( + { status: 'success', searchQuality: searchQualityBlock, budget: { mode: 'full', resultCount: results.length }, @@ -1212,7 +1179,9 @@ export async function handle( .slice(0, 3) .map((m) => `${m.memory} (${m.effectiveConfidence})`) }) - }); + }, + { mode: 'full', pretty: true, transportAware: true } + ); return { content: [ diff --git a/src/tools/search-payload-budget.ts b/src/tools/search-payload-budget.ts new file mode 100644 index 0000000..74ee28a --- /dev/null +++ b/src/tools/search-payload-budget.ts @@ -0,0 +1,69 @@ +type SearchPayloadMode = 'compact' | 'full'; + +function isPlainRecord(value: unknown): value is Record { + return typeof value === 'object' && value !== null && !Array.isArray(value); +} + +function buildWarning(tokenEstimate: number, mode: SearchPayloadMode): string | undefined { + if (tokenEstimate <= 4000) { + return undefined; + } + + if (mode === 'compact') { + return `Large search payload: estimated ${tokenEstimate} tokens. Try tighter filters (e.g. layer=, language=) to reduce payload size.`; + } + + return `Large search payload: estimated ${tokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.`; +} + +export function finalizeSearchPayloadText( + payload: Record, + options: { + mode: SearchPayloadMode; + pretty?: boolean; + transportAware?: boolean; + } +): string { + if (!isPlainRecord(payload.searchQuality)) { + return JSON.stringify(payload, null, options.pretty ? 2 : undefined); + } + + let tokenEstimate = + typeof payload.searchQuality.tokenEstimate === 'number' + ? payload.searchQuality.tokenEstimate + : 0; + let warning = + typeof payload.searchQuality.warning === 'string' ? payload.searchQuality.warning : undefined; + let renderedPayload = ''; + + for (let attempt = 0; attempt < 5; attempt += 1) { + renderedPayload = JSON.stringify( + { + ...payload, + searchQuality: { + ...payload.searchQuality, + ...(warning ? { warning } : {}), + tokenEstimate + } + }, + null, + options.pretty ? 2 : undefined + ); + + const estimatedTransportPayload = + options.transportAware && process.platform === 'win32' + ? renderedPayload.replace(/\n/g, '\r\n') + : renderedPayload; + const nextTokenEstimate = Math.ceil(estimatedTransportPayload.length / 4); + const nextWarning = buildWarning(nextTokenEstimate, options.mode); + + if (nextTokenEstimate === tokenEstimate && nextWarning === warning) { + return renderedPayload; + } + + tokenEstimate = nextTokenEstimate; + warning = nextWarning; + } + + return renderedPayload; +} diff --git a/tests/search-compact-mode.test.ts b/tests/search-compact-mode.test.ts index d833129..104fe42 100644 --- a/tests/search-compact-mode.test.ts +++ b/tests/search-compact-mode.test.ts @@ -363,6 +363,46 @@ describe('search_codebase compact/full mode', () => { expect(payload.searchQuality.warning).toBeUndefined(); }); + it('uses filter-only guidance when a final compact payload exceeds the token threshold', async () => { + const oversizedSummary = 'Token-heavy compact summary '.repeat(1200); + searchMocks.search.mockResolvedValueOnce([ + makeResult({ + summary: oversizedSummary + }) + ]); + + const { server } = await import('../src/index.js'); + const handler = ( + server as { + _requestHandlers?: Map< + string, + (r: unknown) => Promise<{ content: Array<{ type: string; text: string }> }> + >; + } + )._requestHandlers?.get('tools/call'); + if (!handler) throw new Error('Expected tools/call handler'); + + const response = await handler({ + jsonrpc: '2.0', + id: 1, + method: 'tools/call', + params: { name: 'search_codebase', arguments: { query: 'auth service' } } + }); + + const payload = JSON.parse(response.content[0].text) as { + searchQuality: { + tokenEstimate: number; + warning?: string; + }; + }; + + expect(payload.searchQuality.tokenEstimate).toBe(Math.ceil(response.content[0].text.length / 4)); + expect(payload.searchQuality.tokenEstimate).toBeGreaterThan(4000); + expect(payload.searchQuality.warning).toBe( + `Large search payload: estimated ${payload.searchQuality.tokenEstimate} tokens. Try tighter filters (e.g. layer=, language=) to reduce payload size.` + ); + }); + // Test 5: Full mode returns hints arrays and all memories + budget it('full mode returns hints object with callers/tests and budget metadata', async () => { searchMocks.search.mockResolvedValueOnce([makeResult()]); @@ -444,8 +484,8 @@ describe('search_codebase compact/full mode', () => { expect(payload.searchQuality.tokenEstimate).toBe(Math.ceil(response.content[0].text.length / 4)); expect(payload.searchQuality.tokenEstimate).toBeGreaterThan(4000); - expect(payload.searchQuality.warning).toContain( - `estimated ${payload.searchQuality.tokenEstimate} tokens` + expect(payload.searchQuality.warning).toBe( + `Large search payload: estimated ${payload.searchQuality.tokenEstimate} tokens. Prefer compact mode or tighter filters before pasting into an agent.` ); }); From 100d291c884a1c03918dd5296f83f24813caaffb Mon Sep 17 00:00:00 2001 From: PatrickSys Date: Mon, 13 Apr 2026 21:58:42 +0200 Subject: [PATCH 5/6] style: format search payload handler --- src/tools/search-codebase.ts | 172 +++++++++++++++++------------------ 1 file changed, 86 insertions(+), 86 deletions(-) diff --git a/src/tools/search-codebase.ts b/src/tools/search-codebase.ts index af3bd50..84e8711 100644 --- a/src/tools/search-codebase.ts +++ b/src/tools/search-codebase.ts @@ -1073,26 +1073,80 @@ export async function handle( const nextHops = buildNextHops(compactResults, searchQuality); const payloadText = finalizeSearchPayloadText( { + status: 'success', + searchQuality: searchQualityBlock, + budget: { mode: 'compact', resultCount: compactResults.length }, + ...(preflightPayload && { preflight: preflightPayload }), + ...(patternSummary && { patternSummary }), + ...(bestExample && { bestExample }), + ...(nextHops.length > 0 && { nextHops }), + results: compactResults.map((r) => { + const importedByCount = getImportedByCount(r); + const topExports = getTopExports(r.filePath); + const scope = buildScopeHeader(r.metadata); + // First 3 lines of chunk content as a lightweight signature preview + const signaturePreview = r.snippet + ? r.snippet + .replace(/^\r?\n+/, '') + .split('\n') + .slice(0, 3) + .join('\n') + .trim() || undefined + : undefined; + return { + file: `${r.filePath}:${r.startLine}-${r.endLine}`, + summary: r.summary, + score: Math.round(r.score * 100) / 100, + ...(r.relevanceReason && { relevanceReason: r.relevanceReason }), + ...(r.componentType && + r.layer && + r.layer !== 'unknown' && { type: `${r.componentType}:${r.layer}` }), + ...(r.trend && r.trend !== 'Stable' && { trend: r.trend }), + ...(r.patternWarning && { patternWarning: r.patternWarning }), + importedByCount, + ...(topExports.length > 0 && { topExports }), + ...(r.layer && r.layer !== 'unknown' && { layer: r.layer }), + // Structural metadata: surface AST intelligence already computed at index time + ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), + ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), + ...(scope && { scope }), + ...(signaturePreview && { signaturePreview }) + }; + }), + ...(strongMemories.length > 0 && { + relatedMemories: strongMemories.map((m) => `${m.memory} (${m.effectiveConfidence})`) + }) + }, + { mode: 'compact', pretty: true, transportAware: true } + ); + + return { + content: [ + { + type: 'text', + text: payloadText + } + ] + }; + } + + // Full mode: today's response shape + budget + relevanceReason; consumers removed + const payloadText = finalizeSearchPayloadText( + { status: 'success', searchQuality: searchQualityBlock, - budget: { mode: 'compact', resultCount: compactResults.length }, + budget: { mode: 'full', resultCount: results.length }, ...(preflightPayload && { preflight: preflightPayload }), - ...(patternSummary && { patternSummary }), - ...(bestExample && { bestExample }), - ...(nextHops.length > 0 && { nextHops }), - results: compactResults.map((r) => { - const importedByCount = getImportedByCount(r); - const topExports = getTopExports(r.filePath); - const scope = buildScopeHeader(r.metadata); - // First 3 lines of chunk content as a lightweight signature preview - const signaturePreview = r.snippet - ? r.snippet - .replace(/^\r?\n+/, '') - .split('\n') - .slice(0, 3) - .join('\n') - .trim() || undefined + results: results.map((r) => { + const relationshipsAndHints = buildRelationshipHints(r); + const enrichedSnippet = includeSnippets + ? enrichSnippetWithScope(r.snippet, r.metadata, r.filePath, r.startLine) : undefined; + const scope = buildScopeHeader(r.metadata); + // Chunk-level imports/exports (top 5 each) + complexity + const chunkImports = (r as unknown as { imports?: string[] }).imports?.slice(0, 5); + const chunkExports = (r as unknown as { exports?: string[] }).exports?.slice(0, 5); + return { file: `${r.filePath}:${r.startLine}-${r.endLine}`, summary: r.summary, @@ -1103,82 +1157,28 @@ export async function handle( r.layer !== 'unknown' && { type: `${r.componentType}:${r.layer}` }), ...(r.trend && r.trend !== 'Stable' && { trend: r.trend }), ...(r.patternWarning && { patternWarning: r.patternWarning }), - importedByCount, - ...(topExports.length > 0 && { topExports }), - ...(r.layer && r.layer !== 'unknown' && { layer: r.layer }), - // Structural metadata: surface AST intelligence already computed at index time + ...(relationshipsAndHints.relationships && { + relationships: relationshipsAndHints.relationships + }), + ...(relationshipsAndHints.hints && { hints: relationshipsAndHints.hints }), + ...(enrichedSnippet && { snippet: enrichedSnippet }), + // Structural metadata ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), ...(scope && { scope }), - ...(signaturePreview && { signaturePreview }) + ...(chunkImports && chunkImports.length > 0 && { imports: chunkImports }), + ...(chunkExports && chunkExports.length > 0 && { exports: chunkExports }), + ...(r.metadata?.cyclomaticComplexity && { + complexity: r.metadata.cyclomaticComplexity + }) }; }), - ...(strongMemories.length > 0 && { - relatedMemories: strongMemories.map((m) => `${m.memory} (${m.effectiveConfidence})`) + totalResults: results.length, + ...(relatedMemories.length > 0 && { + relatedMemories: relatedMemories + .slice(0, 3) + .map((m) => `${m.memory} (${m.effectiveConfidence})`) }) - }, - { mode: 'compact', pretty: true, transportAware: true } - ); - - return { - content: [ - { - type: 'text', - text: payloadText - } - ] - }; - } - - // Full mode: today's response shape + budget + relevanceReason; consumers removed - const payloadText = finalizeSearchPayloadText( - { - status: 'success', - searchQuality: searchQualityBlock, - budget: { mode: 'full', resultCount: results.length }, - ...(preflightPayload && { preflight: preflightPayload }), - results: results.map((r) => { - const relationshipsAndHints = buildRelationshipHints(r); - const enrichedSnippet = includeSnippets - ? enrichSnippetWithScope(r.snippet, r.metadata, r.filePath, r.startLine) - : undefined; - const scope = buildScopeHeader(r.metadata); - // Chunk-level imports/exports (top 5 each) + complexity - const chunkImports = (r as unknown as { imports?: string[] }).imports?.slice(0, 5); - const chunkExports = (r as unknown as { exports?: string[] }).exports?.slice(0, 5); - - return { - file: `${r.filePath}:${r.startLine}-${r.endLine}`, - summary: r.summary, - score: Math.round(r.score * 100) / 100, - ...(r.relevanceReason && { relevanceReason: r.relevanceReason }), - ...(r.componentType && - r.layer && - r.layer !== 'unknown' && { type: `${r.componentType}:${r.layer}` }), - ...(r.trend && r.trend !== 'Stable' && { trend: r.trend }), - ...(r.patternWarning && { patternWarning: r.patternWarning }), - ...(relationshipsAndHints.relationships && { - relationships: relationshipsAndHints.relationships - }), - ...(relationshipsAndHints.hints && { hints: relationshipsAndHints.hints }), - ...(enrichedSnippet && { snippet: enrichedSnippet }), - // Structural metadata - ...(r.metadata?.symbolName && { symbol: r.metadata.symbolName }), - ...(r.metadata?.symbolKind && { symbolKind: r.metadata.symbolKind }), - ...(scope && { scope }), - ...(chunkImports && chunkImports.length > 0 && { imports: chunkImports }), - ...(chunkExports && chunkExports.length > 0 && { exports: chunkExports }), - ...(r.metadata?.cyclomaticComplexity && { - complexity: r.metadata.cyclomaticComplexity - }) - }; - }), - totalResults: results.length, - ...(relatedMemories.length > 0 && { - relatedMemories: relatedMemories - .slice(0, 3) - .map((m) => `${m.memory} (${m.effectiveConfidence})`) - }) }, { mode: 'full', pretty: true, transportAware: true } ); From 840670c4a7935c87e08b20a35ccf8bd36461ad0b Mon Sep 17 00:00:00 2001 From: PatrickSys Date: Mon, 13 Apr 2026 22:24:24 +0200 Subject: [PATCH 6/6] chore: trim release metadata from PR #98 --- docs/benchmark.md | 38 ++++++++++++++++++-------------------- package.json | 2 +- 2 files changed, 19 insertions(+), 21 deletions(-) diff --git a/docs/benchmark.md b/docs/benchmark.md index 93a9408..1ea836e 100644 --- a/docs/benchmark.md +++ b/docs/benchmark.md @@ -1,6 +1,6 @@ # Discovery Benchmark -This page documents the current public proof slice for `v2.1.0`. +This page documents the current public proof slice for `v2.0.0`. It is a discovery benchmark, not an implementation-quality benchmark. ## Scope @@ -37,30 +37,28 @@ From `results/gate-evaluation.json`: - `claimAllowed`: `false` - `totalTasks`: `24` - `averageUsefulness`: `0.75` -- `averagePayloadBytes`: `7287.625` -- `averageEstimatedTokens`: `1822.25` -- `averageFirstRelevantHit`: `null` +- `averageEstimatedTokens`: `903.7083333333334` - `bestExampleUsefulnessRate`: `0.125` Repo-level outputs from the same rerun: -| Repo | Tasks | Avg usefulness | Avg payload bytes | Avg estimated tokens | Best-example usefulness | -| --- | ---: | ---: | ---: | ---: | ---: | -| `angular-spotify` | 12 | 0.8333 | 8553 | 2138 | 0.25 | -| `excalidraw` | 12 | 0.6667 | 6023 | 1506 | 0 | +| Repo | Tasks | Avg usefulness | Avg estimated tokens | Best-example usefulness | +| --- | ---: | ---: | ---: | ---: | +| `angular-spotify` | 12 | 0.8333 | 1080.6667 | 0.25 | +| `excalidraw` | 12 | 0.6667 | 726.75 | 0 | ## Gate Truth The gate is intentionally still blocked. -- The combined suite covers both frozen public repos. +- The combined suite now covers both public repos. - The release claim is still disallowed because comparator evidence remains incomplete. - Missing evidence currently includes: - raw Claude Code baseline metrics - - GrepAI comparator metrics - - jCodeMunch comparator metrics - - codebase-memory-mcp comparator metrics - - CodeGraphContext comparator metrics + - GrepAI metrics + - jCodeMunch metrics + - codebase-memory-mcp metrics + - CodeGraphContext metrics ## Comparator Reality @@ -68,20 +66,20 @@ The current comparator artifact records setup failures, not benchmark wins. | Comparator | Status | Current reason | | --- | --- | --- | -| `codebase-memory-mcp` | `ok` | The lane now executes on this host, but the captured outputs are near-empty (`19` bytes / `5` tokens on average, `0` usefulness), so the gate still treats it as missing evidence | -| `jCodeMunch` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`) | -| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path are not present | -| `CodeGraphContext` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`); database prerequisite remains unresolved | -| `raw Claude Code` | `ok` | The baseline now runs, but the captured outputs remain non-useful (`66.08` bytes / `17.17` tokens on average, `0` usefulness), so the gate still treats it as missing evidence | +| `codebase-memory-mcp` | `setup_failed` | Installer path still points to the external shell installer | +| `jCodeMunch` | `setup_failed` | MCP server closes during startup | +| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present | +| `CodeGraphContext` | `setup_failed` | MCP server closes during startup | +| `raw Claude Code` | `setup_failed` | Local `claude` CLI baseline is not installed/authenticated in this environment | -`CodeGraphContext` remains part of the comparison frame. It is not removed from the public story just because the lane still fails to start. +`CodeGraphContext` is explicitly part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start. ## Important Limitations - This benchmark measures discovery usefulness and payload cost only. - It does not measure implementation correctness, patch quality, or end-to-end task completion. - Comparator setup is still environment-sensitive, so the gate remains `pending_evidence`. -- Current search payload costs are higher than the older v2.0.0 proof slice because the v2.1.0 surface now includes richer map structure and `searchQuality.tokenEstimate` advisories. +- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness. - `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set. ## What This Proof Can Support diff --git a/package.json b/package.json index 9c89799..604877b 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "codebase-context", - "version": "2.1.0", + "version": "1.9.0", "description": "Pre-maps your codebase architecture, conventions, and team memory so AI agents navigate with precision instead of exploring. Local-first MCP server with AST-backed hybrid search.", "type": "module", "main": "./dist/lib.js",