perf(llm): batch classifier + dead-code, bump concurrency 2->8 [WIP 1/5] by andreinknv · Pull Request #3 · andreinknv/codegraph

andreinknv · 2026-05-01T11:21:10Z

Stress-test fixes — PR 1 of 5 (stacked).
Base: wip-stress-base (= df96c6c, the verified round-3 tip).
Stack: PR1 → PR2 → PR3 → PR4 → PR5. Each subsequent PR targets the previous one as base.

Commits

9927549 perf: batch LLM classifier + dead-code, bump concurrency, N1 attribution hint

Summary

Concurrency 2→8 for non-bridge providers (openai-compat, anthropic-api). The hardcoded 2 was tuned for local proxies that serialise internally; cloud Haiku is dominated by network roundtrip and benefits from parallelism. claude-bridge stays at 4 (subprocess spawn cost).
Classifier batching — classifyAllRoles rewritten from one chat call per symbol to N-per-call (50 default) with bracket-aware JSON-array parsing. Same pattern applied to dead-code judging.
N1 attribution hint in formatImpact — when codegraph_impact resolves to >=2 sources, the merged per-file detail now points readers up to the per-source breakdown so they don't mistake the merged list for being source-attributed.

Test plan

npm test — full suite at 1113 / 13 skip / 0 fail across the entire 5-PR stack.

WIP — opened as draft for review.

…ion hint Layered on top of the round-1+2+3+4 stress-test fixes. Four changes: ## N1 follow-up — attribution hint in formatImpact When codegraph_impact resolves the symbol name to >=2 sources, the existing per-source rollup (round-4 Gap 2) tells you which definition contributes which file counts. The merged per-file *symbol detail* below it still shows symbols cross-source — middleware/openai.go: Write, EmbeddingsMiddleware, ... does not say which Encode definition each symbol traces back to. Adds a one-line cross-reference under the merged detail pointing readers up to the per-source breakdown so they do not mistake the merged list for being source-attributed. formatImpact now takes a multiSource: boolean flag, threaded through from handleImpact based on perSource.length > 1. ## Concurrency bump — 2->8 for non-bridge providers src/index.ts:1002 hardcoded chatConcurrency = 2 for openai-compat / anthropic-api. The original comment claimed openai-compat servers serialise internally so 2 is plenty — true for some local proxies but not for cloud Anthropic Haiku, where per-call latency is dominated by network roundtrip. Bumped to 8 (claude-bridge stays at 4 — subprocess spawn cost is the bottleneck and this is what already worked). DEFAULT_CONCURRENCY in classifier.ts, summarizer.ts, dead-code.ts similarly bumped 2->8 in case anyone calls them directly without overriding. dir-summarizer.ts left at 1 (large per-call output, serial is fine). ## Classifier batching — 50 symbols per call, JSON list response classifyAllRoles in src/llm/classifier.ts rewrote from one chat call per symbol to N-per-call: - New buildBatchPrompt sends the role list once + a numbered list of symbols, asks for a JSON array of {i, role} entries in the same order. - New parseBatchResponse extracts the first balanced JSON array (string-aware bracket counting so brackets inside string literals do not fool the depth tracker), requires >=80% coverage, returns null on parse failure. - Worker claims batchSize candidates at a time. On batch parse failure or LlmEndpointError, falls back to classifyOne (per-item) for that batch only — an unrelated bad symbol cannot lose the rest. - buildSinglePrompt factored out for fallback; ROLE_LIST_TEXT shared between both prompt builders. Net effect on a --force re-index of 722 symbols: ~30x fewer chat calls (15 batches vs 722 calls), ~3x fewer input tokens (role list sent 15x instead of 722x), and ~30x lower wall-clock latency at the new concurrency. ## Dead-code batching — 8 candidates per call Same pattern as classifier in src/llm/dead-code.ts, smaller batch (8) because verdict output is bigger (~80 tokens per entry). New parseBatchJudge mirrors parseBatchResponse string-aware bracket counting (matters here because reason fields can contain brackets). Synthetic uncertain entries created for items missing from a parsed batch (when batch passes 80% coverage but a row is dropped) — they also increment judged to keep judged === results.length invariant the test asserts. useAskModel: true preserved — dead-code judge still goes to Sonnet, not Haiku, since the verdict is high-stakes. ## Tests - __tests__/llm-tiers.test.ts fake-server stub now detects batched prompts via the structural marker "Symbols (zero-indexed):" (anchored on structure, not on reply-instruction phrasing which could drift independently in either file). Counts numbered symbol lines and emits a same-sized JSON array of stubbed verdicts/roles. - 2 new unit tests: - parseBatchResponse extracts roles, tolerates surrounding prose, rejects under-coverage — covers happy path, prose-wrapped JSON, title-case role coercion via parseRole, sparse response rejection (under 80% coverage), malformed JSON, and bracket-inside-string- literal preservation. - parseBatchJudge handles dead-code verdict arrays + recovery edges — covers happy path, confidence clamp to [0,1], unknown-verdict coercion to uncertain, bracket-inside-string-literal via inString tracker, sparse rejection. Suite: 1081 / 13 skip / 0 fail (was 1079). ## Reviewer trail Two passes. Pass 1: APPROVE + 3 info findings (parseBatchResponse missing string-state tracking, dead-code judged undercount on synthetic entries, fragile test stub anchored on reply phrase). All addressed. Pass 2: APPROVE + 2 info findings (parseBatchResponse test missing bracket-in-string case, doc-comment overclaiming what string-aware tracking handles). Both addressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… docs Three reviewer-flagged improvements on 6d0e7a2 (the WASM-visibility commit). All informational items from that review — no functional regression, no scope drift, suite stays at 1117/13/0. ## #1 — backend is now per-DatabaseConnection, not a process global `createDatabase` previously set a module-level `activeBackend` and exposed it via `getActiveBackend()`. In the MCP cross-project path (handleStatus called against `projectPath` opens a SECOND DB via openSync) the global reflected whichever DB was opened most recently, not necessarily the one whose stats were just rendered. Benign in practice (every DB in a process resolves the same backend) but structurally imprecise. Refactor: - `createDatabase(dbPath)` now returns `{db, backend}` instead of a bare `SqliteDatabase`. Caller stores both. - `DatabaseConnection` carries a `private backend: SqliteBackend` and exposes `getBackend()`. - `CodeGraph.getBackend()` delegates — that's the public surface. - CLI `codegraph status` and MCP `handleStatus` both call `cg.getBackend()` instead of the global. The global is removed. Two pre-existing tests (`migrations-015-016`, `migrations-022`) that called `createDatabase` directly now destructure `{db: adapter}`. ## #2 — fix recipe deduplicated across the two code surfaces The `xcode-select` / `npm rebuild` / `npm install --save` recipe appeared inline in both `buildWasmFallbackBanner` (sqlite-adapter.ts) and the MCP `handleStatus` formatter (mcp/tools.ts). New `WASM_FALLBACK_FIX_RECIPE` constant in sqlite-adapter.ts is the single source for the one-line summary; the MCP formatter interpolates it. The banner formats the same content multi-line for the stderr surface. README is intentionally separate (different audience, different rendering). ## #3 — README troubleshooting now covers Linux Section title renamed "Indexing is slow on macOS / WASM fallback" -> "Indexing is slow / WASM fallback active". New code block lists fix steps for macOS, Debian/Ubuntu, RHEL/Fedora, and the cross-platform `npm install --save` escape hatch. The banner stderr block also gained the Linux equivalent for symmetry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… server-config flags Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed: #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe) #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*` #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)` colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed` CLI/MCP alignment (5/32 → 33+/35): - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar, biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs, module-summary, role, coverage-query, pending-summaries, save-summaries, review-context - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync, codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit, codegraph_unlock, codegraph_affected MCP server-level operator config (`codegraph serve --mcp`): - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing) - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model / --llm-api-key (operator LLM config; per-project config wins on conflict) - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders Architectural cleanups: - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces growing string-comparison chain in execute()) - `withAllowStale` registry injection only on tools that DO see the gate - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts` - `LlmClient.isEmbeddingReachable` for split-provider correctness - SyncResult `lockContention` flag → handleSync emits distinct retryable message - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force) - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs) Migrations: 023-files-is-test.ts — add `files.is_test` 024-build-context-refs.ts — add `build_context_refs` table Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes: - JSONC URL strip via state machine (was eating `https://` tails) - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent) - Dynamic imports also match template_string nodes - process.exit deferred until after finally cleanup in runViaMCP - --same-language / --different-language mutual exclusion guard - help-tools CLI bypasses isInitialized (works without a project) - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks) - handleAffected errors instead of silently dropping unsupported glob filters - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move 3 graph-derived biomarker scanners to a new src/db/queries-biomarkers-graph.ts module as free functions taking (qb: QueryBuilder, ...args). Same proven pattern as the prior ten extractions. Functions moved: - findGodClasses (god_class biomarker — class-like nodes with too many `contains` children) - findFeatureEnvy (feature_envy biomarker — methods calling out more than in) - findUnusedExports (unused_export biomarker — exported symbols with no cross-file `calls` / `references` / `instantiates` / `extends` / `implements` edge) This module covers the cross-file biomarker SCANNERS only; finding STORAGE (appendFindings, etc.) lives in queries-findings.ts (cluster #3). Touchpoints (only one consumer): - src/biomarkers/index.ts: 3 call sites (one per scanner) Verification: - tsc clean, suite 1352/13/0 (no regressions) - Reindex: QueryBuilder god_class **46 -> 43** (delta = 3, exact) QueryBuilder is now within 3 of the error-threshold (40). Next cluster will be unresolved-refs (~10 methods), which clears the error entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Prefix

) getFindingsRanked gains `excludeFile` — drop findings whose file_path starts with a literal prefix. The triage workflow it enables: "give me ranked warnings excluding src/legacy/" without having to rebaseline the index or post-filter in JS. Implementation -------------- - src/db/queries-findings.ts: WHERE n.file_path NOT LIKE @Prefix ESCAPE '\\'. The prefix is escaped (\\, _, %) before being bound so any of those characters in real path names match themselves; the first draft used GLOB but that was discarded after the reviewer flagged that GLOB has no escape syntax — `excludeFile: '*'` would have collapsed to "match every path" and silently zeroed the result set. - src/mcp/tools/biomarkers.ts: new `excludeFile` arg on the inputSchema, passed through to `getFindingsRanked`. The description notes that the filter only applies to mode='ranked' (silently ignored on 'symbol' / 'stats'). - src/bin/codegraph.ts: `--exclude-file <prefix>` on the `codegraph biomarkers` CLI command. Tests ----- __tests__/biomarkers.test.ts: - baseline (no filter) sees both functions - directory exclusion via trailing-slash prefix drops one - single-file exclusion drops the named file - `excludeFile: '*'` is treated literally (no path starts with '*') so the result equals baseline — pinning the reviewer-caught fix - `excludeFile: 'src/k_ep/'` does NOT match 'src/keep/' because the underscore is escaped, not a wildcard Also: struck the duplicate B colbymchenry#33 marker in BACKLOG.md (it was the same item as A #3). Suite: 1366/13/0 (+1 new test). tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ToolHandler was a 491-line god-class doing five things commingled. This extracts the project-cache + watcher coordination into its own class so the dispatcher (ToolHandler) can focus on what's left: disabled-tool filtering, freshness gating, dispatch, and lifecycle. Before: - ToolHandler held projectCache (Map), cachedRoots (Map), watchedRoots (Set), MAX_CACHED_PROJECTS, MAX_WATCHED_PROJECTS - ...plus getCodeGraph, evictOldestIfFull, tryStartWatcher, closeAll, closeProjectsMatching, shouldEvictCachedProject, closeDefaultCgIfMatching as private methods. After: - New `src/mcp/tools/_project-cache.ts` (175 LOC) owns the cache Map, the cachedRoots map, the watchedRoots set, the FIFO eviction, and tryStartWatcher. Public surface: getOrOpen / closeAll / closeProjectsMatching / readonlyView (for ToolCtx exposure). - ToolHandler shrinks ~150 lines; getCodeGraph becomes a 12-line method that delegates to cache.getOrOpen for explicit-project paths. closeAll delegates to cache.closeAll + closes the default cg if any. closeProjectsMatching delegates + sweeps the default cg. Defense-by-design angle: - Each class now has a narrower public surface — fewer reachable invariants per file → smaller blast radius for the next change. - ProjectCache's own state (the three collections) can no longer be perturbed by unrelated dispatch logic; encapsulated behind 4 methods. Self-doc angle: - ToolHandler.execute() now reads as a clear pipeline: validate → gate → validate args (D1) → dispatch. The "what's a watcher doing here?" noise is gone. What's NOT changed (intentional, smaller follow-up scope): - The `runFreshnessGate` + auto-sync logic stays inside ToolHandler. Splitting that further is a separate exercise — it's tightly coupled to ToolResult shaping (banner injection, freshness metadata). - WatcherCoordinator is folded INTO ProjectCache rather than a third class because the eviction pathway shares state with the watcher (closing a cached cg must stop its watcher). Keeping them in one class avoids cross-class state coordination. - The CodeGraph class itself was triaged and DECLINED for split: it's already a facade exposing sub-managers (db/queries/orchestrator/etc). The remaining lifecycle code is genuinely cohesive. Tests: - Watcher tests update their internal-state probes to drill through the new `cache` field (handler.cache.watchedRoots instead of handler.watchedRoots). 4 fixture tests adjusted, 0 new tests needed. - Full suite 1634/34/0 — same as pre-split. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…arch arc) Three additions inspired by the 2026-05-08 code-KG paper sweep (CodeRAG, GraphCodeAgent, GraphGen4Code, Maarleveld GNN survey, CodexGraph, FalkorDB). Each verified absent before implementation; all opt-in to keep default structural traversals unchanged. #1 — In-graph similar_to edges (similarity as graph hops) - New EdgeKind similar_to (confidence=INFERRED, metadata.score) - buildSimilarToEdges reuses findSimilarViaVec; delete+insert are wrapped in db.transaction so the replacement is atomic - Migration 036 partial-index on edges(source,kind) WHERE kind=similar_to — guarded by sqlite_master existence check for the pre-016 hand-rolled migration tests - Surfaced as codegraph_admin({action: build-similarity-edges}) and CLI codegraph admin build-similarity-edges --k --min-score - EXCLUDED_EDGE_KINDS keeps it out of default traversals; explicit edgeKinds bypass the filter #2 — mode=intent search over symbol_summaries - FTS5 virtual table summary_fts (porter unicode61, mirrors nodes_fts) + INSERT/UPDATE/DELETE triggers on symbol_summaries - Migration 037 + parallel schema.sql entry for fresh-init path - bm25-ranked, optional kind/language/pathFilter filters - pathFilter LIKE uses canonical backslash-escape pattern matching queries-findings.ts:227-232 (no _/% injection) - Refuse-when-empty error points at codegraph summarize - FTS5 query parse errors caught and re-surfaced as a clear syntax-error message #3 — Intra-procedural def_use edges (TS/JS/TSX/JSX) - New EdgeKind def_use as self-loops on function/method nodes; metadata carries name, defLine, useLines - Scope-bounded extractor in src/extraction/def-use.ts, called from both tsExtractFunction and tsExtractMethod - Skips parameters (function inputs), fields (covered by field_access), nested-scope vars (belong to inner function set) - EXCLUDED_EDGE_KINDS opt-in; no traversal helper assumes source != target Schema-version assertions bumped 35 to 37 in foundation.test.ts and pr19-improvements.test.ts. Suite 1742/0/34 (was 1729 baseline, +13 new tests). Three reviewer rounds: round 1 caught LIKE-escape, atomicity, field rename, FTS5 try/catch; round 2 caught a JSDoc rot from the rename and a contradictory test assertion; round 3 APPROVE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ge-level diff (eval arc) Three additions with pre-set hypotheses + post-impl measurement, per the user's "evaluation and measurement how much useful it will actually be" brief. Pre-impl thresholds were committed before code so post numbers couldn't be motivated. One feature deferred with honest documented reasoning. #1 Selective parse-cache invalidation - clearParseCache(qb, language?) returns deleted-row count - CLI: --clear-parse-cache [language] (boolean OR optional value) - MCP: clearParseCache: boolean + clearParseCacheLanguage: string (typed schema doesn't allow oneOf — split into two args; language wins when both set) Hypothesis (pre-set): >=3x wall-clock speedup, <30s absolute Measured (codegraph repo, 463/498 = 93% TS in parse cache): - TS-only clear: 3.70s wall / 3.77s user-CPU - Full clear: 3.87s wall / 8.13s user-CPU - Wall ratio: 1.04x (parallelism masks the work delta) - User-CPU ratio: 2.16x more work for full clear Verdict: speed threshold NOT MET on monoglot testbed. Real value here is correctness (targeted invalidation when an extractor changes). On polyglot repos at 50% target-lang ratio, expected ~2x wall-clock speedup. #1.5 Docstring source for mode='intent' (user follow-up: "make intent richer") - Migration 038: docstring_fts FTS5 over nodes.docstring + INS/UPD/DEL triggers (with WHEN docstring IS NOT NULL AND != ''); schema.sql parity for fresh-init path; pragma_table_info guard for pre-016 hand-rolled migration test setups - _search-intent.ts queries BOTH summary_fts AND docstring_fts, UNIONs by node_id keeping best rank, surfaces 'via summary' / 'via docstring' provenance label per result - Empty-corpus check fixed: FTS5 external-content COUNT(*) reads from the source table, not the actual indexed rows — switched to direct content count Hypothesis (pre-set): >=30% recall increase Measured (20 hand-picked intent queries, codegraph corpus): - Summary-only hits: 22 - Docstring-only hits: 34 - Combined unique-node hits: 56 (2.55x = 155% improvement) Verdict: well above threshold. Best ROI of this arc — docstrings cover 26% of nodes vs summaries' 18%, AND describe intent verbatim in JSDoc / Python docstrings / Go comments. #2 Edge-level diff in compare_to_ref - EdgeDelta / EdgesDelta types - diffEdgeLists keyed by stable (srcQualName::srcKind=>tgtQualName::tgtKind::edgeKind) so line shifts don't surface as spurious changes - Cross-file edges out of scope (compareToRef is per-file) - Opt-in via includeEdges: true (CLI: --include-edges) - Renderer surfaces source -> target node IDs (round-1 reviewer finding: discarded data; fixed) Hypothesis (pre-set): >=30% additional info (>=20% loose) Measured (HEAD vs HEAD~3 on this branch): - Node changes: 21+11+308 = 340 - Edge changes (NEW signal): 83+11 = 94 - Files surfaced: 22 -> 27 (+5 visible only via edge changes) - Information gain: 94/340 = 27.6% Verdict: >=20% threshold MET; just below >=30% strict. The 5 newly-surfaced files (pure-edge changes) are the qualitative win. #3 Stack-graphs cross-language resolver — DEFERRED Survey of the codegraph corpus: monoglot TypeScript. child_process invocations are ~30 git execFileSync calls; no Python/Ruby/Go spawn targets. Dynamic imports are NPM packages. string_imports table is dominated by test fixtures. Conclusion: this corpus lacks the ground-truth cross-language references needed to measure a scope-graph rule meaningfully. Building infrastructure without testable signal would be speculative abstraction (CLAUDE.md anti-pattern). Stays on the borrow-ideas backlog as the long-horizon item; not blocked, just not session-feasible without a polyglot testbed. Schema-version assertions bumped 37 -> 38 in foundation.test.ts and pr19-improvements.test.ts. Suite 1746/0/34 (was 1742; +4 new tests). Two reviewer rounds: round 1 caught the edge-delta formatter discarding source/target IDs; round 2 APPROVE. Info-level note tracked for later: summary_fts_au trigger could mirror the docstring_fts_au SELECT-WHERE guard pattern for consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…enrichment User asked: "is there other hot data we can apply intent search on, and is there something missing we can apply on hot data?" Answer: test descriptions mined from it/test/describe(...) calls in test files. "Hot" = extracted at index time, no LLM pass needed. Plus enrich codegraph_tests_for and codegraph_node with the same data while we're in the area. What shipped ============ #1.6a Schema + mining hook - Migration 039: test_names (id / file_path / line / description, FK on files.path, ON DELETE CASCADE) + test_names_fts FTS5 with porter unicode61 tokenizer + INS/UPD/DEL triggers; schema.sql parity for fresh-init path; sqlite_master guard for pre-016 hand- rolled migration tests - src/index-hooks/test-names.ts: regex-mining hook (NOT tree-sitter — patterns are simple enough; ~5% missed cases acceptable per docstring justification). Matches it/test/describe/context/ fit/fdescribe/fcontext/xit/xtest/xdescribe with optional .skip/.only/.each/.todo/.concurrent/.sequential/.failing modifiers. Idempotent: full rescan on indexAll, per-file rescan on sync. Runs after TESTS_EDGES_HOOK so test->subject edges are fresh. #1.6b mode='intent' third source - _search-intent.ts queries summary_fts + docstring_fts + test_names_fts. Symbol-anchored sources merge by node_id keeping best rank; test_names render in a separate "Test-description matches" section because they anchor to file:line not a node id. - kind/languageFilter skip the test_names branch (test_names don't have a node kind). pathFilter applies via the canonical LIKE-escape pattern from queries-findings.ts (memo item #3). - Empty-corpus refusal + no-match messages updated to mention all three sources. #1.6c codegraph_tests_for enrichment - TestRow.testDescriptions: Array<{line, description}>, fetched via fetchTestDescriptionsForFile (50-row cap per file). - appendBucket renders up to 8 descriptions per file with overflow note ("...and N more assertions"). - handleFilesMode (the files=[...] path) gets the same enrichment. #1.6d codegraph_node enrichment - When fetching a single symbol's details, surface up to 5 mined test assertions from any test file linked to the symbol's file via the `tests` edge. Anchored by file (not by precise symbol — label says "may include broader-scope tests"). Schema-version assertions bumped 38 -> 39 in foundation.test.ts and pr19-improvements.test.ts. Measurement =========== Coverage on this codegraph repo: 2169 test descriptions across 133 test files mined. Per-source hits across 10 hand-picked queries: - summary-only: 22 hits - docstring: 80 hits - test_name: 101 hits - Combined: 203 hits - Gain from test_name: 99% additional signal vs summary+docstring Vocabulary specialization: test_names dominate on assertion-style queries ("rejects", "returns empty", "refusal error"); docstrings dominate on implementation-style queries ("parses tree sitter", "writes embeddings"). Sources are complementary, not redundant. Suite 1751/0/34 (was 1750; +1 for the new node-enrichment test; total +5 across the test-names test file). Typecheck clean. Reviewer: APPROVE with 3 info-level findings. Addressed the missing codegraph_node enrichment test in this commit; the two style notes (canonical FK-safe insert pattern via `WHERE EXISTS`, transaction wrap on clearAllTestNames for very large repos) are tracked as follow-ups. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ulk codegraph_at_range Three additive surface extensions to cut investigation round-trips: #1 codegraph_node accepts `symbols: string[]` (up to 20) alongside the existing single-symbol `symbol`. Duplicate inputs that resolve to the same node are merged. Saves N round-trips when checking a list of suspect symbols. #2 codegraph_node accepts four new inline-expansion flags (`includeCallers` / `includeCallees` / `includeBiomarkers` / `includeTests`). When set, the response folds in the corresponding tool's answer under each card, capped per-section to keep token pressure low (10 callers, 10 callees, 5 findings, 5 test files). Collapses 3-5 round-trips into one for "tell me everything about X" patterns; the dedicated tools remain available for the full lists. #3 codegraph_at_range accepts `ranges: Array<{file, startLine, endLine}>` (up to 100) alongside the single-range form. Output renders one subsection per range so the agent can map results back to specific diff hunks. PR review with N hunks goes from N+1 calls to 1. All three paths are additive — the legacy single-input shapes are preserved verbatim. Backward-compat is locked in by the existing tests plus 19 new ones (8 for node multi/expansions, 5 for bulk at-range, +6 from refactoring). Docs updated in CLAUDE.md, README.md, and the server-instructions playbook. Suite 1772/0/34. No schema migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…polish items Eight friction-tracker items addressed in parallel by sub-agents (2 Haiku, 1 Sonnet); reviewer caught one real correctness edge case (bucket overlap on degenerate fresh-index shapes) plus two info items, all addressed in this commit. ## colbymchenry#21 — at_range cost-benefit JSDoc Doc-only update to src/mcp/tools/at-range.ts. Tool description and JSDoc now state "pays off most on dense files (100+ symbols) and multi-range bulk lookups; for tiny preview fetches on small files, raw `head -N` is comparable." No code change. ## colbymchenry#25 — blame surfaces rename detection inline src/git-utils.ts gains a new helper `getFileFollowEarliestTs` that runs `git log --follow --format=%aI -- <path>` (5 s timeout, ISO timestamp). src/mcp/tools/blame.ts compares the rename-aware oldest commit against the line-range-only timeline's oldest. When `--follow` reaches further back, appends a warning that the timeline truncated at the file's rename and points at `git log --follow <file>` for the full history. Edge cases handled: not-a-git-repo, timeout, empty timeline. Test approach uses `vi.spyOn` to mock pre-rename history because real fixtures are unreliable: modern git's `git log -L` follows renames via content-similarity tracking, making a deterministic black-box rename-fixture impossible. ## colbymchenry#26 — hotspots split into 3 mutually-exclusive categories src/db/queries-history.ts gains `getCategorizedHotspots` and src/mcp/tools/hotspots.ts gains a `category: 'risk' | 'maintenance' | 'brittle' | 'all'` arg (default 'risk' for backward compat). Thresholds use 75/25 percentile rather than hardcoded magic numbers — they adapt as the project grows. Buckets: - risk : high centrality AND high churn — where bugs hide - maintenance : high churn AND not-high centrality — refactor target - brittle : high centrality AND not-high churn — stable critical Reviewer-caught correctness bug: original filters used `<= low` for the secondary axis, which collapsed buckets when high == low (fresh index where centrality is uniformly zero, or repos where every file has identical churn). A file at the threshold could appear in both risk AND maintenance simultaneously. Fixed by switching maintenance and brittle to `< highThreshold`, making them strictly disjoint even on degenerate inputs. Also added a more-hint when any section hit the per-category cap (the existing `category='risk'` path already had this; `category='all'` now mirrors). New `__tests__/hotspots.test.ts` (4 cases) covers all-section rendering, single-category dispatch, and the backward-compat default path. ## colbymchenry#27 — search centrality:high differentiates "hook hasn't run" vs "no node met the threshold" src/mcp/tools/search.ts. `probeCentralityFilterCulprit` now runs a sub-millisecond probe `SELECT 1 FROM nodes WHERE centrality IS NOT NULL LIMIT 1` (uses the existing `idx_nodes_centrality` index). When ALL nodes have NULL centrality the agent gets the existing "centrality hook hasn't run — run codegraph index" hint. When SOME nodes have centrality but none cleared the filter, a different hint suggests relaxing the threshold. Two-case hint instead of one. ## colbymchenry#28 — search exact promotes multi-token-query warning to pre-result src/mcp/tools/search.ts. `buildConceptHintIfNeeded` now returns `{ preResult, postResult }` instead of a single string. When the query splits into 2+ space-separated non-qualified tokens (likely "multiple symbol names"), the agent gets a leading hint to call search per name OR use codegraph_explore — BEFORE the result list rather than buried after. Field-qualified tokens (`kind:function lang:typescript`) and single-free-token queries are unchanged. ## colbymchenry#33 — callers on "constructor" with no callers explains the instantiates-edge model src/mcp/tools/callers.ts. When the resolved symbol is `kind=method && name=constructor` AND the callers list is empty, appends a one-line note: "constructors are invoked via `new ClassName(...)`, which graph-edges as `instantiates` on the parent class. To find construction sites, run codegraph_callers on the enclosing class instead of 'constructor'." Both the multi-match and single-match paths got the note (guarded by the same kind+name+empty check). Constructors WITH callers (e.g. via super()) render normally — no false positive. ## colbymchenry#35 — node.symbol tie-break prefers non-fixture, then centrality src/mcp/tools/symbol-resolver.ts. `pickFromMultipleExactMatches` now filters out fixture paths first (falls back to all-fixture when that's all that matches), then sorts by centrality DESC (NULL → 0). A `helper` symbol that exists in both `src/core.ts` and `docs/test-beds/fixture.ts` resolves to `src/core.ts` as the displayed primary. Tier #3 (last_touched_ts) deferred — data not in the resolver's existing query. Reviewer-caught DRY issue: the fixture-path regex set was duplicated between symbol-resolver.ts and dead-code.ts (introduced by parallel sub-agents on the same brief). Extracted to `isFixturePath` in src/mcp/tools/shared.ts; both consumers now import the single source. ## colbymchenry#49 — getSummaryCoverage denominator threading (3 call sites) src/bin/codegraph.ts (lines 348, 1461) + src/mcp/tools/status.ts (line 440). All three pass `SUMMARIZABLE_KINDS` to getSummaryCoverage to match the canonical pattern from the previously-fixed _search-intent.ts:218. Without this, the helper falls back to COUNT(*) which inflates the denominator with parameters / imports / file nodes — its own JSDoc explicitly warns against this. ## Test re-additions Sub-agent #1 deleted its own test files for colbymchenry#33 and colbymchenry#35 (a brief misread — "DO NOT commit" was interpreted as "DO NOT leave tests in repo"). Re-added as `__tests__/mcp-callers-constructor-and-fixture-tiebreak.test.ts` covering: constructor-with-no-callers note appears, non-constructor-method note absent, name-collision picks non-fixture primary. ## Verification - 15 modified files + 2 new test files, +619/-55 - npm run typecheck — clean - 74/74 tests pass across 9 LLM/search/hotspots-related test files - New exports: `isFixturePath` (shared.ts), `getCategorizedHotspots` (queries-history.ts), `getFileFollowEarliestTs` (git-utils.ts) — all have concrete in-tree callers in the same diff per reviewer-memo item #7 Reviewer pass with .claude/reviewer-memo.md prepended caught: - (request_changes) bucket-exclusivity edge case → fixed - (info) isFixturePath duplication → deduped - (info) category='all' missing more-hint → added Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New module src/llm/local-role-classifier.ts. Wraps @huggingface/transformers `zero-shot-classification` pipeline to predict the codegraph 7-class role taxonomy (business_logic / api_endpoint / util / data_model / framework_glue / test_helper / unknown) without a chat round-trip. Default model: Xenova/distilbert-base-uncased-mnli (~80MB ONNX, zero-shot via NLI). Same patterns as the other Stage-3+ in-process clients (RerankerClient, QaClient): module-level pipeline cache keyed by `<model>::<dtype>`, eviction on rejection, LlmEndpointError on missing optional dep, empty-input short-circuit. Public API: - LocalRoleClassifier(cfg) - classify(text) → { label, score, scores[] } - classifyBatch(texts) → RoleClassification[] - isReachable / listModels - _clearRoleClassifierCache (test export) Output normalization handles flat {labels, scores} AND nested [{labels, scores}] shapes; out-of-taxonomy labels mapped to 'unknown'. 26 vitest cases across 9 describe blocks. Replaces the chat-mediated codegraph_role taxonomy path when configured — token cost ~200 tokens/call → 0. Integration into the classify pipeline lands separately (tracked as Stage 7 #3 follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires the in-process zero-shot role classifier into the classifier phase as replace-mode. When localRoleLlm is configured: 1. Per batch, call LocalRoleClassifier.classifyBatch on the texts. 2. Stamp results with localModelLabel (cache-attribution discipline from the Stage 5 #B reviewer fix). 3. On any local failure (transient model issue, abort, length mismatch), fall through to the existing chat-batch path. - Added localRoleLlm to LlmEndpointConfig + ResolvedLlm + user-facing CodeGraphConfig.llm. - New resolveLocalRole helper in provider.ts. - classifyAllRoles accepts localRoleClassifier + localModelLabel. - New classifierTryLocalBatch helper in classifier.ts; threads through ClassifierRunCtx. - runClassifyPhase instantiates LocalRoleClassifier lazily when resolved.localRoleLlm is set. Confidence floor: LOCAL_ROLE_CONFIDENCE_FLOOR = 0.25; below that the label is stamped as 'unknown' rather than the model's top guess. Tunable later if the empirical taxonomy distribution warrants. Off by default. When configured, classification token cost drops ~200/symbol → 0/symbol (entire chat path is bypassed for the common case). Suite remains 2374/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Stage 7 #2/#3 follow-ups) Two new local-NLI surfaces. Both use the existing bart-large-mnli zero-shot classifier; both keep the heuristic / rule-based path on the happy case and only consult the model when the deterministic rules can't reach a confident answer. Token cost vs the chat backend: ~0 per call. # C — commit-intent NLI fallback `classifyCommitMessageWithFallback(message, classifier?)` runs the existing heuristic first; when it returns `'unknown'` (~30% of commits in messy histories: "wip", "stuff", "more"), feeds the subject into a 7-hypothesis NLI classifier: feat → "a new feature or capability" fix → "a bug fix or error correction" refactor → "code restructuring or cleanup without behaviour change" perf → "a performance improvement or optimisation" test → "tests, specs, or test infrastructure" docs → "documentation, comments, or readme" chore → "dependency bumps, build config, or routine chores" Confidence floor 0.45 — below that we keep `'unknown'` rather than ascribe a low-confidence label (avoids polluting commit_intents with junk on truly-opaque commits). Wired into the cochange index hook: classifier is constructed from `localRoleLlm` config when present, otherwise the hook stays heuristic-only and behaviour- identical to before. The original sync `classifyCommitMessage` is unchanged — existing callers + tests continue to use the heuristic path. # D — new structured change-kind classifier `classifyChangeKind({classifier, beforeBody, afterBody, name, kind})` in `src/llm/change-kind.ts`. Distinct from the existing generative `summarizeChange` (which produces prose via the chat backend): this produces a STRUCTURED label suitable for grouping / filtering / metrics: addition | removal | modification | refactor | signature_change | behavioral_change | doc_only | unknown Rule-based dispatch handles the trivial cases (empty before → addition, empty after → removal, identical → unknown) without an NLI call. The remaining cases consult bart-large-mnli with 4 prose hypotheses against the diff. Same 0.45 confidence floor; sub-threshold tops fall through to `'modification'`. # Supporting refactors - `LocalRoleClassifier.classifyLabels(text, labels)` — new method that runs zero-shot against an arbitrary caller-supplied label set (vs the existing `classify()` which is hardcoded to the 7-class role taxonomy). The classifier wraps the same pipeline, so both surfaces share one model load. - `IndexHook` afterIndexAll/afterSync are already async-aware in the registry; the cochange hook had been synchronous. Made `applyResults`/`applyFullRescan`/`applyIncremental`/`refresh` async so the NLI fallback can await per-commit. Behaviour unchanged when `localRoleLlm` is unset. # Tests + `__tests__/commit-intent-fallback.test.ts` — 13 cases covering short-circuit, NLI dispatch, low-confidence floor, error degradation, all 7 intent labels. + `__tests__/change-kind.test.ts` — 12 cases covering rule-based dispatch, NLI dispatch, low-confidence floor, error paths. + `__tests__/local-role-classifier.test.ts` — 7 new cases for `classifyLabels` (custom label set, short-circuit, abort, unknown labels NOT coerced to ROLE_LABELS). # Why now Confirmed by web research (HF API + WebSearch agent) that the in-process transformers.js model surface is "five small specialized models is the realistic ceiling": no code-tuned summarizer or embedder ships in transformers.js-loadable form today. The biggest remaining win for the existing surface was unifying everything that used to chat-call onto the one bart-large-mnli load — both C and D fit that shape.

andreinknv merged commit 9927549 into wip-stress-base May 1, 2026

andreinknv deleted the pr1-perf-classifier-batching branch May 1, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(llm): batch classifier + dead-code, bump concurrency 2->8 [WIP 1/5]#3

perf(llm): batch classifier + dead-code, bump concurrency 2->8 [WIP 1/5]#3
andreinknv merged 1 commit into
wip-stress-basefrom
pr1-perf-classifier-batching

andreinknv commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andreinknv commented May 1, 2026

Commits

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant