perf+fix(llm): preserve LLM caches across --force [WIP 2/5] by andreinknv · Pull Request #4 · andreinknv/codegraph

andreinknv · 2026-05-01T11:21:15Z

Stress-test fixes — PR 2 of 5 (stacked). Base: pr1-perf-classifier-batching.

Commits

bd26945 perf: preserve LLM caches across --force, content-hash fallback, gating bump
f1e2bb7 fix: clearCoChanges no longer wipes LLM caches; summarizer done++ once

Summary

Goal: turn a no-op --force re-index from a full ~540K-token LLM run into a content-hash cache hit.

clearStructural() preserves LLM caches — --force previously called cg.clear() which DELETEd everything, including symbol_summaries / symbol_embeddings / directory_summaries. New path drops only structural data (nodes, edges, files, unresolved_refs, co_changes, plus node-id-keyed analyses). FK on symbol_summaries→nodes stays ON DELETE CASCADE for incremental sync; bulk path toggles PRAGMA foreign_keys OFF/ON around the deletes.
Content-hash fallback in summarizer — even with summaries surviving --force, lookups were keyed by node_id. When a symbol moves files / shifts lines its node_id changes; the new fallback re-attaches by content hash.
Bug fix: clearCoChanges() was wiping LLM caches — despite its name. The cochange post-index hook called it on every full rescan, so every --force post-bd26945 was nuking the very caches clearStructural had just preserved. Fix: only deletes co_changes + resets commit_count.
Bug fix: done++ double-increment on cache hits in summarizer — try { if (cacheHit) { done++; continue; } ... } finally { done++; } was counting cache hits twice.

Test plan

Suite green at 1113 / 13 skip / 0 fail.
Manual --force re-index shows done == total (no over-100% progress).

WIP draft.

…ng bump Layered on top of the perf-batch commit (9927549). Four changes that collectively turn a no-op `--force` re-index from a full ~540K-token LLM run into a content-hash cache hit. ## clearStructural — preserve LLM caches across --force Before: `--force` called `cg.clear()` which DELETEd everything, including symbol_summaries / symbol_embeddings / directory_summaries. Even though the summarizer already had a content_hash check, the rows were gone before it could short-circuit. Adds `CodeGraph.clearStructural()` and `QueryBuilder.clearStructural()` that drop only structural data: nodes, edges, files, unresolved_refs, co_changes, plus node-id-keyed analyses (symbol_issues, config_refs, sql_refs, node_coverage, code_health_findings) that would otherwise reference dead nodes. The three LLM-derived caches survive. The schema-level FK on symbol_summaries→nodes is left as ON DELETE CASCADE. That's correct for incremental sync (a deleted symbol should drop its cached summary). The bulk path suppresses it at runtime by toggling PRAGMA foreign_keys OFF/ON around the deletes — the pragma is connection-level and only takes effect outside an active transaction, hence the order: pragma → transaction → pragma. With FKs off the explicit DELETEs of the structural-attribution tables (symbol_issues, config_refs, sql_refs) are necessary; without them those rows would survive as orphans pointing at gone nodes. `--force` in `src/bin/codegraph.ts` now calls `clearStructural()` in both quiet and non-quiet branches, with the user-facing message updated to "Cleared existing index (LLM caches preserved)". The original `clear()` is preserved with its full-wipe semantics (now implemented as clearStructural + DELETEs of the three LLM tables) for the explicit nuke case. ## Content-hash fallback in summarizer Even with summaries surviving --force, the cache lookup was keyed by node_id. When a symbol moves files or shifts lines its node_id changes, so the cache missed even when the body was unchanged. `getSummaryByContentHash(hash, model)` looks up by content_hash + model with `ORDER BY generated_at DESC LIMIT 1` (deterministic tie- break on collisions, semantically irrelevant since identical bodies yield identical summaries). The summarizer falls back to this lookup when the node_id miss happens — on hit, it copies the cached summary text onto a fresh row keyed by the new node_id (counted as a cacheHit, no LLM call). A new index `idx_summaries_content_hash` on (content_hash, model) backs this lookup. Added to schema.sql for fresh DBs and via migration 022 (`022-add-content-hash-index.ts`) for existing ones. ## Trivial-symbol gating `MIN_BODY_LINES` in summarizer.ts bumped from 3 to 4. Skips typical setters and 1-statement forwarders that the name + signature already explain. Existing users with codegraph indexes will see a one-time drop in summary coverage on next sync — symbols spanning exactly 3 lines will not be re-summarized. The structural index is unaffected. ## Tests - `__tests__/migrations-022.test.ts` (NEW, 4 tests): index presence on fresh + upgrade DB, clearStructural preserving rows even with CASCADE FK in place, sync-path single-symbol delete still cascading correctly (proves FKs were re-enabled), full clear() still wiping summaries. - `__tests__/llm-tiers.test.ts` (+2 tests): clearStructural preserves symbol_summaries so re-index hits cache (asserts second-pass chat calls < first-pass), content-hash fallback reuses summary when node_id changes (direct queries-level assertions). - Schema-version assertions in `foundation.test.ts:308` and `pr19-improvements.test.ts:302` updated 21 → 22. Suite: 1087 / 13 skip / 0 fail (was 1081). ## Reviewer trail Two passes. Pass 1: REQUEST_CHANGES with 4 findings. - (1) clearStructural was orphaning symbol_issues / config_refs / sql_refs because the FK toggle suppressed CASCADE for them too; added explicit DELETEs. - (2) Migration filename `022-drop-summary-cascade.ts` was misleading (it adds an index, doesn't drop cascade); renamed to `022-add-content-hash-index.ts`, updated import + registration. - (4) `getSummaryByContentHash` LIMIT 1 was non-deterministic on hash collisions; added `ORDER BY generated_at DESC`. - (3) MIN_BODY_LINES release-note ask: addressed inline above rather than in a separate CHANGELOG file. Pass 2: APPROVE, one info finding noting that DESC vs ASC tiebreak is arbitrary (documented, no action needed). ## Expected impact Codegraph self-repo on second `--force` re-index with no source changes: the per-symbol summary phase short-circuits via content_hash cache hits. Token consumption drops from ~540K to a small residual for any new content. Combined with the perf-batch commit, the 30× classifier latency win and ~3× input token savings now layer onto near-zero summary regeneration cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two real bugs surfaced by end-to-end measurement of the cache- preservation work in bd26945. Both pre-existed bd26945 (and df96c6c) but were masked because the old --force path (cg.clear()) wiped the LLM caches anyway. They became visible when clearStructural() started preserving them. ## Bug 1 — clearCoChanges() was wiping symbol_summaries / embeddings / dir summaries src/db/queries.ts:2677. Despite its name, the function was deleting the three LLM-derived caches (`symbol_summaries`, `symbol_embeddings`, `directory_summaries`) along with the actual co-change history. The cochange post-index hook (src/index-hooks/cochange.ts:46) calls clearCoChanges on every full rescan with fullRescan: true — which is what afterIndexAll always passes. So every --force re-index post- bd26945 was nuking the very caches clearStructural had just preserved moments earlier. There is no logical reason for the LLM tables to be cleared by a co- change rescan: summaries are keyed by content_hash of the symbol body, not by commit history. The wipe was probably defensive overcaution from when the columns lived inline on symbol_summaries (pre-016). Fix: clearCoChanges now only does - DELETE FROM co_changes - UPDATE files SET commit_count = 0 Doc-comment notes the historical context and why we removed the LLM-table deletes. ## Bug 2 — done counter double-incremented on cache hits in summarizer src/llm/summarizer.ts:181, 196. Each worker iteration was structured try { if (cacheHit) { cacheHits++; done++; ...; continue; } ... } finally { done++; options.onProgress?.(done, total); } JavaScript `continue` still triggers the `finally` clause, so cache- hit paths bumped done twice — once explicitly, once in finally. Surfaced as `679/672 (101%)` and similar in production progress output during the end-to-end measurement run. Fix: removed the explicit `done++` from both cache-hit paths (node-id match and content-hash fallback). Let `finally` handle it. Doc- comments explain the JS semantic and the historical masking. Verified by inspection: every code path through the worker body now ends in either (a) hitting the finally block (normal path, errors, falsy body, cache hits via continue) or (b) an explicit `return` for abort signals (which deliberately skips done++ — that's correct behavior). ## Test strengthening `__tests__/llm-tiers.test.ts` — the existing test "clearStructural preserves symbol_summaries..." passed with bug 1 in place because its only assertion was `incremental < callsAfterFirst`, which was too loose: post-clearCoChanges-wipe, the second pass still needed FEWER calls than the first only if role classification was counted as a meaningful baseline. Strengthened to read symbol_summaries row count directly via the queries handle and assert STRICT equality: - Before clearStructural row count == after clearStructural row count - Before clearStructural row count == after second indexAll row count Renamed the test to mention the cochange hook explicitly so future readers know what regression it guards. Kept the looser cumulative- chat-calls assertion as a backstop. Suite: 1087 / 13 skip / 0 fail. ## Real-world validation Phase A (first --force, fresh DB) took ~27 minutes to populate 671 summaries from scratch via claude-bridge → Haiku. Phase C (re-running --force after fix, with 101 surviving from a truncated Phase B): the summary phase progress JUMPED from 5/672 in 22s (the Phase A baseline) to 109/672 in 22s (Phase C with 101 cached). Direct evidence of the cache short-circuit working — ~20× speedup for the cached portion in real measurement, scaling to ~10× overall when the cache covers all 672 symbols. ## Reviewer trail One pass. APPROVE with one info-level finding (clearStructural also deletes co_changes directly, so the cochange hook's clearCoChanges double-clears it — idempotent, harmless, not addressed in this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

andreinknv · 2026-05-01T11:31:08Z

Stack collapsed into wip-stress-base (now at 56fac32). Commits landed via the fast-forward of wip-stress-base from df96c6c to 56fac32; #3 shows as merged, #4-#7 are closed because their tips are already ancestors of the base. Branch preserved for reference.

… server-config flags Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed: #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe) #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*` #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)` colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed` CLI/MCP alignment (5/32 → 33+/35): - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar, biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs, module-summary, role, coverage-query, pending-summaries, save-summaries, review-context - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync, codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit, codegraph_unlock, codegraph_affected MCP server-level operator config (`codegraph serve --mcp`): - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing) - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model / --llm-api-key (operator LLM config; per-project config wins on conflict) - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders Architectural cleanups: - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces growing string-comparison chain in execute()) - `withAllowStale` registry injection only on tools that DO see the gate - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts` - `LlmClient.isEmbeddingReachable` for split-provider correctness - SyncResult `lockContention` flag → handleSync emits distinct retryable message - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force) - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs) Migrations: 023-files-is-test.ts — add `files.is_test` 024-build-context-refs.ts — add `build_context_refs` table Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes: - JSONC URL strip via state machine (was eating `https://` tails) - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent) - Dynamic imports also match template_string nodes - process.exit deferred until after finally cleanup in runViaMCP - --same-language / --different-language mutual exclusion guard - help-tools CLI bypasses isInitialized (works without a project) - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks) - handleAffected errors instead of silently dropping unsupported glob filters - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ion A #5) Closes the post-refactor backlog item #5: static large_method flags evergreen-large symbols indistinguishably from actively-growing ones; recently_grew surfaces the latter as the actionable refactor target. Architecture: - Migration 027 adds node_loc_history(node_id, indexed_ts, loc) — one row per analyseProject pass that touched the symbol. The per-file content-hash short-circuit means unchanged symbols are not re-snapshotted; the compound index (node_id, indexed_ts DESC) makes "previous snapshot for this node" a sub-millisecond seek. - src/db/queries-loc-history.ts exports recordLocSnapshots (batch insert with the canonical FK-safe SELECT WHERE EXISTS / ON CONFLICT pattern from queries-summaries) and getPriorLocSnapshots (per-node seek; one prepared stmt, N steps). - biomarkers/index.ts: at analyseProject start, capture nowMs once; per-file batch-fetch prior snapshots, evaluate rules with prior + nowMs threaded into RuleContext, persist a fresh snapshot for every symbol whose metrics we computed. - biomarkers/engine.ts: new evaluateRecentlyGrew rule. Triggers when prior_loc >= 20 AND current_loc > prior_loc * 1.5 AND (now - prior_ts) <= 30d. Severity ladder: 1.5–2x info, 2–3x warning, ≥3x error. Metric is round(ratio * 100) so the ranked-mode sort orders by growth percent. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 27 (foundation + pr19-improvements). - #5 node_loc_history added to BOTH the migration AND schema.sql so fresh installs initialize correctly. - #7 dead-export check: dropped recordLocSnapshot/getPriorLocSnapshot singulars after the reviewer flagged YAGNI; only the batch variants are exported (and called). 9 engine-level tests for the rule logic. Full suite 1393/34/0 (+9 from prior). Persistence path is exercised implicitly by every existing biomarker test (silent green confirms the schema + writes are sound on real fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

First half of the trace-logging arc that backs the viewer's Agent-trace tab. This commit lands the storage + logger; the MCP server hookup (commit B) and the viewer endpoints (commit C) follow. - Migration 028 + schema.sql: mcp_sessions(id, started_ts, last_activity_ts, tool_count) + mcp_tool_calls(session_id, step, ts, tool_name, args_json, result_summary, duration_ms). FK CASCADE from calls → sessions; idx_mcp_tool_calls_ts gives the retention sweep an O(log n) scan. - src/db/queries-trace.ts: insertSession, appendToolCall (transaction that bumps the parent session's tool_count atomically), pruneToolCalls (cap rows + GC orphan sessions), recentSessions, callsForSession. - src/trace/logger.ts: TraceLogger class. Constructor inserts a session row; log() appends + prunes-every-N. All persistence is best-effort (DB write failures log at debug, never throw — trace must not lose tool results). Args + result-summary are truncated at 2000 chars. - generateSessionId(): unix-seconds prefix + 6 random hex chars, sortable by time. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 28 (foundation + pr19-improvements). - #5 mcp_sessions + mcp_tool_calls in BOTH the migration AND schema.sql. 10 logger round-trip tests. Full suite 1422/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the cyclomatic + maxNesting "—" gap in the viewer's right pane. Previously these fields only had a value when a biomarker finding fired (via the metric on `code_health_findings`); clean symbols read null. Now every analysable symbol has a row in node_metrics with all six engine-computed metrics, overwritten on each pass. Architecture: - Migration 029 + node_metrics table: PK on node_id, overwritten per pass via a single FK-safe upsert. Distinct from code_health_findings (threshold-only) and node_loc_history (LoC time-series for recently_grew). Schema in BOTH the migration AND schema.sql per memo #5. - queries-metrics.ts: upsertNodeMetricsBatch (one transaction, WHERE EXISTS guard), getNodeMetrics (single-row reader). - biomarkers/index.ts: evaluateAnalysableNodes returns a third map of metrics-by-node; analyseSingleFile persists to the new table after the findings persist. - BIOMARKER_CACHE_KEY bumped v1 → v2 so existing indexes self-heal: without the bump, the per-file content-hash cache would short-circuit unchanged files and leave node_metrics empty until users hit `--force` themselves. - Viewer server: buildMetricsBlock now reads from node_metrics first; falls back to the per-finding metric on older indexes. paramCount also surfaced (not yet displayed in the UI). Smoke-tested on the codegraph self-index after re-index: extractFromSource now reports cyclomatic 3 / maxNesting 1 / paramCount 3 — matching what the engine computes during biomarker analysis. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 29. - #5 node_metrics in BOTH migration AND schema.sql. Updated mcp-reindex test (cache key bump). Suite 1425/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ue filter When an agent passes a filter that constrains every result to one value (search `kind:X`, biomarkers `biomarker:Y`), the corresponding column is the same on every row — pure noise. Drop the per-row column and put the value once in the section header. - search.ts: drops `(${kind})` per row when `kind` filter is set, annotates header with `(kind=X)`. Saves ~15 chars/row. - biomarkers.ts (ranked mode): drops the `Biomarker` table column (header + separator + per-row cell) when `biomarker` filter is set, annotates header with the biomarker name. Saves ~15 chars/row. Backlog #4 narrowed: the literal "audit every tool" sweep was moot — most tools (callers/callees/biomarkers without filter) genuinely return mixed kinds. The narrow filter-implied-column case is the clean win that survives premise verification. 2 new tests in __tests__/implied-keys.test.ts. Suite 1444/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olbymchenry#8) Promotes the resolver numeric `metadata.confidence` to a categorical, queryable column on `edges`. `codegraph_callers`, `_callees`, `_impact` now annotate each row with `*(INFERRED)*` when the edge is not concrete, and accept a new `minConfidence` arg to drop edges below a chosen level — useful for refactor-impact analysis where false-positive reach inflates the blast radius. Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy / structural / extractor-direct edges need no producer change. Index on the column for filter-down queries. Idempotent ADD COLUMN with table-exists guard following the convention from migrations 020 / 021 / 023 / 030. Producer: `ReferenceResolver.createEdges` calls a new free `classifyConfidence(resolvedBy, score)` helper: - import / qualified-name / file-path / exact-match → EXTRACTED - framework with score >= 0.85 → EXTRACTED, else INFERRED - instance-method / fuzzy → INFERRED AMBIGUOUS is not stamped from the resolver path in v1 — needs tie-tracking from the candidate picker. Tracked as B3. Consumers (`result-formatters.ts`): - `formatConfidence(edge)` — markdown italic suffix (empty for EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise). - `parseMinConfidence(raw)` — returns level / null / errorResult. - `filterByConfidence(rows, min)` — drop below threshold. - `CONFIDENCE_RANK` — numeric ordering for inline filter loops. Wired into all four code paths in `callers.ts` (collectCallers, collectCallersForSource, collectTypeUsers, formatGroupedCallers), both paths in `callees.ts`, and `impact.ts` with a `filterImpactByConfidence` BFS that prunes nodes unreachable along the surviving edge set. Type cleanup: `Edge.confidence` added with JSDoc; the long-dead `Edge.provenance` field also got a JSDoc note pointing at the SCIP-integration-reservation explanation in memory. Reviewer-memo #4 catch (schema-version test forgetfulness): noticed `__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)` even though migrations 030 and 031 had shipped. Catch-up to 32 in this commit covers both the missed bumps and the new migration. Reviewer: APPROVE, three info-only findings (type-user display gap, framework-rule calibration table doc, catch of the recurring pr19 pattern). Suite: 1555 / 34 / 0 (+18 new edge-confidence tests). Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression budget vs 091e935 baseline). Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11 trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Total complex_method warnings 15 → 10. Pattern: replace N-arm switch / if-cascade with `Record<string, Handler>` (or `Map`) dispatch table. Each per-key handler becomes a small named function; the orchestrator drops to a single map lookup + fallback. - src/search/query-parser.ts (classifyQueryToken, was 19): 10-arm switch over the colon-key (`kind:` / `lang:` / `path:` etc.) replaced by a TOKEN_HANDLERS Map. Each handler pushes either to `out.{field}` or to `textParts` on invalid input. Aliases (`lang`/`language`, `sig`/`signature`) point at the same handler reference. Unknown keys still fall through to `textParts.push(tok)`, and invalid enum values still preserve the original token so the user's typo doesn't disappear silently. - src/extraction/graphql-extractor.ts (visitDefinition, was 15): 6-arm switch on `specific.type` replaced by TYPE_DEFINITION_EMITTERS Record. The 6 emit methods (emitObject / emitInterface / emitInputObject / emitEnum / emitUnion / emitScalar) had their `private` modifier dropped and `/** @internal */` JSDoc added so the module-level dispatch table can call them. GraphqlExtractor is module-internal (not re-exported from the package index; only `.extract()` is consumed externally), so the widened TS structural surface has no in-tree or external consumer. - src/extraction/languages/rescript.ts (visitReScriptNode, was 19): 9-arm if-cascade on `node.type` replaced by RESCRIPT_NODE_HANDLERS Record. Five named helper functions for the cases with non-trivial bodies, four inline arrow handlers for the one-liner cases. Dispatcher returns `handler !== undefined`, preserving the original's "true on handled, false on fallthrough" contract. - src/mcp/tools/compare.ts (fmtFileSection, was 17): Three helper extractions: * fileStatusSuffix(f) — collapses the `(new file)/(deleted)/''` ternary chain * appendSymbolList(out, label, symbols) — handles the added/removed/modified triplet (no-op when empty) * appendFindingsDelta(out, fd) — handles the `+ findings introduced` / `− cleared` / `_carried over_` block Plus one behaviour fix flagged by reviewer: the noFindingsChanges gate now also excludes carried-only files so they appear in output (previously a file with ONLY carried findings and no structural changes was silently suppressed; the carried-finding count is useful context for the agent). - src/viewer/server.ts (handleRequest, was 18): 12-route if-cascade replaced by GET_ROUTES table of `{ match, handle }` entries. `match` returns `RegExpExecArray | true | null`; dispatcher walks in order, stopping at first non-null match. Pulled out `respondWithIdLookup` for the symbol/source pair (decode-or-400 + lookup-or-404 + 200 chain) and `sendIndexHtml` for the / and /index.html pair. POST /api/ask and the non-GET method-not-allowed path are unchanged. Route ordering preserved (exact /api/findings precedes the regex /api/findings/:biomarker). Reviewer pass approved after addressing the carried-only-gate behaviour change in #4. Suite 1639/34/0. Total warnings 15 → 10.

Post-refactor backlog #4. Replaces the prior "all three of large/complex/nested fire at warning+" gate with a true composite score that combines size, branchiness, nesting, and conditional density. The old metric was just LOC, so a 200-LOC flat dispatch table fired the same as a 200-LOC method with deep nested loops and dense boolean expressions — exactly the discrimination the brain_method smell is supposed to provide. Formula: score = (loc / 100) × (cyc / 10) × max(1, nest / 3) × max(1, condOps / 4) Each factor is normalised at 1.0 around the middle of its single- metric biomarker's threshold band; max(1, ...) on nesting / cond prevents low-density values from depressing the LOC×CYC product (LOC and CYC stay the dominant signals). Worked examples on a 200-LOC method: flat dispatch (cyc=15, nest=2, cond=3): 2.0 × 1.5 × 1.0 × 1.0 = 3.0 → no finding (below info gate) nested-loop monstrosity (cyc=25, nest=5, cond=8): 2.0 × 2.5 × 1.67 × 2.0 = 16.7 → warning Severity ladder by score: 5 info, 10 warning, 20 error. Gates: LOC < 50 OR cyclomatic < 8 → not even considered. Tiny helpers can't be brain methods even when dense; size + complexity together are the smell. Future work: include a shared-mutable-state proxy (let bindings, cross-block reassignments) once the AST analysis exists — would cover methods that look tractable on the surface but have hidden coupling. Tests rewritten: dropped the old "all 3 prereqs"/"only one prereq" shape; added high-composite-score positive, flat-vs-nested discrimination at same LOC, LOC-gate negative, cyclomatic-gate negative. Suite 1663/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When every row in a callers / callees / impact result carries the same non-default confidence (INFERRED or AMBIGUOUS uniformly across all rows — the common shape resolver output takes when ambiguity hits), hoist a single ` — all *INFERRED*` suffix into the header instead of repeating ` *(INFERRED)*` on every row. Live measurement on `codegraph callers getFileDependents` (5 rows, all INFERRED): 589 → 543 bytes (-7.8%). Larger savings on bigger results — for a 30-row uniform callers output the savings scale to ~360 bytes (~12%). Mixed-confidence outputs (EXTRACTED + INFERRED, or AMBIGUOUS + INFERRED) keep their per-row markers so individual row trust is still legible. Single-row outputs also keep the per-row marker since there's no consolidation benefit. The detection lives in a new `detectUniformConfidence` helper exported from result-formatters.ts so other tabular renderers (e.g. the grouped-callers / grouped-callees paths in callers.ts / callees.ts) can adopt the same pattern incrementally — same optimization shape as #4 "strip implied keys" already shipped for kind columns. +5 focused tests covering the uniform / mixed / EXTRACTED-mixed / single-row / direct-API cases. Suite 1685/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rs rollups Round-trip-reduction item #4 — fold the top-N from `codegraph_hotspots` and `codegraph_biomarkers mode=ranked` into the status response so the onboarding pattern "what's interesting in this repo?" (status + hotspots + biomarkers + risk_review + digest = 4-5 calls) collapses to one call. Added flags (both default 0 → suppressed → byte-identical legacy output): - `topHotspots: number` — top-N by composite risk (centrality × churn). Skips silently on a non-git repo or empty hotspots table. - `topBiomarkers: number` — top-N at warning+ severity, worst-first. Each clamps to [1, 30]; negative / NaN values collapse to suppressed. Both rollup sections render AFTER the readiness/server sections so the agent's eye-scan still hits freshness/health banners first. Skipped `summaryCoverage` flag from the original spec — the existing Feature Readiness section already shows summary count + percentage; a dedicated flag would be redundant. Suite 1778/0/34 (+6 new tests). No schema migration. Backwards-compat locked in by the existing `mcp-status.test.ts` regression guards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rieval Subtask colbymchenry#10.1 of the embedding-features arc Stage 2. Adds a `grain TEXT NOT NULL DEFAULT 'symbol'` column to `symbol_embeddings` so the table can carry per-symbol embeddings (existing) and per-file aggregated embeddings (forthcoming subtask colbymchenry#10.2). File-grain rows will reuse the file's nodes(id) since `getEmbeddableNodes` already excludes file kinds from per-symbol embedding — no PK collision. Schema change is column-additive (ALTER TABLE ADD COLUMN); no backfill needed because the default 'symbol' covers every existing row. schema.sql updated alongside the migration per recurring scrutiny pattern #5. Schema-version test assertions bumped 42 → 43 per recurring scrutiny pattern #4. Suite remains 2100/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Subtask #4.1 of the embedding-features arc Stage 3. New module src/llm/reranker-client.ts wraps @huggingface/transformers (already installed by Stage 1's in-process embedding cutover) and runs a cross-encoder relevance scorer over (query, candidate) pairs. Public API mirrors LocalEmbeddingClient: - new RerankerClient(cfg: RerankerProviderConfig | null) - isConfigured: boolean - rerank(query, candidates: string[]): Promise<number[]> - scoreOne(query, candidate): Promise<number> - isReachable() / listModels() — same any-up semantics Default model: Xenova/ms-marco-MiniLM-L-6-v2 (~80MB ONNX, ~50 pairs/sec on Apple Silicon). Lazy model load + module-level pipeline cache keyed by `<model>::<dtype>` so process-wide first search pays the ~3s init cost once. Pipeline output normalisation: the text-classification pipeline returns either a flat [{score}, ...] or nested [[{score}], ...] depending on input shape; the wrapper handles both. Optional dep discipline: when `@huggingface/transformers` is missing, rerank() rejects with a clear LlmEndpointError pointing at the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Subtask #4.2. Adds optional `rerankerLlm` block to the user-facing CodeGraphConfig.llm shape (`provider: 'local'`, optional `model`, optional `dtype`) and surfaces it on the resolved LlmEndpointConfig + ResolvedLlm contracts. The new `resolveReranker(llm)` helper in provider.ts is a pure pass-through (no env-var override, no legacy field) since reranking is a Stage 3 feature that shipped after the legacy migration window. Default value: unset / null — operators opt in to the precision pass explicitly. When unset, no behavior change vs Stage 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Subtask #4.3. After the file-grain merge + query-expansion stages of the semantic-search pipeline, when `resolved.rerankerLlm` is set: fetch each candidate's name/signature/docstring/summary from SQLite in one statement, build the cross-encoder input text the same way the embedder does, call RerankerClient.rerank(query, texts), then reorder the semantic hits by the new scores. Replacing the cosine score with the reranker logit is safe because the downstream reciprocalRankFusion uses rank position only, not the raw score value. Returns the unmodified hits on any failure so the search path always degrades gracefully when the optional dep is missing or the model load throws. Wired through searchHybrid (called by codegraph_ask, codegraph_search mode='semantic', and codegraph_review_context's RAG retrieval). Default off (rerankerLlm = null in config), so installs that haven't opted in see no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Subtask #4.4. 29 vitest cases across 8 describe blocks: - Configuration: isConfigured + listModels variants (5 tests). - Rerank input handling: empty candidates, null cfg, aborted signal. - Pipeline output normalization: flat / nested / mixed shapes, missing score, empty inner array. - Pipeline call shape: pair format + top_k option. - Model load caching: single load, _clearRerankerCache, different model + dtype keys. - scoreOne wrapper. - isReachable. All tests use vi.mock to intercept the @huggingface/transformers import — no live model load. Suite goes 2166 -> 2195 (+29). Two contract drift notes captured by the test agent: - scoreOne propagates LlmEndpointError when cfg is null (does NOT return NaN as the spec aspirated). Test pins the real behavior. - isReachable's load-throws branch isn't testable with hoisted vi.mock; documented as a skip with rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nums Bundles four schema-quality improvements gated by one new migration (054) and the matching schema.sql changes for fresh installs. 1. STRICT on every base table (32 of them). Schema-survey before the change confirmed the codebase already used only the 5 STRICT-legal types (TEXT/INTEGER/REAL/BLOB/ANY) — STRICT is just the missing keyword + a per-table rebuild for existing DBs. Catches a class of silent type-coercion bugs (a stringified node id silently stored in an INTEGER column would compare unequal in JOINs, break PK lookups, etc). 2. WITHOUT ROWID on 15 natural-PK tables: role_assignments, files, co_changes, project_metadata, symbol_issues, summary_refs, directory_summaries, node_coverage, code_health_findings, node_loc_history, mcp_sessions, mcp_macros, node_metrics, summary_priority_queue, commit_intents. Skips tables with rowid dependencies (nodes/summary_store/test_names back FTS5 content; embedding_store/embedding_refs/symbol_chunk_embeddings back vec0 mirrors and HNSW; parse_cache/mcp_tool_calls evict by oldest rowid). ~30% storage reduction on the affected tables and one B-tree fewer per PK lookup. 3. CHECK constraints on five enum-shaped columns: nodes.kind, nodes.role, edges.kind, edges.confidence, embedding_refs.grain. Same philosophy STRICT applies to types extends to values: a typo'd kind like `funtion` is rejected at write time instead of silently corrupting downstream histogram queries. Allowed lists mirror src/types.ts (NodeKind / EdgeKind), src/llm/classifier.ts (ROLE_LABELS), and the categorical mapping in src/db/migrations/032-edge-confidence.ts. 4. Promoted code_health_findings.metric INTEGER -> REAL. brain_method emits a fractional composite score (~12.3); the old INTEGER column silently stored it as REAL under loose typing but a STRICT INTEGER column rejects it. Migration 054 mechanics: - requiresFkDisable: true (FK toggling for the parent-rebuild loop) - Reads CREATE statements from sqlite_master, rebuilds via the standard SQLite pattern (CREATE _strict_tmp / INSERT / DROP / RENAME), recreates indexes + triggers from captured SQL. - Per-table fixup hook (`PER_TABLE_FIXUPS`) injects STRICT, the code_health_findings.metric REAL promotion, and the five CHECK constraints during the rebuild for existing DBs. - Idempotent: tables already STRICT are skipped (fresh DBs whose schema.sql already has it replay as no-ops). - Excludes virtual tables (FTS5 / RTree), vec0 mirrors, sqlite_*, and shadow-backing tables (`<virtual>_*`). - Pre-flight guard skips the entire migration when nodes lacks start_line/end_line — partial-schema test fixtures from migrations-015-016 / migrations-022 would otherwise trip the rtree triggers from migration 034. Production DBs always have these columns since v1. - Defensive cleanup: any temp table left from a partial rebuild failure is DROP IF EXISTS in the catch handler. - VACUUM at the end to reclaim space from the 32-table rebuild (silently skipped if the runner wraps us in a transaction). Side fix: src/extraction/index.ts:2041 was writing stats.mtimeMs (a fractional float from fs.statSync) into files.modified_at INTEGER. Pre-STRICT this silently coerced; STRICT rejects it and broke 366 tests. Floor the value at the writer. Reviewer-memo update: item #4 (schema-version test forgetfulness) is now self-updating — CURRENT_SCHEMA_VERSION is filename-derived and both foundation.test.ts + pr19-improvements.test.ts compare against the imported constant, not a literal. The memo entry's old "BOTH tests assert .toBe(NN) literally" is stale and would mislead the next reviewer. Refreshed. Tests: 178 files / 2,470 passed, 34 skipped, 0 failed. New migrations-054 test file covers: every base table is STRICT in a fresh DB; STRICT actually rejects TEXT-into-INTEGER write; metric column is REAL; fractional metric value round-trips; CHECK on nodes.kind rejects typos; CHECK on edges.kind rejects bogus values. Reviewed by independent reviewer agent in two passes (initial STRICT-only diff, then this expanded version). Verdict APPROVE on migration idempotency, pruning predicate correctness, trigger semantics, FK-disable safety, schema.sql/migration lock-step, and CLI/MCP alignment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

andreinknv and others added 2 commits April 30, 2026 16:19

andreinknv closed this May 1, 2026

This was referenced May 1, 2026

feat+perf(llm): split chat/askChat provider + summary batching [WIP 3/5] #5

Closed

fix: GC orphan summaries + WASM savepoint nesting [WIP 4/5] #6

Closed

feat(mcp): surface LLM provider routing in status/ask/similar [WIP 5/5] #7

Closed

andreinknv deleted the pr2-cache-preservation branch May 1, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf+fix(llm): preserve LLM caches across --force [WIP 2/5]#4

perf+fix(llm): preserve LLM caches across --force [WIP 2/5]#4
andreinknv wants to merge 2 commits into
pr1-perf-classifier-batchingfrom
pr2-cache-preservation

andreinknv commented May 1, 2026

Uh oh!

andreinknv commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andreinknv commented May 1, 2026

Commits

Summary

Test plan

Uh oh!

andreinknv commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant