perf+fix(llm): preserve LLM caches across --force [WIP 2/5]#4
Closed
andreinknv wants to merge 2 commits into
Closed
perf+fix(llm): preserve LLM caches across --force [WIP 2/5]#4andreinknv wants to merge 2 commits into
andreinknv wants to merge 2 commits into
Conversation
…ng bump Layered on top of the perf-batch commit (9927549). Four changes that collectively turn a no-op `--force` re-index from a full ~540K-token LLM run into a content-hash cache hit. ## clearStructural — preserve LLM caches across --force Before: `--force` called `cg.clear()` which DELETEd everything, including symbol_summaries / symbol_embeddings / directory_summaries. Even though the summarizer already had a content_hash check, the rows were gone before it could short-circuit. Adds `CodeGraph.clearStructural()` and `QueryBuilder.clearStructural()` that drop only structural data: nodes, edges, files, unresolved_refs, co_changes, plus node-id-keyed analyses (symbol_issues, config_refs, sql_refs, node_coverage, code_health_findings) that would otherwise reference dead nodes. The three LLM-derived caches survive. The schema-level FK on symbol_summaries→nodes is left as ON DELETE CASCADE. That's correct for incremental sync (a deleted symbol should drop its cached summary). The bulk path suppresses it at runtime by toggling PRAGMA foreign_keys OFF/ON around the deletes — the pragma is connection-level and only takes effect outside an active transaction, hence the order: pragma → transaction → pragma. With FKs off the explicit DELETEs of the structural-attribution tables (symbol_issues, config_refs, sql_refs) are necessary; without them those rows would survive as orphans pointing at gone nodes. `--force` in `src/bin/codegraph.ts` now calls `clearStructural()` in both quiet and non-quiet branches, with the user-facing message updated to "Cleared existing index (LLM caches preserved)". The original `clear()` is preserved with its full-wipe semantics (now implemented as clearStructural + DELETEs of the three LLM tables) for the explicit nuke case. ## Content-hash fallback in summarizer Even with summaries surviving --force, the cache lookup was keyed by node_id. When a symbol moves files or shifts lines its node_id changes, so the cache missed even when the body was unchanged. `getSummaryByContentHash(hash, model)` looks up by content_hash + model with `ORDER BY generated_at DESC LIMIT 1` (deterministic tie- break on collisions, semantically irrelevant since identical bodies yield identical summaries). The summarizer falls back to this lookup when the node_id miss happens — on hit, it copies the cached summary text onto a fresh row keyed by the new node_id (counted as a cacheHit, no LLM call). A new index `idx_summaries_content_hash` on (content_hash, model) backs this lookup. Added to schema.sql for fresh DBs and via migration 022 (`022-add-content-hash-index.ts`) for existing ones. ## Trivial-symbol gating `MIN_BODY_LINES` in summarizer.ts bumped from 3 to 4. Skips typical setters and 1-statement forwarders that the name + signature already explain. Existing users with codegraph indexes will see a one-time drop in summary coverage on next sync — symbols spanning exactly 3 lines will not be re-summarized. The structural index is unaffected. ## Tests - `__tests__/migrations-022.test.ts` (NEW, 4 tests): index presence on fresh + upgrade DB, clearStructural preserving rows even with CASCADE FK in place, sync-path single-symbol delete still cascading correctly (proves FKs were re-enabled), full clear() still wiping summaries. - `__tests__/llm-tiers.test.ts` (+2 tests): clearStructural preserves symbol_summaries so re-index hits cache (asserts second-pass chat calls < first-pass), content-hash fallback reuses summary when node_id changes (direct queries-level assertions). - Schema-version assertions in `foundation.test.ts:308` and `pr19-improvements.test.ts:302` updated 21 → 22. Suite: 1087 / 13 skip / 0 fail (was 1081). ## Reviewer trail Two passes. Pass 1: REQUEST_CHANGES with 4 findings. - (1) clearStructural was orphaning symbol_issues / config_refs / sql_refs because the FK toggle suppressed CASCADE for them too; added explicit DELETEs. - (2) Migration filename `022-drop-summary-cascade.ts` was misleading (it adds an index, doesn't drop cascade); renamed to `022-add-content-hash-index.ts`, updated import + registration. - (4) `getSummaryByContentHash` LIMIT 1 was non-deterministic on hash collisions; added `ORDER BY generated_at DESC`. - (3) MIN_BODY_LINES release-note ask: addressed inline above rather than in a separate CHANGELOG file. Pass 2: APPROVE, one info finding noting that DESC vs ASC tiebreak is arbitrary (documented, no action needed). ## Expected impact Codegraph self-repo on second `--force` re-index with no source changes: the per-symbol summary phase short-circuits via content_hash cache hits. Token consumption drops from ~540K to a small residual for any new content. Combined with the perf-batch commit, the 30× classifier latency win and ~3× input token savings now layer onto near-zero summary regeneration cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two real bugs surfaced by end-to-end measurement of the cache- preservation work in bd26945. Both pre-existed bd26945 (and df96c6c) but were masked because the old --force path (cg.clear()) wiped the LLM caches anyway. They became visible when clearStructural() started preserving them. ## Bug 1 — clearCoChanges() was wiping symbol_summaries / embeddings / dir summaries src/db/queries.ts:2677. Despite its name, the function was deleting the three LLM-derived caches (`symbol_summaries`, `symbol_embeddings`, `directory_summaries`) along with the actual co-change history. The cochange post-index hook (src/index-hooks/cochange.ts:46) calls clearCoChanges on every full rescan with fullRescan: true — which is what afterIndexAll always passes. So every --force re-index post- bd26945 was nuking the very caches clearStructural had just preserved moments earlier. There is no logical reason for the LLM tables to be cleared by a co- change rescan: summaries are keyed by content_hash of the symbol body, not by commit history. The wipe was probably defensive overcaution from when the columns lived inline on symbol_summaries (pre-016). Fix: clearCoChanges now only does - DELETE FROM co_changes - UPDATE files SET commit_count = 0 Doc-comment notes the historical context and why we removed the LLM-table deletes. ## Bug 2 — done counter double-incremented on cache hits in summarizer src/llm/summarizer.ts:181, 196. Each worker iteration was structured try { if (cacheHit) { cacheHits++; done++; ...; continue; } ... } finally { done++; options.onProgress?.(done, total); } JavaScript `continue` still triggers the `finally` clause, so cache- hit paths bumped done twice — once explicitly, once in finally. Surfaced as `679/672 (101%)` and similar in production progress output during the end-to-end measurement run. Fix: removed the explicit `done++` from both cache-hit paths (node-id match and content-hash fallback). Let `finally` handle it. Doc- comments explain the JS semantic and the historical masking. Verified by inspection: every code path through the worker body now ends in either (a) hitting the finally block (normal path, errors, falsy body, cache hits via continue) or (b) an explicit `return` for abort signals (which deliberately skips done++ — that's correct behavior). ## Test strengthening `__tests__/llm-tiers.test.ts` — the existing test "clearStructural preserves symbol_summaries..." passed with bug 1 in place because its only assertion was `incremental < callsAfterFirst`, which was too loose: post-clearCoChanges-wipe, the second pass still needed FEWER calls than the first only if role classification was counted as a meaningful baseline. Strengthened to read symbol_summaries row count directly via the queries handle and assert STRICT equality: - Before clearStructural row count == after clearStructural row count - Before clearStructural row count == after second indexAll row count Renamed the test to mention the cochange hook explicitly so future readers know what regression it guards. Kept the looser cumulative- chat-calls assertion as a backstop. Suite: 1087 / 13 skip / 0 fail. ## Real-world validation Phase A (first --force, fresh DB) took ~27 minutes to populate 671 summaries from scratch via claude-bridge → Haiku. Phase C (re-running --force after fix, with 101 surviving from a truncated Phase B): the summary phase progress JUMPED from 5/672 in 22s (the Phase A baseline) to 109/672 in 22s (Phase C with 101 cached). Direct evidence of the cache short-circuit working — ~20× speedup for the cached portion in real measurement, scaling to ~10× overall when the cache covers all 672 symbols. ## Reviewer trail One pass. APPROVE with one info-level finding (clearStructural also deletes co_changes directly, so the cochange hook's clearCoChanges double-clears it — idempotent, harmless, not addressed in this PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
This was referenced May 1, 2026
andreinknv
added a commit
that referenced
this pull request
May 1, 2026
… server-config flags Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed: #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe) #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*` #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)` colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed` CLI/MCP alignment (5/32 → 33+/35): - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar, biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs, module-summary, role, coverage-query, pending-summaries, save-summaries, review-context - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync, codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit, codegraph_unlock, codegraph_affected MCP server-level operator config (`codegraph serve --mcp`): - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing) - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model / --llm-api-key (operator LLM config; per-project config wins on conflict) - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders Architectural cleanups: - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces growing string-comparison chain in execute()) - `withAllowStale` registry injection only on tools that DO see the gate - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts` - `LlmClient.isEmbeddingReachable` for split-provider correctness - SyncResult `lockContention` flag → handleSync emits distinct retryable message - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force) - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs) Migrations: 023-files-is-test.ts — add `files.is_test` 024-build-context-refs.ts — add `build_context_refs` table Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes: - JSONC URL strip via state machine (was eating `https://` tails) - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent) - Dynamic imports also match template_string nodes - process.exit deferred until after finally cleanup in runViaMCP - --same-language / --different-language mutual exclusion guard - help-tools CLI bypasses isInitialized (works without a project) - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks) - handleAffected errors instead of silently dropping unsupported glob filters - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
…ion A #5) Closes the post-refactor backlog item #5: static large_method flags evergreen-large symbols indistinguishably from actively-growing ones; recently_grew surfaces the latter as the actionable refactor target. Architecture: - Migration 027 adds node_loc_history(node_id, indexed_ts, loc) — one row per analyseProject pass that touched the symbol. The per-file content-hash short-circuit means unchanged symbols are not re-snapshotted; the compound index (node_id, indexed_ts DESC) makes "previous snapshot for this node" a sub-millisecond seek. - src/db/queries-loc-history.ts exports recordLocSnapshots (batch insert with the canonical FK-safe SELECT WHERE EXISTS / ON CONFLICT pattern from queries-summaries) and getPriorLocSnapshots (per-node seek; one prepared stmt, N steps). - biomarkers/index.ts: at analyseProject start, capture nowMs once; per-file batch-fetch prior snapshots, evaluate rules with prior + nowMs threaded into RuleContext, persist a fresh snapshot for every symbol whose metrics we computed. - biomarkers/engine.ts: new evaluateRecentlyGrew rule. Triggers when prior_loc >= 20 AND current_loc > prior_loc * 1.5 AND (now - prior_ts) <= 30d. Severity ladder: 1.5–2x info, 2–3x warning, ≥3x error. Metric is round(ratio * 100) so the ranked-mode sort orders by growth percent. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 27 (foundation + pr19-improvements). - #5 node_loc_history added to BOTH the migration AND schema.sql so fresh installs initialize correctly. - #7 dead-export check: dropped recordLocSnapshot/getPriorLocSnapshot singulars after the reviewer flagged YAGNI; only the batch variants are exported (and called). 9 engine-level tests for the rule logic. Full suite 1393/34/0 (+9 from prior). Persistence path is exercised implicitly by every existing biomarker test (silent green confirms the schema + writes are sound on real fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
First half of the trace-logging arc that backs the viewer's Agent-trace tab. This commit lands the storage + logger; the MCP server hookup (commit B) and the viewer endpoints (commit C) follow. - Migration 028 + schema.sql: mcp_sessions(id, started_ts, last_activity_ts, tool_count) + mcp_tool_calls(session_id, step, ts, tool_name, args_json, result_summary, duration_ms). FK CASCADE from calls → sessions; idx_mcp_tool_calls_ts gives the retention sweep an O(log n) scan. - src/db/queries-trace.ts: insertSession, appendToolCall (transaction that bumps the parent session's tool_count atomically), pruneToolCalls (cap rows + GC orphan sessions), recentSessions, callsForSession. - src/trace/logger.ts: TraceLogger class. Constructor inserts a session row; log() appends + prunes-every-N. All persistence is best-effort (DB write failures log at debug, never throw — trace must not lose tool results). Args + result-summary are truncated at 2000 chars. - generateSessionId(): unix-seconds prefix + 6 random hex chars, sortable by time. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 28 (foundation + pr19-improvements). - #5 mcp_sessions + mcp_tool_calls in BOTH the migration AND schema.sql. 10 logger round-trip tests. Full suite 1422/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
Closes the cyclomatic + maxNesting "—" gap in the viewer's right pane. Previously these fields only had a value when a biomarker finding fired (via the metric on `code_health_findings`); clean symbols read null. Now every analysable symbol has a row in node_metrics with all six engine-computed metrics, overwritten on each pass. Architecture: - Migration 029 + node_metrics table: PK on node_id, overwritten per pass via a single FK-safe upsert. Distinct from code_health_findings (threshold-only) and node_loc_history (LoC time-series for recently_grew). Schema in BOTH the migration AND schema.sql per memo #5. - queries-metrics.ts: upsertNodeMetricsBatch (one transaction, WHERE EXISTS guard), getNodeMetrics (single-row reader). - biomarkers/index.ts: evaluateAnalysableNodes returns a third map of metrics-by-node; analyseSingleFile persists to the new table after the findings persist. - BIOMARKER_CACHE_KEY bumped v1 → v2 so existing indexes self-heal: without the bump, the per-file content-hash cache would short-circuit unchanged files and leave node_metrics empty until users hit `--force` themselves. - Viewer server: buildMetricsBlock now reads from node_metrics first; falls back to the per-finding metric on older indexes. paramCount also surfaced (not yet displayed in the UI). Smoke-tested on the codegraph self-index after re-index: extractFromSource now reports cyclomatic 3 / maxNesting 1 / paramCount 3 — matching what the engine computes during biomarker analysis. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 29. - #5 node_metrics in BOTH migration AND schema.sql. Updated mcp-reindex test (cache key bump). Suite 1425/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
…ue filter
When an agent passes a filter that constrains every result to one value
(search `kind:X`, biomarkers `biomarker:Y`), the corresponding column
is the same on every row — pure noise. Drop the per-row column and put
the value once in the section header.
- search.ts: drops `(${kind})` per row when `kind` filter is set,
annotates header with `(kind=X)`. Saves ~15 chars/row.
- biomarkers.ts (ranked mode): drops the `Biomarker` table column
(header + separator + per-row cell) when `biomarker` filter is set,
annotates header with the biomarker name. Saves ~15 chars/row.
Backlog #4 narrowed: the literal "audit every tool" sweep was moot —
most tools (callers/callees/biomarkers without filter) genuinely
return mixed kinds. The narrow filter-implied-column case is the
clean win that survives premise verification.
2 new tests in __tests__/implied-keys.test.ts. Suite 1444/34/0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
…olbymchenry#8) Promotes the resolver numeric `metadata.confidence` to a categorical, queryable column on `edges`. `codegraph_callers`, `_callees`, `_impact` now annotate each row with `*(INFERRED)*` when the edge is not concrete, and accept a new `minConfidence` arg to drop edges below a chosen level — useful for refactor-impact analysis where false-positive reach inflates the blast radius. Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy / structural / extractor-direct edges need no producer change. Index on the column for filter-down queries. Idempotent ADD COLUMN with table-exists guard following the convention from migrations 020 / 021 / 023 / 030. Producer: `ReferenceResolver.createEdges` calls a new free `classifyConfidence(resolvedBy, score)` helper: - import / qualified-name / file-path / exact-match → EXTRACTED - framework with score >= 0.85 → EXTRACTED, else INFERRED - instance-method / fuzzy → INFERRED AMBIGUOUS is not stamped from the resolver path in v1 — needs tie-tracking from the candidate picker. Tracked as B3. Consumers (`result-formatters.ts`): - `formatConfidence(edge)` — markdown italic suffix (empty for EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise). - `parseMinConfidence(raw)` — returns level / null / errorResult. - `filterByConfidence(rows, min)` — drop below threshold. - `CONFIDENCE_RANK` — numeric ordering for inline filter loops. Wired into all four code paths in `callers.ts` (collectCallers, collectCallersForSource, collectTypeUsers, formatGroupedCallers), both paths in `callees.ts`, and `impact.ts` with a `filterImpactByConfidence` BFS that prunes nodes unreachable along the surviving edge set. Type cleanup: `Edge.confidence` added with JSDoc; the long-dead `Edge.provenance` field also got a JSDoc note pointing at the SCIP-integration-reservation explanation in memory. Reviewer-memo #4 catch (schema-version test forgetfulness): noticed `__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)` even though migrations 030 and 031 had shipped. Catch-up to 32 in this commit covers both the missed bumps and the new migration. Reviewer: APPROVE, three info-only findings (type-user display gap, framework-rule calibration table doc, catch of the recurring pr19 pattern). Suite: 1555 / 34 / 0 (+18 new edge-confidence tests). Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression budget vs 091e935 baseline). Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11 trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 4, 2026
Total complex_method warnings 15 → 10. Pattern: replace N-arm
switch / if-cascade with `Record<string, Handler>` (or `Map`)
dispatch table. Each per-key handler becomes a small named
function; the orchestrator drops to a single map lookup +
fallback.
- src/search/query-parser.ts (classifyQueryToken, was 19):
10-arm switch over the colon-key (`kind:` / `lang:` /
`path:` etc.) replaced by a TOKEN_HANDLERS Map. Each
handler pushes either to `out.{field}` or to `textParts`
on invalid input. Aliases (`lang`/`language`,
`sig`/`signature`) point at the same handler reference.
Unknown keys still fall through to `textParts.push(tok)`,
and invalid enum values still preserve the original token
so the user's typo doesn't disappear silently.
- src/extraction/graphql-extractor.ts (visitDefinition, was 15):
6-arm switch on `specific.type` replaced by
TYPE_DEFINITION_EMITTERS Record. The 6 emit methods
(emitObject / emitInterface / emitInputObject / emitEnum
/ emitUnion / emitScalar) had their `private` modifier
dropped and `/** @internal */` JSDoc added so the
module-level dispatch table can call them.
GraphqlExtractor is module-internal (not re-exported from
the package index; only `.extract()` is consumed
externally), so the widened TS structural surface has no
in-tree or external consumer.
- src/extraction/languages/rescript.ts (visitReScriptNode, was 19):
9-arm if-cascade on `node.type` replaced by
RESCRIPT_NODE_HANDLERS Record. Five named helper functions
for the cases with non-trivial bodies, four inline arrow
handlers for the one-liner cases. Dispatcher returns
`handler !== undefined`, preserving the original's
"true on handled, false on fallthrough" contract.
- src/mcp/tools/compare.ts (fmtFileSection, was 17):
Three helper extractions:
* fileStatusSuffix(f) — collapses the
`(new file)/(deleted)/''` ternary chain
* appendSymbolList(out, label, symbols) — handles
the added/removed/modified triplet (no-op when empty)
* appendFindingsDelta(out, fd) — handles the
`+ findings introduced` / `− cleared` /
`_carried over_` block
Plus one behaviour fix flagged by reviewer: the
noFindingsChanges gate now also excludes carried-only
files so they appear in output (previously a file with
ONLY carried findings and no structural changes was
silently suppressed; the carried-finding count is useful
context for the agent).
- src/viewer/server.ts (handleRequest, was 18):
12-route if-cascade replaced by GET_ROUTES table of
`{ match, handle }` entries. `match` returns
`RegExpExecArray | true | null`; dispatcher walks in
order, stopping at first non-null match. Pulled out
`respondWithIdLookup` for the symbol/source pair
(decode-or-400 + lookup-or-404 + 200 chain) and
`sendIndexHtml` for the / and /index.html pair. POST
/api/ask and the non-GET method-not-allowed path are
unchanged. Route ordering preserved (exact /api/findings
precedes the regex /api/findings/:biomarker).
Reviewer pass approved after addressing the carried-only-gate
behaviour change in #4. Suite 1639/34/0. Total warnings 15 → 10.
andreinknv
added a commit
that referenced
this pull request
May 7, 2026
Post-refactor backlog #4. Replaces the prior "all three of large/complex/nested fire at warning+" gate with a true composite score that combines size, branchiness, nesting, and conditional density. The old metric was just LOC, so a 200-LOC flat dispatch table fired the same as a 200-LOC method with deep nested loops and dense boolean expressions — exactly the discrimination the brain_method smell is supposed to provide. Formula: score = (loc / 100) × (cyc / 10) × max(1, nest / 3) × max(1, condOps / 4) Each factor is normalised at 1.0 around the middle of its single- metric biomarker's threshold band; max(1, ...) on nesting / cond prevents low-density values from depressing the LOC×CYC product (LOC and CYC stay the dominant signals). Worked examples on a 200-LOC method: flat dispatch (cyc=15, nest=2, cond=3): 2.0 × 1.5 × 1.0 × 1.0 = 3.0 → no finding (below info gate) nested-loop monstrosity (cyc=25, nest=5, cond=8): 2.0 × 2.5 × 1.67 × 2.0 = 16.7 → warning Severity ladder by score: 5 info, 10 warning, 20 error. Gates: LOC < 50 OR cyclomatic < 8 → not even considered. Tiny helpers can't be brain methods even when dense; size + complexity together are the smell. Future work: include a shared-mutable-state proxy (let bindings, cross-block reassignments) once the AST analysis exists — would cover methods that look tractable on the surface but have hidden coupling. Tests rewritten: dropped the old "all 3 prereqs"/"only one prereq" shape; added high-composite-score positive, flat-vs-nested discrimination at same LOC, LOC-gate negative, cyclomatic-gate negative. Suite 1663/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 7, 2026
When every row in a callers / callees / impact result carries the same non-default confidence (INFERRED or AMBIGUOUS uniformly across all rows — the common shape resolver output takes when ambiguity hits), hoist a single ` — all *INFERRED*` suffix into the header instead of repeating ` *(INFERRED)*` on every row. Live measurement on `codegraph callers getFileDependents` (5 rows, all INFERRED): 589 → 543 bytes (-7.8%). Larger savings on bigger results — for a 30-row uniform callers output the savings scale to ~360 bytes (~12%). Mixed-confidence outputs (EXTRACTED + INFERRED, or AMBIGUOUS + INFERRED) keep their per-row markers so individual row trust is still legible. Single-row outputs also keep the per-row marker since there's no consolidation benefit. The detection lives in a new `detectUniformConfidence` helper exported from result-formatters.ts so other tabular renderers (e.g. the grouped-callers / grouped-callees paths in callers.ts / callees.ts) can adopt the same pattern incrementally — same optimization shape as #4 "strip implied keys" already shipped for kind columns. +5 focused tests covering the uniform / mixed / EXTRACTED-mixed / single-row / direct-API cases. Suite 1685/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 8, 2026
…rs rollups Round-trip-reduction item #4 — fold the top-N from `codegraph_hotspots` and `codegraph_biomarkers mode=ranked` into the status response so the onboarding pattern "what's interesting in this repo?" (status + hotspots + biomarkers + risk_review + digest = 4-5 calls) collapses to one call. Added flags (both default 0 → suppressed → byte-identical legacy output): - `topHotspots: number` — top-N by composite risk (centrality × churn). Skips silently on a non-git repo or empty hotspots table. - `topBiomarkers: number` — top-N at warning+ severity, worst-first. Each clamps to [1, 30]; negative / NaN values collapse to suppressed. Both rollup sections render AFTER the readiness/server sections so the agent's eye-scan still hits freshness/health banners first. Skipped `summaryCoverage` flag from the original spec — the existing Feature Readiness section already shows summary count + percentage; a dedicated flag would be redundant. Suite 1778/0/34 (+6 new tests). No schema migration. Backwards-compat locked in by the existing `mcp-status.test.ts` regression guards. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
…rieval Subtask colbymchenry#10.1 of the embedding-features arc Stage 2. Adds a `grain TEXT NOT NULL DEFAULT 'symbol'` column to `symbol_embeddings` so the table can carry per-symbol embeddings (existing) and per-file aggregated embeddings (forthcoming subtask colbymchenry#10.2). File-grain rows will reuse the file's nodes(id) since `getEmbeddableNodes` already excludes file kinds from per-symbol embedding — no PK collision. Schema change is column-additive (ALTER TABLE ADD COLUMN); no backfill needed because the default 'symbol' covers every existing row. schema.sql updated alongside the migration per recurring scrutiny pattern #5. Schema-version test assertions bumped 42 → 43 per recurring scrutiny pattern #4. Suite remains 2100/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
Subtask #4.1 of the embedding-features arc Stage 3. New module src/llm/reranker-client.ts wraps @huggingface/transformers (already installed by Stage 1's in-process embedding cutover) and runs a cross-encoder relevance scorer over (query, candidate) pairs. Public API mirrors LocalEmbeddingClient: - new RerankerClient(cfg: RerankerProviderConfig | null) - isConfigured: boolean - rerank(query, candidates: string[]): Promise<number[]> - scoreOne(query, candidate): Promise<number> - isReachable() / listModels() — same any-up semantics Default model: Xenova/ms-marco-MiniLM-L-6-v2 (~80MB ONNX, ~50 pairs/sec on Apple Silicon). Lazy model load + module-level pipeline cache keyed by `<model>::<dtype>` so process-wide first search pays the ~3s init cost once. Pipeline output normalisation: the text-classification pipeline returns either a flat [{score}, ...] or nested [[{score}], ...] depending on input shape; the wrapper handles both. Optional dep discipline: when `@huggingface/transformers` is missing, rerank() rejects with a clear LlmEndpointError pointing at the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
Subtask #4.2. Adds optional `rerankerLlm` block to the user-facing CodeGraphConfig.llm shape (`provider: 'local'`, optional `model`, optional `dtype`) and surfaces it on the resolved LlmEndpointConfig + ResolvedLlm contracts. The new `resolveReranker(llm)` helper in provider.ts is a pure pass-through (no env-var override, no legacy field) since reranking is a Stage 3 feature that shipped after the legacy migration window. Default value: unset / null — operators opt in to the precision pass explicitly. When unset, no behavior change vs Stage 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
Subtask #4.3. After the file-grain merge + query-expansion stages of the semantic-search pipeline, when `resolved.rerankerLlm` is set: fetch each candidate's name/signature/docstring/summary from SQLite in one statement, build the cross-encoder input text the same way the embedder does, call RerankerClient.rerank(query, texts), then reorder the semantic hits by the new scores. Replacing the cosine score with the reranker logit is safe because the downstream reciprocalRankFusion uses rank position only, not the raw score value. Returns the unmodified hits on any failure so the search path always degrades gracefully when the optional dep is missing or the model load throws. Wired through searchHybrid (called by codegraph_ask, codegraph_search mode='semantic', and codegraph_review_context's RAG retrieval). Default off (rerankerLlm = null in config), so installs that haven't opted in see no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
Subtask #4.4. 29 vitest cases across 8 describe blocks: - Configuration: isConfigured + listModels variants (5 tests). - Rerank input handling: empty candidates, null cfg, aborted signal. - Pipeline output normalization: flat / nested / mixed shapes, missing score, empty inner array. - Pipeline call shape: pair format + top_k option. - Model load caching: single load, _clearRerankerCache, different model + dtype keys. - scoreOne wrapper. - isReachable. All tests use vi.mock to intercept the @huggingface/transformers import — no live model load. Suite goes 2166 -> 2195 (+29). Two contract drift notes captured by the test agent: - scoreOne propagates LlmEndpointError when cfg is null (does NOT return NaN as the spec aspirated). Test pins the real behavior. - isReachable's load-throws branch isn't testable with hoisted vi.mock; documented as a skip with rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 11, 2026
…nums
Bundles four schema-quality improvements gated by one new migration
(054) and the matching schema.sql changes for fresh installs.
1. STRICT on every base table (32 of them). Schema-survey before the
change confirmed the codebase already used only the 5 STRICT-legal
types (TEXT/INTEGER/REAL/BLOB/ANY) — STRICT is just the missing
keyword + a per-table rebuild for existing DBs. Catches a class of
silent type-coercion bugs (a stringified node id silently stored
in an INTEGER column would compare unequal in JOINs, break PK
lookups, etc).
2. WITHOUT ROWID on 15 natural-PK tables: role_assignments, files,
co_changes, project_metadata, symbol_issues, summary_refs,
directory_summaries, node_coverage, code_health_findings,
node_loc_history, mcp_sessions, mcp_macros, node_metrics,
summary_priority_queue, commit_intents. Skips tables with rowid
dependencies (nodes/summary_store/test_names back FTS5 content;
embedding_store/embedding_refs/symbol_chunk_embeddings back vec0
mirrors and HNSW; parse_cache/mcp_tool_calls evict by oldest
rowid). ~30% storage reduction on the affected tables and one
B-tree fewer per PK lookup.
3. CHECK constraints on five enum-shaped columns: nodes.kind,
nodes.role, edges.kind, edges.confidence, embedding_refs.grain.
Same philosophy STRICT applies to types extends to values: a
typo'd kind like `funtion` is rejected at write time instead of
silently corrupting downstream histogram queries. Allowed lists
mirror src/types.ts (NodeKind / EdgeKind), src/llm/classifier.ts
(ROLE_LABELS), and the categorical mapping in
src/db/migrations/032-edge-confidence.ts.
4. Promoted code_health_findings.metric INTEGER -> REAL. brain_method
emits a fractional composite score (~12.3); the old INTEGER column
silently stored it as REAL under loose typing but a STRICT INTEGER
column rejects it.
Migration 054 mechanics:
- requiresFkDisable: true (FK toggling for the parent-rebuild loop)
- Reads CREATE statements from sqlite_master, rebuilds via the
standard SQLite pattern (CREATE _strict_tmp / INSERT / DROP /
RENAME), recreates indexes + triggers from captured SQL.
- Per-table fixup hook (`PER_TABLE_FIXUPS`) injects STRICT, the
code_health_findings.metric REAL promotion, and the five CHECK
constraints during the rebuild for existing DBs.
- Idempotent: tables already STRICT are skipped (fresh DBs whose
schema.sql already has it replay as no-ops).
- Excludes virtual tables (FTS5 / RTree), vec0 mirrors, sqlite_*,
and shadow-backing tables (`<virtual>_*`).
- Pre-flight guard skips the entire migration when nodes lacks
start_line/end_line — partial-schema test fixtures from
migrations-015-016 / migrations-022 would otherwise trip the
rtree triggers from migration 034. Production DBs always have
these columns since v1.
- Defensive cleanup: any temp table left from a partial rebuild
failure is DROP IF EXISTS in the catch handler.
- VACUUM at the end to reclaim space from the 32-table rebuild
(silently skipped if the runner wraps us in a transaction).
Side fix: src/extraction/index.ts:2041 was writing stats.mtimeMs (a
fractional float from fs.statSync) into files.modified_at INTEGER.
Pre-STRICT this silently coerced; STRICT rejects it and broke 366
tests. Floor the value at the writer.
Reviewer-memo update: item #4 (schema-version test forgetfulness)
is now self-updating — CURRENT_SCHEMA_VERSION is filename-derived
and both foundation.test.ts + pr19-improvements.test.ts compare
against the imported constant, not a literal. The memo entry's
old "BOTH tests assert .toBe(NN) literally" is stale and would
mislead the next reviewer. Refreshed.
Tests: 178 files / 2,470 passed, 34 skipped, 0 failed. New
migrations-054 test file covers: every base table is STRICT in a
fresh DB; STRICT actually rejects TEXT-into-INTEGER write; metric
column is REAL; fractional metric value round-trips; CHECK on
nodes.kind rejects typos; CHECK on edges.kind rejects bogus values.
Reviewed by independent reviewer agent in two passes (initial
STRICT-only diff, then this expanded version). Verdict APPROVE on
migration idempotency, pruning predicate correctness, trigger
semantics, FK-disable safety, schema.sql/migration lock-step, and
CLI/MCP alignment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stress-test fixes — PR 2 of 5 (stacked). Base:
pr1-perf-classifier-batching.Commits
bd26945perf: preserve LLM caches across--force, content-hash fallback, gating bumpf1e2bb7fix: clearCoChanges no longer wipes LLM caches; summarizer done++ onceSummary
Goal: turn a no-op
--forcere-index from a full ~540K-token LLM run into a content-hash cache hit.clearStructural()preserves LLM caches —--forcepreviously calledcg.clear()which DELETEd everything, including symbol_summaries / symbol_embeddings / directory_summaries. New path drops only structural data (nodes, edges, files, unresolved_refs, co_changes, plus node-id-keyed analyses). FK onsymbol_summaries→nodesstaysON DELETE CASCADEfor incremental sync; bulk path togglesPRAGMA foreign_keysOFF/ON around the deletes.--force, lookups were keyed by node_id. When a symbol moves files / shifts lines its node_id changes; the new fallback re-attaches by content hash.clearCoChanges()was wiping LLM caches — despite its name. The cochange post-index hook called it on every full rescan, so every--forcepost-bd26945was nuking the very cachesclearStructuralhad just preserved. Fix: only deletes co_changes + resets commit_count.done++double-increment on cache hits in summarizer —try { if (cacheHit) { done++; continue; } ... } finally { done++; }was counting cache hits twice.Test plan
--forcere-index showsdone == total(no over-100% progress).WIP draft.