Skip to content

perf+fix(llm): preserve LLM caches across --force [WIP 2/5]#4

Closed
andreinknv wants to merge 2 commits into
pr1-perf-classifier-batchingfrom
pr2-cache-preservation
Closed

perf+fix(llm): preserve LLM caches across --force [WIP 2/5]#4
andreinknv wants to merge 2 commits into
pr1-perf-classifier-batchingfrom
pr2-cache-preservation

Conversation

@andreinknv
Copy link
Copy Markdown
Owner

Stress-test fixes — PR 2 of 5 (stacked). Base: pr1-perf-classifier-batching.

Commits

  • bd26945 perf: preserve LLM caches across --force, content-hash fallback, gating bump
  • f1e2bb7 fix: clearCoChanges no longer wipes LLM caches; summarizer done++ once

Summary

Goal: turn a no-op --force re-index from a full ~540K-token LLM run into a content-hash cache hit.

  • clearStructural() preserves LLM caches--force previously called cg.clear() which DELETEd everything, including symbol_summaries / symbol_embeddings / directory_summaries. New path drops only structural data (nodes, edges, files, unresolved_refs, co_changes, plus node-id-keyed analyses). FK on symbol_summaries→nodes stays ON DELETE CASCADE for incremental sync; bulk path toggles PRAGMA foreign_keys OFF/ON around the deletes.
  • Content-hash fallback in summarizer — even with summaries surviving --force, lookups were keyed by node_id. When a symbol moves files / shifts lines its node_id changes; the new fallback re-attaches by content hash.
  • Bug fix: clearCoChanges() was wiping LLM caches — despite its name. The cochange post-index hook called it on every full rescan, so every --force post-bd26945 was nuking the very caches clearStructural had just preserved. Fix: only deletes co_changes + resets commit_count.
  • Bug fix: done++ double-increment on cache hits in summarizer — try { if (cacheHit) { done++; continue; } ... } finally { done++; } was counting cache hits twice.

Test plan

  • Suite green at 1113 / 13 skip / 0 fail.
  • Manual --force re-index shows done == total (no over-100% progress).

WIP draft.

andreinknv and others added 2 commits April 30, 2026 16:19
…ng bump

Layered on top of the perf-batch commit (9927549). Four changes that
collectively turn a no-op `--force` re-index from a full ~540K-token
LLM run into a content-hash cache hit.

## clearStructural — preserve LLM caches across --force

Before: `--force` called `cg.clear()` which DELETEd everything,
including symbol_summaries / symbol_embeddings / directory_summaries.
Even though the summarizer already had a content_hash check, the
rows were gone before it could short-circuit.

Adds `CodeGraph.clearStructural()` and `QueryBuilder.clearStructural()`
that drop only structural data: nodes, edges, files, unresolved_refs,
co_changes, plus node-id-keyed analyses (symbol_issues, config_refs,
sql_refs, node_coverage, code_health_findings) that would otherwise
reference dead nodes. The three LLM-derived caches survive.

The schema-level FK on symbol_summaries→nodes is left as ON DELETE
CASCADE. That's correct for incremental sync (a deleted symbol should
drop its cached summary). The bulk path suppresses it at runtime by
toggling PRAGMA foreign_keys OFF/ON around the deletes — the pragma
is connection-level and only takes effect outside an active
transaction, hence the order: pragma → transaction → pragma. With FKs
off the explicit DELETEs of the structural-attribution tables
(symbol_issues, config_refs, sql_refs) are necessary; without them
those rows would survive as orphans pointing at gone nodes.

`--force` in `src/bin/codegraph.ts` now calls `clearStructural()`
in both quiet and non-quiet branches, with the user-facing message
updated to "Cleared existing index (LLM caches preserved)".

The original `clear()` is preserved with its full-wipe semantics
(now implemented as clearStructural + DELETEs of the three LLM
tables) for the explicit nuke case.

## Content-hash fallback in summarizer

Even with summaries surviving --force, the cache lookup was keyed by
node_id. When a symbol moves files or shifts lines its node_id
changes, so the cache missed even when the body was unchanged.

`getSummaryByContentHash(hash, model)` looks up by content_hash +
model with `ORDER BY generated_at DESC LIMIT 1` (deterministic tie-
break on collisions, semantically irrelevant since identical bodies
yield identical summaries). The summarizer falls back to this lookup
when the node_id miss happens — on hit, it copies the cached summary
text onto a fresh row keyed by the new node_id (counted as a
cacheHit, no LLM call).

A new index `idx_summaries_content_hash` on (content_hash, model)
backs this lookup. Added to schema.sql for fresh DBs and via
migration 022 (`022-add-content-hash-index.ts`) for existing ones.

## Trivial-symbol gating

`MIN_BODY_LINES` in summarizer.ts bumped from 3 to 4. Skips typical
setters and 1-statement forwarders that the name + signature already
explain. Existing users with codegraph indexes will see a one-time
drop in summary coverage on next sync — symbols spanning exactly
3 lines will not be re-summarized. The structural index is
unaffected.

## Tests

- `__tests__/migrations-022.test.ts` (NEW, 4 tests): index presence
  on fresh + upgrade DB, clearStructural preserving rows even with
  CASCADE FK in place, sync-path single-symbol delete still
  cascading correctly (proves FKs were re-enabled), full clear()
  still wiping summaries.
- `__tests__/llm-tiers.test.ts` (+2 tests): clearStructural preserves
  symbol_summaries so re-index hits cache (asserts second-pass chat
  calls < first-pass), content-hash fallback reuses summary when
  node_id changes (direct queries-level assertions).
- Schema-version assertions in `foundation.test.ts:308` and
  `pr19-improvements.test.ts:302` updated 21 → 22.

Suite: 1087 / 13 skip / 0 fail (was 1081).

## Reviewer trail

Two passes. Pass 1: REQUEST_CHANGES with 4 findings.
- (1) clearStructural was orphaning symbol_issues / config_refs /
  sql_refs because the FK toggle suppressed CASCADE for them too;
  added explicit DELETEs.
- (2) Migration filename `022-drop-summary-cascade.ts` was
  misleading (it adds an index, doesn't drop cascade); renamed to
  `022-add-content-hash-index.ts`, updated import + registration.
- (4) `getSummaryByContentHash` LIMIT 1 was non-deterministic on
  hash collisions; added `ORDER BY generated_at DESC`.
- (3) MIN_BODY_LINES release-note ask: addressed inline above
  rather than in a separate CHANGELOG file.
Pass 2: APPROVE, one info finding noting that DESC vs ASC tiebreak
is arbitrary (documented, no action needed).

## Expected impact

Codegraph self-repo on second `--force` re-index with no source
changes: the per-symbol summary phase short-circuits via content_hash
cache hits. Token consumption drops from ~540K to a small residual
for any new content. Combined with the perf-batch commit, the 30×
classifier latency win and ~3× input token savings now layer onto
near-zero summary regeneration cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two real bugs surfaced by end-to-end measurement of the cache-
preservation work in bd26945. Both pre-existed bd26945 (and df96c6c)
but were masked because the old --force path (cg.clear()) wiped the
LLM caches anyway. They became visible when clearStructural() started
preserving them.

## Bug 1 — clearCoChanges() was wiping symbol_summaries / embeddings / dir summaries

src/db/queries.ts:2677. Despite its name, the function was deleting
the three LLM-derived caches (`symbol_summaries`, `symbol_embeddings`,
`directory_summaries`) along with the actual co-change history.

The cochange post-index hook (src/index-hooks/cochange.ts:46) calls
clearCoChanges on every full rescan with fullRescan: true — which is
what afterIndexAll always passes. So every --force re-index post-
bd26945 was nuking the very caches clearStructural had just preserved
moments earlier.

There is no logical reason for the LLM tables to be cleared by a co-
change rescan: summaries are keyed by content_hash of the symbol body,
not by commit history. The wipe was probably defensive overcaution
from when the columns lived inline on symbol_summaries (pre-016).

Fix: clearCoChanges now only does
- DELETE FROM co_changes
- UPDATE files SET commit_count = 0
Doc-comment notes the historical context and why we removed the
LLM-table deletes.

## Bug 2 — done counter double-incremented on cache hits in summarizer

src/llm/summarizer.ts:181, 196. Each worker iteration was structured

  try {
    if (cacheHit) { cacheHits++; done++; ...; continue; }
    ...
  } finally {
    done++;
    options.onProgress?.(done, total);
  }

JavaScript `continue` still triggers the `finally` clause, so cache-
hit paths bumped done twice — once explicitly, once in finally.
Surfaced as `679/672 (101%)` and similar in production progress output
during the end-to-end measurement run.

Fix: removed the explicit `done++` from both cache-hit paths (node-id
match and content-hash fallback). Let `finally` handle it. Doc-
comments explain the JS semantic and the historical masking.

Verified by inspection: every code path through the worker body now
ends in either (a) hitting the finally block (normal path, errors,
falsy body, cache hits via continue) or (b) an explicit `return` for
abort signals (which deliberately skips done++ — that's correct
behavior).

## Test strengthening

`__tests__/llm-tiers.test.ts` — the existing test
"clearStructural preserves symbol_summaries..." passed with bug 1 in
place because its only assertion was `incremental < callsAfterFirst`,
which was too loose: post-clearCoChanges-wipe, the second pass still
needed FEWER calls than the first only if role classification was
counted as a meaningful baseline.

Strengthened to read symbol_summaries row count directly via the
queries handle and assert STRICT equality:
- Before clearStructural row count == after clearStructural row count
- Before clearStructural row count == after second indexAll row count

Renamed the test to mention the cochange hook explicitly so future
readers know what regression it guards. Kept the looser cumulative-
chat-calls assertion as a backstop.

Suite: 1087 / 13 skip / 0 fail.

## Real-world validation

Phase A (first --force, fresh DB) took ~27 minutes to populate 671
summaries from scratch via claude-bridge → Haiku.
Phase C (re-running --force after fix, with 101 surviving from a
truncated Phase B): the summary phase progress JUMPED from 5/672 in
22s (the Phase A baseline) to 109/672 in 22s (Phase C with 101
cached). Direct evidence of the cache short-circuit working — ~20×
speedup for the cached portion in real measurement, scaling to ~10×
overall when the cache covers all 672 symbols.

## Reviewer trail

One pass. APPROVE with one info-level finding (clearStructural also
deletes co_changes directly, so the cochange hook's clearCoChanges
double-clears it — idempotent, harmless, not addressed in this PR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andreinknv
Copy link
Copy Markdown
Owner Author

Stack collapsed into wip-stress-base (now at 56fac32). Commits landed via the fast-forward of wip-stress-base from df96c6c to 56fac32; #3 shows as merged, #4-#7 are closed because their tips are already ancestors of the base. Branch preserved for reference.

@andreinknv andreinknv closed this May 1, 2026
@andreinknv andreinknv deleted the pr2-cache-preservation branch May 1, 2026 11:37
andreinknv added a commit that referenced this pull request May 1, 2026
… server-config flags

Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed:
  #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale
  #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema
  #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe)
  #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters
  #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string
  #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*`
  #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)`
  colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains
       `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed`

CLI/MCP alignment (5/32 → 33+/35):
  - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar,
    biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs,
    module-summary, role, coverage-query, pending-summaries, save-summaries, review-context
  - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync,
    codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit,
    codegraph_unlock, codegraph_affected

MCP server-level operator config (`codegraph serve --mcp`):
  - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing)
  - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model /
    --llm-api-key (operator LLM config; per-project config wins on conflict)
  - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders

Architectural cleanups:
  - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces
    growing string-comparison chain in execute())
  - `withAllowStale` registry injection only on tools that DO see the gate
  - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts`
  - `LlmClient.isEmbeddingReachable` for split-provider correctness
  - SyncResult `lockContention` flag → handleSync emits distinct retryable message
  - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force)
  - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs)

Migrations:
  023-files-is-test.ts — add `files.is_test`
  024-build-context-refs.ts — add `build_context_refs` table

Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes:
  - JSONC URL strip via state machine (was eating `https://` tails)
  - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent)
  - Dynamic imports also match template_string nodes
  - process.exit deferred until after finally cleanup in runViaMCP
  - --same-language / --different-language mutual exclusion guard
  - help-tools CLI bypasses isInitialized (works without a project)
  - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks)
  - handleAffected errors instead of silently dropping unsupported glob filters
  - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block

Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
…ion A #5)

Closes the post-refactor backlog item #5: static large_method flags
evergreen-large symbols indistinguishably from actively-growing ones;
recently_grew surfaces the latter as the actionable refactor target.

Architecture:
- Migration 027 adds node_loc_history(node_id, indexed_ts, loc) — one
  row per analyseProject pass that touched the symbol. The per-file
  content-hash short-circuit means unchanged symbols are not
  re-snapshotted; the compound index (node_id, indexed_ts DESC) makes
  "previous snapshot for this node" a sub-millisecond seek.
- src/db/queries-loc-history.ts exports recordLocSnapshots (batch
  insert with the canonical FK-safe SELECT WHERE EXISTS / ON CONFLICT
  pattern from queries-summaries) and getPriorLocSnapshots (per-node
  seek; one prepared stmt, N steps).
- biomarkers/index.ts: at analyseProject start, capture nowMs once;
  per-file batch-fetch prior snapshots, evaluate rules with prior +
  nowMs threaded into RuleContext, persist a fresh snapshot for every
  symbol whose metrics we computed.
- biomarkers/engine.ts: new evaluateRecentlyGrew rule. Triggers when
  prior_loc >= 20 AND current_loc > prior_loc * 1.5 AND
  (now - prior_ts) <= 30d. Severity ladder: 1.5–2x info,
  2–3x warning, ≥3x error. Metric is round(ratio * 100) so the
  ranked-mode sort orders by growth percent.

Reviewer-memo gates passed:
- #4 schema-version asserts both bumped to 27 (foundation +
  pr19-improvements).
- #5 node_loc_history added to BOTH the migration AND schema.sql so
  fresh installs initialize correctly.
- #7 dead-export check: dropped recordLocSnapshot/getPriorLocSnapshot
  singulars after the reviewer flagged YAGNI; only the batch
  variants are exported (and called).

9 engine-level tests for the rule logic. Full suite 1393/34/0
(+9 from prior). Persistence path is exercised implicitly by every
existing biomarker test (silent green confirms the schema + writes
are sound on real fixtures).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
First half of the trace-logging arc that backs the viewer's
Agent-trace tab. This commit lands the storage + logger; the
MCP server hookup (commit B) and the viewer endpoints (commit C)
follow.

- Migration 028 + schema.sql: mcp_sessions(id, started_ts,
  last_activity_ts, tool_count) + mcp_tool_calls(session_id, step,
  ts, tool_name, args_json, result_summary, duration_ms). FK
  CASCADE from calls → sessions; idx_mcp_tool_calls_ts gives the
  retention sweep an O(log n) scan.
- src/db/queries-trace.ts: insertSession, appendToolCall (transaction
  that bumps the parent session's tool_count atomically),
  pruneToolCalls (cap rows + GC orphan sessions), recentSessions,
  callsForSession.
- src/trace/logger.ts: TraceLogger class. Constructor inserts a
  session row; log() appends + prunes-every-N. All persistence is
  best-effort (DB write failures log at debug, never throw —
  trace must not lose tool results). Args + result-summary are
  truncated at 2000 chars.
- generateSessionId(): unix-seconds prefix + 6 random hex chars,
  sortable by time.

Reviewer-memo gates passed:
- #4 schema-version asserts both bumped to 28 (foundation +
  pr19-improvements).
- #5 mcp_sessions + mcp_tool_calls in BOTH the migration AND
  schema.sql.

10 logger round-trip tests. Full suite 1422/34/0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
Closes the cyclomatic + maxNesting "—" gap in the viewer's right
pane. Previously these fields only had a value when a biomarker
finding fired (via the metric on `code_health_findings`); clean
symbols read null. Now every analysable symbol has a row in
node_metrics with all six engine-computed metrics, overwritten
on each pass.

Architecture:
- Migration 029 + node_metrics table: PK on node_id, overwritten
  per pass via a single FK-safe upsert. Distinct from
  code_health_findings (threshold-only) and node_loc_history
  (LoC time-series for recently_grew). Schema in BOTH the
  migration AND schema.sql per memo #5.
- queries-metrics.ts: upsertNodeMetricsBatch (one transaction,
  WHERE EXISTS guard), getNodeMetrics (single-row reader).
- biomarkers/index.ts: evaluateAnalysableNodes returns a third
  map of metrics-by-node; analyseSingleFile persists to the new
  table after the findings persist.
- BIOMARKER_CACHE_KEY bumped v1 → v2 so existing indexes
  self-heal: without the bump, the per-file content-hash cache
  would short-circuit unchanged files and leave node_metrics
  empty until users hit `--force` themselves.
- Viewer server: buildMetricsBlock now reads from node_metrics
  first; falls back to the per-finding metric on older indexes.
  paramCount also surfaced (not yet displayed in the UI).

Smoke-tested on the codegraph self-index after re-index:
extractFromSource now reports cyclomatic 3 / maxNesting 1 /
paramCount 3 — matching what the engine computes during
biomarker analysis.

Reviewer-memo gates passed:
- #4 schema-version asserts both bumped to 29.
- #5 node_metrics in BOTH migration AND schema.sql.

Updated mcp-reindex test (cache key bump). Suite 1425/34/0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
…ue filter

When an agent passes a filter that constrains every result to one value
(search `kind:X`, biomarkers `biomarker:Y`), the corresponding column
is the same on every row — pure noise. Drop the per-row column and put
the value once in the section header.

- search.ts: drops `(${kind})` per row when `kind` filter is set,
  annotates header with `(kind=X)`. Saves ~15 chars/row.
- biomarkers.ts (ranked mode): drops the `Biomarker` table column
  (header + separator + per-row cell) when `biomarker` filter is set,
  annotates header with the biomarker name. Saves ~15 chars/row.

Backlog #4 narrowed: the literal "audit every tool" sweep was moot —
most tools (callers/callees/biomarkers without filter) genuinely
return mixed kinds. The narrow filter-implied-column case is the
clean win that survives premise verification.

2 new tests in __tests__/implied-keys.test.ts. Suite 1444/34/0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 3, 2026
…olbymchenry#8)

Promotes the resolver numeric `metadata.confidence` to a categorical,
queryable column on `edges`. `codegraph_callers`, `_callees`,
`_impact` now annotate each row with `*(INFERRED)*` when the edge is
not concrete, and accept a new `minConfidence` arg to drop edges
below a chosen level — useful for refactor-impact analysis where
false-positive reach inflates the blast radius.

Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in
EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy /
structural / extractor-direct edges need no producer change. Index
on the column for filter-down queries. Idempotent ADD COLUMN with
table-exists guard following the convention from migrations 020 /
021 / 023 / 030.

Producer: `ReferenceResolver.createEdges` calls a new free
`classifyConfidence(resolvedBy, score)` helper:
  - import / qualified-name / file-path / exact-match → EXTRACTED
  - framework with score >= 0.85 → EXTRACTED, else INFERRED
  - instance-method / fuzzy → INFERRED
AMBIGUOUS is not stamped from the resolver path in v1 — needs
tie-tracking from the candidate picker. Tracked as B3.

Consumers (`result-formatters.ts`):
  - `formatConfidence(edge)` — markdown italic suffix (empty for
    EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise).
  - `parseMinConfidence(raw)` — returns level / null / errorResult.
  - `filterByConfidence(rows, min)` — drop below threshold.
  - `CONFIDENCE_RANK` — numeric ordering for inline filter loops.
Wired into all four code paths in `callers.ts` (collectCallers,
collectCallersForSource, collectTypeUsers, formatGroupedCallers),
both paths in `callees.ts`, and `impact.ts` with a
`filterImpactByConfidence` BFS that prunes nodes unreachable along
the surviving edge set.

Type cleanup: `Edge.confidence` added with JSDoc; the long-dead
`Edge.provenance` field also got a JSDoc note pointing at the
SCIP-integration-reservation explanation in memory.

Reviewer-memo #4 catch (schema-version test forgetfulness): noticed
`__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)`
even though migrations 030 and 031 had shipped. Catch-up to 32 in
this commit covers both the missed bumps and the new migration.

Reviewer: APPROVE, three info-only findings (type-user display gap,
framework-rule calibration table doc, catch of the recurring pr19
pattern).

Suite: 1555 / 34 / 0 (+18 new edge-confidence tests).
Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression
budget vs 091e935 baseline).

Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11
trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 4, 2026
Total complex_method warnings 15 → 10. Pattern: replace N-arm
switch / if-cascade with `Record<string, Handler>` (or `Map`)
dispatch table. Each per-key handler becomes a small named
function; the orchestrator drops to a single map lookup +
fallback.

  - src/search/query-parser.ts (classifyQueryToken, was 19):
      10-arm switch over the colon-key (`kind:` / `lang:` /
      `path:` etc.) replaced by a TOKEN_HANDLERS Map. Each
      handler pushes either to `out.{field}` or to `textParts`
      on invalid input. Aliases (`lang`/`language`,
      `sig`/`signature`) point at the same handler reference.
      Unknown keys still fall through to `textParts.push(tok)`,
      and invalid enum values still preserve the original token
      so the user's typo doesn't disappear silently.

  - src/extraction/graphql-extractor.ts (visitDefinition, was 15):
      6-arm switch on `specific.type` replaced by
      TYPE_DEFINITION_EMITTERS Record. The 6 emit methods
      (emitObject / emitInterface / emitInputObject / emitEnum
      / emitUnion / emitScalar) had their `private` modifier
      dropped and `/** @internal */` JSDoc added so the
      module-level dispatch table can call them.
      GraphqlExtractor is module-internal (not re-exported from
      the package index; only `.extract()` is consumed
      externally), so the widened TS structural surface has no
      in-tree or external consumer.

  - src/extraction/languages/rescript.ts (visitReScriptNode, was 19):
      9-arm if-cascade on `node.type` replaced by
      RESCRIPT_NODE_HANDLERS Record. Five named helper functions
      for the cases with non-trivial bodies, four inline arrow
      handlers for the one-liner cases. Dispatcher returns
      `handler !== undefined`, preserving the original's
      "true on handled, false on fallthrough" contract.

  - src/mcp/tools/compare.ts (fmtFileSection, was 17):
      Three helper extractions:
        * fileStatusSuffix(f) — collapses the
          `(new file)/(deleted)/''` ternary chain
        * appendSymbolList(out, label, symbols) — handles
          the added/removed/modified triplet (no-op when empty)
        * appendFindingsDelta(out, fd) — handles the
          `+ findings introduced` / `− cleared` /
          `_carried over_` block
      Plus one behaviour fix flagged by reviewer: the
      noFindingsChanges gate now also excludes carried-only
      files so they appear in output (previously a file with
      ONLY carried findings and no structural changes was
      silently suppressed; the carried-finding count is useful
      context for the agent).

  - src/viewer/server.ts (handleRequest, was 18):
      12-route if-cascade replaced by GET_ROUTES table of
      `{ match, handle }` entries. `match` returns
      `RegExpExecArray | true | null`; dispatcher walks in
      order, stopping at first non-null match. Pulled out
      `respondWithIdLookup` for the symbol/source pair
      (decode-or-400 + lookup-or-404 + 200 chain) and
      `sendIndexHtml` for the / and /index.html pair. POST
      /api/ask and the non-GET method-not-allowed path are
      unchanged. Route ordering preserved (exact /api/findings
      precedes the regex /api/findings/:biomarker).

Reviewer pass approved after addressing the carried-only-gate
behaviour change in #4. Suite 1639/34/0. Total warnings 15 → 10.
andreinknv added a commit that referenced this pull request May 7, 2026
Post-refactor backlog #4. Replaces the prior "all three of
large/complex/nested fire at warning+" gate with a true composite
score that combines size, branchiness, nesting, and conditional
density. The old metric was just LOC, so a 200-LOC flat dispatch
table fired the same as a 200-LOC method with deep nested loops
and dense boolean expressions — exactly the discrimination the
brain_method smell is supposed to provide.

Formula:
  score = (loc / 100) × (cyc / 10) ×
          max(1, nest / 3) × max(1, condOps / 4)

Each factor is normalised at 1.0 around the middle of its single-
metric biomarker's threshold band; max(1, ...) on nesting / cond
prevents low-density values from depressing the LOC×CYC product
(LOC and CYC stay the dominant signals).

Worked examples on a 200-LOC method:
  flat dispatch (cyc=15, nest=2, cond=3):
    2.0 × 1.5 × 1.0 × 1.0 = 3.0   → no finding (below info gate)
  nested-loop monstrosity (cyc=25, nest=5, cond=8):
    2.0 × 2.5 × 1.67 × 2.0 = 16.7 → warning

Severity ladder by score: 5 info, 10 warning, 20 error.

Gates: LOC < 50 OR cyclomatic < 8 → not even considered. Tiny
helpers can't be brain methods even when dense; size + complexity
together are the smell.

Future work: include a shared-mutable-state proxy (let bindings,
cross-block reassignments) once the AST analysis exists — would
cover methods that look tractable on the surface but have hidden
coupling.

Tests rewritten: dropped the old "all 3 prereqs"/"only one prereq"
shape; added high-composite-score positive, flat-vs-nested
discrimination at same LOC, LOC-gate negative, cyclomatic-gate
negative. Suite 1663/34/0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 7, 2026
When every row in a callers / callees / impact result carries the
same non-default confidence (INFERRED or AMBIGUOUS uniformly across
all rows — the common shape resolver output takes when ambiguity
hits), hoist a single ` — all *INFERRED*` suffix into the header
instead of repeating ` *(INFERRED)*` on every row.

Live measurement on `codegraph callers getFileDependents` (5
rows, all INFERRED): 589 → 543 bytes (-7.8%). Larger savings on
bigger results — for a 30-row uniform callers output the savings
scale to ~360 bytes (~12%).

Mixed-confidence outputs (EXTRACTED + INFERRED, or AMBIGUOUS +
INFERRED) keep their per-row markers so individual row trust is
still legible. Single-row outputs also keep the per-row marker
since there's no consolidation benefit.

The detection lives in a new `detectUniformConfidence` helper
exported from result-formatters.ts so other tabular renderers
(e.g. the grouped-callers / grouped-callees paths in callers.ts /
callees.ts) can adopt the same pattern incrementally — same
optimization shape as #4 "strip implied keys" already shipped
for kind columns.

+5 focused tests covering the uniform / mixed / EXTRACTED-mixed /
single-row / direct-API cases. Suite 1685/34/0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 8, 2026
…rs rollups

Round-trip-reduction item #4 — fold the top-N from `codegraph_hotspots`
and `codegraph_biomarkers mode=ranked` into the status response so the
onboarding pattern "what's interesting in this repo?" (status + hotspots
+ biomarkers + risk_review + digest = 4-5 calls) collapses to one call.

Added flags (both default 0 → suppressed → byte-identical legacy output):
- `topHotspots: number` — top-N by composite risk (centrality × churn).
  Skips silently on a non-git repo or empty hotspots table.
- `topBiomarkers: number` — top-N at warning+ severity, worst-first.

Each clamps to [1, 30]; negative / NaN values collapse to suppressed.
Both rollup sections render AFTER the readiness/server sections so the
agent's eye-scan still hits freshness/health banners first.

Skipped `summaryCoverage` flag from the original spec — the existing
Feature Readiness section already shows summary count + percentage; a
dedicated flag would be redundant.

Suite 1778/0/34 (+6 new tests). No schema migration. Backwards-compat
locked in by the existing `mcp-status.test.ts` regression guards.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 9, 2026
…rieval

Subtask colbymchenry#10.1 of the embedding-features arc Stage 2. Adds a `grain
TEXT NOT NULL DEFAULT 'symbol'` column to `symbol_embeddings` so the
table can carry per-symbol embeddings (existing) and per-file
aggregated embeddings (forthcoming subtask colbymchenry#10.2). File-grain rows
will reuse the file's nodes(id) since `getEmbeddableNodes` already
excludes file kinds from per-symbol embedding — no PK collision.

Schema change is column-additive (ALTER TABLE ADD COLUMN); no
backfill needed because the default 'symbol' covers every existing
row. schema.sql updated alongside the migration per recurring
scrutiny pattern #5. Schema-version test assertions bumped 42 → 43
per recurring scrutiny pattern #4.

Suite remains 2100/0/34. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 9, 2026
Subtask #4.1 of the embedding-features arc Stage 3. New module
src/llm/reranker-client.ts wraps @huggingface/transformers (already
installed by Stage 1's in-process embedding cutover) and runs a
cross-encoder relevance scorer over (query, candidate) pairs.

Public API mirrors LocalEmbeddingClient:
- new RerankerClient(cfg: RerankerProviderConfig | null)
- isConfigured: boolean
- rerank(query, candidates: string[]): Promise<number[]>
- scoreOne(query, candidate): Promise<number>
- isReachable() / listModels() — same any-up semantics

Default model: Xenova/ms-marco-MiniLM-L-6-v2 (~80MB ONNX,
~50 pairs/sec on Apple Silicon). Lazy model load + module-level
pipeline cache keyed by `<model>::<dtype>` so process-wide first
search pays the ~3s init cost once.

Pipeline output normalisation: the text-classification pipeline
returns either a flat [{score}, ...] or nested [[{score}], ...]
depending on input shape; the wrapper handles both.

Optional dep discipline: when `@huggingface/transformers` is missing,
rerank() rejects with a clear LlmEndpointError pointing at the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 9, 2026
Subtask #4.2. Adds optional `rerankerLlm` block to the user-facing
CodeGraphConfig.llm shape (`provider: 'local'`, optional `model`,
optional `dtype`) and surfaces it on the resolved LlmEndpointConfig
+ ResolvedLlm contracts.

The new `resolveReranker(llm)` helper in provider.ts is a pure
pass-through (no env-var override, no legacy field) since reranking
is a Stage 3 feature that shipped after the legacy migration window.

Default value: unset / null — operators opt in to the precision pass
explicitly. When unset, no behavior change vs Stage 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 9, 2026
Subtask #4.3. After the file-grain merge + query-expansion stages of
the semantic-search pipeline, when `resolved.rerankerLlm` is set:
fetch each candidate's name/signature/docstring/summary from SQLite
in one statement, build the cross-encoder input text the same way
the embedder does, call RerankerClient.rerank(query, texts), then
reorder the semantic hits by the new scores.

Replacing the cosine score with the reranker logit is safe because
the downstream reciprocalRankFusion uses rank position only, not
the raw score value. Returns the unmodified hits on any failure so
the search path always degrades gracefully when the optional dep is
missing or the model load throws.

Wired through searchHybrid (called by codegraph_ask, codegraph_search
mode='semantic', and codegraph_review_context's RAG retrieval).
Default off (rerankerLlm = null in config), so installs that haven't
opted in see no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 9, 2026
Subtask #4.4. 29 vitest cases across 8 describe blocks:
- Configuration: isConfigured + listModels variants (5 tests).
- Rerank input handling: empty candidates, null cfg, aborted signal.
- Pipeline output normalization: flat / nested / mixed shapes,
  missing score, empty inner array.
- Pipeline call shape: pair format + top_k option.
- Model load caching: single load, _clearRerankerCache, different
  model + dtype keys.
- scoreOne wrapper.
- isReachable.

All tests use vi.mock to intercept the @huggingface/transformers
import — no live model load. Suite goes 2166 -> 2195 (+29).

Two contract drift notes captured by the test agent:
- scoreOne propagates LlmEndpointError when cfg is null (does NOT
  return NaN as the spec aspirated). Test pins the real behavior.
- isReachable's load-throws branch isn't testable with hoisted
  vi.mock; documented as a skip with rationale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit that referenced this pull request May 11, 2026
…nums

Bundles four schema-quality improvements gated by one new migration
(054) and the matching schema.sql changes for fresh installs.

1. STRICT on every base table (32 of them). Schema-survey before the
   change confirmed the codebase already used only the 5 STRICT-legal
   types (TEXT/INTEGER/REAL/BLOB/ANY) — STRICT is just the missing
   keyword + a per-table rebuild for existing DBs. Catches a class of
   silent type-coercion bugs (a stringified node id silently stored
   in an INTEGER column would compare unequal in JOINs, break PK
   lookups, etc).

2. WITHOUT ROWID on 15 natural-PK tables: role_assignments, files,
   co_changes, project_metadata, symbol_issues, summary_refs,
   directory_summaries, node_coverage, code_health_findings,
   node_loc_history, mcp_sessions, mcp_macros, node_metrics,
   summary_priority_queue, commit_intents. Skips tables with rowid
   dependencies (nodes/summary_store/test_names back FTS5 content;
   embedding_store/embedding_refs/symbol_chunk_embeddings back vec0
   mirrors and HNSW; parse_cache/mcp_tool_calls evict by oldest
   rowid). ~30% storage reduction on the affected tables and one
   B-tree fewer per PK lookup.

3. CHECK constraints on five enum-shaped columns: nodes.kind,
   nodes.role, edges.kind, edges.confidence, embedding_refs.grain.
   Same philosophy STRICT applies to types extends to values: a
   typo'd kind like `funtion` is rejected at write time instead of
   silently corrupting downstream histogram queries. Allowed lists
   mirror src/types.ts (NodeKind / EdgeKind), src/llm/classifier.ts
   (ROLE_LABELS), and the categorical mapping in
   src/db/migrations/032-edge-confidence.ts.

4. Promoted code_health_findings.metric INTEGER -> REAL. brain_method
   emits a fractional composite score (~12.3); the old INTEGER column
   silently stored it as REAL under loose typing but a STRICT INTEGER
   column rejects it.

Migration 054 mechanics:
  - requiresFkDisable: true (FK toggling for the parent-rebuild loop)
  - Reads CREATE statements from sqlite_master, rebuilds via the
    standard SQLite pattern (CREATE _strict_tmp / INSERT / DROP /
    RENAME), recreates indexes + triggers from captured SQL.
  - Per-table fixup hook (`PER_TABLE_FIXUPS`) injects STRICT, the
    code_health_findings.metric REAL promotion, and the five CHECK
    constraints during the rebuild for existing DBs.
  - Idempotent: tables already STRICT are skipped (fresh DBs whose
    schema.sql already has it replay as no-ops).
  - Excludes virtual tables (FTS5 / RTree), vec0 mirrors, sqlite_*,
    and shadow-backing tables (`<virtual>_*`).
  - Pre-flight guard skips the entire migration when nodes lacks
    start_line/end_line — partial-schema test fixtures from
    migrations-015-016 / migrations-022 would otherwise trip the
    rtree triggers from migration 034. Production DBs always have
    these columns since v1.
  - Defensive cleanup: any temp table left from a partial rebuild
    failure is DROP IF EXISTS in the catch handler.
  - VACUUM at the end to reclaim space from the 32-table rebuild
    (silently skipped if the runner wraps us in a transaction).

Side fix: src/extraction/index.ts:2041 was writing stats.mtimeMs (a
fractional float from fs.statSync) into files.modified_at INTEGER.
Pre-STRICT this silently coerced; STRICT rejects it and broke 366
tests. Floor the value at the writer.

Reviewer-memo update: item #4 (schema-version test forgetfulness)
is now self-updating — CURRENT_SCHEMA_VERSION is filename-derived
and both foundation.test.ts + pr19-improvements.test.ts compare
against the imported constant, not a literal. The memo entry's
old "BOTH tests assert .toBe(NN) literally" is stale and would
mislead the next reviewer. Refreshed.

Tests: 178 files / 2,470 passed, 34 skipped, 0 failed. New
migrations-054 test file covers: every base table is STRICT in a
fresh DB; STRICT actually rejects TEXT-into-INTEGER write; metric
column is REAL; fractional metric value round-trips; CHECK on
nodes.kind rejects typos; CHECK on edges.kind rejects bogus values.

Reviewed by independent reviewer agent in two passes (initial
STRICT-only diff, then this expanded version). Verdict APPROVE on
migration idempotency, pruning predicate correctness, trigger
semantics, FK-disable safety, schema.sql/migration lock-step, and
CLI/MCP alignment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant