feat+perf(llm): split chat/askChat provider + summary batching [WIP 3/5] by andreinknv · Pull Request #5 · andreinknv/codegraph

andreinknv · 2026-05-01T11:21:20Z

Stress-test fixes — PR 3 of 5 (stacked). Base: pr2-cache-preservation.

Commits

2620304 feat(llm): split provider per role — chat vs askChat
877930d perf(llm): summary batching, cache-aware, provider-defaulted to 3 for bridge/api

Summary

Why these are paired: the summary-batching commit's provider gating reads the new askChat slot from 2620304, so they must ship together.

Split provider (`2620304`)

End-to-end measurement uncovered a real cost/quality tradeoff: bulk summary work runs ~2.6× faster on a local MLX-served Qwen3-Coder-30B than on claude-bridge → Haiku, with comparable summary quality. But the same Qwen3-Coder confidently confabulates wrong answers on synthesis-heavy ask queries. Sonnet via claude-bridge gives precisely correct answers (12-40s).

The pre-existing schema (chat.askModel: string) only allowed swapping the model id within the same provider/endpoint. New top-level askChat block lets bulk and ask route through entirely different providers:

"chat":    { "provider": "openai-compat", "endpoint": "...", "model": "qwen3-coder" },
"askChat": { "provider": "claude-bridge",                      "model": "claude-sonnet-4-6" }

getAskModel() cascade: askChat.model → chat.askModel → chat.model → chatModel (legacy).

Summary batching (`877930d`)

Bench against MLX with 12 real symbols at batch sizes 1/3/5/10:

Local single-stream backends (MLX, llama.cpp): wall-clock identical regardless of batch size — no per-call overhead to amortise.
Quality: BATCH=3 indistinguishable from 1, BATCH=5 starts losing specifics, BATCH=10 noticeably terser.
Implication: batch only when there's per-call overhead (claude-bridge subprocess spawn, anthropic-api network roundtrip).

Worker accumulates a per-worker pending list; cache hits never enter it. Parse failures fall back to per-item generateSingle. Provider gating defaults batch=3 for bridge/anthropic-api, batch=1 for openai-compat.

Test plan

Suite green at 1113 / 13 skip / 0 fail.
Manual: split-provider config produces summaries via chat model and ask responses via askChat model.

WIP draft.

End-to-end measurement of the cache-preservation work uncovered a real cost/quality tradeoff: bulk summary work runs 2.6x faster on a local MLX-served model (Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ via mlx-openai-server) than on claude-bridge -> Haiku, with comparable quality. But the same Qwen3-Coder model confidently confabulates WRONG answers on synthesis-heavy ask queries — got the PRAGMA foreign_keys direction reversed and cited the wrong migration filename when asked about the cache-preservation mechanism. Sonnet via claude-bridge gave precisely correct answers, but takes 12-40s. The pre-existing schema (chat.askModel: string) only allowed swapping the model id within the SAME provider/endpoint. To route bulk and ask through entirely DIFFERENT providers, this PR adds a top-level `askChat` block. ## Changes ### Schema (src/types.ts, src/llm/client.ts) New optional `llm.askChat` block mirroring `chat`: "chat": { "provider": "openai-compat", "endpoint": "http://localhost:8081/v1", "model": "qwen3-coder" }, "askChat": { "provider": "claude-bridge", "model": "claude-sonnet-4-6" } `LlmEndpointConfig.askChat?: ChatProviderConfig | null` parallels the existing `chat` slot. `normalizeEndpointConfig` returns the new field alongside chat + embeddings. ### Resolver (src/llm/provider.ts) New `resolveAskChat()` mirrors `resolveChat()` but never reads legacy flat fields — askChat is opt-in only. Wired into `resolveLlmProviders` so the resulting `ResolvedLlm` carries both `chat` and `askChat` slots. New `getAskModel()` helper with cascade: - askChat.model -> chat.askModel -> chat.model -> chatModel (legacy) This is the right primitive for any caller that wants to know which model id will actually answer an ask/dead-code call. ### Client routing (src/llm/client.ts) `LlmClient` now lazy-loads TWO backends instead of one: - `chatBackend` — for bulk work (summaries, classifier, dir summaries). - `askChatBackend` — for ask + dead-code judge, only when askChat is configured. `LlmClient.chat(messages, { useAskModel: true })` checks `askChatCfg`: - If set → dispatch to ask backend with `useAskModel: false` (the ask backend's own `model` field is already the right model; no re-swap needed within that backend). - If unset → dispatch to chat backend with `useAskModel` passed through (legacy single-provider behaviour: same backend, swap model id via `chat.askModel`). `instantiateBackend` factored out from `getChatBackend` so both paths share construction logic. `isReachable()` now also probes the askChat backend when configured — returns false if EITHER chat OR askChat fails, so status output never claims reachable when ask calls would throw at runtime (e.g. claude-bridge binary missing on the ask side). ### Display + gating (src/bin/codegraph.ts, src/mcp/tools.ts) - `codegraph status` shows a separate `Ask model:` line when ask routes differently from chat. Provider hint in parentheses (`Ask model: claude-sonnet-4-6 (claude-bridge)`) appears only in true split-provider setups, not for single-provider askModel overrides. - `codegraph ask` CLI trailer now uses getAskModel for the displayed model id so the printed `model X` matches what actually generated the answer. - handleAsk and handleDeadCode in the MCP tools now gate on getAskModel rather than getChatModel, with error messages updated to mention the askChat block as an alternative configuration path. Pre-fix, a config with chat=null but askChat configured (rare but legitimate) would have failed the gate even though ask was perfectly configured. ### Tests (__tests__/llm.test.ts) Two new tests: 1. `split provider: useAskModel routes to askChat backend when configured` — uses TWO fake servers, asserts bulk hits chat server only and ask hits askChat server only. Captures and asserts the `model` field in the request body of each, guarding against a regression where routing is right but the model id sent is wrong. 2. `legacy single-provider: useAskModel stays on chat backend, swaps model id` — asserts no-askChat config preserves prior behaviour (single backend handles both calls, model id swaps). `FakeServer` extended with `lastChatBody` capture so tests can assert which model id reached which server. ## Live validation Tested on the codegraph self-repo with chat -> MLX/Qwen3-Coder and askChat -> claude-bridge/Sonnet. Same ask question that Qwen3-Coder confabulated about now returns Sonnet's precisely correct answer: - Pre-split: 6.9s, wrong on PRAGMA foreign_keys direction, cited migration 014 (unrelated). - Post-split: 39.5s, precisely correct including line numbers (`summarizer.ts:189-197`, `queries.ts:2125`) and the right migration filename (022-add-content-hash-index). ## Backwards compatibility Existing configs with just `chat: { ..., askModel: "..." }` continue to work unchanged — askChat is optional. Test #2 covers this path. ## Reviewer trail Two passes. Pass 1: REQUEST_CHANGES + 4 findings. - (1) handleAsk/handleDeadCode used getChatModel — addressed, switched to getAskModel with updated error messages. - (2) status display omitted ask model — addressed, added conditional Ask model line in split-provider setup. - (3) isReachable didn't probe askChat — addressed. - (4) split-provider test didn't assert request-body model id — addressed. Pass 2: APPROVE + 2 info findings (stale test comment, redundant provider hint for single-provider askModel override) — both addressed. Suite: 1089 / 13 skip / 0 fail (was 1087). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… bridge/api Bench against the live MLX server with 12 real codegraph symbols at batch sizes 1/3/5/10 surfaced an asymmetry the original "batch all LLM calls uniformly" instinct would have got wrong: - Wall-clock on local single-stream backends (MLX, llama.cpp): IDENTICAL across batch sizes (~9.3-9.9s for 12 symbols regardless). No per-call overhead to amortise; total generation work dominates. - Quality: monotonically degraded — BATCH=3 indistinguishable from BATCH=1 on Qwen3-Coder; BATCH=5 starts losing specifics; BATCH=10 noticeably terser. - Implication: batching helps ONLY when there's per-call overhead (claude-bridge subprocess spawn ~1-2s, anthropic-api network roundtrip). For local servers, batching is pure quality loss. For codegraph self-repo on claude-bridge, the math: 672 summaries at ~30 sym/min (Phase A baseline) = 22 min. Batched 3-per-call: ~224 calls × ~5s / 4 concurrency ≈ 5 min. ~4-5x speedup with quality holding. ## Implementation (cache-aware) Worker rewritten in src/llm/summarizer.ts: - New `summaryBatchSize?: number` SummarizerOption, default 1. - Per-worker `pending` accumulator. Cache hits NEVER batch — they short-circuit before pending push, so they don't burn batch slots. Only cache MISSES enter the batch. - `flushBatch` calls client.chat with `buildBatchPrompt`, parses via `parseBatchSummaries` (mirrors classifier's bracket-aware string- state JSON extractor + ≥80% coverage gate). - On parse failure or LlmEndpointError → per-item fallback via `generateSingle` so a bad batch can't lose the whole group. - Single-item batch (e.g. trailing partial) routes straight to `generateSingle` to avoid batch-prompt overhead. - Pre-fix `try/finally`-with-`continue` double-count hazard removed by the rewrite — every exit path explicitly increments `done`. ## Provider gating src/index.ts startBackgroundSummarization picks the default based on chat.provider: - `claude-bridge` / `anthropic-api` → 3 (per-call overhead amortises) - `openai-compat` → 1 (most local servers don't continuous-batch) Users can override via `chat.summaryBatchSize` in config — useful for high-latency openai-compat endpoints (cloud-hosted proxies that pay network roundtrip cost). The public CodeGraph.summarizeAll() wrapper applies the same cascade (caller option > config > provider default). ## Token budget batchMaxTokens = max(160, batch.length * 120). Per-item budget breaks down to ~60 tokens summary text (200 chars / 3.5 chars per token) + ~40 tokens JSON framing + ~20 tokens headroom for verbose models. batch=5 gets 600 tokens, batch=10 gets 1200 — comfortable margin. ## Tests (4 new) - `parseBatchSummaries extracts summaries, tolerates prose, rejects under-coverage` — happy path, prose-wrapped JSON, bracket-in-string, sparse rejection, malformed. - `parseBatchSummaries: ≥80% coverage gate works at batch=5` — locks down the boundary (4 of 5 passes, 3 of 5 fails). - `summary batching reduces chat-call count when batchSize > 1` — asserts `incremental ≤ ceil(generated/3) + 2` slop with batchSize=3. - `summary batching: cache hits never burn batch slots` — second-pass summarizeAll on identical bodies asserts ZERO incremental chat calls AND `secondPass.cacheHits === firstPass.cacheHits + firstPass.generated` (resilient to first-pass body-read errors). Plus: fake server stub updated to detect batched-summary mode (anchored on the unique `Write a SINGLE LINE summary` instruction marker) and emit `{i, summary}` shape responses. Fixed a subtle bug in the existing batched-prompt detection: it counted bare `N.` line prefixes which would over-count when a symbol body happened to contain `1. step` text. Summary-mode now uses `### N.` headers (the summarizer's distinctive per-item marker). Suite: 1093 / 13 skip / 0 fail (was 1089). ## Reviewer trail Two passes. Pass 1: APPROVE + 3 info findings (token budget tight, test assertion fragile to first-pass errors, missing user config knob) — all addressed. Pass 2: APPROVE + 1 info finding (public summarizeAll wrapper didn't thread summaryBatchSize) — addressed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

andreinknv · 2026-05-01T11:31:10Z

Stack collapsed into wip-stress-base (now at 56fac32). Commits landed via the fast-forward of wip-stress-base from df96c6c to 56fac32; #3 shows as merged, #4-#7 are closed because their tips are already ancestors of the base. Branch preserved for reference.

… server-config flags Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed: #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe) #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*` #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)` colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed` CLI/MCP alignment (5/32 → 33+/35): - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar, biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs, module-summary, role, coverage-query, pending-summaries, save-summaries, review-context - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync, codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit, codegraph_unlock, codegraph_affected MCP server-level operator config (`codegraph serve --mcp`): - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing) - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model / --llm-api-key (operator LLM config; per-project config wins on conflict) - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders Architectural cleanups: - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces growing string-comparison chain in execute()) - `withAllowStale` registry injection only on tools that DO see the gate - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts` - `LlmClient.isEmbeddingReachable` for split-provider correctness - SyncResult `lockContention` flag → handleSync emits distinct retryable message - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force) - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs) Migrations: 023-files-is-test.ts — add `files.is_test` 024-build-context-refs.ts — add `build_context_refs` table Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes: - JSONC URL strip via state machine (was eating `https://` tails) - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent) - Dynamic imports also match template_string nodes - process.exit deferred until after finally cleanup in runViaMCP - --same-language / --different-language mutual exclusion guard - help-tools CLI bypasses isInitialized (works without a project) - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks) - handleAffected errors instead of silently dropping unsupported glob filters - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ion A #5) Closes the post-refactor backlog item #5: static large_method flags evergreen-large symbols indistinguishably from actively-growing ones; recently_grew surfaces the latter as the actionable refactor target. Architecture: - Migration 027 adds node_loc_history(node_id, indexed_ts, loc) — one row per analyseProject pass that touched the symbol. The per-file content-hash short-circuit means unchanged symbols are not re-snapshotted; the compound index (node_id, indexed_ts DESC) makes "previous snapshot for this node" a sub-millisecond seek. - src/db/queries-loc-history.ts exports recordLocSnapshots (batch insert with the canonical FK-safe SELECT WHERE EXISTS / ON CONFLICT pattern from queries-summaries) and getPriorLocSnapshots (per-node seek; one prepared stmt, N steps). - biomarkers/index.ts: at analyseProject start, capture nowMs once; per-file batch-fetch prior snapshots, evaluate rules with prior + nowMs threaded into RuleContext, persist a fresh snapshot for every symbol whose metrics we computed. - biomarkers/engine.ts: new evaluateRecentlyGrew rule. Triggers when prior_loc >= 20 AND current_loc > prior_loc * 1.5 AND (now - prior_ts) <= 30d. Severity ladder: 1.5–2x info, 2–3x warning, ≥3x error. Metric is round(ratio * 100) so the ranked-mode sort orders by growth percent. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 27 (foundation + pr19-improvements). - #5 node_loc_history added to BOTH the migration AND schema.sql so fresh installs initialize correctly. - #7 dead-export check: dropped recordLocSnapshot/getPriorLocSnapshot singulars after the reviewer flagged YAGNI; only the batch variants are exported (and called). 9 engine-level tests for the rule logic. Full suite 1393/34/0 (+9 from prior). Persistence path is exercised implicitly by every existing biomarker test (silent green confirms the schema + writes are sound on real fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

First half of the trace-logging arc that backs the viewer's Agent-trace tab. This commit lands the storage + logger; the MCP server hookup (commit B) and the viewer endpoints (commit C) follow. - Migration 028 + schema.sql: mcp_sessions(id, started_ts, last_activity_ts, tool_count) + mcp_tool_calls(session_id, step, ts, tool_name, args_json, result_summary, duration_ms). FK CASCADE from calls → sessions; idx_mcp_tool_calls_ts gives the retention sweep an O(log n) scan. - src/db/queries-trace.ts: insertSession, appendToolCall (transaction that bumps the parent session's tool_count atomically), pruneToolCalls (cap rows + GC orphan sessions), recentSessions, callsForSession. - src/trace/logger.ts: TraceLogger class. Constructor inserts a session row; log() appends + prunes-every-N. All persistence is best-effort (DB write failures log at debug, never throw — trace must not lose tool results). Args + result-summary are truncated at 2000 chars. - generateSessionId(): unix-seconds prefix + 6 random hex chars, sortable by time. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 28 (foundation + pr19-improvements). - #5 mcp_sessions + mcp_tool_calls in BOTH the migration AND schema.sql. 10 logger round-trip tests. Full suite 1422/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Second commit of the trace-logging arc. Wraps every toolHandler.execute() call in a TraceLogger.log() so the call flows into the mcp_tool_calls table that landed in 027139b. The viewer's Agent-trace tab (commit C) reads it back. Lifecycle: - TraceLogger is created lazily once `cg` is open (via ensureTraceLogger() called on each dispatch). Skipped entirely when --no-write-tools is passed: the spirit of that flag is "no DB writes", and trace logging is a write path. - log() is contractually best-effort (DB failures swallowed at debug). The dispatch site additionally wraps it in try/finally so even a contract violation can never strand the tool result — sendResult always fires. Reviewer-memo gates passed: - #1 docstring rot: ensureTraceLogger and the dispatch comments match the implementation. - #2 best-effort claim is upheld at both layers (logger try/catch + caller try/finally). - #5 N/A — this commit doesn't add tables. The 3-line wiring isn't separately tested; TraceLogger has 10 round-trip tests covering every behavior the wiring composes, and the wiring has no branches. Suite 1422/34/0 unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the cyclomatic + maxNesting "—" gap in the viewer's right pane. Previously these fields only had a value when a biomarker finding fired (via the metric on `code_health_findings`); clean symbols read null. Now every analysable symbol has a row in node_metrics with all six engine-computed metrics, overwritten on each pass. Architecture: - Migration 029 + node_metrics table: PK on node_id, overwritten per pass via a single FK-safe upsert. Distinct from code_health_findings (threshold-only) and node_loc_history (LoC time-series for recently_grew). Schema in BOTH the migration AND schema.sql per memo #5. - queries-metrics.ts: upsertNodeMetricsBatch (one transaction, WHERE EXISTS guard), getNodeMetrics (single-row reader). - biomarkers/index.ts: evaluateAnalysableNodes returns a third map of metrics-by-node; analyseSingleFile persists to the new table after the findings persist. - BIOMARKER_CACHE_KEY bumped v1 → v2 so existing indexes self-heal: without the bump, the per-file content-hash cache would short-circuit unchanged files and leave node_metrics empty until users hit `--force` themselves. - Viewer server: buildMetricsBlock now reads from node_metrics first; falls back to the per-finding metric on older indexes. paramCount also surfaced (not yet displayed in the UI). Smoke-tested on the codegraph self-index after re-index: extractFromSource now reports cyclomatic 3 / maxNesting 1 / paramCount 3 — matching what the engine computes during biomarker analysis. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 29. - #5 node_metrics in BOTH migration AND schema.sql. Updated mcp-reindex test (cache key bump). Suite 1425/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds appendMoreHint() helper in shared.ts and wires it into the six highest-traffic capped tools: callers, callees, search, biomarkers, hotspots, history. When the tool detects its result was at-or-above the requested limit, the response gets a tail line: "> Result capped — pass a higher `limit` to see more." Lets agents skip a redundant re-fetch when the absence of the hint confirms they got the complete set, AND signals when re-running with a higher limit could surface more — backlog #5, scoped narrower than the literal spec after a verification pass found that almost every tool already caps results. Each tool computes hasMore from its own ground truth — exact rawCount > shownCount for callers/callees/search, heuristic rows.length >= limit for the SQL-LIMIT tools (biomarkers/hotspots/ history). Hint goes after truncateOutput in every tool so the 15K budget can't chop it. Multi-match grouped paths in callers/callees also wired (formatGroupedCallers/Callees now return { text, hasMore }). 11 new tests in __tests__/more-hint.test.ts cover the helper and single-match integration paths; suite 1442/34/0 (was 1431). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… 040) Ships the role-classifier cascade after the empirical N=48 ablation established that docstring-input retains comparable quality to summary-input (B<->baseline 66.7% vs A<->baseline 68.8% — within local-LLM noise; eval memory at project_role_classifier_input_eval_2026_05_08). Migration 040 - Adds nodes.role + nodes.role_model + idx_nodes_role - Backfills from symbol_summaries.role; idempotent guard (nodes.role IS NULL) so a direct re-invoke can't clobber classifier-assigned values - Pre-016 hand-rolled-test guard via sqlite_master + pragma_table_info - schema.sql parity (memo item #5) - Old symbol_summaries.role + idx_summaries_role left in place for one cycle; future migration drops them Classifier cascade input - Candidate.summary now nullable; new Candidate.docstring - descriptionFor(c) picks summary when present, falls back to docstring (both .trim()); empty strings count as absent - buildSinglePrompt and buildBatchPrompt use the cascade output and label the source ("Description (from summary)" vs "Description (from docstring)") so the LLM knows the input provenance - getClassifiableSummaries -> getClassifiableNodes (LEFT JOIN summary, returns nodes where (summary OR docstring) AND no role for this model) Consumer read paths - getSymbolRoles / findNodesByRole / getRoleCounts now read from nodes.role (not symbol_summaries.role) - upsertSymbolRole writes to nodes.role / nodes.role_model - All in-tree consumers (codegraph_role tool, status, module tool) use the helpers; transparent to them Test fixtures + 1 new test - foundation.test.ts + pr19-improvements.test.ts: schema 39 -> 40 - llm-tiers.test.ts wipe target: nodes.role (not symbol_summaries.role) - New focused test: classifier input cascade — docstring-only node gets classified when summary is absent Docs - CLAUDE.md classify line + server-instructions.ts codegraph_role description updated to mention the cascade input - JSDoc rot purged across 8 sites in classifier.ts / codegraph-llm-service.ts / bin/codegraph.ts / mcp/tools/classify.ts (all "already-summarised symbols" / "every summarised symbol" replaced with cascade-aware wording) Live measurement on this codegraph corpus (1932 newly-classified symbols via the cascade, end-to-end run) - Roles before: 1414 - Roles after: 3346 (+1932 = +136% coverage) - Via summary path: 1416 - Via docstring path: 1930 (the cascade-only contribution) - Unknown rate: 78/3346 = 2.3% Suite 1752/0/34 (was 1751; +1 cascade test). Typecheck clean. Two reviewer rounds: round 1 caught JSDoc rot across 8 sites + flagged backfill idempotency; round 2 APPROVE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four request_changes issues from the independent reviewer pass on 87109da^..ef21afd: 1. schema.sql: add hnsw_meta DDL alongside migration 042 — fresh installs that initialize from schema.sql directly were silently falling back to vec0 forever (recurring scrutiny pattern #5). 2. codegraph-llm-pass.ts: clear svc.hnswByDim alongside the existing embeddingCache.invalidate() in runEmbedPhase. The background embed path was leaving stale HNSW indexes in the in-memory query cache until the next foreground summarizeAll/embedAll fired the clear. 3. hnsw-build.ts: use Math.abs on the rowCount delta so pure-deletion churn (no new inserts, maxRowid unchanged) still triggers a rebuild. The growth-only check would have left the index stale indefinitely after a large delete. 4. codegraph-llm-service.ts: narrow tryLoadHnswForDim catch to the "no such table" shape so genuine DB failures (SQLITE_CORRUPT / SQLITE_IOERR) surface instead of being silently swallowed as vec0-fallback signals. Suite remains 2100/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rieval Subtask colbymchenry#10.1 of the embedding-features arc Stage 2. Adds a `grain TEXT NOT NULL DEFAULT 'symbol'` column to `symbol_embeddings` so the table can carry per-symbol embeddings (existing) and per-file aggregated embeddings (forthcoming subtask colbymchenry#10.2). File-grain rows will reuse the file's nodes(id) since `getEmbeddableNodes` already excludes file kinds from per-symbol embedding — no PK collision. Schema change is column-additive (ALTER TABLE ADD COLUMN); no backfill needed because the default 'symbol' covers every existing row. schema.sql updated alongside the migration per recurring scrutiny pattern #5. Schema-version test assertions bumped 42 → 43 per recurring scrutiny pattern #4. Suite remains 2100/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#2) Wires the heuristic commit-intent classifier (shipped in d12ee01) into a SHA-keyed persistence layer: - Migration 045: new `commit_intents (sha PK, intent, score, seen_at)` table with idx_commit_intents_intent for per-intent queries. - schema.sql parity (recurring scrutiny pattern #5). - Schema-version test assertions bumped 44 → 45. - src/db/queries-commit-intents.ts (new): recordCommitIntent / recordCommitIntents (batch upsert with txn) / getCommitIntent / getCommitIntents (multi-SHA Map fetch) / aggregateIntentBreakdown (returns Record<intent, count> for codegraph_history-style folds) / clearCommitIntents (cochange full-rescan path). Mining-side integration deferred — `mineCoChanges` doesn't currently emit subjects (only SHAs + file pairs). Adding a subject-capture pass is tracked as a Stage 7 #2 follow-up. Suite remains 2374/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sibling-table approach (deferred-design from earlier-session friction with the compound-PK rebuild on symbol_embeddings): - Migration 046: new `symbol_chunk_embeddings (node_id, chunk_idx)` PK table, FK CASCADE to nodes(id). chunk_idx >= 1 for chunks; the canonical chunk_idx=0 stays in symbol_embeddings. - schema.sql parity (recurring scrutiny pattern #5). - Schema-version test bumps 45 → 46. - src/embeddings/multi-vec.ts (NEW): `chunkSymbol` line-window split with overlap. Constants: LARGE_SYMBOL_LOC_THRESHOLD = 500 CHUNK_LOC = 200 CHUNK_OVERLAP = 50 - src/db/queries-chunk-embeddings.ts (NEW): upsertChunkEmbedding / getChunkEmbeddingsByNode / clearChunkEmbeddings / countChunkEmbeddings. - 33 tests in __tests__/multi-vec.test.ts covering chunking boundary cases, overlap correctness, persistence, cascade-on-delete. What's NOT done (deferred to a follow-up PR): runEmbedPhase integration to actually generate chunk embeddings, findSimilarViaVec multi-vec retrieval merge, vec0 mirror table for chunks. Foundation is sound; the wiring is the next step. Suite: 2374 → 2433 (+33 from new tests + agent-already-bumped vec tests). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

andreinknv and others added 2 commits April 30, 2026 18:53

andreinknv closed this May 1, 2026

andreinknv deleted the pr3-split-provider-and-summary-batching branch May 1, 2026 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat+perf(llm): split chat/askChat provider + summary batching [WIP 3/5]#5

feat+perf(llm): split chat/askChat provider + summary batching [WIP 3/5]#5
andreinknv wants to merge 2 commits into
pr2-cache-preservationfrom
pr3-split-provider-and-summary-batching

andreinknv commented May 1, 2026

Uh oh!

andreinknv commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andreinknv commented May 1, 2026

Commits

Summary

Split provider (2620304)

Summary batching (877930d)

Test plan

Uh oh!

andreinknv commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Split provider (`2620304`)

Summary batching (`877930d`)