feat+perf(llm): split chat/askChat provider + summary batching [WIP 3/5]#5
Closed
andreinknv wants to merge 2 commits into
Closed
feat+perf(llm): split chat/askChat provider + summary batching [WIP 3/5]#5andreinknv wants to merge 2 commits into
andreinknv wants to merge 2 commits into
Conversation
End-to-end measurement of the cache-preservation work uncovered a
real cost/quality tradeoff: bulk summary work runs 2.6x faster on a
local MLX-served model (Qwen3-Coder-30B-A3B-Instruct-4bit-DWQ via
mlx-openai-server) than on claude-bridge -> Haiku, with comparable
quality. But the same Qwen3-Coder model confidently confabulates
WRONG answers on synthesis-heavy ask queries — got the PRAGMA
foreign_keys direction reversed and cited the wrong migration
filename when asked about the cache-preservation mechanism. Sonnet
via claude-bridge gave precisely correct answers, but takes 12-40s.
The pre-existing schema (chat.askModel: string) only allowed swapping
the model id within the SAME provider/endpoint. To route bulk and
ask through entirely DIFFERENT providers, this PR adds a top-level
`askChat` block.
## Changes
### Schema (src/types.ts, src/llm/client.ts)
New optional `llm.askChat` block mirroring `chat`:
"chat": { "provider": "openai-compat",
"endpoint": "http://localhost:8081/v1",
"model": "qwen3-coder" },
"askChat": { "provider": "claude-bridge",
"model": "claude-sonnet-4-6" }
`LlmEndpointConfig.askChat?: ChatProviderConfig | null` parallels the
existing `chat` slot. `normalizeEndpointConfig` returns the new field
alongside chat + embeddings.
### Resolver (src/llm/provider.ts)
New `resolveAskChat()` mirrors `resolveChat()` but never reads legacy
flat fields — askChat is opt-in only. Wired into `resolveLlmProviders`
so the resulting `ResolvedLlm` carries both `chat` and `askChat` slots.
New `getAskModel()` helper with cascade:
- askChat.model -> chat.askModel -> chat.model -> chatModel (legacy)
This is the right primitive for any caller that wants to know which
model id will actually answer an ask/dead-code call.
### Client routing (src/llm/client.ts)
`LlmClient` now lazy-loads TWO backends instead of one:
- `chatBackend` — for bulk work (summaries, classifier, dir summaries).
- `askChatBackend` — for ask + dead-code judge, only when askChat is
configured.
`LlmClient.chat(messages, { useAskModel: true })` checks `askChatCfg`:
- If set → dispatch to ask backend with `useAskModel: false` (the ask
backend's own `model` field is already the right model; no re-swap
needed within that backend).
- If unset → dispatch to chat backend with `useAskModel` passed
through (legacy single-provider behaviour: same backend, swap model
id via `chat.askModel`).
`instantiateBackend` factored out from `getChatBackend` so both paths
share construction logic.
`isReachable()` now also probes the askChat backend when configured —
returns false if EITHER chat OR askChat fails, so status output never
claims reachable when ask calls would throw at runtime (e.g.
claude-bridge binary missing on the ask side).
### Display + gating (src/bin/codegraph.ts, src/mcp/tools.ts)
- `codegraph status` shows a separate `Ask model:` line when ask
routes differently from chat. Provider hint in parentheses
(`Ask model: claude-sonnet-4-6 (claude-bridge)`) appears only in
true split-provider setups, not for single-provider askModel
overrides.
- `codegraph ask` CLI trailer now uses getAskModel for the displayed
model id so the printed `model X` matches what actually generated
the answer.
- handleAsk and handleDeadCode in the MCP tools now gate on
getAskModel rather than getChatModel, with error messages updated
to mention the askChat block as an alternative configuration path.
Pre-fix, a config with chat=null but askChat configured (rare but
legitimate) would have failed the gate even though ask was
perfectly configured.
### Tests (__tests__/llm.test.ts)
Two new tests:
1. `split provider: useAskModel routes to askChat backend when
configured` — uses TWO fake servers, asserts bulk hits chat server
only and ask hits askChat server only. Captures and asserts the
`model` field in the request body of each, guarding against a
regression where routing is right but the model id sent is wrong.
2. `legacy single-provider: useAskModel stays on chat backend, swaps
model id` — asserts no-askChat config preserves prior behaviour
(single backend handles both calls, model id swaps).
`FakeServer` extended with `lastChatBody` capture so tests can assert
which model id reached which server.
## Live validation
Tested on the codegraph self-repo with chat -> MLX/Qwen3-Coder and
askChat -> claude-bridge/Sonnet. Same ask question that Qwen3-Coder
confabulated about now returns Sonnet's precisely correct answer:
- Pre-split: 6.9s, wrong on PRAGMA foreign_keys direction, cited
migration 014 (unrelated).
- Post-split: 39.5s, precisely correct including line numbers
(`summarizer.ts:189-197`, `queries.ts:2125`) and the right
migration filename (022-add-content-hash-index).
## Backwards compatibility
Existing configs with just `chat: { ..., askModel: "..." }` continue
to work unchanged — askChat is optional. Test #2 covers this path.
## Reviewer trail
Two passes. Pass 1: REQUEST_CHANGES + 4 findings.
- (1) handleAsk/handleDeadCode used getChatModel — addressed,
switched to getAskModel with updated error messages.
- (2) status display omitted ask model — addressed, added
conditional Ask model line in split-provider setup.
- (3) isReachable didn't probe askChat — addressed.
- (4) split-provider test didn't assert request-body model id —
addressed.
Pass 2: APPROVE + 2 info findings (stale test comment, redundant
provider hint for single-provider askModel override) — both
addressed.
Suite: 1089 / 13 skip / 0 fail (was 1087).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… bridge/api
Bench against the live MLX server with 12 real codegraph symbols at
batch sizes 1/3/5/10 surfaced an asymmetry the original "batch all
LLM calls uniformly" instinct would have got wrong:
- Wall-clock on local single-stream backends (MLX, llama.cpp): IDENTICAL
across batch sizes (~9.3-9.9s for 12 symbols regardless). No per-call
overhead to amortise; total generation work dominates.
- Quality: monotonically degraded — BATCH=3 indistinguishable from
BATCH=1 on Qwen3-Coder; BATCH=5 starts losing specifics; BATCH=10
noticeably terser.
- Implication: batching helps ONLY when there's per-call overhead
(claude-bridge subprocess spawn ~1-2s, anthropic-api network
roundtrip). For local servers, batching is pure quality loss.
For codegraph self-repo on claude-bridge, the math: 672 summaries at
~30 sym/min (Phase A baseline) = 22 min. Batched 3-per-call:
~224 calls × ~5s / 4 concurrency ≈ 5 min. ~4-5x speedup with
quality holding.
## Implementation (cache-aware)
Worker rewritten in src/llm/summarizer.ts:
- New `summaryBatchSize?: number` SummarizerOption, default 1.
- Per-worker `pending` accumulator. Cache hits NEVER batch — they
short-circuit before pending push, so they don't burn batch slots.
Only cache MISSES enter the batch.
- `flushBatch` calls client.chat with `buildBatchPrompt`, parses via
`parseBatchSummaries` (mirrors classifier's bracket-aware string-
state JSON extractor + ≥80% coverage gate).
- On parse failure or LlmEndpointError → per-item fallback via
`generateSingle` so a bad batch can't lose the whole group.
- Single-item batch (e.g. trailing partial) routes straight to
`generateSingle` to avoid batch-prompt overhead.
- Pre-fix `try/finally`-with-`continue` double-count hazard removed
by the rewrite — every exit path explicitly increments `done`.
## Provider gating
src/index.ts startBackgroundSummarization picks the default based on
chat.provider:
- `claude-bridge` / `anthropic-api` → 3 (per-call overhead amortises)
- `openai-compat` → 1 (most local servers don't continuous-batch)
Users can override via `chat.summaryBatchSize` in config — useful for
high-latency openai-compat endpoints (cloud-hosted proxies that pay
network roundtrip cost).
The public CodeGraph.summarizeAll() wrapper applies the same cascade
(caller option > config > provider default).
## Token budget
batchMaxTokens = max(160, batch.length * 120). Per-item budget breaks
down to ~60 tokens summary text (200 chars / 3.5 chars per token) +
~40 tokens JSON framing + ~20 tokens headroom for verbose models.
batch=5 gets 600 tokens, batch=10 gets 1200 — comfortable margin.
## Tests (4 new)
- `parseBatchSummaries extracts summaries, tolerates prose, rejects
under-coverage` — happy path, prose-wrapped JSON, bracket-in-string,
sparse rejection, malformed.
- `parseBatchSummaries: ≥80% coverage gate works at batch=5` — locks
down the boundary (4 of 5 passes, 3 of 5 fails).
- `summary batching reduces chat-call count when batchSize > 1` —
asserts `incremental ≤ ceil(generated/3) + 2` slop with batchSize=3.
- `summary batching: cache hits never burn batch slots` — second-pass
summarizeAll on identical bodies asserts ZERO incremental chat calls
AND `secondPass.cacheHits === firstPass.cacheHits + firstPass.generated`
(resilient to first-pass body-read errors).
Plus: fake server stub updated to detect batched-summary mode (anchored
on the unique `Write a SINGLE LINE summary` instruction marker) and emit
`{i, summary}` shape responses. Fixed a subtle bug in the existing
batched-prompt detection: it counted bare `N.` line prefixes which
would over-count when a symbol body happened to contain `1. step`
text. Summary-mode now uses `### N.` headers (the summarizer's
distinctive per-item marker).
Suite: 1093 / 13 skip / 0 fail (was 1089).
## Reviewer trail
Two passes. Pass 1: APPROVE + 3 info findings (token budget tight,
test assertion fragile to first-pass errors, missing user config knob)
— all addressed. Pass 2: APPROVE + 1 info finding (public
summarizeAll wrapper didn't thread summaryBatchSize) — addressed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
andreinknv
added a commit
that referenced
this pull request
May 1, 2026
… server-config flags Tooling-gap backlog (codegraph/docs/codegraph-tooling-gaps.md) closed: #1 freshness severity bucket — `classifyFreshness` with fresh|recent|stale|very_stale #2 allowStale flag — opt-in bypass for the heavy-drift gate, registry-injected schema #3 module format in status — `module-format.ts` parses package.json + tsconfig (JSONC-safe) #4 codegraph_imports tool + import-classifier — file/directory/bare/unresolvable filters #5 dynamic imports — extractor catches `import('…')` + `require('…')`, incl. template_string #6 build-context refs — new `build_context_refs` table for `__dirname` / `import.meta.*` #7 files.is_test flag — column populated by glob; surfaced in status as `(N test)` colbymchenry#11 summarize-also-embeds (discovered while dogfooding) — `cg.summarizeAll()` chains `embedAllSummaries`; new `cg.embedAll()` for embed-only path; CLI `codegraph embed` CLI/MCP alignment (5/32 → 33+/35): - 13 new CLI commands via `runViaMCP` shim: callers, callees, impact, node, similar, biomarkers, imports, help-tools, explore, hotspots, dead-code, config-refs, sql-refs, module-summary, role, coverage-query, pending-summaries, save-summaries, review-context - 7 new MCP tools: codegraph_imports, codegraph_embed, codegraph_summarize, codegraph_sync, codegraph_reindex, codegraph_coverage_ingest, codegraph_init, codegraph_uninit, codegraph_unlock, codegraph_affected MCP server-level operator config (`codegraph serve --mcp`): - --no-write-tools / --allow-stale-default / --disable-tool (sandboxing) - --llm-endpoint / --llm-chat-model / --llm-ask-model / --llm-embedding-model / --llm-api-key (operator LLM config; per-project config wins on conflict) - New CODEGRAPH_LLM_* env vars wired through `mergeLlmEnv` in resolveLlmProviders Architectural cleanups: - `bypassFreshnessGate` and `isWriteTool` declarative flags on ToolModule (replaces growing string-comparison chain in execute()) - `withAllowStale` registry injection only on tools that DO see the gate - DRY of inline copy-paste in 3 hooks → `src/index-hooks/enclosing.ts` - `LlmClient.isEmbeddingReachable` for split-provider correctness - SyncResult `lockContention` flag → handleSync emits distinct retryable message - `clearStructural` deletes from build_context_refs (was orphan-leaking on --force) - cli:dev npm script + tsx CLI fixed (web-tree-sitter `import type` for type-only refs) Migrations: 023-files-is-test.ts — add `files.is_test` 024-build-context-refs.ts — add `build_context_refs` table Reviewer rounds: 11 total, all REQUEST_CHANGES addressed inline. Notable fixes: - JSONC URL strip via state machine (was eating `https://` tails) - classifyFreshness very_stale now requires isStale (in-sync-but-old → recent) - Dynamic imports also match template_string nodes - process.exit deferred until after finally cleanup in runViaMCP - --same-language / --different-language mutual exclusion guard - help-tools CLI bypasses isInitialized (works without a project) - handleUninit sweeps projectCache by getProjectRoot (no dangling alias leaks) - handleAffected errors instead of silently dropping unsupported glob filters - mergeLlmEnv preserves precedence: legacy flat config wins over env-synthesised block Suite: 1268 passing, 1 expected red (colbymchenry#8 — undecided), 13 skipped, 1 todo, 0 regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
…ion A #5) Closes the post-refactor backlog item #5: static large_method flags evergreen-large symbols indistinguishably from actively-growing ones; recently_grew surfaces the latter as the actionable refactor target. Architecture: - Migration 027 adds node_loc_history(node_id, indexed_ts, loc) — one row per analyseProject pass that touched the symbol. The per-file content-hash short-circuit means unchanged symbols are not re-snapshotted; the compound index (node_id, indexed_ts DESC) makes "previous snapshot for this node" a sub-millisecond seek. - src/db/queries-loc-history.ts exports recordLocSnapshots (batch insert with the canonical FK-safe SELECT WHERE EXISTS / ON CONFLICT pattern from queries-summaries) and getPriorLocSnapshots (per-node seek; one prepared stmt, N steps). - biomarkers/index.ts: at analyseProject start, capture nowMs once; per-file batch-fetch prior snapshots, evaluate rules with prior + nowMs threaded into RuleContext, persist a fresh snapshot for every symbol whose metrics we computed. - biomarkers/engine.ts: new evaluateRecentlyGrew rule. Triggers when prior_loc >= 20 AND current_loc > prior_loc * 1.5 AND (now - prior_ts) <= 30d. Severity ladder: 1.5–2x info, 2–3x warning, ≥3x error. Metric is round(ratio * 100) so the ranked-mode sort orders by growth percent. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 27 (foundation + pr19-improvements). - #5 node_loc_history added to BOTH the migration AND schema.sql so fresh installs initialize correctly. - #7 dead-export check: dropped recordLocSnapshot/getPriorLocSnapshot singulars after the reviewer flagged YAGNI; only the batch variants are exported (and called). 9 engine-level tests for the rule logic. Full suite 1393/34/0 (+9 from prior). Persistence path is exercised implicitly by every existing biomarker test (silent green confirms the schema + writes are sound on real fixtures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
First half of the trace-logging arc that backs the viewer's Agent-trace tab. This commit lands the storage + logger; the MCP server hookup (commit B) and the viewer endpoints (commit C) follow. - Migration 028 + schema.sql: mcp_sessions(id, started_ts, last_activity_ts, tool_count) + mcp_tool_calls(session_id, step, ts, tool_name, args_json, result_summary, duration_ms). FK CASCADE from calls → sessions; idx_mcp_tool_calls_ts gives the retention sweep an O(log n) scan. - src/db/queries-trace.ts: insertSession, appendToolCall (transaction that bumps the parent session's tool_count atomically), pruneToolCalls (cap rows + GC orphan sessions), recentSessions, callsForSession. - src/trace/logger.ts: TraceLogger class. Constructor inserts a session row; log() appends + prunes-every-N. All persistence is best-effort (DB write failures log at debug, never throw — trace must not lose tool results). Args + result-summary are truncated at 2000 chars. - generateSessionId(): unix-seconds prefix + 6 random hex chars, sortable by time. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 28 (foundation + pr19-improvements). - #5 mcp_sessions + mcp_tool_calls in BOTH the migration AND schema.sql. 10 logger round-trip tests. Full suite 1422/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
Second commit of the trace-logging arc. Wraps every toolHandler.execute() call in a TraceLogger.log() so the call flows into the mcp_tool_calls table that landed in 027139b. The viewer's Agent-trace tab (commit C) reads it back. Lifecycle: - TraceLogger is created lazily once `cg` is open (via ensureTraceLogger() called on each dispatch). Skipped entirely when --no-write-tools is passed: the spirit of that flag is "no DB writes", and trace logging is a write path. - log() is contractually best-effort (DB failures swallowed at debug). The dispatch site additionally wraps it in try/finally so even a contract violation can never strand the tool result — sendResult always fires. Reviewer-memo gates passed: - #1 docstring rot: ensureTraceLogger and the dispatch comments match the implementation. - #2 best-effort claim is upheld at both layers (logger try/catch + caller try/finally). - #5 N/A — this commit doesn't add tables. The 3-line wiring isn't separately tested; TraceLogger has 10 round-trip tests covering every behavior the wiring composes, and the wiring has no branches. Suite 1422/34/0 unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
Closes the cyclomatic + maxNesting "—" gap in the viewer's right pane. Previously these fields only had a value when a biomarker finding fired (via the metric on `code_health_findings`); clean symbols read null. Now every analysable symbol has a row in node_metrics with all six engine-computed metrics, overwritten on each pass. Architecture: - Migration 029 + node_metrics table: PK on node_id, overwritten per pass via a single FK-safe upsert. Distinct from code_health_findings (threshold-only) and node_loc_history (LoC time-series for recently_grew). Schema in BOTH the migration AND schema.sql per memo #5. - queries-metrics.ts: upsertNodeMetricsBatch (one transaction, WHERE EXISTS guard), getNodeMetrics (single-row reader). - biomarkers/index.ts: evaluateAnalysableNodes returns a third map of metrics-by-node; analyseSingleFile persists to the new table after the findings persist. - BIOMARKER_CACHE_KEY bumped v1 → v2 so existing indexes self-heal: without the bump, the per-file content-hash cache would short-circuit unchanged files and leave node_metrics empty until users hit `--force` themselves. - Viewer server: buildMetricsBlock now reads from node_metrics first; falls back to the per-finding metric on older indexes. paramCount also surfaced (not yet displayed in the UI). Smoke-tested on the codegraph self-index after re-index: extractFromSource now reports cyclomatic 3 / maxNesting 1 / paramCount 3 — matching what the engine computes during biomarker analysis. Reviewer-memo gates passed: - #4 schema-version asserts both bumped to 29. - #5 node_metrics in BOTH migration AND schema.sql. Updated mcp-reindex test (cache key bump). Suite 1425/34/0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 3, 2026
Adds appendMoreHint() helper in shared.ts and wires it into the six highest-traffic capped tools: callers, callees, search, biomarkers, hotspots, history. When the tool detects its result was at-or-above the requested limit, the response gets a tail line: "> Result capped — pass a higher `limit` to see more." Lets agents skip a redundant re-fetch when the absence of the hint confirms they got the complete set, AND signals when re-running with a higher limit could surface more — backlog #5, scoped narrower than the literal spec after a verification pass found that almost every tool already caps results. Each tool computes hasMore from its own ground truth — exact rawCount > shownCount for callers/callees/search, heuristic rows.length >= limit for the SQL-LIMIT tools (biomarkers/hotspots/ history). Hint goes after truncateOutput in every tool so the 15K budget can't chop it. Multi-match grouped paths in callers/callees also wired (formatGroupedCallers/Callees now return { text, hasMore }). 11 new tests in __tests__/more-hint.test.ts cover the helper and single-match integration paths; suite 1442/34/0 (was 1431). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 8, 2026
… 040)
Ships the role-classifier cascade after the empirical N=48 ablation
established that docstring-input retains comparable quality to
summary-input (B<->baseline 66.7% vs A<->baseline 68.8% — within
local-LLM noise; eval memory at
project_role_classifier_input_eval_2026_05_08).
Migration 040
- Adds nodes.role + nodes.role_model + idx_nodes_role
- Backfills from symbol_summaries.role; idempotent guard
(nodes.role IS NULL) so a direct re-invoke can't clobber
classifier-assigned values
- Pre-016 hand-rolled-test guard via sqlite_master + pragma_table_info
- schema.sql parity (memo item #5)
- Old symbol_summaries.role + idx_summaries_role left in place for
one cycle; future migration drops them
Classifier cascade input
- Candidate.summary now nullable; new Candidate.docstring
- descriptionFor(c) picks summary when present, falls back to
docstring (both .trim()); empty strings count as absent
- buildSinglePrompt and buildBatchPrompt use the cascade output
and label the source ("Description (from summary)" vs "Description
(from docstring)") so the LLM knows the input provenance
- getClassifiableSummaries -> getClassifiableNodes (LEFT JOIN
summary, returns nodes where (summary OR docstring) AND no role
for this model)
Consumer read paths
- getSymbolRoles / findNodesByRole / getRoleCounts now read from
nodes.role (not symbol_summaries.role)
- upsertSymbolRole writes to nodes.role / nodes.role_model
- All in-tree consumers (codegraph_role tool, status, module tool)
use the helpers; transparent to them
Test fixtures + 1 new test
- foundation.test.ts + pr19-improvements.test.ts: schema 39 -> 40
- llm-tiers.test.ts wipe target: nodes.role (not symbol_summaries.role)
- New focused test: classifier input cascade — docstring-only node
gets classified when summary is absent
Docs
- CLAUDE.md classify line + server-instructions.ts codegraph_role
description updated to mention the cascade input
- JSDoc rot purged across 8 sites in classifier.ts /
codegraph-llm-service.ts / bin/codegraph.ts / mcp/tools/classify.ts
(all "already-summarised symbols" / "every summarised symbol"
replaced with cascade-aware wording)
Live measurement on this codegraph corpus (1932 newly-classified
symbols via the cascade, end-to-end run)
- Roles before: 1414
- Roles after: 3346 (+1932 = +136% coverage)
- Via summary path: 1416
- Via docstring path: 1930 (the cascade-only contribution)
- Unknown rate: 78/3346 = 2.3%
Suite 1752/0/34 (was 1751; +1 cascade test). Typecheck clean. Two
reviewer rounds: round 1 caught JSDoc rot across 8 sites + flagged
backfill idempotency; round 2 APPROVE.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
Four request_changes issues from the independent reviewer pass on 87109da^..ef21afd: 1. schema.sql: add hnsw_meta DDL alongside migration 042 — fresh installs that initialize from schema.sql directly were silently falling back to vec0 forever (recurring scrutiny pattern #5). 2. codegraph-llm-pass.ts: clear svc.hnswByDim alongside the existing embeddingCache.invalidate() in runEmbedPhase. The background embed path was leaving stale HNSW indexes in the in-memory query cache until the next foreground summarizeAll/embedAll fired the clear. 3. hnsw-build.ts: use Math.abs on the rowCount delta so pure-deletion churn (no new inserts, maxRowid unchanged) still triggers a rebuild. The growth-only check would have left the index stale indefinitely after a large delete. 4. codegraph-llm-service.ts: narrow tryLoadHnswForDim catch to the "no such table" shape so genuine DB failures (SQLITE_CORRUPT / SQLITE_IOERR) surface instead of being silently swallowed as vec0-fallback signals. Suite remains 2100/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
…rieval Subtask colbymchenry#10.1 of the embedding-features arc Stage 2. Adds a `grain TEXT NOT NULL DEFAULT 'symbol'` column to `symbol_embeddings` so the table can carry per-symbol embeddings (existing) and per-file aggregated embeddings (forthcoming subtask colbymchenry#10.2). File-grain rows will reuse the file's nodes(id) since `getEmbeddableNodes` already excludes file kinds from per-symbol embedding — no PK collision. Schema change is column-additive (ALTER TABLE ADD COLUMN); no backfill needed because the default 'symbol' covers every existing row. schema.sql updated alongside the migration per recurring scrutiny pattern #5. Schema-version test assertions bumped 42 → 43 per recurring scrutiny pattern #4. Suite remains 2100/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 9, 2026
…#2) Wires the heuristic commit-intent classifier (shipped in d12ee01) into a SHA-keyed persistence layer: - Migration 045: new `commit_intents (sha PK, intent, score, seen_at)` table with idx_commit_intents_intent for per-intent queries. - schema.sql parity (recurring scrutiny pattern #5). - Schema-version test assertions bumped 44 → 45. - src/db/queries-commit-intents.ts (new): recordCommitIntent / recordCommitIntents (batch upsert with txn) / getCommitIntent / getCommitIntents (multi-SHA Map fetch) / aggregateIntentBreakdown (returns Record<intent, count> for codegraph_history-style folds) / clearCommitIntents (cochange full-rescan path). Mining-side integration deferred — `mineCoChanges` doesn't currently emit subjects (only SHAs + file pairs). Adding a subject-capture pass is tracked as a Stage 7 #2 follow-up. Suite remains 2374/0/34. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
that referenced
this pull request
May 10, 2026
Sibling-table approach (deferred-design from earlier-session friction with the compound-PK rebuild on symbol_embeddings): - Migration 046: new `symbol_chunk_embeddings (node_id, chunk_idx)` PK table, FK CASCADE to nodes(id). chunk_idx >= 1 for chunks; the canonical chunk_idx=0 stays in symbol_embeddings. - schema.sql parity (recurring scrutiny pattern #5). - Schema-version test bumps 45 → 46. - src/embeddings/multi-vec.ts (NEW): `chunkSymbol` line-window split with overlap. Constants: LARGE_SYMBOL_LOC_THRESHOLD = 500 CHUNK_LOC = 200 CHUNK_OVERLAP = 50 - src/db/queries-chunk-embeddings.ts (NEW): upsertChunkEmbedding / getChunkEmbeddingsByNode / clearChunkEmbeddings / countChunkEmbeddings. - 33 tests in __tests__/multi-vec.test.ts covering chunking boundary cases, overlap correctness, persistence, cascade-on-delete. What's NOT done (deferred to a follow-up PR): runEmbedPhase integration to actually generate chunk embeddings, findSimilarViaVec multi-vec retrieval merge, vec0 mirror table for chunks. Foundation is sound; the wiring is the next step. Suite: 2374 → 2433 (+33 from new tests + agent-already-bumped vec tests). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stress-test fixes — PR 3 of 5 (stacked). Base:
pr2-cache-preservation.Commits
2620304feat(llm): split provider per role — chat vs askChat877930dperf(llm): summary batching, cache-aware, provider-defaulted to 3 for bridge/apiSummary
Why these are paired: the summary-batching commit's provider gating reads the new
askChatslot from 2620304, so they must ship together.Split provider (
2620304)End-to-end measurement uncovered a real cost/quality tradeoff: bulk summary work runs ~2.6× faster on a local MLX-served Qwen3-Coder-30B than on claude-bridge → Haiku, with comparable summary quality. But the same Qwen3-Coder confidently confabulates wrong answers on synthesis-heavy ask queries. Sonnet via claude-bridge gives precisely correct answers (12-40s).
The pre-existing schema (
chat.askModel: string) only allowed swapping the model id within the same provider/endpoint. New top-levelaskChatblock lets bulk and ask route through entirely different providers:getAskModel()cascade:askChat.model → chat.askModel → chat.model → chatModel (legacy).Summary batching (
877930d)Bench against MLX with 12 real symbols at batch sizes 1/3/5/10:
Worker accumulates a per-worker
pendinglist; cache hits never enter it. Parse failures fall back to per-itemgenerateSingle. Provider gating defaults batch=3 for bridge/anthropic-api, batch=1 for openai-compat.Test plan
WIP draft.