Skip to content

Security hardening: path validation, input clamping, safe JSON, file locking#19

Merged
colbymchenry merged 2 commits into
mainfrom
security-hardening
Feb 10, 2026
Merged

Security hardening: path validation, input clamping, safe JSON, file locking#19
colbymchenry merged 2 commits into
mainfrom
security-hardening

Conversation

@colbymchenry
Copy link
Copy Markdown
Owner

Summary

Implements security improvements inspired by PR #16 by @MO2k4. Rather than merging the original PR (which had extensive conflicts against the current codebase), the key ideas were reimplemented fresh.

  • Path traversal prevention: validatePathWithinRoot() utility used in extraction and context building to reject paths that escape the project root
  • MCP input validation: All numeric MCP tool inputs (limit, depth, maxDepth) are clamped to sane ranges
  • Atomic config writes: Config saves use temp file + rename pattern to prevent corrupted configs on crash
  • Symlink cycle detection: Directory scanning tracks visited real paths to prevent infinite loops
  • Safe JSON parsing: All JSON.parse calls in db/queries.ts replaced with safeJsonParse fallbacks for corrupted database metadata
  • Cross-process file locking: FileLock class prevents concurrent DB writes from CLI, MCP server, and git hooks

Test plan

  • Build compiles with no type errors
  • 28 new security tests all passing (__tests__/security.test.ts)
  • Manual test: attempt path traversal via MCP tool inputs
  • Manual test: concurrent indexing from two processes

Credit: @MO2k4 for the security audit and ideas in PR #16

🤖 Generated with Claude Code

colbymchenry and others added 2 commits February 9, 2026 23:18
…locking

Implements security improvements inspired by PR #16 (credit: MO2k4):

- Add validatePathWithinRoot() to prevent path traversal attacks in
  extraction and context building
- Clamp MCP tool inputs (limit, depth, maxDepth) to sane ranges
- Use atomic writes (temp file + rename) for config saves
- Add symlink cycle detection in directory scanning to prevent infinite loops
- Replace all JSON.parse calls in db/queries.ts with safeJsonParse fallbacks
  to handle corrupted database metadata gracefully
- Add cross-process FileLock for DB write operations (indexAll, indexFiles,
  sync) to prevent concurrent writes from CLI, MCP server, and git hooks
- Remove unused path import from context/index.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests cover all security utilities introduced in the previous commit:
- validatePathWithinRoot: path traversal prevention (7 tests)
- safeJsonParse: corrupted JSON fallback handling (6 tests)
- clamp: input range clamping (5 tests)
- FileLock: cross-process file locking (7 tests)
- Atomic config writes: temp file + rename pattern (3 tests)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@colbymchenry colbymchenry merged commit 082513b into main Feb 10, 2026
colbymchenry added a commit that referenced this pull request Feb 10, 2026
Covers arrow function extraction, best-candidate resolution, graph
traversal direction fix, MCP symbol disambiguation, output truncation,
CLI uninit command, and more. Tests requiring better-sqlite3 native
bindings are conditionally skipped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@colbymchenry colbymchenry deleted the security-hardening branch February 10, 2026 22:48
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
…colbymchenry#29 won't-do)

Builds out the eval harness so the future ranking arc (B colbymchenry#19,
Aider-style TF-IDF + PageRank) can be measured. Pre-PR the harness
ran 12 Elasticsearch cases and produced JSON reports — but had no
comparison mode and no self-codebase cases, so a developer had to
manually diff JSONs and check out a separate big codebase to
validate any ranking change.

What this adds
--------------
- __tests__/evaluation/compare.ts: pure module + CLI. compareReports
  returns per-case + summary delta; formatComparison renders the
  human table; standalone CLI exits non-zero on regression. Budget:
  >0.10 per-case recall drop OR >0.05 mean recall drop = fail.
- __tests__/evaluation/cases-self.ts: 11 cases targeting THIS repo's
  own indexed symbols (CodeGraph, searchNodes, ToolModule,
  compareToRef, ExtractionOrchestrator, etc.). Lets developers
  iterate on ranking without an external codebase.
- runner.ts: argv parser with --cases self|elasticsearch and
  --compare <baseline.json>; env-var fallbacks (EVAL_CODEBASE,
  EVAL_CASES, EVAL_COMPARE). On --compare, the runner re-loads the
  baseline + invokes compareReports + prints formatComparison +
  exits non-zero on regression beyond budget.
- npm run eval:self script.
- .gitignore: __tests__/evaluation/results/.

What I had to fix mid-implementation
- runner.ts referenced `cg.findRelevantContext` (not on CodeGraph;
  it's on cg.contextBuilder). Fixed.
- runner.ts used `__dirname` (ESM-incompatible). Switched to
  `import.meta.dirname`.

Reviewer pass — caught + fixed
- scoring.ts MRR was buggy for multi-symbol cases: iterated over
  expectedSymbols in expected-array order and recorded "the rank of
  the first one found in that iteration", not the BEST rank across
  all found symbols. Standard MRR is reciprocal of the highest-ranked
  relevant result. Renamed `firstRank` → `bestRank` and changed the
  update to `Math.min`-style semantics.
- runner.ts argv parser silently dropped --compare's value when the
  flag was the last token. Now errors with exit code 2.
- self-explore-compare-to-ref case had `getLineRangeHistory` as an
  expected symbol — that function lives in the blame tool, not
  compare-to-ref. Replaced with FileDelta (real compare-to-ref type).
- Runner's meanMRR filter used `startsWith('search-')` but self
  cases use `self-search-*` prefix; the filter dropped them
  silently. Now matches `'-search-'` segment.

Ran on this repo: 9/11 cases pass (was 8/11 before the case fix);
mean recall 0.79; the 2 remaining failures (extraction-pipeline,
search-cascade) are real recall gaps the future ranking arc is
intended to close — confirmed by reviewer that the expected
symbols exist at the cited file:line.

Also: B colbymchenry#29 (upstream PRs to colbymchenry/codegraph) struck as
won't-do per user — keeping changes in this fork.

Verification
- npm run typecheck (tsgo) — clean.
- npx vitest run — 1392/13/0 (eval is run-on-demand, not a vitest
  test).
- npm run eval:self — ran end-to-end, saved JSON, --compare
  against itself shows zero deltas + within-budget verdict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
…y#19)

Two ranking changes in findRelevantContext, gated by the eval
harness shipped in B colbymchenry#26:

1. Common-term dampener (one-sided IDF) in runMultiTermTextSearch.
   Each per-term search-result score gets multiplied by
   `1 - DAMPEN_RATE * (numHits / fetchLimit)`. With
   DAMPEN_RATE=0.5: a term that saturates the fetch cap (16/16)
   gets 0.5×; mid-saturation (8/16) gets 0.75×; rare hits (1/16)
   get effectively 1.0×. Counters the cross-call score-summing
   problem where common tokens ("nodes", "parse" in this repo)
   drowned out rare ones ("extraction") because each independent
   per-term search only knows BM25-IDF within ITS own results,
   not across calls.

2. PageRank centrality boost in collectAndScoreCandidates.
   `score *= 1 + γ * sqrt(centrality)` with γ=5. Pulls
   high-centrality hubs (ExtractionOrchestrator, etc.) into
   entry-point selection. Sqrt-smooths the long tail so leaves
   barely move while hubs at the top centrality (~0.12 in this
   repo) get +173%.

Eval results
------------
self-eval suite (11 cases): mean recall 0.79 → 0.91 (+0.121),
pass 9/11 → 10/11. Zero per-case regressions. Two cases gained:
- self-explore-extraction-pipeline: 0.00 → 1.00
- self-explore-search-cascade: 0.67 → 1.00

The gate from B colbymchenry#26 (compare.ts) was load-bearing — multiple
intermediate iterations were rejected:
- bidirectional IDF (`log(1 + cap/hits)`): regressed
  self-explore-compare-to-ref by 0.33. Rolled back.
- cliff dampener (saturated→0.5×, else→1.0×, no centrality
  change): flipped extraction-pipeline +1.00 but regressed
  biomarker-engine -0.33. Rejected.
- smooth dampener at DAMPEN_RATE=0.4: zero movement either
  direction.
- DAMPEN_RATE=0.5 with γ=2 centrality (existing): same
  trade as cliff. Rejected.
- DAMPEN_RATE=0.5 with γ=5 centrality: biomarker-engine
  recovered (centrality boost on its hub symbols compensated
  for the dampened common-term scores). Two wins held. Gate
  passed.

Reviewer pass — docstring rot fixed
- CENTRALITY_BOOST_WEIGHT JSDoc still showed γ=2 arithmetic
  examples after I bumped the constant to 5. Updated to the
  correct +173% / +50% / +16% percentages.
- DAMPEN_RATE inline comment showed numbers from a superseded
  DAMPEN_RATE=0.4 iteration. Updated to the shipped 0.5 values.
- Both flagged via memo scrutiny-area #1 (docstring rotting
  after a behavior change). Memo workflow paying off — fifth
  consecutive review where memo content was load-bearing.

Verification
- npm run typecheck (tsgo) — clean.
- npx vitest run — 1392/13/0 (unchanged; ranking changes
  measured via eval, not vitest).
- npx tsx __tests__/evaluation/runner.ts . --cases self
  --compare __tests__/evaluation/baseline-self.json — within
  budget, +0.121 mean recall, +1 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
…henry#23)

Flip the SQLite backend default. Pre-PR, createDatabase tried
better-sqlite3 first and fell back to node:sqlite; the package
shipped better-sqlite3 in optionalDependencies so npm install
attempted to compile native bindings on every fresh install.
Post-PR, node:sqlite is the built-in default (Node 22.5+,
zero native compile, zero platform binaries); better-sqlite3
remains supported as a perf opt-in (~1.5× faster on indexing)
when the user installs it explicitly.

The Node 22 floor is already in place (engines.node = ">=22.5.0"),
so this isn't a forced version bump — just exercising what the
existing floor enabled.

Why this is a win
- Fresh installs no longer compile native bindings (the #1 cause
  of "npm install fails on Apple Silicon" issues per the dep audit).
- 22 MB net install savings versus the previous default path
  on platforms that pulled prebuilt better-sqlite3 binaries.
- Users who want the perf opt-in still get it via a single
  explicit `npm install better-sqlite3 --save`.

Adapter shims required
- node:sqlite throws on unused named parameters; better-sqlite3
  silently ignored them. Several call sites pass
  `bindingsFromObject(record, FULL_SCHEMA)` where the SQL only
  binds a subset (notably upsertFile excludes churn-managed
  columns). Added `extractNamedPlaceholders` to filter the
  bindings object on each prepare()'d statement's run/get/all.
- node:sqlite throws on double-close; better-sqlite3 was
  silently a no-op. Made `NodeSqliteAdapter.close()` idempotent
  to match the existing contract.

Surfaces flipped
- src/db/sqlite-adapter.ts createDatabase priority order +
  banner text (now flags failed perf opt-in, not active
  fallback).
- src/mcp/tools/status.ts reports `node:sqlite (built-in default)`
  vs `better-sqlite3 (perf opt-in, native)`.
- src/bin/codegraph.ts CLI status display same.
- package.json dropped better-sqlite3 from optionalDependencies.
- @types/better-sqlite3 kept in devDependencies for compile-time
  SqliteDatabase interface.

Test fallout addressed
- wasm-savepoint banner-text assertions: updated for the new
  framing.
- mcp-llm-visibility + sqlite-vec backend assertions: now match
  either flavor so the test stays portable across both dev
  environments.
- cochange + search-quality migration tests that directly
  `require('better-sqlite3')`: now `it.skipIf(!hasBetterSqlite3)`.
  21 tests skip when the perf opt-in isn't installed (they all
  poke the migration path against a raw better-sqlite3 handle;
  production uses the adapter regardless).

Reviewer pass — fixed before commit
- buildFallbackBanner said "Or remove the perf opt-in" then
  printed `npm install better-sqlite3 --save` — that's INSTALL,
  not remove. Bug. Fixed to `npm uninstall better-sqlite3`. Test
  assertion flipped to match.
- extractNamedPlaceholders JSDoc now notes that node:sqlite's
  "Unknown named parameter" diagnostic for typo'd keys is
  silenced as a deliberate trade.
- Sixth consecutive review where memo content (scrutiny-area #1
  docstring rot) drove the catch.

Verification
- npm run typecheck (tsgo) — clean.
- npx vitest run — 1371 passed / 34 skipped / 0 failed
  (vs 1392/13/0 baseline; +21 skipped reflect direct-handle
  migration tests that need better-sqlite3).
- npm run eval:self --compare baseline-self.json — within
  budget, +0.121 mean recall (B colbymchenry#19 ranking arc holds on
  the new default backend).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
User-driven backlog cleanup before next session:

- **colbymchenry#27 GraphQL `extend type` — verified NOT implemented.** I had
  briefly thought this was done; double-check found
  `src/extraction/graphql-extractor.ts:131` explicitly skips
  `type_system_extension` in v1 with a "needs a second resolution
  pass we don't do yet" comment. No fixture coverage. Backlog
  entry annotated to flag the verification result so a future
  session doesn't re-make the same mis-recall.

- **colbymchenry#25 build-snapshot:** annotated with the 2026-05-03 triage
  numbers (~50ms upper-bound win on a ~261ms cold start;
  ESM + native-module fragility) but kept on the backlog per
  user — worth picking up later when TS 7 + ESM build-snapshot
  tooling matures.

Also checks in `__tests__/evaluation/baseline-self.json` — the
self-eval baseline captured before B colbymchenry#19 (ranking arc) shipped,
referenced by the runner's `--compare` flag. Without it in-repo,
every fresh checkout would have to regenerate the baseline before
gating ranking changes against it.

The baseline is the **pre-improvement** snapshot so its mean
recall (0.79) is the floor every future ranking change must clear
or stay flat against. Bump the file deliberately when a verified
improvement should be the new floor — never silently overwrite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
…olbymchenry#8)

Promotes the resolver numeric `metadata.confidence` to a categorical,
queryable column on `edges`. `codegraph_callers`, `_callees`,
`_impact` now annotate each row with `*(INFERRED)*` when the edge is
not concrete, and accept a new `minConfidence` arg to drop edges
below a chosen level — useful for refactor-impact analysis where
false-positive reach inflates the blast radius.

Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in
EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy /
structural / extractor-direct edges need no producer change. Index
on the column for filter-down queries. Idempotent ADD COLUMN with
table-exists guard following the convention from migrations 020 /
021 / 023 / 030.

Producer: `ReferenceResolver.createEdges` calls a new free
`classifyConfidence(resolvedBy, score)` helper:
  - import / qualified-name / file-path / exact-match → EXTRACTED
  - framework with score >= 0.85 → EXTRACTED, else INFERRED
  - instance-method / fuzzy → INFERRED
AMBIGUOUS is not stamped from the resolver path in v1 — needs
tie-tracking from the candidate picker. Tracked as B3.

Consumers (`result-formatters.ts`):
  - `formatConfidence(edge)` — markdown italic suffix (empty for
    EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise).
  - `parseMinConfidence(raw)` — returns level / null / errorResult.
  - `filterByConfidence(rows, min)` — drop below threshold.
  - `CONFIDENCE_RANK` — numeric ordering for inline filter loops.
Wired into all four code paths in `callers.ts` (collectCallers,
collectCallersForSource, collectTypeUsers, formatGroupedCallers),
both paths in `callees.ts`, and `impact.ts` with a
`filterImpactByConfidence` BFS that prunes nodes unreachable along
the surviving edge set.

Type cleanup: `Edge.confidence` added with JSDoc; the long-dead
`Edge.provenance` field also got a JSDoc note pointing at the
SCIP-integration-reservation explanation in memory.

Reviewer-memo #4 catch (schema-version test forgetfulness): noticed
`__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)`
even though migrations 030 and 031 had shipped. Catch-up to 32 in
this commit covers both the missed bumps and the new migration.

Reviewer: APPROVE, three info-only findings (type-user display gap,
framework-rule calibration table doc, catch of the recurring pr19
pattern).

Suite: 1555 / 34 / 0 (+18 new edge-confidence tests).
Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression
budget vs 091e935 baseline).

Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11
trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 4, 2026
…sites (colbymchenry#11)

Last Phase 1-6 backlog item. Replaces the "agent stitches
codegraph_callers + _history + _biomarkers manually for every stack
frame" loop with one tool: paste a stack trace, get a ranked list of
likely fix-site candidates with per-row "why suspected" reasons.

## Pipeline

1. `parseTrace(text)` — multi-format extractor: V8/Node, Python,
   Java/Kotlin/Scala, Go, Rust, plus a generic `path/to/file.ext:line`
   fallback. Caps at 200 frames; dedupes by `(file, line)`; normalises
   Windows backslashes; first-pattern-per-line wins.
2. `resolveTraceFile` — longest-suffix match between absolute trace
   paths and project-relative indexed paths, with a path-boundary
   check so `foo/bar.ts` doesn't match `sub-foo/bar.ts`.
3. `findEnclosingNode` (existing helper from index-hooks/enclosing.ts,
   shared with codegraph_grep) attributes each frame to its enclosing
   function/method/class via line number.
4. `scoreCulprits` composite: `1 / (topRank + 1)` for frame position
   (top of stack wins decisively), `+0.3` for risk-biomarker overlap
   (god_class / complex_method / large_method / nested_complexity),
   `+0.2` for recent file churn (touched within 30 days, with commit
   count surfaced in the reason).
5. Output: ranked candidates with per-row reasons + an "unmapped
   frames" footer (capped at 5 samples) so the agent sees what
   couldn't be attributed (`node_modules/`, generated files, etc.).

## CLI mirror

`codegraph trace-to-culprits` reads from stdin (`cat error.log |
codegraph trace-to-culprits`) or from `--trace "..."`. Routes via
runViaMCP per the repo convention.

## Reviewer cycle

Round 1 returned BLOCK + REQUEST_CHANGES + info — all addressed:
  - **R1 (BLOCK, correctness)**: `recentCutoff` was computed in ms
    but `last_touched_ts` is unix seconds — the `>=` comparison
    always evaluated false, killing the churn signal silently on
    every project. Fixed with `Math.floor(Date.now() / 1000)` and a
    new R1-regression-guard test that stamps a fresh `last_touched_ts`
    via the same `applyChurnDeltas` helper the churn miner uses, then
    asserts the rendered output mentions "recent churn".
  - **R2 (REQUEST_CHANGES, correctness)**: inline `cg.queries.db
    .prepare(...)` violated the per-domain query-file convention and
    re-prepared the same statement per culprit. Switched to
    `getFileByPath` from queries-files.ts (cached statement, returns
    FileRecord with `lastTouchedTs` and `commitCount`).
  - **R3 (info, perf)**: `resolveTraceFile` ran `getAllFiles` once
    per frame — N full-table scans for an N-frame trace. Hoisted to
    once-per-call at the top of `buildCulprits`.

## Tests

13 trace tests in `__tests__/mcp-trace-to-culprits.test.ts`:
  - parseTrace: V8 / Python / Java / Go / dedup / 200-frame cap /
    no-frames / Windows backslash normalisation
  - End-to-end: V8 trace → enclosing symbols ranked top-of-stack
    first, unmapped-frames footer, no-refs error message, limit cap,
    R1-regression guard for the churn signal

CLI dogfood: `cat trace | npm run cli:dev -- trace-to-culprits`
parses a real production-shaped V8 trace and maps frames to
`src/mcp/tools/trace-to-culprits.ts` symbols.

Suite: **1595 / 34 / 0** (+13 trace tests).
Eval: 18/18 | recall=1.00 | mrr=0.91 | within budget (re-baselined
after colbymchenry#11 added new files to the corpus, which shifted the
explore-pipeline case rank — case still PASSes its 0.5 threshold,
the relative regression was corpus drift, not behaviour change).

## Backlog after this

Phase 1-6 complete. Remaining: Phase 7 (colbymchenry#17 propose_extract /
propose_rename, colbymchenry#18 plan-and-execute CLI — both need user check),
Phase 8 (colbymchenry#19 streaming MCP), and #6 TOON (deferred pending
measurement).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 4, 2026
Backlog #6 proposed adopting TOON (Token-Oriented Object Notation,
header-once / rows-as-tuples) as a smaller alternative to the
current markdown formatters for tabular MCP responses. The backlog
explicitly gated the decision: "Verify actual savings on captured
queries before flipping default — the 30-60% claim is for ideal
tabular data."

This commit ships the measurement and the answer.

## Result

Eight representative queries against this project's own .codegraph,
covering search (long signatures), suggest (short rows), callers
(20-row mixed-confidence list):

  sample                                  rows  md(B)  toon(B)  saving
  ──────────────────────────────────────  ────  ─────  ───────  ──────
  search "CodeGraph"                        10    683      661  +3.2%
  search "extractFromSource"                10   1483     1444  +2.6%
  search "compareToRef"                      3    461      468  -1.5%
  search "handleSearch"                      6    760      757  +0.4%
  search "parseTrace"                        2    291      298  -2.4%
  suggest "CodGrap"                         10    408      409  -0.2%
  suggest "extracFromSorce"                 10    386      387  -0.3%
  callers of extractFromSource              14    936     1037  -10.8%

  TOTAL: md=5408B  toon=5461B  aggregate saving -1.0%

## Why the 30-60% claim doesn't apply

TOON's win comes from compressing a verbose JSON baseline like
`[{"name":"foo","kind":"function","file":"a.ts","line":42},…]` —
the "headers repeated per row" + JSON quoting waste is what its
header-once shape removes.

But our markdown is ALREADY row-shaped:
  - `### foo (function)` ≈ `foo,function` (no quoting).
  - `a.ts:42` ≈ `a.ts,42` (single-char delimiter, both compact).
  - `- name (kind) - file:line` is shorter than the comma-tuple form
    when names are short.

There's no fat to trim. On callers (the densest row shape) TOON is
10% LARGER because the per-row `- ` bullet syntax is a one-byte
overhead while TOON's commas waste two chars between every field.

## Decision

Skip TOON. Empirical answer matches the backlog's gate — the
savings aren't there for our markdown baseline. Adopting it would:
  - Add per-tool format selector code + tests.
  - Risk LLM-client misrender (TOON is new; some clients haven't
    trained on it).
  - Net zero to net negative on payload bytes.

The measurement script (`__tests__/evaluation/toon-measure.ts`)
stays in the repo as a permanent record + a re-runnable harness if
the markdown formatters ever get verbose enough to flip the math.
Run with `npx tsx __tests__/evaluation/toon-measure.ts`.

## Backlog

#6 closed. Phase 1-6 of the agentic backlog is now fully resolved
(every item shipped or empirically dismissed). Remaining: Phase 7
(colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI),
Phase 8 (colbymchenry#19 streaming MCP) — all need user check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 4, 2026
…olbymchenry#19 successor)

Closes the visibility gap that the original colbymchenry#19 (streaming MCP
responses) was trying to fix — but via the additive standard MCP
mechanism instead of reshaping `tools/call` into a streaming
response. User noted: `codegraph_summarize` takes ~10 minutes on
this repo and the agent currently sits blind for the whole run.

## Wiring

`MCPServer.handleToolsCall` now reads `params._meta?.progressToken`.
When present (per the standard MCP spec), it builds an `onProgress`
callback that emits `notifications/progress` events on the same
stdio channel and forwards it to `ToolHandler.execute(toolName,
args, opts)`. The handler threads `opts.onProgress` into a new
optional `ToolCtx.reportProgress` field; tool handlers null-check
before calling so the no-token path stays cost-free.

Wired into the four tools that legitimately take minutes:

  - `codegraph_admin {action: sync}`  → forwards IndexProgress
  - `codegraph_admin {action: index}` → forwards IndexProgress
  - `codegraph_summarize`             → forwards (done, total) per symbol
  - `codegraph_embed`                 → forwards (done, total) per batch
                                       (extended cg.llm.embedAll with
                                       an onProgress option that
                                       embedAllSummaries already had
                                       internally — just plumbed through)

The IndexProgress shape `{ phase, current, total, currentFile? }`
gets surfaced as: `progress = current`, `total`, `message =
"<phase>: <currentFile>"`.

## Why this beats stream-as-result (colbymchenry#19)

  - Additive — clients without a progressToken see no behavior
    change. Stream-as-result would have required reshaping every
    response.
  - Spec-conformant — standard MCP `notifications/progress` works
    with any compliant client (including Claude Code).
  - Targets the actual long-running tools, not the bounded ones.
    `codegraph_explore` runs in 15ms (per B7); streaming partial
    results would have added complexity for zero benefit.
  - 80 LOC instead of the projected 200 LOC + transport-layer risk.

## Reviewer

APPROVE; two info-only polish items addressed in this commit:
  - Use canonical `IndexProgress` import in admin.ts instead of
    inline structural type (so future field additions surface as
    compile errors at the callback sites).
  - Comment on the `lockContention` early-return path explaining
    that no progress events fire in that branch (acceptable: the
    contention check is immediate, no work done).

## Tests

4 integration tests in `__tests__/mcp-progress-notifications.test.ts`:
  - admin sync forwards IndexProgress events
  - admin index forwards IndexProgress events
  - no-callback path is zero-cost (no events when opts.onProgress
    isn't supplied)
  - monotonic-within-phase property check

Suite: **1599 / 34 / 0** (+4 B13 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 8, 2026
…y#18, colbymchenry#19)

Two friction items uncovered in the round-2 stress test, both
about the boundary between long-running MCP servers and ad-hoc
fresh-code CLI processes acting on the same per-project DB.

## colbymchenry#19 — cli:dev silently migrated queried projects (severe)

Reproduced live: a single `cli:dev at-range ... -p ollama-source`
ran forward migrations 33→35 on ollama silently. The running MCP
server (started at v32 binary) then couldn't access ollama either
— the schema-guard returned "stale code, restart" on every tool
call. **A read-style CLI invocation effectively bricked MCP
access to the queried project.**

Fix: `OpenOptions.autoMigrate` opt-in (default false). When the
on-disk schema is older than the binary's CURRENT_SCHEMA_VERSION
and autoMigrate is unset, `DatabaseConnection.open` throws with
explicit recovery commands (`admin migrate` / `admin sync` /
`admin index --force`) AND the post-migrate "restart your MCP
server" caveat. Newer-than-binary DBs always fail (B4 silent-
corruption path).

Write entry points opt in to autoMigrate=true — admin
sync/index/migrate, summarize, embed, classify, the MCP server's
default-project open, the cross-project cache, and `runViaMCP`'s
shim (which derives autoMigrate from the tool's `isWriteTool`
flag, automatically gating per-tool without per-command edits).

New `codegraph_admin({action: 'migrate'})` MCP action +
`codegraph admin migrate` CLI command: cheapest recovery path.
The CLI does a two-phase open (default → autoMigrate=true on
throw) so it can distinguish "already current" from "migrated
this run".

## colbymchenry#18 — MCP server's tool registry frozen at startup

Repro: a `codegraph_at_range` call (added in commit 50a9ebf,
after the running server's start) returned `Error: No such tool
available`. ES modules cache; can't hot-reload without restart.

Fix: cannot make new tools live without restart. Added VISIBILITY
in `codegraph_status` — a new "Tool registry drift" section
fires when the on-disk count of `_TOOL: ToolModule`-exporting
files exceeds the loaded count. Content-sniff filter (regex on
file body) keeps the count accurate as new helper modules land
without touching the status code. Suppressed when in sync.
Status keeps `bypassSchemaGuard: true` so this signal stays
reachable when other tools are blocked — exactly when the agent
needs it most.

## Reviewer findings (3 of 3 addressed before commit)

- `isToolFile` was missing 5 helper files (env-refs, explore-
  budget, result-formatters, sql-refs, symbol-resolver) that
  passed the filename filter but didn't have `_TOOL` exports —
  permanent false-positive drift warning. Switched to content
  sniff via the regex `/export const X_TOOL: ToolModule/`.
- `retryInitIfNeeded` at src/mcp/index.ts:480 was missing
  `autoMigrate: true`, so a stale-schema retry would silently
  fail and leave the server permanently without a default project.
  Now matches `tryInitializeDefault`'s policy.
- `handleMigrate`'s "Migrations applied" message was misleading
  for already-current DBs (the project-cache opens with
  autoMigrate=true, so by handler time migrations are already
  done). Reworked to state the resulting version neutrally with
  an "already current / behind by N" qualifier.

Suite 1729/34/0 (+1 gating test in foundation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 9, 2026
… intent coverage hint + compare_to_ref skipped/suppress flags

Five friction-tracker items addressed. Three sub-agents in parallel
on disjoint files; reviewer pass with .claude/reviewer-memo.md
prepended caught two real correctness bugs and two info-level items
which are all addressed in this commit.

## colbymchenry#17 — dead_code static excludes test-bed fixtures

src/mcp/tools/dead-code.ts gains an excludeFixtures: boolean arg
(default true). When true, formatStaticDeadCode requests 4× overscan
from findGraphCandidates, runs filterFixtureNodes (regex match
against docs/test-beds/, __tests__/fixtures/, test/fixtures/,
spec/fixtures/), then slices to the user's maxCandidates. New
helper filterFixtureNodes is module-private (no speculative export).

Reviewer-caught correctness gap: the schema described excludeFixtures
as a general filter but llmFindDeadCode in mode='judge' was called
with only { maxCandidates }, so fixture filtering silently no-op'd
in judge mode. Fixed by extending the schema description to caveat
"applies to mode='static' only — judge mode tracked as follow-up";
the wider judge-mode wiring is non-trivial (llmFindDeadCode's args
interface needs extending) and is left as a tracked task.

Reviewer-caught info gap: 4× overscan doesn't suffice when fixture
density exceeds 75% of the candidate set. Now surfaces a warning
when overscan was exhausted AND we still under-filled — the agent
sees "raise maxCandidates if you need more" instead of silently
truncating.

New test file __tests__/mcp-dead-code-fixtures.test.ts (3 cases):
filter on, filter off, spec/fixtures pattern.

## colbymchenry#19 — search mode='intent' attaches coverage hint on 0 hits

src/mcp/tools/_search-intent.ts. When handleSearchIntent returns
empty, computes summary coverage and appends one of two hints:
- coverage < 50%: "intent-search depends on LLM summaries (current
  coverage X%) — run codegraph summarize to expand the corpus, or
  fall back to mode='exact' / codegraph_grep"
- coverage >= 50%: "0 hits at X% coverage — concept may not be
  summarised yet, or may not exist in the codebase. Try
  codegraph_explore"

Non-empty results unchanged. Coverage-stat errors are caught and
logged so they don't fail the search.

Reviewer-caught correctness bug: getSummaryCoverage(cg.queries) was
called WITHOUT a kinds filter; the helper's own JSDoc warns that
"counting parameters/imports/files in the denominator would
understate coverage and confuse the user". Fixed by passing
SUMMARIZABLE_KINDS (already exported from src/llm/summarizer.ts and
used by status.ts the same way). Three OTHER call sites in the
codebase also pass no kinds — filed as colbymchenry#49 follow-up since they're
out of scope for this commit.

New tests added to __tests__/search-intent.test.ts (2 cases:
low-coverage hint, high-coverage hint).

## colbymchenry#23 — compare_to_ref surfaces skipped (non-TS / non-indexed) files

src/compare/index.ts adds filesSkipped: number to CompareResult.
The compare logic counts files git reports as changed but cg can't
structurally diff (non-indexed languages, .md/.json, binary, etc.).
Formatter src/mcp/tools/compare.ts surfaces "> N file(s) skipped
(non-indexed or non-TS)" only when count > 0 (no noise on clean
diffs). Manual eyeball on HEAD~5 confirmed correct count.

## colbymchenry#24 — compare_to_ref suppressLineRangeOnly flag

src/compare/index.ts adds suppressLineRangeOnly: boolean (default
false for backward compat — existing callers asserting on
result.totals.modified would break). Adds lineRangeOnlyCount to
FileDelta. New helpers isPureLineRangeOnly / applyLineRangeOnlySuppression
(module-private). When true, FileDeltas where every modified symbol's
reasons are exactly ['line range changed'] (no signature change, no
modifier flip, no body diff) collapse into a single per-file
roll-up: "src/foo.ts: 14 symbols renumbered (no content change)".
Mixed files (real change + renumber) keep real changes individually
shown. MCP schema in compare.ts exposes the flag.

Manual eyeball on HEAD~5 of this repo: 96/174 modified symbols
(55%) were pure-renumber noise that suppression collapsed cleanly
into 7 roll-up lines.

New tests added to __tests__/compare.test.ts (4 cases): filesSkipped
counts non-TS files, filesSkipped: 0 doesn't add the line, suppress
collapses pure-renumber, mixed files keep real changes individually.

## colbymchenry#34 — at_range rejects paths outside project root

src/mcp/tools/at-range.ts. New validateFileWithinRoot helper:
path.resolve canonicalization on both sides, checks "equals root OR
starts with root + sep". Single-range and bulk-range forms both
validate; bulk form fails the whole call if ANY range is out-of-root
(no silent filtering). Reviewer-caught info: documented symlink
limitation in JSDoc — within-root symlinks pointing OUTSIDE the root
will pass; if untrusted symlinks become an issue, swap to
fs.realpathSync.

New tests added to __tests__/at-range.test.ts (2 cases: traversal
'../../etc/passwd' rejected, bulk form with one out-of-root range
fails the whole call).

## Reviewer-caught items beyond the original 5

- New friction colbymchenry#48 filed: typescript-lsp plugin reverts sub-agent
  Edit calls on first application (not a codegraph defect; harness
  interaction). Sonnet caught it during colbymchenry#23/colbymchenry#24 work and re-applied
  successfully on second pass.
- New friction colbymchenry#49 filed: 3 other getSummaryCoverage call sites in
  status.ts and bin/codegraph.ts have the same denominator-inflation
  bug — out of scope for this commit, tracked separately.

## Verification

- 9 files (8 modified + 1 new test), +398/-13
- npm run typecheck — clean
- 50/50 tests pass in focused slice (compare / at-range /
  mcp-dead-code-fixtures / search-intent)
- New module-private helpers (filterFixtureNodes, isPureLineRangeOnly,
  applyLineRangeOnlySuppression, validateFileWithinRoot) all have
  concrete in-tree callers in the same diff — no speculative
  exports per reviewer-memo item #7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 9, 2026
…henry#19)

Closes friction colbymchenry#19 — sub-agents (Haiku especially) commit + push
their own work when the brief is silent on commits, producing split
history with awkward attribution. Caught 2026-05-09 in `a928395`
(Set B Haiku auto-commit) followed by `6ef55f6` (parent commit
double-attributing the same work).

This is a process-level fix, not code:

1. CLAUDE.md gains a "Sub-agent briefing — checklist for parallel
   work" section. Three required lines for every Edit/Write-bearing
   sub-agent brief:
     - DO NOT commit (leave changes in working tree).
     - Use RELATIVE paths only — don't cd to absolute paths.
     - Disjoint file scope.
   Plus two optional but useful ones (state baseline, scope cap).

2. .claude/reviewer-memo.md gains a recurring scrutiny area colbymchenry#10:
   Sub-agent autonomous-commit attribution. The reviewer will flag
   commit-message attribution mismatches when work is split across
   multiple commits in the immediate ancestry. Info-severity, not
   merge-blocking.

3. Memory note `feedback_subagent_no_autonomous_commit.md` added
   alongside the existing `feedback_subagent_worktree_absolute_paths.md`,
   indexed in MEMORY.md. Read at session start for permanent
   recall.

No code change. Suite 2014 / 0 / 34 (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 9, 2026
…colbymchenry#48/colbymchenry#57)

Concrete recurrence counts and failure-mode descriptions for three
recurring frictions that briefs keep accidentally omitting:

- **colbymchenry#19 (DO NOT commit):** added "Place this near the TOP of the brief"
  + "Recurred 3× across the 2026-05-09 mega session when omitted."
- **colbymchenry#57 (RELATIVE paths):** spelled out the actual failure mode
  (`cd /Users/.../codegraph/...` → wrong tree → isolation defeated),
  added the 2026-05-09 recurrence count (3×) alongside the existing
  2026-05-08 4/4 datapoint.
- **colbymchenry#48 (typescript-lsp Edit revert):** new paragraph at section end
  with concrete repro paths (`src/installer/shimmer-progress.ts`,
  `src/installer/pool-recommendation.ts`), the 2× recurrence count,
  and the mandatory `git diff --stat` + grep + re-apply protocol.

30 LOC total (+23 / -7).

Doc-only — no code, no tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant