Security hardening: path validation, input clamping, safe JSON, file locking by colbymchenry · Pull Request #19 · colbymchenry/codegraph

colbymchenry · 2026-02-10T05:23:11Z

Summary

Implements security improvements inspired by PR #16 by @MO2k4. Rather than merging the original PR (which had extensive conflicts against the current codebase), the key ideas were reimplemented fresh.

Path traversal prevention: validatePathWithinRoot() utility used in extraction and context building to reject paths that escape the project root
MCP input validation: All numeric MCP tool inputs (limit, depth, maxDepth) are clamped to sane ranges
Atomic config writes: Config saves use temp file + rename pattern to prevent corrupted configs on crash
Symlink cycle detection: Directory scanning tracks visited real paths to prevent infinite loops
Safe JSON parsing: All JSON.parse calls in db/queries.ts replaced with safeJsonParse fallbacks for corrupted database metadata
Cross-process file locking: FileLock class prevents concurrent DB writes from CLI, MCP server, and git hooks

Test plan

Build compiles with no type errors
28 new security tests all passing (__tests__/security.test.ts)
Manual test: attempt path traversal via MCP tool inputs
Manual test: concurrent indexing from two processes

Credit: @MO2k4 for the security audit and ideas in PR #16

🤖 Generated with Claude Code

…locking Implements security improvements inspired by PR #16 (credit: MO2k4): - Add validatePathWithinRoot() to prevent path traversal attacks in extraction and context building - Clamp MCP tool inputs (limit, depth, maxDepth) to sane ranges - Use atomic writes (temp file + rename) for config saves - Add symlink cycle detection in directory scanning to prevent infinite loops - Replace all JSON.parse calls in db/queries.ts with safeJsonParse fallbacks to handle corrupted database metadata gracefully - Add cross-process FileLock for DB write operations (indexAll, indexFiles, sync) to prevent concurrent writes from CLI, MCP server, and git hooks - Remove unused path import from context/index.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests cover all security utilities introduced in the previous commit: - validatePathWithinRoot: path traversal prevention (7 tests) - safeJsonParse: corrupted JSON fallback handling (6 tests) - clamp: input range clamping (5 tests) - FileLock: cross-process file locking (7 tests) - Atomic config writes: temp file + rename pattern (3 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Covers arrow function extraction, best-candidate resolution, graph traversal direction fix, MCP symbol disambiguation, output truncation, CLI uninit command, and more. Tests requiring better-sqlite3 native bindings are conditionally skipped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…colbymchenry#29 won't-do) Builds out the eval harness so the future ranking arc (B colbymchenry#19, Aider-style TF-IDF + PageRank) can be measured. Pre-PR the harness ran 12 Elasticsearch cases and produced JSON reports — but had no comparison mode and no self-codebase cases, so a developer had to manually diff JSONs and check out a separate big codebase to validate any ranking change. What this adds -------------- - __tests__/evaluation/compare.ts: pure module + CLI. compareReports returns per-case + summary delta; formatComparison renders the human table; standalone CLI exits non-zero on regression. Budget: >0.10 per-case recall drop OR >0.05 mean recall drop = fail. - __tests__/evaluation/cases-self.ts: 11 cases targeting THIS repo's own indexed symbols (CodeGraph, searchNodes, ToolModule, compareToRef, ExtractionOrchestrator, etc.). Lets developers iterate on ranking without an external codebase. - runner.ts: argv parser with --cases self|elasticsearch and --compare <baseline.json>; env-var fallbacks (EVAL_CODEBASE, EVAL_CASES, EVAL_COMPARE). On --compare, the runner re-loads the baseline + invokes compareReports + prints formatComparison + exits non-zero on regression beyond budget. - npm run eval:self script. - .gitignore: __tests__/evaluation/results/. What I had to fix mid-implementation - runner.ts referenced `cg.findRelevantContext` (not on CodeGraph; it's on cg.contextBuilder). Fixed. - runner.ts used `__dirname` (ESM-incompatible). Switched to `import.meta.dirname`. Reviewer pass — caught + fixed - scoring.ts MRR was buggy for multi-symbol cases: iterated over expectedSymbols in expected-array order and recorded "the rank of the first one found in that iteration", not the BEST rank across all found symbols. Standard MRR is reciprocal of the highest-ranked relevant result. Renamed `firstRank` → `bestRank` and changed the update to `Math.min`-style semantics. - runner.ts argv parser silently dropped --compare's value when the flag was the last token. Now errors with exit code 2. - self-explore-compare-to-ref case had `getLineRangeHistory` as an expected symbol — that function lives in the blame tool, not compare-to-ref. Replaced with FileDelta (real compare-to-ref type). - Runner's meanMRR filter used `startsWith('search-')` but self cases use `self-search-*` prefix; the filter dropped them silently. Now matches `'-search-'` segment. Ran on this repo: 9/11 cases pass (was 8/11 before the case fix); mean recall 0.79; the 2 remaining failures (extraction-pipeline, search-cascade) are real recall gaps the future ranking arc is intended to close — confirmed by reviewer that the expected symbols exist at the cited file:line. Also: B colbymchenry#29 (upstream PRs to colbymchenry/codegraph) struck as won't-do per user — keeping changes in this fork. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1392/13/0 (eval is run-on-demand, not a vitest test). - npm run eval:self — ran end-to-end, saved JSON, --compare against itself shows zero deltas + within-budget verdict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y#19) Two ranking changes in findRelevantContext, gated by the eval harness shipped in B colbymchenry#26: 1. Common-term dampener (one-sided IDF) in runMultiTermTextSearch. Each per-term search-result score gets multiplied by `1 - DAMPEN_RATE * (numHits / fetchLimit)`. With DAMPEN_RATE=0.5: a term that saturates the fetch cap (16/16) gets 0.5×; mid-saturation (8/16) gets 0.75×; rare hits (1/16) get effectively 1.0×. Counters the cross-call score-summing problem where common tokens ("nodes", "parse" in this repo) drowned out rare ones ("extraction") because each independent per-term search only knows BM25-IDF within ITS own results, not across calls. 2. PageRank centrality boost in collectAndScoreCandidates. `score *= 1 + γ * sqrt(centrality)` with γ=5. Pulls high-centrality hubs (ExtractionOrchestrator, etc.) into entry-point selection. Sqrt-smooths the long tail so leaves barely move while hubs at the top centrality (~0.12 in this repo) get +173%. Eval results ------------ self-eval suite (11 cases): mean recall 0.79 → 0.91 (+0.121), pass 9/11 → 10/11. Zero per-case regressions. Two cases gained: - self-explore-extraction-pipeline: 0.00 → 1.00 - self-explore-search-cascade: 0.67 → 1.00 The gate from B colbymchenry#26 (compare.ts) was load-bearing — multiple intermediate iterations were rejected: - bidirectional IDF (`log(1 + cap/hits)`): regressed self-explore-compare-to-ref by 0.33. Rolled back. - cliff dampener (saturated→0.5×, else→1.0×, no centrality change): flipped extraction-pipeline +1.00 but regressed biomarker-engine -0.33. Rejected. - smooth dampener at DAMPEN_RATE=0.4: zero movement either direction. - DAMPEN_RATE=0.5 with γ=2 centrality (existing): same trade as cliff. Rejected. - DAMPEN_RATE=0.5 with γ=5 centrality: biomarker-engine recovered (centrality boost on its hub symbols compensated for the dampened common-term scores). Two wins held. Gate passed. Reviewer pass — docstring rot fixed - CENTRALITY_BOOST_WEIGHT JSDoc still showed γ=2 arithmetic examples after I bumped the constant to 5. Updated to the correct +173% / +50% / +16% percentages. - DAMPEN_RATE inline comment showed numbers from a superseded DAMPEN_RATE=0.4 iteration. Updated to the shipped 0.5 values. - Both flagged via memo scrutiny-area #1 (docstring rotting after a behavior change). Memo workflow paying off — fifth consecutive review where memo content was load-bearing. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1392/13/0 (unchanged; ranking changes measured via eval, not vitest). - npx tsx __tests__/evaluation/runner.ts . --cases self --compare __tests__/evaluation/baseline-self.json — within budget, +0.121 mean recall, +1 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…henry#23) Flip the SQLite backend default. Pre-PR, createDatabase tried better-sqlite3 first and fell back to node:sqlite; the package shipped better-sqlite3 in optionalDependencies so npm install attempted to compile native bindings on every fresh install. Post-PR, node:sqlite is the built-in default (Node 22.5+, zero native compile, zero platform binaries); better-sqlite3 remains supported as a perf opt-in (~1.5× faster on indexing) when the user installs it explicitly. The Node 22 floor is already in place (engines.node = ">=22.5.0"), so this isn't a forced version bump — just exercising what the existing floor enabled. Why this is a win - Fresh installs no longer compile native bindings (the #1 cause of "npm install fails on Apple Silicon" issues per the dep audit). - 22 MB net install savings versus the previous default path on platforms that pulled prebuilt better-sqlite3 binaries. - Users who want the perf opt-in still get it via a single explicit `npm install better-sqlite3 --save`. Adapter shims required - node:sqlite throws on unused named parameters; better-sqlite3 silently ignored them. Several call sites pass `bindingsFromObject(record, FULL_SCHEMA)` where the SQL only binds a subset (notably upsertFile excludes churn-managed columns). Added `extractNamedPlaceholders` to filter the bindings object on each prepare()'d statement's run/get/all. - node:sqlite throws on double-close; better-sqlite3 was silently a no-op. Made `NodeSqliteAdapter.close()` idempotent to match the existing contract. Surfaces flipped - src/db/sqlite-adapter.ts createDatabase priority order + banner text (now flags failed perf opt-in, not active fallback). - src/mcp/tools/status.ts reports `node:sqlite (built-in default)` vs `better-sqlite3 (perf opt-in, native)`. - src/bin/codegraph.ts CLI status display same. - package.json dropped better-sqlite3 from optionalDependencies. - @types/better-sqlite3 kept in devDependencies for compile-time SqliteDatabase interface. Test fallout addressed - wasm-savepoint banner-text assertions: updated for the new framing. - mcp-llm-visibility + sqlite-vec backend assertions: now match either flavor so the test stays portable across both dev environments. - cochange + search-quality migration tests that directly `require('better-sqlite3')`: now `it.skipIf(!hasBetterSqlite3)`. 21 tests skip when the perf opt-in isn't installed (they all poke the migration path against a raw better-sqlite3 handle; production uses the adapter regardless). Reviewer pass — fixed before commit - buildFallbackBanner said "Or remove the perf opt-in" then printed `npm install better-sqlite3 --save` — that's INSTALL, not remove. Bug. Fixed to `npm uninstall better-sqlite3`. Test assertion flipped to match. - extractNamedPlaceholders JSDoc now notes that node:sqlite's "Unknown named parameter" diagnostic for typo'd keys is silenced as a deliberate trade. - Sixth consecutive review where memo content (scrutiny-area #1 docstring rot) drove the catch. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1371 passed / 34 skipped / 0 failed (vs 1392/13/0 baseline; +21 skipped reflect direct-handle migration tests that need better-sqlite3). - npm run eval:self --compare baseline-self.json — within budget, +0.121 mean recall (B colbymchenry#19 ranking arc holds on the new default backend). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User-driven backlog cleanup before next session: - **colbymchenry#27 GraphQL `extend type` — verified NOT implemented.** I had briefly thought this was done; double-check found `src/extraction/graphql-extractor.ts:131` explicitly skips `type_system_extension` in v1 with a "needs a second resolution pass we don't do yet" comment. No fixture coverage. Backlog entry annotated to flag the verification result so a future session doesn't re-make the same mis-recall. - **colbymchenry#25 build-snapshot:** annotated with the 2026-05-03 triage numbers (~50ms upper-bound win on a ~261ms cold start; ESM + native-module fragility) but kept on the backlog per user — worth picking up later when TS 7 + ESM build-snapshot tooling matures. Also checks in `__tests__/evaluation/baseline-self.json` — the self-eval baseline captured before B colbymchenry#19 (ranking arc) shipped, referenced by the runner's `--compare` flag. Without it in-repo, every fresh checkout would have to regenerate the baseline before gating ranking changes against it. The baseline is the **pre-improvement** snapshot so its mean recall (0.79) is the floor every future ranking change must clear or stay flat against. Bump the file deliberately when a verified improvement should be the new floor — never silently overwrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olbymchenry#8) Promotes the resolver numeric `metadata.confidence` to a categorical, queryable column on `edges`. `codegraph_callers`, `_callees`, `_impact` now annotate each row with `*(INFERRED)*` when the edge is not concrete, and accept a new `minConfidence` arg to drop edges below a chosen level — useful for refactor-impact analysis where false-positive reach inflates the blast radius. Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy / structural / extractor-direct edges need no producer change. Index on the column for filter-down queries. Idempotent ADD COLUMN with table-exists guard following the convention from migrations 020 / 021 / 023 / 030. Producer: `ReferenceResolver.createEdges` calls a new free `classifyConfidence(resolvedBy, score)` helper: - import / qualified-name / file-path / exact-match → EXTRACTED - framework with score >= 0.85 → EXTRACTED, else INFERRED - instance-method / fuzzy → INFERRED AMBIGUOUS is not stamped from the resolver path in v1 — needs tie-tracking from the candidate picker. Tracked as B3. Consumers (`result-formatters.ts`): - `formatConfidence(edge)` — markdown italic suffix (empty for EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise). - `parseMinConfidence(raw)` — returns level / null / errorResult. - `filterByConfidence(rows, min)` — drop below threshold. - `CONFIDENCE_RANK` — numeric ordering for inline filter loops. Wired into all four code paths in `callers.ts` (collectCallers, collectCallersForSource, collectTypeUsers, formatGroupedCallers), both paths in `callees.ts`, and `impact.ts` with a `filterImpactByConfidence` BFS that prunes nodes unreachable along the surviving edge set. Type cleanup: `Edge.confidence` added with JSDoc; the long-dead `Edge.provenance` field also got a JSDoc note pointing at the SCIP-integration-reservation explanation in memory. Reviewer-memo #4 catch (schema-version test forgetfulness): noticed `__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)` even though migrations 030 and 031 had shipped. Catch-up to 32 in this commit covers both the missed bumps and the new migration. Reviewer: APPROVE, three info-only findings (type-user display gap, framework-rule calibration table doc, catch of the recurring pr19 pattern). Suite: 1555 / 34 / 0 (+18 new edge-confidence tests). Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression budget vs 091e935 baseline). Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11 trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sites (colbymchenry#11) Last Phase 1-6 backlog item. Replaces the "agent stitches codegraph_callers + _history + _biomarkers manually for every stack frame" loop with one tool: paste a stack trace, get a ranked list of likely fix-site candidates with per-row "why suspected" reasons. ## Pipeline 1. `parseTrace(text)` — multi-format extractor: V8/Node, Python, Java/Kotlin/Scala, Go, Rust, plus a generic `path/to/file.ext:line` fallback. Caps at 200 frames; dedupes by `(file, line)`; normalises Windows backslashes; first-pattern-per-line wins. 2. `resolveTraceFile` — longest-suffix match between absolute trace paths and project-relative indexed paths, with a path-boundary check so `foo/bar.ts` doesn't match `sub-foo/bar.ts`. 3. `findEnclosingNode` (existing helper from index-hooks/enclosing.ts, shared with codegraph_grep) attributes each frame to its enclosing function/method/class via line number. 4. `scoreCulprits` composite: `1 / (topRank + 1)` for frame position (top of stack wins decisively), `+0.3` for risk-biomarker overlap (god_class / complex_method / large_method / nested_complexity), `+0.2` for recent file churn (touched within 30 days, with commit count surfaced in the reason). 5. Output: ranked candidates with per-row reasons + an "unmapped frames" footer (capped at 5 samples) so the agent sees what couldn't be attributed (`node_modules/`, generated files, etc.). ## CLI mirror `codegraph trace-to-culprits` reads from stdin (`cat error.log | codegraph trace-to-culprits`) or from `--trace "..."`. Routes via runViaMCP per the repo convention. ## Reviewer cycle Round 1 returned BLOCK + REQUEST_CHANGES + info — all addressed: - **R1 (BLOCK, correctness)**: `recentCutoff` was computed in ms but `last_touched_ts` is unix seconds — the `>=` comparison always evaluated false, killing the churn signal silently on every project. Fixed with `Math.floor(Date.now() / 1000)` and a new R1-regression-guard test that stamps a fresh `last_touched_ts` via the same `applyChurnDeltas` helper the churn miner uses, then asserts the rendered output mentions "recent churn". - **R2 (REQUEST_CHANGES, correctness)**: inline `cg.queries.db .prepare(...)` violated the per-domain query-file convention and re-prepared the same statement per culprit. Switched to `getFileByPath` from queries-files.ts (cached statement, returns FileRecord with `lastTouchedTs` and `commitCount`). - **R3 (info, perf)**: `resolveTraceFile` ran `getAllFiles` once per frame — N full-table scans for an N-frame trace. Hoisted to once-per-call at the top of `buildCulprits`. ## Tests 13 trace tests in `__tests__/mcp-trace-to-culprits.test.ts`: - parseTrace: V8 / Python / Java / Go / dedup / 200-frame cap / no-frames / Windows backslash normalisation - End-to-end: V8 trace → enclosing symbols ranked top-of-stack first, unmapped-frames footer, no-refs error message, limit cap, R1-regression guard for the churn signal CLI dogfood: `cat trace | npm run cli:dev -- trace-to-culprits` parses a real production-shaped V8 trace and maps frames to `src/mcp/tools/trace-to-culprits.ts` symbols. Suite: **1595 / 34 / 0** (+13 trace tests). Eval: 18/18 | recall=1.00 | mrr=0.91 | within budget (re-baselined after colbymchenry#11 added new files to the corpus, which shifted the explore-pipeline case rank — case still PASSes its 0.5 threshold, the relative regression was corpus drift, not behaviour change). ## Backlog after this Phase 1-6 complete. Remaining: Phase 7 (colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI — both need user check), Phase 8 (colbymchenry#19 streaming MCP), and #6 TOON (deferred pending measurement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Backlog #6 proposed adopting TOON (Token-Oriented Object Notation, header-once / rows-as-tuples) as a smaller alternative to the current markdown formatters for tabular MCP responses. The backlog explicitly gated the decision: "Verify actual savings on captured queries before flipping default — the 30-60% claim is for ideal tabular data." This commit ships the measurement and the answer. ## Result Eight representative queries against this project's own .codegraph, covering search (long signatures), suggest (short rows), callers (20-row mixed-confidence list): sample rows md(B) toon(B) saving ────────────────────────────────────── ──── ───── ─────── ────── search "CodeGraph" 10 683 661 +3.2% search "extractFromSource" 10 1483 1444 +2.6% search "compareToRef" 3 461 468 -1.5% search "handleSearch" 6 760 757 +0.4% search "parseTrace" 2 291 298 -2.4% suggest "CodGrap" 10 408 409 -0.2% suggest "extracFromSorce" 10 386 387 -0.3% callers of extractFromSource 14 936 1037 -10.8% TOTAL: md=5408B toon=5461B aggregate saving -1.0% ## Why the 30-60% claim doesn't apply TOON's win comes from compressing a verbose JSON baseline like `[{"name":"foo","kind":"function","file":"a.ts","line":42},…]` — the "headers repeated per row" + JSON quoting waste is what its header-once shape removes. But our markdown is ALREADY row-shaped: - `### foo (function)` ≈ `foo,function` (no quoting). - `a.ts:42` ≈ `a.ts,42` (single-char delimiter, both compact). - `- name (kind) - file:line` is shorter than the comma-tuple form when names are short. There's no fat to trim. On callers (the densest row shape) TOON is 10% LARGER because the per-row `- ` bullet syntax is a one-byte overhead while TOON's commas waste two chars between every field. ## Decision Skip TOON. Empirical answer matches the backlog's gate — the savings aren't there for our markdown baseline. Adopting it would: - Add per-tool format selector code + tests. - Risk LLM-client misrender (TOON is new; some clients haven't trained on it). - Net zero to net negative on payload bytes. The measurement script (`__tests__/evaluation/toon-measure.ts`) stays in the repo as a permanent record + a re-runnable harness if the markdown formatters ever get verbose enough to flip the math. Run with `npx tsx __tests__/evaluation/toon-measure.ts`. ## Backlog #6 closed. Phase 1-6 of the agentic backlog is now fully resolved (every item shipped or empirically dismissed). Remaining: Phase 7 (colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI), Phase 8 (colbymchenry#19 streaming MCP) — all need user check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olbymchenry#19 successor) Closes the visibility gap that the original colbymchenry#19 (streaming MCP responses) was trying to fix — but via the additive standard MCP mechanism instead of reshaping `tools/call` into a streaming response. User noted: `codegraph_summarize` takes ~10 minutes on this repo and the agent currently sits blind for the whole run. ## Wiring `MCPServer.handleToolsCall` now reads `params._meta?.progressToken`. When present (per the standard MCP spec), it builds an `onProgress` callback that emits `notifications/progress` events on the same stdio channel and forwards it to `ToolHandler.execute(toolName, args, opts)`. The handler threads `opts.onProgress` into a new optional `ToolCtx.reportProgress` field; tool handlers null-check before calling so the no-token path stays cost-free. Wired into the four tools that legitimately take minutes: - `codegraph_admin {action: sync}` → forwards IndexProgress - `codegraph_admin {action: index}` → forwards IndexProgress - `codegraph_summarize` → forwards (done, total) per symbol - `codegraph_embed` → forwards (done, total) per batch (extended cg.llm.embedAll with an onProgress option that embedAllSummaries already had internally — just plumbed through) The IndexProgress shape `{ phase, current, total, currentFile? }` gets surfaced as: `progress = current`, `total`, `message = "<phase>: <currentFile>"`. ## Why this beats stream-as-result (colbymchenry#19) - Additive — clients without a progressToken see no behavior change. Stream-as-result would have required reshaping every response. - Spec-conformant — standard MCP `notifications/progress` works with any compliant client (including Claude Code). - Targets the actual long-running tools, not the bounded ones. `codegraph_explore` runs in 15ms (per B7); streaming partial results would have added complexity for zero benefit. - 80 LOC instead of the projected 200 LOC + transport-layer risk. ## Reviewer APPROVE; two info-only polish items addressed in this commit: - Use canonical `IndexProgress` import in admin.ts instead of inline structural type (so future field additions surface as compile errors at the callback sites). - Comment on the `lockContention` early-return path explaining that no progress events fire in that branch (acceptable: the contention check is immediate, no work done). ## Tests 4 integration tests in `__tests__/mcp-progress-notifications.test.ts`: - admin sync forwards IndexProgress events - admin index forwards IndexProgress events - no-callback path is zero-cost (no events when opts.onProgress isn't supplied) - monotonic-within-phase property check Suite: **1599 / 34 / 0** (+4 B13 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y#18, colbymchenry#19) Two friction items uncovered in the round-2 stress test, both about the boundary between long-running MCP servers and ad-hoc fresh-code CLI processes acting on the same per-project DB. ## colbymchenry#19 — cli:dev silently migrated queried projects (severe) Reproduced live: a single `cli:dev at-range ... -p ollama-source` ran forward migrations 33→35 on ollama silently. The running MCP server (started at v32 binary) then couldn't access ollama either — the schema-guard returned "stale code, restart" on every tool call. **A read-style CLI invocation effectively bricked MCP access to the queried project.** Fix: `OpenOptions.autoMigrate` opt-in (default false). When the on-disk schema is older than the binary's CURRENT_SCHEMA_VERSION and autoMigrate is unset, `DatabaseConnection.open` throws with explicit recovery commands (`admin migrate` / `admin sync` / `admin index --force`) AND the post-migrate "restart your MCP server" caveat. Newer-than-binary DBs always fail (B4 silent- corruption path). Write entry points opt in to autoMigrate=true — admin sync/index/migrate, summarize, embed, classify, the MCP server's default-project open, the cross-project cache, and `runViaMCP`'s shim (which derives autoMigrate from the tool's `isWriteTool` flag, automatically gating per-tool without per-command edits). New `codegraph_admin({action: 'migrate'})` MCP action + `codegraph admin migrate` CLI command: cheapest recovery path. The CLI does a two-phase open (default → autoMigrate=true on throw) so it can distinguish "already current" from "migrated this run". ## colbymchenry#18 — MCP server's tool registry frozen at startup Repro: a `codegraph_at_range` call (added in commit 50a9ebf, after the running server's start) returned `Error: No such tool available`. ES modules cache; can't hot-reload without restart. Fix: cannot make new tools live without restart. Added VISIBILITY in `codegraph_status` — a new "Tool registry drift" section fires when the on-disk count of `_TOOL: ToolModule`-exporting files exceeds the loaded count. Content-sniff filter (regex on file body) keeps the count accurate as new helper modules land without touching the status code. Suppressed when in sync. Status keeps `bypassSchemaGuard: true` so this signal stays reachable when other tools are blocked — exactly when the agent needs it most. ## Reviewer findings (3 of 3 addressed before commit) - `isToolFile` was missing 5 helper files (env-refs, explore- budget, result-formatters, sql-refs, symbol-resolver) that passed the filename filter but didn't have `_TOOL` exports — permanent false-positive drift warning. Switched to content sniff via the regex `/export const X_TOOL: ToolModule/`. - `retryInitIfNeeded` at src/mcp/index.ts:480 was missing `autoMigrate: true`, so a stale-schema retry would silently fail and leave the server permanently without a default project. Now matches `tryInitializeDefault`'s policy. - `handleMigrate`'s "Migrations applied" message was misleading for already-current DBs (the project-cache opens with autoMigrate=true, so by handler time migrations are already done). Reworked to state the resulting version neutrally with an "already current / behind by N" qualifier. Suite 1729/34/0 (+1 gating test in foundation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… intent coverage hint + compare_to_ref skipped/suppress flags Five friction-tracker items addressed. Three sub-agents in parallel on disjoint files; reviewer pass with .claude/reviewer-memo.md prepended caught two real correctness bugs and two info-level items which are all addressed in this commit. ## colbymchenry#17 — dead_code static excludes test-bed fixtures src/mcp/tools/dead-code.ts gains an excludeFixtures: boolean arg (default true). When true, formatStaticDeadCode requests 4× overscan from findGraphCandidates, runs filterFixtureNodes (regex match against docs/test-beds/, __tests__/fixtures/, test/fixtures/, spec/fixtures/), then slices to the user's maxCandidates. New helper filterFixtureNodes is module-private (no speculative export). Reviewer-caught correctness gap: the schema described excludeFixtures as a general filter but llmFindDeadCode in mode='judge' was called with only { maxCandidates }, so fixture filtering silently no-op'd in judge mode. Fixed by extending the schema description to caveat "applies to mode='static' only — judge mode tracked as follow-up"; the wider judge-mode wiring is non-trivial (llmFindDeadCode's args interface needs extending) and is left as a tracked task. Reviewer-caught info gap: 4× overscan doesn't suffice when fixture density exceeds 75% of the candidate set. Now surfaces a warning when overscan was exhausted AND we still under-filled — the agent sees "raise maxCandidates if you need more" instead of silently truncating. New test file __tests__/mcp-dead-code-fixtures.test.ts (3 cases): filter on, filter off, spec/fixtures pattern. ## colbymchenry#19 — search mode='intent' attaches coverage hint on 0 hits src/mcp/tools/_search-intent.ts. When handleSearchIntent returns empty, computes summary coverage and appends one of two hints: - coverage < 50%: "intent-search depends on LLM summaries (current coverage X%) — run codegraph summarize to expand the corpus, or fall back to mode='exact' / codegraph_grep" - coverage >= 50%: "0 hits at X% coverage — concept may not be summarised yet, or may not exist in the codebase. Try codegraph_explore" Non-empty results unchanged. Coverage-stat errors are caught and logged so they don't fail the search. Reviewer-caught correctness bug: getSummaryCoverage(cg.queries) was called WITHOUT a kinds filter; the helper's own JSDoc warns that "counting parameters/imports/files in the denominator would understate coverage and confuse the user". Fixed by passing SUMMARIZABLE_KINDS (already exported from src/llm/summarizer.ts and used by status.ts the same way). Three OTHER call sites in the codebase also pass no kinds — filed as colbymchenry#49 follow-up since they're out of scope for this commit. New tests added to __tests__/search-intent.test.ts (2 cases: low-coverage hint, high-coverage hint). ## colbymchenry#23 — compare_to_ref surfaces skipped (non-TS / non-indexed) files src/compare/index.ts adds filesSkipped: number to CompareResult. The compare logic counts files git reports as changed but cg can't structurally diff (non-indexed languages, .md/.json, binary, etc.). Formatter src/mcp/tools/compare.ts surfaces "> N file(s) skipped (non-indexed or non-TS)" only when count > 0 (no noise on clean diffs). Manual eyeball on HEAD~5 confirmed correct count. ## colbymchenry#24 — compare_to_ref suppressLineRangeOnly flag src/compare/index.ts adds suppressLineRangeOnly: boolean (default false for backward compat — existing callers asserting on result.totals.modified would break). Adds lineRangeOnlyCount to FileDelta. New helpers isPureLineRangeOnly / applyLineRangeOnlySuppression (module-private). When true, FileDeltas where every modified symbol's reasons are exactly ['line range changed'] (no signature change, no modifier flip, no body diff) collapse into a single per-file roll-up: "src/foo.ts: 14 symbols renumbered (no content change)". Mixed files (real change + renumber) keep real changes individually shown. MCP schema in compare.ts exposes the flag. Manual eyeball on HEAD~5 of this repo: 96/174 modified symbols (55%) were pure-renumber noise that suppression collapsed cleanly into 7 roll-up lines. New tests added to __tests__/compare.test.ts (4 cases): filesSkipped counts non-TS files, filesSkipped: 0 doesn't add the line, suppress collapses pure-renumber, mixed files keep real changes individually. ## colbymchenry#34 — at_range rejects paths outside project root src/mcp/tools/at-range.ts. New validateFileWithinRoot helper: path.resolve canonicalization on both sides, checks "equals root OR starts with root + sep". Single-range and bulk-range forms both validate; bulk form fails the whole call if ANY range is out-of-root (no silent filtering). Reviewer-caught info: documented symlink limitation in JSDoc — within-root symlinks pointing OUTSIDE the root will pass; if untrusted symlinks become an issue, swap to fs.realpathSync. New tests added to __tests__/at-range.test.ts (2 cases: traversal '../../etc/passwd' rejected, bulk form with one out-of-root range fails the whole call). ## Reviewer-caught items beyond the original 5 - New friction colbymchenry#48 filed: typescript-lsp plugin reverts sub-agent Edit calls on first application (not a codegraph defect; harness interaction). Sonnet caught it during colbymchenry#23/colbymchenry#24 work and re-applied successfully on second pass. - New friction colbymchenry#49 filed: 3 other getSummaryCoverage call sites in status.ts and bin/codegraph.ts have the same denominator-inflation bug — out of scope for this commit, tracked separately. ## Verification - 9 files (8 modified + 1 new test), +398/-13 - npm run typecheck — clean - 50/50 tests pass in focused slice (compare / at-range / mcp-dead-code-fixtures / search-intent) - New module-private helpers (filterFixtureNodes, isPureLineRangeOnly, applyLineRangeOnlySuppression, validateFileWithinRoot) all have concrete in-tree callers in the same diff — no speculative exports per reviewer-memo item #7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…henry#19) Closes friction colbymchenry#19 — sub-agents (Haiku especially) commit + push their own work when the brief is silent on commits, producing split history with awkward attribution. Caught 2026-05-09 in `a928395` (Set B Haiku auto-commit) followed by `6ef55f6` (parent commit double-attributing the same work). This is a process-level fix, not code: 1. CLAUDE.md gains a "Sub-agent briefing — checklist for parallel work" section. Three required lines for every Edit/Write-bearing sub-agent brief: - DO NOT commit (leave changes in working tree). - Use RELATIVE paths only — don't cd to absolute paths. - Disjoint file scope. Plus two optional but useful ones (state baseline, scope cap). 2. .claude/reviewer-memo.md gains a recurring scrutiny area colbymchenry#10: Sub-agent autonomous-commit attribution. The reviewer will flag commit-message attribution mismatches when work is split across multiple commits in the immediate ancestry. Info-severity, not merge-blocking. 3. Memory note `feedback_subagent_no_autonomous_commit.md` added alongside the existing `feedback_subagent_worktree_absolute_paths.md`, indexed in MEMORY.md. Read at session start for permanent recall. No code change. Suite 2014 / 0 / 34 (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…colbymchenry#48/colbymchenry#57) Concrete recurrence counts and failure-mode descriptions for three recurring frictions that briefs keep accidentally omitting: - **colbymchenry#19 (DO NOT commit):** added "Place this near the TOP of the brief" + "Recurred 3× across the 2026-05-09 mega session when omitted." - **colbymchenry#57 (RELATIVE paths):** spelled out the actual failure mode (`cd /Users/.../codegraph/...` → wrong tree → isolation defeated), added the 2026-05-09 recurrence count (3×) alongside the existing 2026-05-08 4/4 datapoint. - **colbymchenry#48 (typescript-lsp Edit revert):** new paragraph at section end with concrete repro paths (`src/installer/shimmer-progress.ts`, `src/installer/pool-recommendation.ts`), the 2× recurrence count, and the mandatory `git diff --stat` + grep + re-apply protocol. 30 LOC total (+23 / -7). Doc-only — no code, no tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

colbymchenry and others added 2 commits February 9, 2026 23:18

colbymchenry mentioned this pull request Feb 10, 2026

Security hardening, correctness fixes, and documentation improvements #16

Closed

6 tasks

colbymchenry merged commit 082513b into main Feb 10, 2026

colbymchenry mentioned this pull request Feb 10, 2026

Add root management and autonomous indexing tools to MCP server #15

Closed

colbymchenry deleted the security-hardening branch February 10, 2026 22:48

This was referenced Apr 26, 2026

feat(search): subword tokens + Porter stemmer + stopword filter for FTS #104

Open

vector search #87

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security hardening: path validation, input clamping, safe JSON, file locking#19

Security hardening: path validation, input clamping, safe JSON, file locking#19
colbymchenry merged 2 commits into
mainfrom
security-hardening

colbymchenry commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

colbymchenry commented Feb 10, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant