Security hardening: path validation, input clamping, safe JSON, file locking#19
Merged
Conversation
…locking Implements security improvements inspired by PR #16 (credit: MO2k4): - Add validatePathWithinRoot() to prevent path traversal attacks in extraction and context building - Clamp MCP tool inputs (limit, depth, maxDepth) to sane ranges - Use atomic writes (temp file + rename) for config saves - Add symlink cycle detection in directory scanning to prevent infinite loops - Replace all JSON.parse calls in db/queries.ts with safeJsonParse fallbacks to handle corrupted database metadata gracefully - Add cross-process FileLock for DB write operations (indexAll, indexFiles, sync) to prevent concurrent writes from CLI, MCP server, and git hooks - Remove unused path import from context/index.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests cover all security utilities introduced in the previous commit: - validatePathWithinRoot: path traversal prevention (7 tests) - safeJsonParse: corrupted JSON fallback handling (6 tests) - clamp: input range clamping (5 tests) - FileLock: cross-process file locking (7 tests) - Atomic config writes: temp file + rename pattern (3 tests) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 tasks
colbymchenry
added a commit
that referenced
this pull request
Feb 10, 2026
Covers arrow function extraction, best-candidate resolution, graph traversal direction fix, MCP symbol disambiguation, output truncation, CLI uninit command, and more. Tests requiring better-sqlite3 native bindings are conditionally skipped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 26, 2026
Closed
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
…colbymchenry#29 won't-do) Builds out the eval harness so the future ranking arc (B colbymchenry#19, Aider-style TF-IDF + PageRank) can be measured. Pre-PR the harness ran 12 Elasticsearch cases and produced JSON reports — but had no comparison mode and no self-codebase cases, so a developer had to manually diff JSONs and check out a separate big codebase to validate any ranking change. What this adds -------------- - __tests__/evaluation/compare.ts: pure module + CLI. compareReports returns per-case + summary delta; formatComparison renders the human table; standalone CLI exits non-zero on regression. Budget: >0.10 per-case recall drop OR >0.05 mean recall drop = fail. - __tests__/evaluation/cases-self.ts: 11 cases targeting THIS repo's own indexed symbols (CodeGraph, searchNodes, ToolModule, compareToRef, ExtractionOrchestrator, etc.). Lets developers iterate on ranking without an external codebase. - runner.ts: argv parser with --cases self|elasticsearch and --compare <baseline.json>; env-var fallbacks (EVAL_CODEBASE, EVAL_CASES, EVAL_COMPARE). On --compare, the runner re-loads the baseline + invokes compareReports + prints formatComparison + exits non-zero on regression beyond budget. - npm run eval:self script. - .gitignore: __tests__/evaluation/results/. What I had to fix mid-implementation - runner.ts referenced `cg.findRelevantContext` (not on CodeGraph; it's on cg.contextBuilder). Fixed. - runner.ts used `__dirname` (ESM-incompatible). Switched to `import.meta.dirname`. Reviewer pass — caught + fixed - scoring.ts MRR was buggy for multi-symbol cases: iterated over expectedSymbols in expected-array order and recorded "the rank of the first one found in that iteration", not the BEST rank across all found symbols. Standard MRR is reciprocal of the highest-ranked relevant result. Renamed `firstRank` → `bestRank` and changed the update to `Math.min`-style semantics. - runner.ts argv parser silently dropped --compare's value when the flag was the last token. Now errors with exit code 2. - self-explore-compare-to-ref case had `getLineRangeHistory` as an expected symbol — that function lives in the blame tool, not compare-to-ref. Replaced with FileDelta (real compare-to-ref type). - Runner's meanMRR filter used `startsWith('search-')` but self cases use `self-search-*` prefix; the filter dropped them silently. Now matches `'-search-'` segment. Ran on this repo: 9/11 cases pass (was 8/11 before the case fix); mean recall 0.79; the 2 remaining failures (extraction-pipeline, search-cascade) are real recall gaps the future ranking arc is intended to close — confirmed by reviewer that the expected symbols exist at the cited file:line. Also: B colbymchenry#29 (upstream PRs to colbymchenry/codegraph) struck as won't-do per user — keeping changes in this fork. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1392/13/0 (eval is run-on-demand, not a vitest test). - npm run eval:self — ran end-to-end, saved JSON, --compare against itself shows zero deltas + within-budget verdict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
…y#19) Two ranking changes in findRelevantContext, gated by the eval harness shipped in B colbymchenry#26: 1. Common-term dampener (one-sided IDF) in runMultiTermTextSearch. Each per-term search-result score gets multiplied by `1 - DAMPEN_RATE * (numHits / fetchLimit)`. With DAMPEN_RATE=0.5: a term that saturates the fetch cap (16/16) gets 0.5×; mid-saturation (8/16) gets 0.75×; rare hits (1/16) get effectively 1.0×. Counters the cross-call score-summing problem where common tokens ("nodes", "parse" in this repo) drowned out rare ones ("extraction") because each independent per-term search only knows BM25-IDF within ITS own results, not across calls. 2. PageRank centrality boost in collectAndScoreCandidates. `score *= 1 + γ * sqrt(centrality)` with γ=5. Pulls high-centrality hubs (ExtractionOrchestrator, etc.) into entry-point selection. Sqrt-smooths the long tail so leaves barely move while hubs at the top centrality (~0.12 in this repo) get +173%. Eval results ------------ self-eval suite (11 cases): mean recall 0.79 → 0.91 (+0.121), pass 9/11 → 10/11. Zero per-case regressions. Two cases gained: - self-explore-extraction-pipeline: 0.00 → 1.00 - self-explore-search-cascade: 0.67 → 1.00 The gate from B colbymchenry#26 (compare.ts) was load-bearing — multiple intermediate iterations were rejected: - bidirectional IDF (`log(1 + cap/hits)`): regressed self-explore-compare-to-ref by 0.33. Rolled back. - cliff dampener (saturated→0.5×, else→1.0×, no centrality change): flipped extraction-pipeline +1.00 but regressed biomarker-engine -0.33. Rejected. - smooth dampener at DAMPEN_RATE=0.4: zero movement either direction. - DAMPEN_RATE=0.5 with γ=2 centrality (existing): same trade as cliff. Rejected. - DAMPEN_RATE=0.5 with γ=5 centrality: biomarker-engine recovered (centrality boost on its hub symbols compensated for the dampened common-term scores). Two wins held. Gate passed. Reviewer pass — docstring rot fixed - CENTRALITY_BOOST_WEIGHT JSDoc still showed γ=2 arithmetic examples after I bumped the constant to 5. Updated to the correct +173% / +50% / +16% percentages. - DAMPEN_RATE inline comment showed numbers from a superseded DAMPEN_RATE=0.4 iteration. Updated to the shipped 0.5 values. - Both flagged via memo scrutiny-area #1 (docstring rotting after a behavior change). Memo workflow paying off — fifth consecutive review where memo content was load-bearing. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1392/13/0 (unchanged; ranking changes measured via eval, not vitest). - npx tsx __tests__/evaluation/runner.ts . --cases self --compare __tests__/evaluation/baseline-self.json — within budget, +0.121 mean recall, +1 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
…henry#23) Flip the SQLite backend default. Pre-PR, createDatabase tried better-sqlite3 first and fell back to node:sqlite; the package shipped better-sqlite3 in optionalDependencies so npm install attempted to compile native bindings on every fresh install. Post-PR, node:sqlite is the built-in default (Node 22.5+, zero native compile, zero platform binaries); better-sqlite3 remains supported as a perf opt-in (~1.5× faster on indexing) when the user installs it explicitly. The Node 22 floor is already in place (engines.node = ">=22.5.0"), so this isn't a forced version bump — just exercising what the existing floor enabled. Why this is a win - Fresh installs no longer compile native bindings (the #1 cause of "npm install fails on Apple Silicon" issues per the dep audit). - 22 MB net install savings versus the previous default path on platforms that pulled prebuilt better-sqlite3 binaries. - Users who want the perf opt-in still get it via a single explicit `npm install better-sqlite3 --save`. Adapter shims required - node:sqlite throws on unused named parameters; better-sqlite3 silently ignored them. Several call sites pass `bindingsFromObject(record, FULL_SCHEMA)` where the SQL only binds a subset (notably upsertFile excludes churn-managed columns). Added `extractNamedPlaceholders` to filter the bindings object on each prepare()'d statement's run/get/all. - node:sqlite throws on double-close; better-sqlite3 was silently a no-op. Made `NodeSqliteAdapter.close()` idempotent to match the existing contract. Surfaces flipped - src/db/sqlite-adapter.ts createDatabase priority order + banner text (now flags failed perf opt-in, not active fallback). - src/mcp/tools/status.ts reports `node:sqlite (built-in default)` vs `better-sqlite3 (perf opt-in, native)`. - src/bin/codegraph.ts CLI status display same. - package.json dropped better-sqlite3 from optionalDependencies. - @types/better-sqlite3 kept in devDependencies for compile-time SqliteDatabase interface. Test fallout addressed - wasm-savepoint banner-text assertions: updated for the new framing. - mcp-llm-visibility + sqlite-vec backend assertions: now match either flavor so the test stays portable across both dev environments. - cochange + search-quality migration tests that directly `require('better-sqlite3')`: now `it.skipIf(!hasBetterSqlite3)`. 21 tests skip when the perf opt-in isn't installed (they all poke the migration path against a raw better-sqlite3 handle; production uses the adapter regardless). Reviewer pass — fixed before commit - buildFallbackBanner said "Or remove the perf opt-in" then printed `npm install better-sqlite3 --save` — that's INSTALL, not remove. Bug. Fixed to `npm uninstall better-sqlite3`. Test assertion flipped to match. - extractNamedPlaceholders JSDoc now notes that node:sqlite's "Unknown named parameter" diagnostic for typo'd keys is silenced as a deliberate trade. - Sixth consecutive review where memo content (scrutiny-area #1 docstring rot) drove the catch. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1371 passed / 34 skipped / 0 failed (vs 1392/13/0 baseline; +21 skipped reflect direct-handle migration tests that need better-sqlite3). - npm run eval:self --compare baseline-self.json — within budget, +0.121 mean recall (B colbymchenry#19 ranking arc holds on the new default backend). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
User-driven backlog cleanup before next session: - **colbymchenry#27 GraphQL `extend type` — verified NOT implemented.** I had briefly thought this was done; double-check found `src/extraction/graphql-extractor.ts:131` explicitly skips `type_system_extension` in v1 with a "needs a second resolution pass we don't do yet" comment. No fixture coverage. Backlog entry annotated to flag the verification result so a future session doesn't re-make the same mis-recall. - **colbymchenry#25 build-snapshot:** annotated with the 2026-05-03 triage numbers (~50ms upper-bound win on a ~261ms cold start; ESM + native-module fragility) but kept on the backlog per user — worth picking up later when TS 7 + ESM build-snapshot tooling matures. Also checks in `__tests__/evaluation/baseline-self.json` — the self-eval baseline captured before B colbymchenry#19 (ranking arc) shipped, referenced by the runner's `--compare` flag. Without it in-repo, every fresh checkout would have to regenerate the baseline before gating ranking changes against it. The baseline is the **pre-improvement** snapshot so its mean recall (0.79) is the floor every future ranking change must clear or stay flat against. Bump the file deliberately when a verified improvement should be the new floor — never silently overwrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
…olbymchenry#8) Promotes the resolver numeric `metadata.confidence` to a categorical, queryable column on `edges`. `codegraph_callers`, `_callees`, `_impact` now annotate each row with `*(INFERRED)*` when the edge is not concrete, and accept a new `minConfidence` arg to drop edges below a chosen level — useful for refactor-impact analysis where false-positive reach inflates the blast radius. Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy / structural / extractor-direct edges need no producer change. Index on the column for filter-down queries. Idempotent ADD COLUMN with table-exists guard following the convention from migrations 020 / 021 / 023 / 030. Producer: `ReferenceResolver.createEdges` calls a new free `classifyConfidence(resolvedBy, score)` helper: - import / qualified-name / file-path / exact-match → EXTRACTED - framework with score >= 0.85 → EXTRACTED, else INFERRED - instance-method / fuzzy → INFERRED AMBIGUOUS is not stamped from the resolver path in v1 — needs tie-tracking from the candidate picker. Tracked as B3. Consumers (`result-formatters.ts`): - `formatConfidence(edge)` — markdown italic suffix (empty for EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise). - `parseMinConfidence(raw)` — returns level / null / errorResult. - `filterByConfidence(rows, min)` — drop below threshold. - `CONFIDENCE_RANK` — numeric ordering for inline filter loops. Wired into all four code paths in `callers.ts` (collectCallers, collectCallersForSource, collectTypeUsers, formatGroupedCallers), both paths in `callees.ts`, and `impact.ts` with a `filterImpactByConfidence` BFS that prunes nodes unreachable along the surviving edge set. Type cleanup: `Edge.confidence` added with JSDoc; the long-dead `Edge.provenance` field also got a JSDoc note pointing at the SCIP-integration-reservation explanation in memory. Reviewer-memo #4 catch (schema-version test forgetfulness): noticed `__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)` even though migrations 030 and 031 had shipped. Catch-up to 32 in this commit covers both the missed bumps and the new migration. Reviewer: APPROVE, three info-only findings (type-user display gap, framework-rule calibration table doc, catch of the recurring pr19 pattern). Suite: 1555 / 34 / 0 (+18 new edge-confidence tests). Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression budget vs 091e935 baseline). Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11 trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 4, 2026
…sites (colbymchenry#11) Last Phase 1-6 backlog item. Replaces the "agent stitches codegraph_callers + _history + _biomarkers manually for every stack frame" loop with one tool: paste a stack trace, get a ranked list of likely fix-site candidates with per-row "why suspected" reasons. ## Pipeline 1. `parseTrace(text)` — multi-format extractor: V8/Node, Python, Java/Kotlin/Scala, Go, Rust, plus a generic `path/to/file.ext:line` fallback. Caps at 200 frames; dedupes by `(file, line)`; normalises Windows backslashes; first-pattern-per-line wins. 2. `resolveTraceFile` — longest-suffix match between absolute trace paths and project-relative indexed paths, with a path-boundary check so `foo/bar.ts` doesn't match `sub-foo/bar.ts`. 3. `findEnclosingNode` (existing helper from index-hooks/enclosing.ts, shared with codegraph_grep) attributes each frame to its enclosing function/method/class via line number. 4. `scoreCulprits` composite: `1 / (topRank + 1)` for frame position (top of stack wins decisively), `+0.3` for risk-biomarker overlap (god_class / complex_method / large_method / nested_complexity), `+0.2` for recent file churn (touched within 30 days, with commit count surfaced in the reason). 5. Output: ranked candidates with per-row reasons + an "unmapped frames" footer (capped at 5 samples) so the agent sees what couldn't be attributed (`node_modules/`, generated files, etc.). ## CLI mirror `codegraph trace-to-culprits` reads from stdin (`cat error.log | codegraph trace-to-culprits`) or from `--trace "..."`. Routes via runViaMCP per the repo convention. ## Reviewer cycle Round 1 returned BLOCK + REQUEST_CHANGES + info — all addressed: - **R1 (BLOCK, correctness)**: `recentCutoff` was computed in ms but `last_touched_ts` is unix seconds — the `>=` comparison always evaluated false, killing the churn signal silently on every project. Fixed with `Math.floor(Date.now() / 1000)` and a new R1-regression-guard test that stamps a fresh `last_touched_ts` via the same `applyChurnDeltas` helper the churn miner uses, then asserts the rendered output mentions "recent churn". - **R2 (REQUEST_CHANGES, correctness)**: inline `cg.queries.db .prepare(...)` violated the per-domain query-file convention and re-prepared the same statement per culprit. Switched to `getFileByPath` from queries-files.ts (cached statement, returns FileRecord with `lastTouchedTs` and `commitCount`). - **R3 (info, perf)**: `resolveTraceFile` ran `getAllFiles` once per frame — N full-table scans for an N-frame trace. Hoisted to once-per-call at the top of `buildCulprits`. ## Tests 13 trace tests in `__tests__/mcp-trace-to-culprits.test.ts`: - parseTrace: V8 / Python / Java / Go / dedup / 200-frame cap / no-frames / Windows backslash normalisation - End-to-end: V8 trace → enclosing symbols ranked top-of-stack first, unmapped-frames footer, no-refs error message, limit cap, R1-regression guard for the churn signal CLI dogfood: `cat trace | npm run cli:dev -- trace-to-culprits` parses a real production-shaped V8 trace and maps frames to `src/mcp/tools/trace-to-culprits.ts` symbols. Suite: **1595 / 34 / 0** (+13 trace tests). Eval: 18/18 | recall=1.00 | mrr=0.91 | within budget (re-baselined after colbymchenry#11 added new files to the corpus, which shifted the explore-pipeline case rank — case still PASSes its 0.5 threshold, the relative regression was corpus drift, not behaviour change). ## Backlog after this Phase 1-6 complete. Remaining: Phase 7 (colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI — both need user check), Phase 8 (colbymchenry#19 streaming MCP), and #6 TOON (deferred pending measurement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 4, 2026
Backlog #6 proposed adopting TOON (Token-Oriented Object Notation, header-once / rows-as-tuples) as a smaller alternative to the current markdown formatters for tabular MCP responses. The backlog explicitly gated the decision: "Verify actual savings on captured queries before flipping default — the 30-60% claim is for ideal tabular data." This commit ships the measurement and the answer. ## Result Eight representative queries against this project's own .codegraph, covering search (long signatures), suggest (short rows), callers (20-row mixed-confidence list): sample rows md(B) toon(B) saving ────────────────────────────────────── ──── ───── ─────── ────── search "CodeGraph" 10 683 661 +3.2% search "extractFromSource" 10 1483 1444 +2.6% search "compareToRef" 3 461 468 -1.5% search "handleSearch" 6 760 757 +0.4% search "parseTrace" 2 291 298 -2.4% suggest "CodGrap" 10 408 409 -0.2% suggest "extracFromSorce" 10 386 387 -0.3% callers of extractFromSource 14 936 1037 -10.8% TOTAL: md=5408B toon=5461B aggregate saving -1.0% ## Why the 30-60% claim doesn't apply TOON's win comes from compressing a verbose JSON baseline like `[{"name":"foo","kind":"function","file":"a.ts","line":42},…]` — the "headers repeated per row" + JSON quoting waste is what its header-once shape removes. But our markdown is ALREADY row-shaped: - `### foo (function)` ≈ `foo,function` (no quoting). - `a.ts:42` ≈ `a.ts,42` (single-char delimiter, both compact). - `- name (kind) - file:line` is shorter than the comma-tuple form when names are short. There's no fat to trim. On callers (the densest row shape) TOON is 10% LARGER because the per-row `- ` bullet syntax is a one-byte overhead while TOON's commas waste two chars between every field. ## Decision Skip TOON. Empirical answer matches the backlog's gate — the savings aren't there for our markdown baseline. Adopting it would: - Add per-tool format selector code + tests. - Risk LLM-client misrender (TOON is new; some clients haven't trained on it). - Net zero to net negative on payload bytes. The measurement script (`__tests__/evaluation/toon-measure.ts`) stays in the repo as a permanent record + a re-runnable harness if the markdown formatters ever get verbose enough to flip the math. Run with `npx tsx __tests__/evaluation/toon-measure.ts`. ## Backlog #6 closed. Phase 1-6 of the agentic backlog is now fully resolved (every item shipped or empirically dismissed). Remaining: Phase 7 (colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI), Phase 8 (colbymchenry#19 streaming MCP) — all need user check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 4, 2026
…olbymchenry#19 successor) Closes the visibility gap that the original colbymchenry#19 (streaming MCP responses) was trying to fix — but via the additive standard MCP mechanism instead of reshaping `tools/call` into a streaming response. User noted: `codegraph_summarize` takes ~10 minutes on this repo and the agent currently sits blind for the whole run. ## Wiring `MCPServer.handleToolsCall` now reads `params._meta?.progressToken`. When present (per the standard MCP spec), it builds an `onProgress` callback that emits `notifications/progress` events on the same stdio channel and forwards it to `ToolHandler.execute(toolName, args, opts)`. The handler threads `opts.onProgress` into a new optional `ToolCtx.reportProgress` field; tool handlers null-check before calling so the no-token path stays cost-free. Wired into the four tools that legitimately take minutes: - `codegraph_admin {action: sync}` → forwards IndexProgress - `codegraph_admin {action: index}` → forwards IndexProgress - `codegraph_summarize` → forwards (done, total) per symbol - `codegraph_embed` → forwards (done, total) per batch (extended cg.llm.embedAll with an onProgress option that embedAllSummaries already had internally — just plumbed through) The IndexProgress shape `{ phase, current, total, currentFile? }` gets surfaced as: `progress = current`, `total`, `message = "<phase>: <currentFile>"`. ## Why this beats stream-as-result (colbymchenry#19) - Additive — clients without a progressToken see no behavior change. Stream-as-result would have required reshaping every response. - Spec-conformant — standard MCP `notifications/progress` works with any compliant client (including Claude Code). - Targets the actual long-running tools, not the bounded ones. `codegraph_explore` runs in 15ms (per B7); streaming partial results would have added complexity for zero benefit. - 80 LOC instead of the projected 200 LOC + transport-layer risk. ## Reviewer APPROVE; two info-only polish items addressed in this commit: - Use canonical `IndexProgress` import in admin.ts instead of inline structural type (so future field additions surface as compile errors at the callback sites). - Comment on the `lockContention` early-return path explaining that no progress events fire in that branch (acceptable: the contention check is immediate, no work done). ## Tests 4 integration tests in `__tests__/mcp-progress-notifications.test.ts`: - admin sync forwards IndexProgress events - admin index forwards IndexProgress events - no-callback path is zero-cost (no events when opts.onProgress isn't supplied) - monotonic-within-phase property check Suite: **1599 / 34 / 0** (+4 B13 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 8, 2026
…y#18, colbymchenry#19) Two friction items uncovered in the round-2 stress test, both about the boundary between long-running MCP servers and ad-hoc fresh-code CLI processes acting on the same per-project DB. ## colbymchenry#19 — cli:dev silently migrated queried projects (severe) Reproduced live: a single `cli:dev at-range ... -p ollama-source` ran forward migrations 33→35 on ollama silently. The running MCP server (started at v32 binary) then couldn't access ollama either — the schema-guard returned "stale code, restart" on every tool call. **A read-style CLI invocation effectively bricked MCP access to the queried project.** Fix: `OpenOptions.autoMigrate` opt-in (default false). When the on-disk schema is older than the binary's CURRENT_SCHEMA_VERSION and autoMigrate is unset, `DatabaseConnection.open` throws with explicit recovery commands (`admin migrate` / `admin sync` / `admin index --force`) AND the post-migrate "restart your MCP server" caveat. Newer-than-binary DBs always fail (B4 silent- corruption path). Write entry points opt in to autoMigrate=true — admin sync/index/migrate, summarize, embed, classify, the MCP server's default-project open, the cross-project cache, and `runViaMCP`'s shim (which derives autoMigrate from the tool's `isWriteTool` flag, automatically gating per-tool without per-command edits). New `codegraph_admin({action: 'migrate'})` MCP action + `codegraph admin migrate` CLI command: cheapest recovery path. The CLI does a two-phase open (default → autoMigrate=true on throw) so it can distinguish "already current" from "migrated this run". ## colbymchenry#18 — MCP server's tool registry frozen at startup Repro: a `codegraph_at_range` call (added in commit 50a9ebf, after the running server's start) returned `Error: No such tool available`. ES modules cache; can't hot-reload without restart. Fix: cannot make new tools live without restart. Added VISIBILITY in `codegraph_status` — a new "Tool registry drift" section fires when the on-disk count of `_TOOL: ToolModule`-exporting files exceeds the loaded count. Content-sniff filter (regex on file body) keeps the count accurate as new helper modules land without touching the status code. Suppressed when in sync. Status keeps `bypassSchemaGuard: true` so this signal stays reachable when other tools are blocked — exactly when the agent needs it most. ## Reviewer findings (3 of 3 addressed before commit) - `isToolFile` was missing 5 helper files (env-refs, explore- budget, result-formatters, sql-refs, symbol-resolver) that passed the filename filter but didn't have `_TOOL` exports — permanent false-positive drift warning. Switched to content sniff via the regex `/export const X_TOOL: ToolModule/`. - `retryInitIfNeeded` at src/mcp/index.ts:480 was missing `autoMigrate: true`, so a stale-schema retry would silently fail and leave the server permanently without a default project. Now matches `tryInitializeDefault`'s policy. - `handleMigrate`'s "Migrations applied" message was misleading for already-current DBs (the project-cache opens with autoMigrate=true, so by handler time migrations are already done). Reworked to state the resulting version neutrally with an "already current / behind by N" qualifier. Suite 1729/34/0 (+1 gating test in foundation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 9, 2026
… intent coverage hint + compare_to_ref skipped/suppress flags Five friction-tracker items addressed. Three sub-agents in parallel on disjoint files; reviewer pass with .claude/reviewer-memo.md prepended caught two real correctness bugs and two info-level items which are all addressed in this commit. ## colbymchenry#17 — dead_code static excludes test-bed fixtures src/mcp/tools/dead-code.ts gains an excludeFixtures: boolean arg (default true). When true, formatStaticDeadCode requests 4× overscan from findGraphCandidates, runs filterFixtureNodes (regex match against docs/test-beds/, __tests__/fixtures/, test/fixtures/, spec/fixtures/), then slices to the user's maxCandidates. New helper filterFixtureNodes is module-private (no speculative export). Reviewer-caught correctness gap: the schema described excludeFixtures as a general filter but llmFindDeadCode in mode='judge' was called with only { maxCandidates }, so fixture filtering silently no-op'd in judge mode. Fixed by extending the schema description to caveat "applies to mode='static' only — judge mode tracked as follow-up"; the wider judge-mode wiring is non-trivial (llmFindDeadCode's args interface needs extending) and is left as a tracked task. Reviewer-caught info gap: 4× overscan doesn't suffice when fixture density exceeds 75% of the candidate set. Now surfaces a warning when overscan was exhausted AND we still under-filled — the agent sees "raise maxCandidates if you need more" instead of silently truncating. New test file __tests__/mcp-dead-code-fixtures.test.ts (3 cases): filter on, filter off, spec/fixtures pattern. ## colbymchenry#19 — search mode='intent' attaches coverage hint on 0 hits src/mcp/tools/_search-intent.ts. When handleSearchIntent returns empty, computes summary coverage and appends one of two hints: - coverage < 50%: "intent-search depends on LLM summaries (current coverage X%) — run codegraph summarize to expand the corpus, or fall back to mode='exact' / codegraph_grep" - coverage >= 50%: "0 hits at X% coverage — concept may not be summarised yet, or may not exist in the codebase. Try codegraph_explore" Non-empty results unchanged. Coverage-stat errors are caught and logged so they don't fail the search. Reviewer-caught correctness bug: getSummaryCoverage(cg.queries) was called WITHOUT a kinds filter; the helper's own JSDoc warns that "counting parameters/imports/files in the denominator would understate coverage and confuse the user". Fixed by passing SUMMARIZABLE_KINDS (already exported from src/llm/summarizer.ts and used by status.ts the same way). Three OTHER call sites in the codebase also pass no kinds — filed as colbymchenry#49 follow-up since they're out of scope for this commit. New tests added to __tests__/search-intent.test.ts (2 cases: low-coverage hint, high-coverage hint). ## colbymchenry#23 — compare_to_ref surfaces skipped (non-TS / non-indexed) files src/compare/index.ts adds filesSkipped: number to CompareResult. The compare logic counts files git reports as changed but cg can't structurally diff (non-indexed languages, .md/.json, binary, etc.). Formatter src/mcp/tools/compare.ts surfaces "> N file(s) skipped (non-indexed or non-TS)" only when count > 0 (no noise on clean diffs). Manual eyeball on HEAD~5 confirmed correct count. ## colbymchenry#24 — compare_to_ref suppressLineRangeOnly flag src/compare/index.ts adds suppressLineRangeOnly: boolean (default false for backward compat — existing callers asserting on result.totals.modified would break). Adds lineRangeOnlyCount to FileDelta. New helpers isPureLineRangeOnly / applyLineRangeOnlySuppression (module-private). When true, FileDeltas where every modified symbol's reasons are exactly ['line range changed'] (no signature change, no modifier flip, no body diff) collapse into a single per-file roll-up: "src/foo.ts: 14 symbols renumbered (no content change)". Mixed files (real change + renumber) keep real changes individually shown. MCP schema in compare.ts exposes the flag. Manual eyeball on HEAD~5 of this repo: 96/174 modified symbols (55%) were pure-renumber noise that suppression collapsed cleanly into 7 roll-up lines. New tests added to __tests__/compare.test.ts (4 cases): filesSkipped counts non-TS files, filesSkipped: 0 doesn't add the line, suppress collapses pure-renumber, mixed files keep real changes individually. ## colbymchenry#34 — at_range rejects paths outside project root src/mcp/tools/at-range.ts. New validateFileWithinRoot helper: path.resolve canonicalization on both sides, checks "equals root OR starts with root + sep". Single-range and bulk-range forms both validate; bulk form fails the whole call if ANY range is out-of-root (no silent filtering). Reviewer-caught info: documented symlink limitation in JSDoc — within-root symlinks pointing OUTSIDE the root will pass; if untrusted symlinks become an issue, swap to fs.realpathSync. New tests added to __tests__/at-range.test.ts (2 cases: traversal '../../etc/passwd' rejected, bulk form with one out-of-root range fails the whole call). ## Reviewer-caught items beyond the original 5 - New friction colbymchenry#48 filed: typescript-lsp plugin reverts sub-agent Edit calls on first application (not a codegraph defect; harness interaction). Sonnet caught it during colbymchenry#23/colbymchenry#24 work and re-applied successfully on second pass. - New friction colbymchenry#49 filed: 3 other getSummaryCoverage call sites in status.ts and bin/codegraph.ts have the same denominator-inflation bug — out of scope for this commit, tracked separately. ## Verification - 9 files (8 modified + 1 new test), +398/-13 - npm run typecheck — clean - 50/50 tests pass in focused slice (compare / at-range / mcp-dead-code-fixtures / search-intent) - New module-private helpers (filterFixtureNodes, isPureLineRangeOnly, applyLineRangeOnlySuppression, validateFileWithinRoot) all have concrete in-tree callers in the same diff — no speculative exports per reviewer-memo item #7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 9, 2026
…henry#19) Closes friction colbymchenry#19 — sub-agents (Haiku especially) commit + push their own work when the brief is silent on commits, producing split history with awkward attribution. Caught 2026-05-09 in `a928395` (Set B Haiku auto-commit) followed by `6ef55f6` (parent commit double-attributing the same work). This is a process-level fix, not code: 1. CLAUDE.md gains a "Sub-agent briefing — checklist for parallel work" section. Three required lines for every Edit/Write-bearing sub-agent brief: - DO NOT commit (leave changes in working tree). - Use RELATIVE paths only — don't cd to absolute paths. - Disjoint file scope. Plus two optional but useful ones (state baseline, scope cap). 2. .claude/reviewer-memo.md gains a recurring scrutiny area colbymchenry#10: Sub-agent autonomous-commit attribution. The reviewer will flag commit-message attribution mismatches when work is split across multiple commits in the immediate ancestry. Info-severity, not merge-blocking. 3. Memory note `feedback_subagent_no_autonomous_commit.md` added alongside the existing `feedback_subagent_worktree_absolute_paths.md`, indexed in MEMORY.md. Read at session start for permanent recall. No code change. Suite 2014 / 0 / 34 (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 9, 2026
…colbymchenry#48/colbymchenry#57) Concrete recurrence counts and failure-mode descriptions for three recurring frictions that briefs keep accidentally omitting: - **colbymchenry#19 (DO NOT commit):** added "Place this near the TOP of the brief" + "Recurred 3× across the 2026-05-09 mega session when omitted." - **colbymchenry#57 (RELATIVE paths):** spelled out the actual failure mode (`cd /Users/.../codegraph/...` → wrong tree → isolation defeated), added the 2026-05-09 recurrence count (3×) alongside the existing 2026-05-08 4/4 datapoint. - **colbymchenry#48 (typescript-lsp Edit revert):** new paragraph at section end with concrete repro paths (`src/installer/shimmer-progress.ts`, `src/installer/pool-recommendation.ts`), the 2× recurrence count, and the mandatory `git diff --stat` + grep + re-apply protocol. 30 LOC total (+23 / -7). Doc-only — no code, no tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements security improvements inspired by PR #16 by @MO2k4. Rather than merging the original PR (which had extensive conflicts against the current codebase), the key ideas were reimplemented fresh.
validatePathWithinRoot()utility used in extraction and context building to reject paths that escape the project rootlimit,depth,maxDepth) are clamped to sane rangesJSON.parsecalls indb/queries.tsreplaced withsafeJsonParsefallbacks for corrupted database metadataFileLockclass prevents concurrent DB writes from CLI, MCP server, and git hooksTest plan
__tests__/security.test.ts)Credit: @MO2k4 for the security audit and ideas in PR #16
🤖 Generated with Claude Code