Skip to content

Fix arrow function and function expression export indexing#18

Merged
colbymchenry merged 2 commits into
colbymchenry:mainfrom
GeneralClaw:fix/arrow-function-export-indexing
Feb 10, 2026
Merged

Fix arrow function and function expression export indexing#18
colbymchenry merged 2 commits into
colbymchenry:mainfrom
GeneralClaw:fix/arrow-function-export-indexing

Conversation

@GeneralClaw
Copy link
Copy Markdown
Contributor

@GeneralClaw GeneralClaw commented Feb 9, 2026

Problem

CodeGraph v0.3.1 fails to index three common TypeScript/JavaScript patterns:

  1. Arrow function exports: export const useAuth = () => { ... } — the arrow_function AST node has no name field, so extractName() returns '<anonymous>' and bails.
  2. Type aliases: export type AuthContextValue = { ... }type_alias_declaration isn't in any extraction list.
  3. Non-function variable exports: export const store = create(...), export const schema = z.object(...), export const config = { ... } — these are call_expression, object, or primitive values, not functions, so they're never visited.

Additionally, isExported() for TS/JS uses a 10-char substring lookback that misses export for deeply nested nodes.

Real-World Impact

Tested on a production monorepo (Expo + FastAPI, 238 files):

Metric Before After Change
Nodes 779 1,172 +50%
Edges 2,599 3,489 +34%
packages/ nodes 0 166 Was completely invisible
variable nodes 0 109 New kind
type_alias nodes 0 105 New kind

7 shared packages (auth, db, cv, i18n, ui, analytics, config) had zero nodes. After this fix, all have full coverage. Only 4 files remain at 0 nodes — all are re-export barrels or ambient declarations with no extractable symbols.

Changes (2 commits)

Commit 1: Arrow function + isExported fix

extractFunction(): When arrow_function or function_expression resolves to '<anonymous>', check the parent variable_declarator for the name:

let name = extractName(node, this.source, this.extractor);
if (name === '<anonymous>' &&
    (node.type === 'arrow_function' || node.type === 'function_expression')) {
  const parent = node.parent;
  if (parent?.type === 'variable_declarator') {
    const varName = getChildByField(parent, 'name');
    if (varName) name = getNodeText(varName, this.source);
  }
}

isExported() (TS + JS): Walk the parent chain instead of 10-char lookback:

isExported: (node, _source) => {
  let current = node.parent;
  while (current) {
    if (current.type === 'export_statement') return true;
    current = current.parent;
  }
  return false;
},

Commit 2: Type alias + exported variable extraction

typeAliasTypes: New field on LanguageExtractor interface. Populated for TypeScript (type_alias_declaration), Go (type_spec), Rust (type_item), C (type_definition), C++ (type_definition, alias_declaration), Swift (typealias_declaration), Kotlin (type_alias). Empty for languages without type aliases.

extractTypeAlias(): New method that creates type_alias kind nodes.

extractExportedVariables(): New method called when visiting export_statement nodes. Finds lexical_declaration > variable_declarator children whose values are NOT already handled by functionTypes (avoids duplicating arrow functions). Creates variable kind nodes for:

  • Zustand stores: export const useX = create(...)
  • XState machines: export const xMachine = createMachine(...)
  • Zod schemas: export const schema = z.object(...)
  • Config objects: export const config = { ... }
  • Constants: export const MAX = 3
  • Arrays: export const NAMES = [...] as const

Tests

Added 17 new test cases in __tests__/extraction.test.ts:

Arrow function tests (6):

  • Exported arrow functions, function expressions, non-exported, anonymous, multiple exports, JavaScript files

Type alias tests (3):

  • Exported type aliases, non-exported, multiple in same file

Exported variable tests (8):

  • Zustand stores, object literals, arrays, primitives, Zod schemas, XState machines
  • No duplication with arrow functions
  • Non-exported const not treated as exported

All 215 tests pass (59 extraction + 156 others). Zero regressions in evaluation benchmarks.

AST Context

For export const useAuth = () => { ... }, tree-sitter produces:

export_statement
  lexical_declaration
    variable_declarator
      name: identifier "useAuth"
      value: arrow_function

For export const config = { ... }:

export_statement
  lexical_declaration
    variable_declarator
      name: identifier "config"
      value: object

The arrow_function/object nodes are 3 levels deep under export_statement, which is why the 10-char lookback failed and why parent-chain walking is needed.

Arrow functions and function expressions assigned to variables
(e.g. `export const useAuth = () => { ... }`) were not being indexed
because the arrow_function AST node has no `name` field — the name
lives on the parent variable_declarator node.

Additionally, `isExported()` for TypeScript and JavaScript extractors
only checked 10 characters back from the node's start position, which
missed `export` for deeply nested nodes like arrow functions inside
variable declarations inside export statements.

Changes:
- extractFunction(): When an arrow_function or function_expression
  resolves to '<anonymous>', look up the parent variable_declarator
  for the name before skipping.
- isExported() (TS + JS): Walk the parent chain to find an
  export_statement ancestor instead of substring matching.
- Add 6 test cases covering arrow function exports, function
  expression exports, non-exported arrow functions, anonymous
  arrow functions, multiple exports, and JavaScript files.

Tested on a real monorepo (238 files): node count increased from
779 to 958 (+23%), with 94 new nodes in packages/ that previously
had 0 coverage.
Extend extraction to index two additional categories of symbols
that were previously invisible:

1. Type aliases (e.g. `export type X = ...` in TypeScript,
   `type X` in Go, `type X = ...` in Rust, `typealias X` in Swift,
   `type_alias` in Kotlin). Adds `typeAliasTypes` to the
   LanguageExtractor interface with values for all 13 languages.

2. Exported variable declarations that aren't functions, including:
   - Zustand stores: `export const useX = create(...)`
   - XState machines: `export const xMachine = createMachine(...)`
   - Zod schemas: `export const schema = z.object(...)`
   - Config objects: `export const config = { ... }`
   - Constants: `export const MAX = 3`
   - Arrays: `export const NAMES = [...] as const`

   The extractExportedVariables() method is called when visiting
   export_statement nodes. It skips variable_declarator values that
   are already handled by functionTypes (arrow_function,
   function_expression) to avoid duplicate extraction.

Adds 11 new test cases (59 total extraction tests, 215 total).

Tested on production monorepo: nodes increased from 958 to 1,172
(+22%), with 109 new variable nodes and 105 new type_alias nodes.
Only 4 files remain at 0 nodes — all are re-export barrels or
ambient declaration files with no extractable symbols.
@colbymchenry colbymchenry merged commit 38fac1f into colbymchenry:main Feb 10, 2026
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
…olbymchenry#8)

Promotes the resolver numeric `metadata.confidence` to a categorical,
queryable column on `edges`. `codegraph_callers`, `_callees`,
`_impact` now annotate each row with `*(INFERRED)*` when the edge is
not concrete, and accept a new `minConfidence` arg to drop edges
below a chosen level — useful for refactor-impact analysis where
false-positive reach inflates the blast radius.

Migration 032 adds `edges.confidence TEXT` (nullable; read-cast in
EDGE_SCHEMA collapses NULL → `EXTRACTED`). Nullable lets legacy /
structural / extractor-direct edges need no producer change. Index
on the column for filter-down queries. Idempotent ADD COLUMN with
table-exists guard following the convention from migrations 020 /
021 / 023 / 030.

Producer: `ReferenceResolver.createEdges` calls a new free
`classifyConfidence(resolvedBy, score)` helper:
  - import / qualified-name / file-path / exact-match → EXTRACTED
  - framework with score >= 0.85 → EXTRACTED, else INFERRED
  - instance-method / fuzzy → INFERRED
AMBIGUOUS is not stamped from the resolver path in v1 — needs
tie-tracking from the candidate picker. Tracked as B3.

Consumers (`result-formatters.ts`):
  - `formatConfidence(edge)` — markdown italic suffix (empty for
    EXTRACTED; `*(INFERRED)*` / `*(AMBIGUOUS)*` otherwise).
  - `parseMinConfidence(raw)` — returns level / null / errorResult.
  - `filterByConfidence(rows, min)` — drop below threshold.
  - `CONFIDENCE_RANK` — numeric ordering for inline filter loops.
Wired into all four code paths in `callers.ts` (collectCallers,
collectCallersForSource, collectTypeUsers, formatGroupedCallers),
both paths in `callees.ts`, and `impact.ts` with a
`filterImpactByConfidence` BFS that prunes nodes unreachable along
the surviving edge set.

Type cleanup: `Edge.confidence` added with JSDoc; the long-dead
`Edge.provenance` field also got a JSDoc note pointing at the
SCIP-integration-reservation explanation in memory.

Reviewer-memo #4 catch (schema-version test forgetfulness): noticed
`__tests__/pr19-improvements.test.ts` was still asserting `toBe(29)`
even though migrations 030 and 031 had shipped. Catch-up to 32 in
this commit covers both the missed bumps and the new migration.

Reviewer: APPROVE, three info-only findings (type-user display gap,
framework-rule calibration table doc, catch of the recurring pr19
pattern).

Suite: 1555 / 34 / 0 (+18 new edge-confidence tests).
Eval: 11/11 passed | recall=1.00 | mrr=0.86 (within regression
budget vs 091e935 baseline).

Phase 3 done. Backlog now down to: #6 TOON (deferred), colbymchenry#11
trace_to_culprits, colbymchenry#17/colbymchenry#18 Phase 7, colbymchenry#19 streaming.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 4, 2026
…sites (colbymchenry#11)

Last Phase 1-6 backlog item. Replaces the "agent stitches
codegraph_callers + _history + _biomarkers manually for every stack
frame" loop with one tool: paste a stack trace, get a ranked list of
likely fix-site candidates with per-row "why suspected" reasons.

## Pipeline

1. `parseTrace(text)` — multi-format extractor: V8/Node, Python,
   Java/Kotlin/Scala, Go, Rust, plus a generic `path/to/file.ext:line`
   fallback. Caps at 200 frames; dedupes by `(file, line)`; normalises
   Windows backslashes; first-pattern-per-line wins.
2. `resolveTraceFile` — longest-suffix match between absolute trace
   paths and project-relative indexed paths, with a path-boundary
   check so `foo/bar.ts` doesn't match `sub-foo/bar.ts`.
3. `findEnclosingNode` (existing helper from index-hooks/enclosing.ts,
   shared with codegraph_grep) attributes each frame to its enclosing
   function/method/class via line number.
4. `scoreCulprits` composite: `1 / (topRank + 1)` for frame position
   (top of stack wins decisively), `+0.3` for risk-biomarker overlap
   (god_class / complex_method / large_method / nested_complexity),
   `+0.2` for recent file churn (touched within 30 days, with commit
   count surfaced in the reason).
5. Output: ranked candidates with per-row reasons + an "unmapped
   frames" footer (capped at 5 samples) so the agent sees what
   couldn't be attributed (`node_modules/`, generated files, etc.).

## CLI mirror

`codegraph trace-to-culprits` reads from stdin (`cat error.log |
codegraph trace-to-culprits`) or from `--trace "..."`. Routes via
runViaMCP per the repo convention.

## Reviewer cycle

Round 1 returned BLOCK + REQUEST_CHANGES + info — all addressed:
  - **R1 (BLOCK, correctness)**: `recentCutoff` was computed in ms
    but `last_touched_ts` is unix seconds — the `>=` comparison
    always evaluated false, killing the churn signal silently on
    every project. Fixed with `Math.floor(Date.now() / 1000)` and a
    new R1-regression-guard test that stamps a fresh `last_touched_ts`
    via the same `applyChurnDeltas` helper the churn miner uses, then
    asserts the rendered output mentions "recent churn".
  - **R2 (REQUEST_CHANGES, correctness)**: inline `cg.queries.db
    .prepare(...)` violated the per-domain query-file convention and
    re-prepared the same statement per culprit. Switched to
    `getFileByPath` from queries-files.ts (cached statement, returns
    FileRecord with `lastTouchedTs` and `commitCount`).
  - **R3 (info, perf)**: `resolveTraceFile` ran `getAllFiles` once
    per frame — N full-table scans for an N-frame trace. Hoisted to
    once-per-call at the top of `buildCulprits`.

## Tests

13 trace tests in `__tests__/mcp-trace-to-culprits.test.ts`:
  - parseTrace: V8 / Python / Java / Go / dedup / 200-frame cap /
    no-frames / Windows backslash normalisation
  - End-to-end: V8 trace → enclosing symbols ranked top-of-stack
    first, unmapped-frames footer, no-refs error message, limit cap,
    R1-regression guard for the churn signal

CLI dogfood: `cat trace | npm run cli:dev -- trace-to-culprits`
parses a real production-shaped V8 trace and maps frames to
`src/mcp/tools/trace-to-culprits.ts` symbols.

Suite: **1595 / 34 / 0** (+13 trace tests).
Eval: 18/18 | recall=1.00 | mrr=0.91 | within budget (re-baselined
after colbymchenry#11 added new files to the corpus, which shifted the
explore-pipeline case rank — case still PASSes its 0.5 threshold,
the relative regression was corpus drift, not behaviour change).

## Backlog after this

Phase 1-6 complete. Remaining: Phase 7 (colbymchenry#17 propose_extract /
propose_rename, colbymchenry#18 plan-and-execute CLI — both need user check),
Phase 8 (colbymchenry#19 streaming MCP), and #6 TOON (deferred pending
measurement).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 4, 2026
Backlog #6 proposed adopting TOON (Token-Oriented Object Notation,
header-once / rows-as-tuples) as a smaller alternative to the
current markdown formatters for tabular MCP responses. The backlog
explicitly gated the decision: "Verify actual savings on captured
queries before flipping default — the 30-60% claim is for ideal
tabular data."

This commit ships the measurement and the answer.

## Result

Eight representative queries against this project's own .codegraph,
covering search (long signatures), suggest (short rows), callers
(20-row mixed-confidence list):

  sample                                  rows  md(B)  toon(B)  saving
  ──────────────────────────────────────  ────  ─────  ───────  ──────
  search "CodeGraph"                        10    683      661  +3.2%
  search "extractFromSource"                10   1483     1444  +2.6%
  search "compareToRef"                      3    461      468  -1.5%
  search "handleSearch"                      6    760      757  +0.4%
  search "parseTrace"                        2    291      298  -2.4%
  suggest "CodGrap"                         10    408      409  -0.2%
  suggest "extracFromSorce"                 10    386      387  -0.3%
  callers of extractFromSource              14    936     1037  -10.8%

  TOTAL: md=5408B  toon=5461B  aggregate saving -1.0%

## Why the 30-60% claim doesn't apply

TOON's win comes from compressing a verbose JSON baseline like
`[{"name":"foo","kind":"function","file":"a.ts","line":42},…]` —
the "headers repeated per row" + JSON quoting waste is what its
header-once shape removes.

But our markdown is ALREADY row-shaped:
  - `### foo (function)` ≈ `foo,function` (no quoting).
  - `a.ts:42` ≈ `a.ts,42` (single-char delimiter, both compact).
  - `- name (kind) - file:line` is shorter than the comma-tuple form
    when names are short.

There's no fat to trim. On callers (the densest row shape) TOON is
10% LARGER because the per-row `- ` bullet syntax is a one-byte
overhead while TOON's commas waste two chars between every field.

## Decision

Skip TOON. Empirical answer matches the backlog's gate — the
savings aren't there for our markdown baseline. Adopting it would:
  - Add per-tool format selector code + tests.
  - Risk LLM-client misrender (TOON is new; some clients haven't
    trained on it).
  - Net zero to net negative on payload bytes.

The measurement script (`__tests__/evaluation/toon-measure.ts`)
stays in the repo as a permanent record + a re-runnable harness if
the markdown formatters ever get verbose enough to flip the math.
Run with `npx tsx __tests__/evaluation/toon-measure.ts`.

## Backlog

#6 closed. Phase 1-6 of the agentic backlog is now fully resolved
(every item shipped or empirically dismissed). Remaining: Phase 7
(colbymchenry#17 propose_extract / propose_rename, colbymchenry#18 plan-and-execute CLI),
Phase 8 (colbymchenry#19 streaming MCP) — all need user check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 4, 2026
Half of colbymchenry#17 (the rename half; extract was descoped per user check —
needs per-language scope analysis that doesn't pay back enough on a
multi-language index). Pure query wrapper that uses the
edge-confidence column from colbymchenry#8 + B8 to give the agent a
ready-to-execute rename plan.

## What it returns

Three sections, header counts:

1. **Definitions** — every node matching `symbol` (handles overloads
   / re-exports). The site(s) where the rename starts.

2. **Call sites (graph edges)** — every incoming edge from callers /
   type-users / instantiators. Grouped by `edge.confidence`:
     - **EXTRACTED** *concrete; safe to rename mechanically* —
       imports / qualified-name / file-path resolution. Apply with
       sed / Edit replace_all.
     - **INFERRED** *heuristic; glance before applying* —
       same-name match, fuzzy dispatch, framework rule below the
       0.85 confidence threshold.
     - **AMBIGUOUS** *ambiguous; verify the target before renaming*
       — picker had a near-tied runner-up (B8 tieMargin < 0.05).
   Sites are sorted EXTRACTED-first, so `limit` truncation drops
   the lowest-confidence rows last.

3. **Textual mentions** — `\bname\b` regex over the indexed file
   set, attributed to enclosing function / method / class /
   interface. Word-boundary so `id` doesn't match `userId`. Doc
   comments / JSDoc / string literals — surfaces where the symbol
   is mentioned NON-graph (no edge), so the agent can review case-
   by-case (not every textual match is the same symbol).

De-dups: textual hits that ALSO appear as graph call sites are
filtered out — graph edges are the higher-quality signal, no
reason to surface the same line twice.

## Validation (warnings, not blocks)

  - newName checked for valid-identifier shape (Unicode-letter +
    digit + underscore + $; rejects whitespace / hyphens).
  - newName checked for collision with existing indexed symbols
    (would shadow / break).

Both surface as `### Warnings` at the top of the report. The agent
decides whether to proceed.

## Why a separate tool when codegraph_callers + codegraph_grep
exist

Three reasons:

1. **Confidence-stamped grouping** — `codegraph_callers` returns a
   flat list with the confidence suffix per row. Rename planning
   needs the EXTRACTED rows isolated for mechanical-safe edits;
   ad-hoc partitioning by the agent is error-prone.

2. **Doc-mention attribution to enclosing symbol + dedup against
   call sites** — `codegraph_grep` doesn't dedup against the call
   graph. The agent doing this manually would re-render the same
   import line in both surfaces.

3. **Validation up front** — collision + identifier checks happen
   once, not per-call-site. Saves the agent a round trip.

## CLI mirror

`codegraph propose-rename <symbol> <newName>` routes via
runViaMCP, supports `--limit` and `--doc-limit`. Dogfooded against
the live repo: `extractFromSource → extractFromCode` surfaces
14 call sites (all EXTRACTED) + 29 textual mentions, plus a
collision warning on the existing `extractFromCode` if any.

## Tests

9 cases in `__tests__/mcp-propose-rename.test.ts`:
  - Three-section render with header counts
  - EXTRACTED-first grouping
  - Collision warning fires
  - Invalid-identifier warning fires
  - Same-name rename → errorResult
  - Unknown-symbol → notFound
  - docLimit=0 skips textual scan + section
  - Textual mentions don't double-count graph call sites
  - limit caps the call-sites section

Suite: **1608 / 34 / 0** (+9 propose-rename tests).

## Backlog after this

#17a (this) shipped. #17b (propose_extract) deferred — needs per-
language scope analysis (free-variable detection, parameter
inference). User-checked decision: not worth the cross-language
risk. colbymchenry#18 (plan-and-execute CLI) deferred — the macro infra from
colbymchenry#13 already covers the structural part; the LLM-planning piece
is arguably the calling agent's job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 8, 2026
…y#18, colbymchenry#19)

Two friction items uncovered in the round-2 stress test, both
about the boundary between long-running MCP servers and ad-hoc
fresh-code CLI processes acting on the same per-project DB.

## colbymchenry#19 — cli:dev silently migrated queried projects (severe)

Reproduced live: a single `cli:dev at-range ... -p ollama-source`
ran forward migrations 33→35 on ollama silently. The running MCP
server (started at v32 binary) then couldn't access ollama either
— the schema-guard returned "stale code, restart" on every tool
call. **A read-style CLI invocation effectively bricked MCP
access to the queried project.**

Fix: `OpenOptions.autoMigrate` opt-in (default false). When the
on-disk schema is older than the binary's CURRENT_SCHEMA_VERSION
and autoMigrate is unset, `DatabaseConnection.open` throws with
explicit recovery commands (`admin migrate` / `admin sync` /
`admin index --force`) AND the post-migrate "restart your MCP
server" caveat. Newer-than-binary DBs always fail (B4 silent-
corruption path).

Write entry points opt in to autoMigrate=true — admin
sync/index/migrate, summarize, embed, classify, the MCP server's
default-project open, the cross-project cache, and `runViaMCP`'s
shim (which derives autoMigrate from the tool's `isWriteTool`
flag, automatically gating per-tool without per-command edits).

New `codegraph_admin({action: 'migrate'})` MCP action +
`codegraph admin migrate` CLI command: cheapest recovery path.
The CLI does a two-phase open (default → autoMigrate=true on
throw) so it can distinguish "already current" from "migrated
this run".

## colbymchenry#18 — MCP server's tool registry frozen at startup

Repro: a `codegraph_at_range` call (added in commit 50a9ebf,
after the running server's start) returned `Error: No such tool
available`. ES modules cache; can't hot-reload without restart.

Fix: cannot make new tools live without restart. Added VISIBILITY
in `codegraph_status` — a new "Tool registry drift" section
fires when the on-disk count of `_TOOL: ToolModule`-exporting
files exceeds the loaded count. Content-sniff filter (regex on
file body) keeps the count accurate as new helper modules land
without touching the status code. Suppressed when in sync.
Status keeps `bypassSchemaGuard: true` so this signal stays
reachable when other tools are blocked — exactly when the agent
needs it most.

## Reviewer findings (3 of 3 addressed before commit)

- `isToolFile` was missing 5 helper files (env-refs, explore-
  budget, result-formatters, sql-refs, symbol-resolver) that
  passed the filename filter but didn't have `_TOOL` exports —
  permanent false-positive drift warning. Switched to content
  sniff via the regex `/export const X_TOOL: ToolModule/`.
- `retryInitIfNeeded` at src/mcp/index.ts:480 was missing
  `autoMigrate: true`, so a stale-schema retry would silently
  fail and leave the server permanently without a default project.
  Now matches `tryInitializeDefault`'s policy.
- `handleMigrate`'s "Migrations applied" message was misleading
  for already-current DBs (the project-cache opens with
  autoMigrate=true, so by handler time migrations are already
  done). Reworked to state the resulting version neutrally with
  an "already current / behind by N" qualifier.

Suite 1729/34/0 (+1 gating test in foundation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 9, 2026
…ry#18)

Closes friction colbymchenry#18 — unused_export FPs caused by the resolved-then-
pruned-from-unresolved_refs lifecycle in incremental sync.

ROOT CAUSE (verified live via codegraph_sql against this repo's
.codegraph/index.db on `2b057d9`):

When file F is re-extracted on incremental sync:
  1. F's nodes are deleted; cascade DELETE wipes edges where
     target was in F.
  2. F's new nodes are inserted with fresh IDs (generateNodeId
     hashes start_line, so a line shift rotates the ID).
  3. Pass A re-resolves unresolved_refs FROM F.
  4. Pass B re-resolves refs IN OTHER FILES naming a symbol
     DEFINED in F.

  GAP: Pass B finds nothing because the original unresolved_refs
  from those other files were already deleted on first-pass
  resolveAndPersist (resolution/index.ts:718). The cascaded edges
  have no record left to re-resolve.

  Result on this repo: src/mcp/tools/registry.ts (last edited
  yesterday) had 35/44 tool-import references edges intact, but 9
  missing — exactly the files I edited TODAY in the colbymchenry#63 sweep.
  unused_export biomarker fired on those 9 (HOTSPOTS_TOOL /
  NODE_TOOL / AT_RANGE_TOOL / ASK_TOOL / BLAME_TOOL / DEPS_TOOL /
  HISTORY_TOOL / SQL_TOOL / TESTS_FOR_TOOL).

THE FIX:

`reconstructCrossFileRefsToFile(qb, filePath)` reverse-engineers
unresolved_refs from existing cross-file edges that target nodes
in `filePath`, BEFORE the cascade-delete wipes them. Filters on
RESOLVABLE_EDGE_KINDS = [calls, references, type_of, returns,
instantiates, extends, implements, overrides, field_access,
decorates] — the kinds that come from unresolved_refs. INNER JOIN
on n_source guarantees the FK constraint on
unresolved_refs.from_node_id holds.

Hooked at 3 sites in extraction/index.ts:
  - eoCommitFileResult (modified-file path, already in qbTransaction)
  - eoApplySyncedGitChanges (git-deleted path, now wrapped)
  - eoApplySyncedFullScan (non-git fallback deleted path, now
    wrapped)

LIMITATIONS:
  - Reconstructed refs always have siteCount=1 and no extraLines
    (edges only carry the primary line/col). Resolution succeeds
    (only fromNodeId+name+kind needed for rebinding); site-count
    biomarker fidelity is silently reduced after a re-extraction
    round-trip. Acceptable trade — fixes 17+ FPs at the cost of
    minor metadata.

REGRESSION TEST:
  - __tests__/sync.test.ts: target.ts exports TARGET_SYMBOL,
    consumer.ts imports it. Modify target.ts so the symbol shifts
    to a new line (changes its node ID via generateNodeId hash).
    Verify post-sync the references edge points at the NEW node.
    Test asserts targetBefore.id !== targetAfter.id to guard the
    premise.

VERIFICATION:
  - npm run typecheck: clean
  - npm test: 2014 / 0 / 34 (was 2013; +1 regression test)
  - EXPLAIN QUERY PLAN: idx_nodes_file_line + idx_edges_target_kind
    (O(log N) per file)

Reviewer round 1 REQUEST_CHANGES on 4 issues (test premise guard,
JSDoc limitations, transaction atomicity, FK safety); round 2
APPROVE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants