feat: File nodes, arrow functions, parallel I/O#27
Merged
Conversation
- Create file-kind nodes for each parsed source file - Add isInsideClassLikeNode() for method vs function detection - Extract arrow functions and function expressions from variable declarators - Batch file I/O with FILE_IO_BATCH_SIZE=10 using Promise.all - Add symlink cycle detection with visitedDirs Set in scanDirectory - Add lazy grammar loading with exported getGrammar() function - Add indexFileWithContent() for pre-read content processing - Add tests for file nodes and arrow function extraction
Keeps the PR's visitedDirs rename and main's gitIgnoredDirs addition.
- Remove extractFunctionVariable() and its dispatch (already handled by extractVariable) - Remove dead getGrammar() export (zero callers) - Deduplicate indexFile by delegating to indexFileWithContent - Remove redundant arrow function variable extraction tests (covered by existing suite)
Owner
|
@MO2k4 Thanks for the contribution! Great additions with file nodes, I pushed a cleanup commit (
Everything else from your PR is kept as-is: file node creation, |
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
User-driven backlog cleanup before next session: - **colbymchenry#27 GraphQL `extend type` — verified NOT implemented.** I had briefly thought this was done; double-check found `src/extraction/graphql-extractor.ts:131` explicitly skips `type_system_extension` in v1 with a "needs a second resolution pass we don't do yet" comment. No fixture coverage. Backlog entry annotated to flag the verification result so a future session doesn't re-make the same mis-recall. - **colbymchenry#25 build-snapshot:** annotated with the 2026-05-03 triage numbers (~50ms upper-bound win on a ~261ms cold start; ESM + native-module fragility) but kept on the backlog per user — worth picking up later when TS 7 + ESM build-snapshot tooling matures. Also checks in `__tests__/evaluation/baseline-self.json` — the self-eval baseline captured before B colbymchenry#19 (ranking arc) shipped, referenced by the runner's `--compare` flag. Without it in-repo, every fresh checkout would have to regenerate the baseline before gating ranking changes against it. The baseline is the **pre-improvement** snapshot so its mean recall (0.79) is the floor every future ranking change must clear or stay flat against. Bump the file deliberately when a verified improvement should be the new floor — never silently overwrite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
Pre-PR, `graphql-extractor.ts` explicitly skipped `type_system_extension`
AST nodes ("intentionally skipped in v1 — merging extensions
across files needs a second resolution pass we don't do yet"),
so federation-style `extend type User { posts: [Post] }` produced
zero nodes. Post-PR, each extension emits a separate node carrying
the new fields/values plus an `extends` UnresolvedReference
targeting the base type — cross-file merging reconstructible by
walking the resolver-promoted edges.
Mapping
- `extend type X { … }` → class + extends ref
- `extend interface X { … }` → interface + extends ref
- `extend input X { … }` → class + extends ref
- `extend enum X { … }` → enum + extends ref + new enum_members
- `extend union X = …` → type_alias + extends ref + new union refs
- `extend scalar X` → unsupported by tree-sitter-graphql
0.1.0 (parses as ERROR);
defensive scaffold kept for a
future grammar bump
Per-line node-id derivation makes multi-extension cases distinct
(`extend type User` at L5 and L20 both produce nodes named `User`
of kind `class` with separate ids). Cross-file: filePath in the
id-hash makes them unique by source location. Fields / enum
values / union members go under the extension node, preserving
"this field came from this extender" provenance.
Known same-file edge case
If a base definition and its extension live in the SAME file, the
existing `findBestMatch` line-proximity may pick the extension's
own node (distance 0) over the base definition (distance > 0),
producing a self-referential extends edge. Federation patterns
put base + extension in different files, which is what this
targets. Documented in `pushExtendsRef` JSDoc as a future
resolver-pass filter target.
Files
- src/extraction/graphql-extractor.ts: visitDefinition routes to
the new `visitTypeSystemExtension` dispatcher; 6 emit*Extension
methods reuse `emitFieldsOf` and the new `pushExtendsRef`
helper. Class-level docstring mapping table updated to cover
the extension forms (memo scrutiny-area #1 catch by reviewer).
- __tests__/graphql-extend-type.test.ts: 3 new cases (5 kinds
end-to-end, signature distinction, type_of refs).
- __tests__/extraction.test.ts: one existing test flipped from
"extend type silently produces zero nodes (v1 out-of-scope)" to
"extension node + extends ref emitted".
- docs/test-beds/graphql/fixture.graphql: full schema fixture
covering definitions and all 5 supported extension forms;
auto-discovered by the language-coverage harness.
Verification
- npm run typecheck (tsgo) — clean.
- npx vitest run — 1374 / 34 / 0 (was 1371; +3 new + 1 flipped).
- E2E probe on a multi-kind extend fixture: 5 extension nodes,
5 extends refs, fields under the right parent, 0 errors.
Reviewer pass — eighth memo-load-bearing review this session:
- Class-level mapping table missing the extension rows (memo
scrutiny-area #1 docstring rotting). Added.
- Same-file self-resolve edge case noted as future resolver
filter target.
- emitScalarExtension's unreachable status confirmed adequate
per its existing JSDoc (memo scrutiny-area #7 doesn't apply
to private methods).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 9, 2026
…polish items Eight friction-tracker items addressed in parallel by sub-agents (2 Haiku, 1 Sonnet); reviewer caught one real correctness edge case (bucket overlap on degenerate fresh-index shapes) plus two info items, all addressed in this commit. ## colbymchenry#21 — at_range cost-benefit JSDoc Doc-only update to src/mcp/tools/at-range.ts. Tool description and JSDoc now state "pays off most on dense files (100+ symbols) and multi-range bulk lookups; for tiny preview fetches on small files, raw `head -N` is comparable." No code change. ## colbymchenry#25 — blame surfaces rename detection inline src/git-utils.ts gains a new helper `getFileFollowEarliestTs` that runs `git log --follow --format=%aI -- <path>` (5 s timeout, ISO timestamp). src/mcp/tools/blame.ts compares the rename-aware oldest commit against the line-range-only timeline's oldest. When `--follow` reaches further back, appends a warning that the timeline truncated at the file's rename and points at `git log --follow <file>` for the full history. Edge cases handled: not-a-git-repo, timeout, empty timeline. Test approach uses `vi.spyOn` to mock pre-rename history because real fixtures are unreliable: modern git's `git log -L` follows renames via content-similarity tracking, making a deterministic black-box rename-fixture impossible. ## colbymchenry#26 — hotspots split into 3 mutually-exclusive categories src/db/queries-history.ts gains `getCategorizedHotspots` and src/mcp/tools/hotspots.ts gains a `category: 'risk' | 'maintenance' | 'brittle' | 'all'` arg (default 'risk' for backward compat). Thresholds use 75/25 percentile rather than hardcoded magic numbers — they adapt as the project grows. Buckets: - risk : high centrality AND high churn — where bugs hide - maintenance : high churn AND not-high centrality — refactor target - brittle : high centrality AND not-high churn — stable critical Reviewer-caught correctness bug: original filters used `<= low` for the secondary axis, which collapsed buckets when high == low (fresh index where centrality is uniformly zero, or repos where every file has identical churn). A file at the threshold could appear in both risk AND maintenance simultaneously. Fixed by switching maintenance and brittle to `< highThreshold`, making them strictly disjoint even on degenerate inputs. Also added a more-hint when any section hit the per-category cap (the existing `category='risk'` path already had this; `category='all'` now mirrors). New `__tests__/hotspots.test.ts` (4 cases) covers all-section rendering, single-category dispatch, and the backward-compat default path. ## colbymchenry#27 — search centrality:high differentiates "hook hasn't run" vs "no node met the threshold" src/mcp/tools/search.ts. `probeCentralityFilterCulprit` now runs a sub-millisecond probe `SELECT 1 FROM nodes WHERE centrality IS NOT NULL LIMIT 1` (uses the existing `idx_nodes_centrality` index). When ALL nodes have NULL centrality the agent gets the existing "centrality hook hasn't run — run codegraph index" hint. When SOME nodes have centrality but none cleared the filter, a different hint suggests relaxing the threshold. Two-case hint instead of one. ## colbymchenry#28 — search exact promotes multi-token-query warning to pre-result src/mcp/tools/search.ts. `buildConceptHintIfNeeded` now returns `{ preResult, postResult }` instead of a single string. When the query splits into 2+ space-separated non-qualified tokens (likely "multiple symbol names"), the agent gets a leading hint to call search per name OR use codegraph_explore — BEFORE the result list rather than buried after. Field-qualified tokens (`kind:function lang:typescript`) and single-free-token queries are unchanged. ## colbymchenry#33 — callers on "constructor" with no callers explains the instantiates-edge model src/mcp/tools/callers.ts. When the resolved symbol is `kind=method && name=constructor` AND the callers list is empty, appends a one-line note: "constructors are invoked via `new ClassName(...)`, which graph-edges as `instantiates` on the parent class. To find construction sites, run codegraph_callers on the enclosing class instead of 'constructor'." Both the multi-match and single-match paths got the note (guarded by the same kind+name+empty check). Constructors WITH callers (e.g. via super()) render normally — no false positive. ## colbymchenry#35 — node.symbol tie-break prefers non-fixture, then centrality src/mcp/tools/symbol-resolver.ts. `pickFromMultipleExactMatches` now filters out fixture paths first (falls back to all-fixture when that's all that matches), then sorts by centrality DESC (NULL → 0). A `helper` symbol that exists in both `src/core.ts` and `docs/test-beds/fixture.ts` resolves to `src/core.ts` as the displayed primary. Tier #3 (last_touched_ts) deferred — data not in the resolver's existing query. Reviewer-caught DRY issue: the fixture-path regex set was duplicated between symbol-resolver.ts and dead-code.ts (introduced by parallel sub-agents on the same brief). Extracted to `isFixturePath` in src/mcp/tools/shared.ts; both consumers now import the single source. ## colbymchenry#49 — getSummaryCoverage denominator threading (3 call sites) src/bin/codegraph.ts (lines 348, 1461) + src/mcp/tools/status.ts (line 440). All three pass `SUMMARIZABLE_KINDS` to getSummaryCoverage to match the canonical pattern from the previously-fixed _search-intent.ts:218. Without this, the helper falls back to COUNT(*) which inflates the denominator with parameters / imports / file nodes — its own JSDoc explicitly warns against this. ## Test re-additions Sub-agent #1 deleted its own test files for colbymchenry#33 and colbymchenry#35 (a brief misread — "DO NOT commit" was interpreted as "DO NOT leave tests in repo"). Re-added as `__tests__/mcp-callers-constructor-and-fixture-tiebreak.test.ts` covering: constructor-with-no-callers note appears, non-constructor-method note absent, name-collision picks non-fixture primary. ## Verification - 15 modified files + 2 new test files, +619/-55 - npm run typecheck — clean - 74/74 tests pass across 9 LLM/search/hotspots-related test files - New exports: `isFixturePath` (shared.ts), `getCategorizedHotspots` (queries-history.ts), `getFileFollowEarliestTs` (git-utils.ts) — all have concrete in-tree callers in the same diff per reviewer-memo item #7 Reviewer pass with .claude/reviewer-memo.md prepended caught: - (request_changes) bucket-exclusivity edge case → fixed - (info) isFixturePath duplication → deduped - (info) category='all' missing more-hint → added Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
filekind nodes for each parsed source fileisInsideClassLikeNode()helper for method vs function detectionFILE_IO_BATCH_SIZE=10usingPromise.allFiles changed
src/extraction/tree-sitter.ts— File nodes, arrow functions, isInsideClassLikeNodesrc/extraction/index.ts— Parallel I/O batching, symlink cycle detectionsrc/extraction/grammars.ts— Lazy grammar loading__tests__/extraction.test.ts— Tests for file nodes and arrow functionsTest plan