Optimize reference resolution with in-memory caches#29
Merged
Conversation
The resolving refs phase stalled on large projects (3400+ files, 38k+ nodes) because matchFuzzy loaded ALL functions/methods/classes per ref, import mappings were re-extracted per ref, and fileExists hit disk every call. Add kindCache, lowerNameCache, importMappingCache, and knownFiles set to warmCaches(). Rewrite matchFuzzy to use O(1) lowercase index lookup instead of 3x getNodesByKind scans. Cache import mappings per file. Pre-build file existence set from the index for O(1) fileExists checks.
3 tasks
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 3, 2026
…colbymchenry#29 won't-do) Builds out the eval harness so the future ranking arc (B colbymchenry#19, Aider-style TF-IDF + PageRank) can be measured. Pre-PR the harness ran 12 Elasticsearch cases and produced JSON reports — but had no comparison mode and no self-codebase cases, so a developer had to manually diff JSONs and check out a separate big codebase to validate any ranking change. What this adds -------------- - __tests__/evaluation/compare.ts: pure module + CLI. compareReports returns per-case + summary delta; formatComparison renders the human table; standalone CLI exits non-zero on regression. Budget: >0.10 per-case recall drop OR >0.05 mean recall drop = fail. - __tests__/evaluation/cases-self.ts: 11 cases targeting THIS repo's own indexed symbols (CodeGraph, searchNodes, ToolModule, compareToRef, ExtractionOrchestrator, etc.). Lets developers iterate on ranking without an external codebase. - runner.ts: argv parser with --cases self|elasticsearch and --compare <baseline.json>; env-var fallbacks (EVAL_CODEBASE, EVAL_CASES, EVAL_COMPARE). On --compare, the runner re-loads the baseline + invokes compareReports + prints formatComparison + exits non-zero on regression beyond budget. - npm run eval:self script. - .gitignore: __tests__/evaluation/results/. What I had to fix mid-implementation - runner.ts referenced `cg.findRelevantContext` (not on CodeGraph; it's on cg.contextBuilder). Fixed. - runner.ts used `__dirname` (ESM-incompatible). Switched to `import.meta.dirname`. Reviewer pass — caught + fixed - scoring.ts MRR was buggy for multi-symbol cases: iterated over expectedSymbols in expected-array order and recorded "the rank of the first one found in that iteration", not the BEST rank across all found symbols. Standard MRR is reciprocal of the highest-ranked relevant result. Renamed `firstRank` → `bestRank` and changed the update to `Math.min`-style semantics. - runner.ts argv parser silently dropped --compare's value when the flag was the last token. Now errors with exit code 2. - self-explore-compare-to-ref case had `getLineRangeHistory` as an expected symbol — that function lives in the blame tool, not compare-to-ref. Replaced with FileDelta (real compare-to-ref type). - Runner's meanMRR filter used `startsWith('search-')` but self cases use `self-search-*` prefix; the filter dropped them silently. Now matches `'-search-'` segment. Ran on this repo: 9/11 cases pass (was 8/11 before the case fix); mean recall 0.79; the 2 remaining failures (extraction-pipeline, search-cascade) are real recall gaps the future ranking arc is intended to close — confirmed by reviewer that the expected symbols exist at the cited file:line. Also: B colbymchenry#29 (upstream PRs to colbymchenry/codegraph) struck as won't-do per user — keeping changes in this fork. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1392/13/0 (eval is run-on-demand, not a vitest test). - npm run eval:self — ran end-to-end, saved JSON, --compare against itself shows zero deltas + within-budget verdict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
May 9, 2026
ReferenceResolver caches knownNames / knownFiles at first warm and reuses them across sync calls. When a sync adds or modifies files, the cached sets are stale — refs to brand-new exports were rejected by the membership pre-filter at src/resolution/index.ts:489 and sat in unresolved_refs until a full reindex. The local-import-alias path (src/resolution/index.ts:502) bypassed the membership check, so the existing pass-B test was rescued by the caller-side import statement. This change covers the bare-name case where the caller has no import line referring to the new symbol. Closes colbymchenry#16, colbymchenry#29, colbymchenry#47, colbymchenry#51 in the friction tracker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ReferenceResolver.warmCaches():kindCache,lowerNameCache,importMappingCache, andknownFilesset — all populated in a single pass over nodes during warmupmatchFuzzyto use O(1) lowercase name index lookup instead of 3xgetNodesByKindcalls that loaded ALL functions+methods+classes per ref (the primary bottleneck)resolveViaImportdoesn't re-read and re-parse the same file for every ref originating from itfileExists()is O(1) instead of hitting disk viafs.existsSyncfor every extension variantTest plan
npm run buildcompiles cleanlynpm test— all previously-passing tests still pass (12/14 resolution tests; 2 pre-existing Windows path separator failures unchanged)codegraph indexon openclaw-main (3409 files, 38,675 nodes) completes in ~10s — resolution phase no longer stalls🤖 Generated with Claude Code