feat: Search query utilities + multi-signal scoring by MO2k4 · Pull Request #26 · colbymchenry/codegraph

MO2k4 · 2026-02-10T10:41:28Z

Summary

New src/search/query-utils.ts with search term extraction, path relevance scoring, kind bonuses, API intent detection
Multi-signal scoring in searchNodes() combining FTS/LIKE score with kind bonus and path relevance
Improved FTS query sanitization to strip special chars and boolean operators
Comprehensive test suite for all query utility functions

Files changed

src/search/query-utils.ts — NEW: search utilities (extractSearchTerms, scorePathRelevance, kindBonus, detectApiIntent, inferRouteDirectories)
src/db/queries.ts — Add scoring import, multi-signal scoring in searchNodes, FTS sanitization
__tests__/search.test.ts — NEW: 29 search tests

Test plan

npm run build compiles without errors
npm test - no new failures
New search tests all pass (29/29)

- Add src/search/query-utils.ts with extractSearchTerms, scorePathRelevance, kindBonus, detectApiIntent, inferRouteDirectories - Add multi-signal scoring to searchNodes (kind bonus + path relevance) - Improve FTS query sanitization (strip :^ chars, filter boolean operators) - Add comprehensive search tests

Both functions have zero callers — dead code on arrival. Remove them and their tests (9 tests) to keep the module focused on what's actually used: search term extraction, path relevance scoring, and kind bonuses.

colbymchenry · 2026-02-10T22:13:54Z

@MO2k4 Thanks for the contribution! The multi-signal scoring and FTS sanitization improvements are solid additions.

I pushed a cleanup commit (a15ad69) before merging that removes unused code:

Removed detectApiIntent() — zero callers anywhere in the codebase
Removed inferRouteDirectories() — zero callers anywhere in the codebase
Removed 9 associated tests for the above two functions

Everything else from your PR is kept as-is: extractSearchTerms, scorePathRelevance, kindBonus, STOP_WORDS, the multi-signal scoring in searchNodes(), and the FTS5 sanitization fix (stripping :^ and boolean operators). All 20 remaining tests pass.

…colbymchenry#29 won't-do) Builds out the eval harness so the future ranking arc (B colbymchenry#19, Aider-style TF-IDF + PageRank) can be measured. Pre-PR the harness ran 12 Elasticsearch cases and produced JSON reports — but had no comparison mode and no self-codebase cases, so a developer had to manually diff JSONs and check out a separate big codebase to validate any ranking change. What this adds -------------- - __tests__/evaluation/compare.ts: pure module + CLI. compareReports returns per-case + summary delta; formatComparison renders the human table; standalone CLI exits non-zero on regression. Budget: >0.10 per-case recall drop OR >0.05 mean recall drop = fail. - __tests__/evaluation/cases-self.ts: 11 cases targeting THIS repo's own indexed symbols (CodeGraph, searchNodes, ToolModule, compareToRef, ExtractionOrchestrator, etc.). Lets developers iterate on ranking without an external codebase. - runner.ts: argv parser with --cases self|elasticsearch and --compare <baseline.json>; env-var fallbacks (EVAL_CODEBASE, EVAL_CASES, EVAL_COMPARE). On --compare, the runner re-loads the baseline + invokes compareReports + prints formatComparison + exits non-zero on regression beyond budget. - npm run eval:self script. - .gitignore: __tests__/evaluation/results/. What I had to fix mid-implementation - runner.ts referenced `cg.findRelevantContext` (not on CodeGraph; it's on cg.contextBuilder). Fixed. - runner.ts used `__dirname` (ESM-incompatible). Switched to `import.meta.dirname`. Reviewer pass — caught + fixed - scoring.ts MRR was buggy for multi-symbol cases: iterated over expectedSymbols in expected-array order and recorded "the rank of the first one found in that iteration", not the BEST rank across all found symbols. Standard MRR is reciprocal of the highest-ranked relevant result. Renamed `firstRank` → `bestRank` and changed the update to `Math.min`-style semantics. - runner.ts argv parser silently dropped --compare's value when the flag was the last token. Now errors with exit code 2. - self-explore-compare-to-ref case had `getLineRangeHistory` as an expected symbol — that function lives in the blame tool, not compare-to-ref. Replaced with FileDelta (real compare-to-ref type). - Runner's meanMRR filter used `startsWith('search-')` but self cases use `self-search-*` prefix; the filter dropped them silently. Now matches `'-search-'` segment. Ran on this repo: 9/11 cases pass (was 8/11 before the case fix); mean recall 0.79; the 2 remaining failures (extraction-pipeline, search-cascade) are real recall gaps the future ranking arc is intended to close — confirmed by reviewer that the expected symbols exist at the cited file:line. Also: B colbymchenry#29 (upstream PRs to colbymchenry/codegraph) struck as won't-do per user — keeping changes in this fork. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1392/13/0 (eval is run-on-demand, not a vitest test). - npm run eval:self — ran end-to-end, saved JSON, --compare against itself shows zero deltas + within-budget verdict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y#19) Two ranking changes in findRelevantContext, gated by the eval harness shipped in B colbymchenry#26: 1. Common-term dampener (one-sided IDF) in runMultiTermTextSearch. Each per-term search-result score gets multiplied by `1 - DAMPEN_RATE * (numHits / fetchLimit)`. With DAMPEN_RATE=0.5: a term that saturates the fetch cap (16/16) gets 0.5×; mid-saturation (8/16) gets 0.75×; rare hits (1/16) get effectively 1.0×. Counters the cross-call score-summing problem where common tokens ("nodes", "parse" in this repo) drowned out rare ones ("extraction") because each independent per-term search only knows BM25-IDF within ITS own results, not across calls. 2. PageRank centrality boost in collectAndScoreCandidates. `score *= 1 + γ * sqrt(centrality)` with γ=5. Pulls high-centrality hubs (ExtractionOrchestrator, etc.) into entry-point selection. Sqrt-smooths the long tail so leaves barely move while hubs at the top centrality (~0.12 in this repo) get +173%. Eval results ------------ self-eval suite (11 cases): mean recall 0.79 → 0.91 (+0.121), pass 9/11 → 10/11. Zero per-case regressions. Two cases gained: - self-explore-extraction-pipeline: 0.00 → 1.00 - self-explore-search-cascade: 0.67 → 1.00 The gate from B colbymchenry#26 (compare.ts) was load-bearing — multiple intermediate iterations were rejected: - bidirectional IDF (`log(1 + cap/hits)`): regressed self-explore-compare-to-ref by 0.33. Rolled back. - cliff dampener (saturated→0.5×, else→1.0×, no centrality change): flipped extraction-pipeline +1.00 but regressed biomarker-engine -0.33. Rejected. - smooth dampener at DAMPEN_RATE=0.4: zero movement either direction. - DAMPEN_RATE=0.5 with γ=2 centrality (existing): same trade as cliff. Rejected. - DAMPEN_RATE=0.5 with γ=5 centrality: biomarker-engine recovered (centrality boost on its hub symbols compensated for the dampened common-term scores). Two wins held. Gate passed. Reviewer pass — docstring rot fixed - CENTRALITY_BOOST_WEIGHT JSDoc still showed γ=2 arithmetic examples after I bumped the constant to 5. Updated to the correct +173% / +50% / +16% percentages. - DAMPEN_RATE inline comment showed numbers from a superseded DAMPEN_RATE=0.4 iteration. Updated to the shipped 0.5 values. - Both flagged via memo scrutiny-area #1 (docstring rotting after a behavior change). Memo workflow paying off — fifth consecutive review where memo content was load-bearing. Verification - npm run typecheck (tsgo) — clean. - npx vitest run — 1392/13/0 (unchanged; ranking changes measured via eval, not vitest). - npx tsx __tests__/evaluation/runner.ts . --cases self --compare __tests__/evaluation/baseline-self.json — within budget, +0.121 mean recall, +1 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…polish items Eight friction-tracker items addressed in parallel by sub-agents (2 Haiku, 1 Sonnet); reviewer caught one real correctness edge case (bucket overlap on degenerate fresh-index shapes) plus two info items, all addressed in this commit. ## colbymchenry#21 — at_range cost-benefit JSDoc Doc-only update to src/mcp/tools/at-range.ts. Tool description and JSDoc now state "pays off most on dense files (100+ symbols) and multi-range bulk lookups; for tiny preview fetches on small files, raw `head -N` is comparable." No code change. ## colbymchenry#25 — blame surfaces rename detection inline src/git-utils.ts gains a new helper `getFileFollowEarliestTs` that runs `git log --follow --format=%aI -- <path>` (5 s timeout, ISO timestamp). src/mcp/tools/blame.ts compares the rename-aware oldest commit against the line-range-only timeline's oldest. When `--follow` reaches further back, appends a warning that the timeline truncated at the file's rename and points at `git log --follow <file>` for the full history. Edge cases handled: not-a-git-repo, timeout, empty timeline. Test approach uses `vi.spyOn` to mock pre-rename history because real fixtures are unreliable: modern git's `git log -L` follows renames via content-similarity tracking, making a deterministic black-box rename-fixture impossible. ## colbymchenry#26 — hotspots split into 3 mutually-exclusive categories src/db/queries-history.ts gains `getCategorizedHotspots` and src/mcp/tools/hotspots.ts gains a `category: 'risk' | 'maintenance' | 'brittle' | 'all'` arg (default 'risk' for backward compat). Thresholds use 75/25 percentile rather than hardcoded magic numbers — they adapt as the project grows. Buckets: - risk : high centrality AND high churn — where bugs hide - maintenance : high churn AND not-high centrality — refactor target - brittle : high centrality AND not-high churn — stable critical Reviewer-caught correctness bug: original filters used `<= low` for the secondary axis, which collapsed buckets when high == low (fresh index where centrality is uniformly zero, or repos where every file has identical churn). A file at the threshold could appear in both risk AND maintenance simultaneously. Fixed by switching maintenance and brittle to `< highThreshold`, making them strictly disjoint even on degenerate inputs. Also added a more-hint when any section hit the per-category cap (the existing `category='risk'` path already had this; `category='all'` now mirrors). New `__tests__/hotspots.test.ts` (4 cases) covers all-section rendering, single-category dispatch, and the backward-compat default path. ## colbymchenry#27 — search centrality:high differentiates "hook hasn't run" vs "no node met the threshold" src/mcp/tools/search.ts. `probeCentralityFilterCulprit` now runs a sub-millisecond probe `SELECT 1 FROM nodes WHERE centrality IS NOT NULL LIMIT 1` (uses the existing `idx_nodes_centrality` index). When ALL nodes have NULL centrality the agent gets the existing "centrality hook hasn't run — run codegraph index" hint. When SOME nodes have centrality but none cleared the filter, a different hint suggests relaxing the threshold. Two-case hint instead of one. ## colbymchenry#28 — search exact promotes multi-token-query warning to pre-result src/mcp/tools/search.ts. `buildConceptHintIfNeeded` now returns `{ preResult, postResult }` instead of a single string. When the query splits into 2+ space-separated non-qualified tokens (likely "multiple symbol names"), the agent gets a leading hint to call search per name OR use codegraph_explore — BEFORE the result list rather than buried after. Field-qualified tokens (`kind:function lang:typescript`) and single-free-token queries are unchanged. ## colbymchenry#33 — callers on "constructor" with no callers explains the instantiates-edge model src/mcp/tools/callers.ts. When the resolved symbol is `kind=method && name=constructor` AND the callers list is empty, appends a one-line note: "constructors are invoked via `new ClassName(...)`, which graph-edges as `instantiates` on the parent class. To find construction sites, run codegraph_callers on the enclosing class instead of 'constructor'." Both the multi-match and single-match paths got the note (guarded by the same kind+name+empty check). Constructors WITH callers (e.g. via super()) render normally — no false positive. ## colbymchenry#35 — node.symbol tie-break prefers non-fixture, then centrality src/mcp/tools/symbol-resolver.ts. `pickFromMultipleExactMatches` now filters out fixture paths first (falls back to all-fixture when that's all that matches), then sorts by centrality DESC (NULL → 0). A `helper` symbol that exists in both `src/core.ts` and `docs/test-beds/fixture.ts` resolves to `src/core.ts` as the displayed primary. Tier #3 (last_touched_ts) deferred — data not in the resolver's existing query. Reviewer-caught DRY issue: the fixture-path regex set was duplicated between symbol-resolver.ts and dead-code.ts (introduced by parallel sub-agents on the same brief). Extracted to `isFixturePath` in src/mcp/tools/shared.ts; both consumers now import the single source. ## colbymchenry#49 — getSummaryCoverage denominator threading (3 call sites) src/bin/codegraph.ts (lines 348, 1461) + src/mcp/tools/status.ts (line 440). All three pass `SUMMARIZABLE_KINDS` to getSummaryCoverage to match the canonical pattern from the previously-fixed _search-intent.ts:218. Without this, the helper falls back to COUNT(*) which inflates the denominator with parameters / imports / file nodes — its own JSDoc explicitly warns against this. ## Test re-additions Sub-agent #1 deleted its own test files for colbymchenry#33 and colbymchenry#35 (a brief misread — "DO NOT commit" was interpreted as "DO NOT leave tests in repo"). Re-added as `__tests__/mcp-callers-constructor-and-fixture-tiebreak.test.ts` covering: constructor-with-no-callers note appears, non-constructor-method note absent, name-collision picks non-fixture primary. ## Verification - 15 modified files + 2 new test files, +619/-55 - npm run typecheck — clean - 74/74 tests pass across 9 LLM/search/hotspots-related test files - New exports: `isFixturePath` (shared.ts), `getCategorizedHotspots` (queries-history.ts), `getFileFollowEarliestTs` (git-utils.ts) — all have concrete in-tree callers in the same diff per reviewer-memo item #7 Reviewer pass with .claude/reviewer-memo.md prepended caught: - (request_changes) bucket-exclusivity edge case → fixed - (info) isFixturePath duplication → deduped - (info) category='all' missing more-hint → added Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MO2k4 and others added 2 commits February 10, 2026 11:40

Remove unused detectApiIntent and inferRouteDirectories

a15ad69

Both functions have zero callers — dead code on arrival. Remove them and their tests (9 tests) to keep the module focused on what's actually used: search term extraction, path relevance scoring, and kind bonuses.

colbymchenry merged commit e6531c5 into colbymchenry:main Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Search query utilities + multi-signal scoring#26

feat: Search query utilities + multi-signal scoring#26
colbymchenry merged 2 commits into
colbymchenry:mainfrom
MO2k4:feat/search-query-utils

MO2k4 commented Feb 10, 2026 •

edited

Loading

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MO2k4 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Test plan

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MO2k4 commented Feb 10, 2026 •

edited

Loading