Skip to content

feat: Search query utilities + multi-signal scoring#26

Merged
colbymchenry merged 2 commits into
colbymchenry:mainfrom
MO2k4:feat/search-query-utils
Feb 10, 2026
Merged

feat: Search query utilities + multi-signal scoring#26
colbymchenry merged 2 commits into
colbymchenry:mainfrom
MO2k4:feat/search-query-utils

Conversation

@MO2k4
Copy link
Copy Markdown
Contributor

@MO2k4 MO2k4 commented Feb 10, 2026

Summary

  • New src/search/query-utils.ts with search term extraction, path relevance scoring, kind bonuses, API intent detection
  • Multi-signal scoring in searchNodes() combining FTS/LIKE score with kind bonus and path relevance
  • Improved FTS query sanitization to strip special chars and boolean operators
  • Comprehensive test suite for all query utility functions

Files changed

  • src/search/query-utils.ts — NEW: search utilities (extractSearchTerms, scorePathRelevance, kindBonus, detectApiIntent, inferRouteDirectories)
  • src/db/queries.ts — Add scoring import, multi-signal scoring in searchNodes, FTS sanitization
  • __tests__/search.test.ts — NEW: 29 search tests

Test plan

  • npm run build compiles without errors
  • npm test - no new failures
  • New search tests all pass (29/29)

MO2k4 and others added 2 commits February 10, 2026 11:40
- Add src/search/query-utils.ts with extractSearchTerms, scorePathRelevance,
  kindBonus, detectApiIntent, inferRouteDirectories
- Add multi-signal scoring to searchNodes (kind bonus + path relevance)
- Improve FTS query sanitization (strip :^ chars, filter boolean operators)
- Add comprehensive search tests
Both functions have zero callers — dead code on arrival. Remove them
and their tests (9 tests) to keep the module focused on what's
actually used: search term extraction, path relevance scoring, and
kind bonuses.
@colbymchenry
Copy link
Copy Markdown
Owner

@MO2k4 Thanks for the contribution! The multi-signal scoring and FTS sanitization improvements are solid additions.

I pushed a cleanup commit (a15ad69) before merging that removes unused code:

  1. Removed detectApiIntent() — zero callers anywhere in the codebase
  2. Removed inferRouteDirectories() — zero callers anywhere in the codebase
  3. Removed 9 associated tests for the above two functions

Everything else from your PR is kept as-is: extractSearchTerms, scorePathRelevance, kindBonus, STOP_WORDS, the multi-signal scoring in searchNodes(), and the FTS5 sanitization fix (stripping :^ and boolean operators). All 20 remaining tests pass.

@colbymchenry colbymchenry merged commit e6531c5 into colbymchenry:main Feb 10, 2026
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
…colbymchenry#29 won't-do)

Builds out the eval harness so the future ranking arc (B colbymchenry#19,
Aider-style TF-IDF + PageRank) can be measured. Pre-PR the harness
ran 12 Elasticsearch cases and produced JSON reports — but had no
comparison mode and no self-codebase cases, so a developer had to
manually diff JSONs and check out a separate big codebase to
validate any ranking change.

What this adds
--------------
- __tests__/evaluation/compare.ts: pure module + CLI. compareReports
  returns per-case + summary delta; formatComparison renders the
  human table; standalone CLI exits non-zero on regression. Budget:
  >0.10 per-case recall drop OR >0.05 mean recall drop = fail.
- __tests__/evaluation/cases-self.ts: 11 cases targeting THIS repo's
  own indexed symbols (CodeGraph, searchNodes, ToolModule,
  compareToRef, ExtractionOrchestrator, etc.). Lets developers
  iterate on ranking without an external codebase.
- runner.ts: argv parser with --cases self|elasticsearch and
  --compare <baseline.json>; env-var fallbacks (EVAL_CODEBASE,
  EVAL_CASES, EVAL_COMPARE). On --compare, the runner re-loads the
  baseline + invokes compareReports + prints formatComparison +
  exits non-zero on regression beyond budget.
- npm run eval:self script.
- .gitignore: __tests__/evaluation/results/.

What I had to fix mid-implementation
- runner.ts referenced `cg.findRelevantContext` (not on CodeGraph;
  it's on cg.contextBuilder). Fixed.
- runner.ts used `__dirname` (ESM-incompatible). Switched to
  `import.meta.dirname`.

Reviewer pass — caught + fixed
- scoring.ts MRR was buggy for multi-symbol cases: iterated over
  expectedSymbols in expected-array order and recorded "the rank of
  the first one found in that iteration", not the BEST rank across
  all found symbols. Standard MRR is reciprocal of the highest-ranked
  relevant result. Renamed `firstRank` → `bestRank` and changed the
  update to `Math.min`-style semantics.
- runner.ts argv parser silently dropped --compare's value when the
  flag was the last token. Now errors with exit code 2.
- self-explore-compare-to-ref case had `getLineRangeHistory` as an
  expected symbol — that function lives in the blame tool, not
  compare-to-ref. Replaced with FileDelta (real compare-to-ref type).
- Runner's meanMRR filter used `startsWith('search-')` but self
  cases use `self-search-*` prefix; the filter dropped them
  silently. Now matches `'-search-'` segment.

Ran on this repo: 9/11 cases pass (was 8/11 before the case fix);
mean recall 0.79; the 2 remaining failures (extraction-pipeline,
search-cascade) are real recall gaps the future ranking arc is
intended to close — confirmed by reviewer that the expected
symbols exist at the cited file:line.

Also: B colbymchenry#29 (upstream PRs to colbymchenry/codegraph) struck as
won't-do per user — keeping changes in this fork.

Verification
- npm run typecheck (tsgo) — clean.
- npx vitest run — 1392/13/0 (eval is run-on-demand, not a vitest
  test).
- npm run eval:self — ran end-to-end, saved JSON, --compare
  against itself shows zero deltas + within-budget verdict.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 3, 2026
…y#19)

Two ranking changes in findRelevantContext, gated by the eval
harness shipped in B colbymchenry#26:

1. Common-term dampener (one-sided IDF) in runMultiTermTextSearch.
   Each per-term search-result score gets multiplied by
   `1 - DAMPEN_RATE * (numHits / fetchLimit)`. With
   DAMPEN_RATE=0.5: a term that saturates the fetch cap (16/16)
   gets 0.5×; mid-saturation (8/16) gets 0.75×; rare hits (1/16)
   get effectively 1.0×. Counters the cross-call score-summing
   problem where common tokens ("nodes", "parse" in this repo)
   drowned out rare ones ("extraction") because each independent
   per-term search only knows BM25-IDF within ITS own results,
   not across calls.

2. PageRank centrality boost in collectAndScoreCandidates.
   `score *= 1 + γ * sqrt(centrality)` with γ=5. Pulls
   high-centrality hubs (ExtractionOrchestrator, etc.) into
   entry-point selection. Sqrt-smooths the long tail so leaves
   barely move while hubs at the top centrality (~0.12 in this
   repo) get +173%.

Eval results
------------
self-eval suite (11 cases): mean recall 0.79 → 0.91 (+0.121),
pass 9/11 → 10/11. Zero per-case regressions. Two cases gained:
- self-explore-extraction-pipeline: 0.00 → 1.00
- self-explore-search-cascade: 0.67 → 1.00

The gate from B colbymchenry#26 (compare.ts) was load-bearing — multiple
intermediate iterations were rejected:
- bidirectional IDF (`log(1 + cap/hits)`): regressed
  self-explore-compare-to-ref by 0.33. Rolled back.
- cliff dampener (saturated→0.5×, else→1.0×, no centrality
  change): flipped extraction-pipeline +1.00 but regressed
  biomarker-engine -0.33. Rejected.
- smooth dampener at DAMPEN_RATE=0.4: zero movement either
  direction.
- DAMPEN_RATE=0.5 with γ=2 centrality (existing): same
  trade as cliff. Rejected.
- DAMPEN_RATE=0.5 with γ=5 centrality: biomarker-engine
  recovered (centrality boost on its hub symbols compensated
  for the dampened common-term scores). Two wins held. Gate
  passed.

Reviewer pass — docstring rot fixed
- CENTRALITY_BOOST_WEIGHT JSDoc still showed γ=2 arithmetic
  examples after I bumped the constant to 5. Updated to the
  correct +173% / +50% / +16% percentages.
- DAMPEN_RATE inline comment showed numbers from a superseded
  DAMPEN_RATE=0.4 iteration. Updated to the shipped 0.5 values.
- Both flagged via memo scrutiny-area #1 (docstring rotting
  after a behavior change). Memo workflow paying off — fifth
  consecutive review where memo content was load-bearing.

Verification
- npm run typecheck (tsgo) — clean.
- npx vitest run — 1392/13/0 (unchanged; ranking changes
  measured via eval, not vitest).
- npx tsx __tests__/evaluation/runner.ts . --cases self
  --compare __tests__/evaluation/baseline-self.json — within
  budget, +0.121 mean recall, +1 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
andreinknv added a commit to andreinknv/codegraph that referenced this pull request May 9, 2026
…polish items

Eight friction-tracker items addressed in parallel by sub-agents (2 Haiku,
1 Sonnet); reviewer caught one real correctness edge case (bucket overlap
on degenerate fresh-index shapes) plus two info items, all addressed in
this commit.

## colbymchenry#21 — at_range cost-benefit JSDoc

Doc-only update to src/mcp/tools/at-range.ts. Tool description and
JSDoc now state "pays off most on dense files (100+ symbols) and
multi-range bulk lookups; for tiny preview fetches on small files,
raw `head -N` is comparable." No code change.

## colbymchenry#25 — blame surfaces rename detection inline

src/git-utils.ts gains a new helper `getFileFollowEarliestTs` that runs
`git log --follow --format=%aI -- <path>` (5 s timeout, ISO timestamp).
src/mcp/tools/blame.ts compares the rename-aware oldest commit against
the line-range-only timeline's oldest. When `--follow` reaches further
back, appends a warning that the timeline truncated at the file's
rename and points at `git log --follow <file>` for the full history.
Edge cases handled: not-a-git-repo, timeout, empty timeline.

Test approach uses `vi.spyOn` to mock pre-rename history because
real fixtures are unreliable: modern git's `git log -L` follows
renames via content-similarity tracking, making a deterministic
black-box rename-fixture impossible.

## colbymchenry#26 — hotspots split into 3 mutually-exclusive categories

src/db/queries-history.ts gains `getCategorizedHotspots` and
src/mcp/tools/hotspots.ts gains a `category: 'risk' | 'maintenance'
| 'brittle' | 'all'` arg (default 'risk' for backward compat).
Thresholds use 75/25 percentile rather than hardcoded magic
numbers — they adapt as the project grows.

Buckets:
- risk        : high centrality AND high churn — where bugs hide
- maintenance : high churn AND not-high centrality — refactor target
- brittle     : high centrality AND not-high churn — stable critical

Reviewer-caught correctness bug: original filters used `<= low` for
the secondary axis, which collapsed buckets when high == low (fresh
index where centrality is uniformly zero, or repos where every file
has identical churn). A file at the threshold could appear in both
risk AND maintenance simultaneously. Fixed by switching maintenance
and brittle to `< highThreshold`, making them strictly disjoint
even on degenerate inputs. Also added a more-hint when any section
hit the per-category cap (the existing `category='risk'` path
already had this; `category='all'` now mirrors).

New `__tests__/hotspots.test.ts` (4 cases) covers all-section
rendering, single-category dispatch, and the backward-compat
default path.

## colbymchenry#27 — search centrality:high differentiates "hook hasn't run"
       vs "no node met the threshold"

src/mcp/tools/search.ts. `probeCentralityFilterCulprit` now runs a
sub-millisecond probe `SELECT 1 FROM nodes WHERE centrality IS NOT
NULL LIMIT 1` (uses the existing `idx_nodes_centrality` index). When
ALL nodes have NULL centrality the agent gets the existing "centrality
hook hasn't run — run codegraph index" hint. When SOME nodes have
centrality but none cleared the filter, a different hint suggests
relaxing the threshold. Two-case hint instead of one.

## colbymchenry#28 — search exact promotes multi-token-query warning to pre-result

src/mcp/tools/search.ts. `buildConceptHintIfNeeded` now returns
`{ preResult, postResult }` instead of a single string. When the
query splits into 2+ space-separated non-qualified tokens (likely
"multiple symbol names"), the agent gets a leading hint to call
search per name OR use codegraph_explore — BEFORE the result list
rather than buried after.

Field-qualified tokens (`kind:function lang:typescript`) and
single-free-token queries are unchanged.

## colbymchenry#33 — callers on "constructor" with no callers explains
       the instantiates-edge model

src/mcp/tools/callers.ts. When the resolved symbol is
`kind=method && name=constructor` AND the callers list is empty,
appends a one-line note: "constructors are invoked via
`new ClassName(...)`, which graph-edges as `instantiates` on the
parent class. To find construction sites, run codegraph_callers on
the enclosing class instead of 'constructor'." Both the multi-match
and single-match paths got the note (guarded by the same
kind+name+empty check). Constructors WITH callers (e.g. via super())
render normally — no false positive.

## colbymchenry#35 — node.symbol tie-break prefers non-fixture, then centrality

src/mcp/tools/symbol-resolver.ts. `pickFromMultipleExactMatches`
now filters out fixture paths first (falls back to all-fixture when
that's all that matches), then sorts by centrality DESC (NULL → 0).
A `helper` symbol that exists in both `src/core.ts` and
`docs/test-beds/fixture.ts` resolves to `src/core.ts` as the displayed
primary. Tier #3 (last_touched_ts) deferred — data not in the
resolver's existing query.

Reviewer-caught DRY issue: the fixture-path regex set was duplicated
between symbol-resolver.ts and dead-code.ts (introduced by parallel
sub-agents on the same brief). Extracted to `isFixturePath` in
src/mcp/tools/shared.ts; both consumers now import the single source.

## colbymchenry#49 — getSummaryCoverage denominator threading (3 call sites)

src/bin/codegraph.ts (lines 348, 1461) + src/mcp/tools/status.ts
(line 440). All three pass `SUMMARIZABLE_KINDS` to getSummaryCoverage
to match the canonical pattern from the previously-fixed
_search-intent.ts:218. Without this, the helper falls back to
COUNT(*) which inflates the denominator with parameters / imports /
file nodes — its own JSDoc explicitly warns against this.

## Test re-additions

Sub-agent #1 deleted its own test files for colbymchenry#33 and colbymchenry#35 (a brief
misread — "DO NOT commit" was interpreted as "DO NOT leave tests in
repo"). Re-added as
`__tests__/mcp-callers-constructor-and-fixture-tiebreak.test.ts`
covering: constructor-with-no-callers note appears,
non-constructor-method note absent, name-collision picks non-fixture
primary.

## Verification

- 15 modified files + 2 new test files, +619/-55
- npm run typecheck — clean
- 74/74 tests pass across 9 LLM/search/hotspots-related test files
- New exports: `isFixturePath` (shared.ts), `getCategorizedHotspots`
  (queries-history.ts), `getFileFollowEarliestTs` (git-utils.ts) —
  all have concrete in-tree callers in the same diff per
  reviewer-memo item #7

Reviewer pass with .claude/reviewer-memo.md prepended caught:
- (request_changes) bucket-exclusivity edge case → fixed
- (info) isFixturePath duplication → deduped
- (info) category='all' missing more-hint → added

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants