feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#131
Merged
colbymchenry merged 3 commits intoMay 8, 2026
Conversation
…zy typo fallback
Two UX improvements that turn a free-text search into something a
real user can drive precisely.
1) Field-qualified queries.
A new query parser (src/search/query-parser.ts) splits the raw query
into structured filters and a free-text remainder:
kind:function name:auth path:src/api authenticate
becomes
{ kinds: ['function'], nameFilters: ['auth'],
pathFilters: ['src/api'], text: 'authenticate' }
Filters compose with the SearchOptions arg (intersection). Unknown
prefixes pass through as plain text so `query "TODO:"` keeps working.
Quoted values (`path:"my dir"`) handle whitespace. When the user
specifies only filters with no text, the search uses a filter-only
candidate scan instead of bailing out.
Recognised today:
kind: any NodeKind value
lang: any Language value (alias: language:)
path: case-insensitive substring of file_path
name: case-insensitive substring of node.name
2) Fuzzy fallback.
When BOTH FTS and LIKE return nothing AND the text is at least 3
chars, the resolver scans the distinct-name set with a bounded
Damerau-Levenshtein-style edit distance (≤2 for ≥5 chars, ≤1 for
4-char queries, off for shorter). Bounded edit-distance early-exits
once the row min exceeds maxDist, so this stays O(distinct-names *
avg-name-length) with a very low constant.
Verified live against ollama/ollama@v0.22.0:
query "kind:function auth" → only function-kind hits
query "lang:go path:server route" → Go files under server/
query "getUssr" (typo) → finds getUser, SetUser
query "confg" (typo) → finds Config
Full test suite: 380 passed.
…fuzzy fan-out cap, larger filter-only over-fetch, unit tests Five fixes from independent review: - parseQuery tokenizer: quotes that appear MID-token (path:"my dir/ file") were not being recognised — only quotes at the start of a token were treated as quoted spans. The fixture path:"my dir" parsed as ['path:"my', 'dir"'] instead of ['path:"my dir"']. Tokeniser is now a single state machine that scans into a token until whitespace OR a quote, and recognises quotes anywhere within the token (skips to the matching close quote). - searchNodesFuzzy: cap the per-name follow-up SQL queries at Math.max(limit*2, 50) AFTER edit-distance filtering. Without this, a project with many similar names (getUser1, getUser2...) could fan out far beyond limit queries before the inner-loop break kicks in. - searchAllByFilters (filter-only no-text path): bumped over-fetch multiplier from 2× to 5× so a selective post-filter (e.g. path:src/very/specific/file.ts) doesn't return fewer than limit results despite the DB having matches. - 23 new unit tests in __tests__/search-query-parser.test.ts: parseQuery covers known-field filter, lang/language alias, multiple kind: ORs, quoted spans (incl. mid-token), URL passthrough, empty-value passthrough, unknown prefix passthrough, unknown value passthrough, all-filters-no-text, empty input, 20k-char input. boundedEditDistance covers identity, single insertion/deletion/substitution, length-difference shortcut, empty inputs, case-sensitivity, early-exit correctness. Full test suite: 853 passed (up from 830).
5 tasks
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
Apr 28, 2026
# Conflicts: # src/db/queries.ts
andreinknv
added a commit
to andreinknv/codegraph
that referenced
this pull request
Apr 29, 2026
Adds Steps K-O to walk the new PRs in dependency order: K: bug-fix wave (clean): colbymchenry#128, colbymchenry#129 L: resolution + search: colbymchenry#130 (resolve), colbymchenry#131 (resolve) M: extraction edges: colbymchenry#134 (resolve) N: biomarker stack: colbymchenry#132, colbymchenry#133 (both resolve, on top of colbymchenry#125) O: search advanced: colbymchenry#135 (resolve, on top of colbymchenry#131) Also flips colbymchenry#125 from merge_clean to merge_resolve - it now hits a queries.ts conflict after the Phase-4 stack lands (colbymchenry#111/colbymchenry#112/colbymchenry#123/colbymchenry#124 all extend the same QueryBuilder surface, so colbymchenry#125's biomarker columns no longer apply cleanly without a resolution). Validated end-to-end against colbymchenry/main HEAD: script ran clean through all 43 PRs, npm run build succeeded, full test suite reports 877/877 passing (was 829 before this wave: +48 from new tests added by the new PRs plus the reviewer-driven follow-ups).
Convert NodeKind and Language to runtime-iterable as const arrays (NODE_KINDS, LANGUAGES) so the query parser imports the canonical list instead of duplicating it. Also fix the path: JSDoc to say substring (matches the .includes() impl). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
|
Reviewed and merging. Pushed a small polish commit:
The field-qualified syntax and the bounded-edit-distance fuzzy fallback are both clean, well-tested wins. Thanks for the contribution. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two UX improvements that turn free-text search into something a user can drive precisely.
1. Field-qualified queries
A new query parser splits the raw query into structured filters and a free-text remainder:
becomes:
Filters compose with the
SearchOptionsarg (intersection). Unknown prefixes pass through as plain text soquery "TODO:"keeps working. Quoted values (path:"my dir") handle whitespace. When the user supplies only filters with no text, the search uses a filter-only candidate scan instead of bailing out.Recognised fields:
kind:NodeKindvalue (function,method,class, ...)lang:(aliaslanguage:)Languagevaluepath:file_pathname:node.name2. Fuzzy typo fallback
When both FTS and LIKE return nothing AND the text is at least 3 chars, scan the distinct-name set with a bounded edit distance (≤2 for ≥5-char queries, ≤1 for 4-char). Bounded edit distance early-exits once the row min exceeds
maxDist, so the per-query cost stays O(distinct-names × avg-name-length) with a very low constant.Test plan
Verified live against ollama/ollama@v0.22.0:
kind:function authlang:go path:server routeserver/getUssr(typo)getUser,SetUserconfg(typo)Confignpx vitest run— 380 passednpx tsc --noEmitcleannpm run buildsucceeds🤖 Generated with Claude Code