Fix burst search dropping valid results due to early cutoff#253
Fix burst search dropping valid results due to early cutoff#253AhmmedSamier merged 5 commits intomasterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds lexical/name-based scoring alongside fuzzy scoring, introduces query bitflag pruning in hot paths, moves result emission and slicing out of burst hot paths into a cancellation-aware post-processing pipeline, and adds tests ensuring direct/lexical matches beat weaker fuzzy matches across scopes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant SearchEngine as SearchEngine
participant BurstLayer as BurstLayer
participant ProviderIndex as ProviderIndex
participant URLMatcher as URLMatcher
participant PostProcess as PostProcess
Client->>SearchEngine: search(query, maxResults, token)
SearchEngine->>BurstLayer: findBurstMatches(queryLower, token, queryBits)
BurstLayer->>ProviderIndex: probe providers/indexes (bitflag-pruned)
ProviderIndex-->>BurstLayer: rawMatches (lexical + fuzzy scores)
BurstLayer-->>SearchEngine: burstMatches
SearchEngine->>URLMatcher: addUrlMatches(burstMatches, preparedQuery)
URLMatcher-->>SearchEngine: mergedMatches
SearchEngine->>PostProcess: rankAndBoost(mergedMatches, personalization, activity)
PostProcess-->>SearchEngine: finalRankedResults
SearchEngine->>Client: emit results (sliced to maxResults, cancellation-aware)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
language-server/src/core/search-engine.ts (1)
2100-2105:⚠️ Potential issue | 🟠 MajorURL augmentation still short-circuits candidate collection.
At Line 2104,
addUrlMatches(..., maxResults)can stop scanning URL candidates early, which can still drop valid high-scoring endpoint matches before final ranking.Suggested fix
- this.addUrlMatches(results, indices, preparedQuery, maxResults); + this.addUrlMatches(results, indices, preparedQuery, cancellationToken);- private addUrlMatches( - results: SearchResult[], - indices: number[] | undefined, - queryOrPrepared: string | PreparedPath, - maxResults?: number, - ): void { + private addUrlMatches( + results: SearchResult[], + indices: number[] | undefined, + queryOrPrepared: string | PreparedPath, + token?: CancellationToken, + ): void { const existingIds = new Set(results.map((r) => r.item.id)); const checkItem = (i: number) => { - if (maxResults && results.length >= maxResults) return; + if (token?.isCancellationRequested) return; const item = this.items[i]; if (item.type === SearchItemType.ENDPOINT && !existingIds.has(item.id)) { const pattern = this.preparedPatterns[i]; const score = pattern ? RouteMatcher.scoreMatchPattern(pattern, queryOrPrepared) : RouteMatcher.scoreMatch(item.name, queryOrPrepared); if (score > 0) { results.push({ item, score, scope: SearchScope.ENDPOINTS, }); existingIds.add(item.id); } } }; if (indices) { for (const i of indices) { - if (maxResults && results.length >= maxResults) break; + if (token?.isCancellationRequested) break; checkItem(i); } } else { for (let i = 0; i < this.items.length; i++) { - if (maxResults && results.length >= maxResults) break; + if (token?.isCancellationRequested) break; checkItem(i); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@language-server/src/core/search-engine.ts` around lines 2100 - 2105, The URL augmentation path calls this.addUrlMatches(results, indices, preparedQuery, maxResults) which allows addUrlMatches to short-circuit candidate collection and drop endpoint candidates prematurely; change the call to collect all URL candidates (e.g., remove the maxResults limit or pass a very large limit such as Number.MAX_SAFE_INTEGER) so addUrlMatches (referenced by name) scans fully, letting the final ranking step decide which matches to keep; keep the same condition around RouteMatcher.prepare/queryLower and ensure downstream ranking logic still enforces maxResults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2100-2105: The URL augmentation path calls
this.addUrlMatches(results, indices, preparedQuery, maxResults) which allows
addUrlMatches to short-circuit candidate collection and drop endpoint candidates
prematurely; change the call to collect all URL candidates (e.g., remove the
maxResults limit or pass a very large limit such as Number.MAX_SAFE_INTEGER) so
addUrlMatches (referenced by name) scans fully, letting the final ranking step
decide which matches to keep; keep the same condition around
RouteMatcher.prepare/queryLower and ensure downstream ranking logic still
enforces maxResults.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2099-2105: Before calling RouteMatcher.prepare and
this.addUrlMatches, check the search cancellation token and return early if
cancellation was requested; i.e., in the block that tests (scope ===
SearchScope.EVERYTHING || scope === SearchScope.ENDPOINTS) &&
RouteMatcher.isPotentialUrl(queryLower) add a guard (check the existing
cancellation variable or method used across the class, e.g.,
cancellation.isCancellationRequested or this.isCancelled()) and skip creating
preparedQuery and calling this.addUrlMatches(results, indices, preparedQuery)
when cancelled.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
language-server/src/core/search-engine.ts (1)
2125-2131:⚠️ Potential issue | 🟠 MajorHonor cancellation before URL augmentation in burstSearch.
Line 2125 still performs URL expansion even after cancellation. This can continue scanning endpoints after cancellation is requested.
Suggested fix
- if ( + if ( + !cancellationToken?.isCancellationRequested && (scope === SearchScope.EVERYTHING || scope === SearchScope.ENDPOINTS) && RouteMatcher.isPotentialUrl(queryLower) ) { const preparedQuery = RouteMatcher.prepare(queryLower); - this.addUrlMatches(results, indices, preparedQuery); + this.addUrlMatches(results, indices, preparedQuery, cancellationToken); }private addUrlMatches( results: SearchResult[], indices: number[] | undefined, queryOrPrepared: string | PreparedPath, - maxResults?: number, + token?: CancellationToken, ): void { const existingIds = new Set(results.map((r) => r.item.id)); const checkItem = (i: number) => { - if (maxResults && results.length >= maxResults) return; + if (token?.isCancellationRequested) return; const item = this.items[i]; if (item.type === SearchItemType.ENDPOINT && !existingIds.has(item.id)) { const pattern = this.preparedPatterns[i]; @@ if (indices) { for (const i of indices) { - if (maxResults && results.length >= maxResults) break; + if (token?.isCancellationRequested) break; checkItem(i); } } else { for (let i = 0; i < this.items.length; i++) { - if (maxResults && results.length >= maxResults) break; + if (token?.isCancellationRequested) break; checkItem(i); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@language-server/src/core/search-engine.ts` around lines 2125 - 2131, In burstSearch, avoid doing URL expansion after cancellation by checking the search cancellation token/abort signal immediately before the URL-augmentation block: when (scope === SearchScope.EVERYTHING || scope === SearchScope.ENDPOINTS) && RouteMatcher.isPotentialUrl(queryLower) is true, first test the cancellation token (e.g., cancelToken.isCancelled or abortSignal.aborted) and return/skip if cancelled, then call RouteMatcher.prepare and this.addUrlMatches; update the burstSearch implementation to perform this early-cancel check so RouteMatcher.prepare and addUrlMatches are not invoked after cancellation.
🧹 Nitpick comments (1)
language-server/src/core/search-engine.ts (1)
1753-1805: Avoid per-itemtoLowerCase()allocations in the lexical hot path.
calculateLexicalScorelowercases names on every candidate evaluation. On large indices, this can become a measurable regression. Consider using precomputed lowercase fields (parallel arrays) to keep this path allocation-free.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@language-server/src/core/search-engine.ts` around lines 1753 - 1805, calculateLexicalScore currently calls item.name.toLowerCase() per candidate; avoid that allocation by precomputing lowercase strings in prepareSearchContext and reading them from the context. Update prepareSearchContext to build parallel arrays (e.g., context.namesLower and context.fullNamesLower) or attach lowercase fields to each item when the search context is prepared, ensure they are populated when the index is created/updated, then change calculateLexicalScore to use context.namesLower[i] and context.fullNamesLower[i] (and still use context.queryLower) instead of calling toLowerCase on item.name/fullName so the lexical hot path is allocation-free; keep the same fallback/score logic and preserve use of ID_TO_BOOST.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2125-2131: In burstSearch, avoid doing URL expansion after
cancellation by checking the search cancellation token/abort signal immediately
before the URL-augmentation block: when (scope === SearchScope.EVERYTHING ||
scope === SearchScope.ENDPOINTS) && RouteMatcher.isPotentialUrl(queryLower) is
true, first test the cancellation token (e.g., cancelToken.isCancelled or
abortSignal.aborted) and return/skip if cancelled, then call
RouteMatcher.prepare and this.addUrlMatches; update the burstSearch
implementation to perform this early-cancel check so RouteMatcher.prepare and
addUrlMatches are not invoked after cancellation.
---
Nitpick comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 1753-1805: calculateLexicalScore currently calls
item.name.toLowerCase() per candidate; avoid that allocation by precomputing
lowercase strings in prepareSearchContext and reading them from the context.
Update prepareSearchContext to build parallel arrays (e.g., context.namesLower
and context.fullNamesLower) or attach lowercase fields to each item when the
search context is prepared, ensure they are populated when the index is
created/updated, then change calculateLexicalScore to use context.namesLower[i]
and context.fullNamesLower[i] (and still use context.queryLower) instead of
calling toLowerCase on item.name/fullName so the lexical hot path is
allocation-free; keep the same fallback/score logic and preserve use of
ID_TO_BOOST.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
language-server/src/core/search-engine.ts (1)
2129-2135:⚠️ Potential issue | 🟠 MajorHonor cancellation before URL augmentation work.
When cancellation is already requested,
RouteMatcher.prepare(...)andaddUrlMatches(...)still run. This can continue scanning after cancel and delay responsiveness.Suggested fix
- if ( + if ( + !cancellationToken?.isCancellationRequested && (scope === SearchScope.EVERYTHING || scope === SearchScope.ENDPOINTS) && RouteMatcher.isPotentialUrl(queryLower) ) { const preparedQuery = RouteMatcher.prepare(queryLower); - this.addUrlMatches(results, indices, preparedQuery); + this.addUrlMatches(results, indices, preparedQuery, cancellationToken); }private addUrlMatches( results: SearchResult[], indices: number[] | undefined, queryOrPrepared: string | PreparedPath, - maxResults?: number, + token?: CancellationToken, ): void { @@ - if (maxResults && results.length >= maxResults) return; + if (token?.isCancellationRequested) return; @@ if (indices) { for (const i of indices) { - if (maxResults && results.length >= maxResults) break; + if (token?.isCancellationRequested) break; checkItem(i); } } else { for (let i = 0; i < this.items.length; i++) { - if (maxResults && results.length >= maxResults) break; + if (token?.isCancellationRequested) break; checkItem(i); } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@language-server/src/core/search-engine.ts` around lines 2129 - 2135, Before performing URL augmentation, check the search cancellation flag/token and bail out early if cancellation is requested; specifically, inside the block that checks SearchScope.EVERYTHING/ENDPOINTS and RouteMatcher.isPotentialUrl(queryLower), verify the cancellation state (e.g., this.cancellationRequested or the provided cancellationToken/isCancellationRequested) and return without calling RouteMatcher.prepare(...) or this.addUrlMatches(results, indices, ... ) when cancelled so the expensive prepare/addUrlMatches work is skipped.
🧹 Nitpick comments (1)
language-server/src/core/search-engine.ts (1)
1758-1762: Avoid per-item lowercase allocations in lexical scoring hot path.
calculateLexicalScorecurrently lowercasesitem.name(and may lowercasefullNamedownstream) for every scanned item per query. Caching lowercased name/fullName in parallel arrays would reduce GC pressure on large indexes.Also applies to: 1789-1809
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@language-server/src/core/search-engine.ts` around lines 1758 - 1762, calculateLexicalScore is allocating lowercased strings per-item on every query which pressures the GC; precompute and reuse lowercased versions instead. Add parallel arrays (e.g., lowerNames and lowerFullNames) populated once when the item index is built/updated and replace all per-call item.name.toLowerCase()/fullName.toLowerCase() usages inside calculateLexicalScore (and related code paths referenced around calculateLexicalScore and lines 1789-1809) to read from those arrays; ensure the arrays are kept in sync with the main items array when items are added/removed/updated so calculateFuzzyScore and other callers continue to use the original item data while lexical scoring uses cached lowercase strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2129-2135: Before performing URL augmentation, check the search
cancellation flag/token and bail out early if cancellation is requested;
specifically, inside the block that checks SearchScope.EVERYTHING/ENDPOINTS and
RouteMatcher.isPotentialUrl(queryLower), verify the cancellation state (e.g.,
this.cancellationRequested or the provided
cancellationToken/isCancellationRequested) and return without calling
RouteMatcher.prepare(...) or this.addUrlMatches(results, indices, ... ) when
cancelled so the expensive prepare/addUrlMatches work is skipped.
---
Nitpick comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 1758-1762: calculateLexicalScore is allocating lowercased strings
per-item on every query which pressures the GC; precompute and reuse lowercased
versions instead. Add parallel arrays (e.g., lowerNames and lowerFullNames)
populated once when the item index is built/updated and replace all per-call
item.name.toLowerCase()/fullName.toLowerCase() usages inside
calculateLexicalScore (and related code paths referenced around
calculateLexicalScore and lines 1789-1809) to read from those arrays; ensure the
arrays are kept in sync with the main items array when items are
added/removed/updated so calculateFuzzyScore and other callers continue to use
the original item data while lexical scoring uses cached lowercase strings.
8f350ed to
8d1a82b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
language-server/src/core/search-engine.ts (1)
2175-2180:⚠️ Potential issue | 🟠 MajorCancellation is pre-checked before URL enrichment, but not honored during URL scanning.
After the one-time check in
enrichResults,addUrlMatchescan still scan all items even if cancellation flips mid-loop.💡 Proposed fix
private async enrichResults( results: SearchResult[], scope: SearchScope, queryLower: string, token?: CancellationToken, ): Promise<SearchResult[]> { @@ if (token?.isCancellationRequested) { return results; } const preparedQuery = RouteMatcher.prepare(queryLower); - this.addUrlMatches(results, undefined, preparedQuery); + this.addUrlMatches(results, undefined, preparedQuery, undefined, token); } return results; }private addUrlMatches( results: SearchResult[], indices: number[] | undefined, queryOrPrepared: string | PreparedPath, maxResults?: number, + token?: CancellationToken, ): void { @@ const checkItem = (i: number) => { + if (token?.isCancellationRequested) return; if (maxResults && results.length >= maxResults) return; @@ if (indices) { for (const i of indices) { + if (token?.isCancellationRequested) break; if (maxResults && results.length >= maxResults) break; checkItem(i); } } else { for (let i = 0; i < this.items.length; i++) { + if (token?.isCancellationRequested) break; if (maxResults && results.length >= maxResults) break; checkItem(i); } } }Also applies to: 2365-2404
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@language-server/src/core/search-engine.ts` around lines 2175 - 2180, enrichResults does a one-time cancellation check but then calls addUrlMatches which can still fully scan items if cancellation occurs mid-loop; modify addUrlMatches (and any other scanning helpers invoked from enrichResults, e.g., the scanning block referenced around lines 2365-2404) to accept the cancellation token (token) and check token.isCancellationRequested inside its iteration(s), returning early when canceled; update enrichResults to pass the token through when calling addUrlMatches so cancellation is honored during URL scanning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2236-2243: The current calculateMatchScore early-returns whenever
calculateNameMatchScore(nameLower, queryLower) > 0, which can under-rank
stronger fullName matches; change calculateMatchScore to always compute both
nameScore and fullNameScore (call calculateFullNameMatchScore(fullName,
nameLower, queryLower)) and return the higher of the two; also fix
calculateFullNameMatchScore to compare actual string equality instead of using
length equality (use fullName === nameLower or fullName === queryLower as
appropriate) so exact fullName matches aren't suppressed by length checks;
update references to calculateNameMatchScore and calculateFullNameMatchScore
accordingly.
- Around line 1729-1732: The current bitflag pruning uses context.queryBitflags
vs context.itemBitflags[i] and is too strict for pattern-based endpoints (e.g.,
/users/42 vs /users/{id}); change the comparison so that before early-return you
mask out dynamic-character flags (digits and punctuation) from the query
bitflags when evaluating pattern endpoints — compute maskedQuery =
context.queryBitflags & ~DYNAMIC_CHAR_FLAGS (using existing flag constants like
FLAG_DIGIT/FLAG_PUNCTUATION) and then require (context.itemBitflags[i] &
maskedQuery) === maskedQuery; apply the same masked comparison in the other
occurrence around lines 2205-2208 so pattern endpoints aren’t incorrectly
pruned.
---
Duplicate comments:
In `@language-server/src/core/search-engine.ts`:
- Around line 2175-2180: enrichResults does a one-time cancellation check but
then calls addUrlMatches which can still fully scan items if cancellation occurs
mid-loop; modify addUrlMatches (and any other scanning helpers invoked from
enrichResults, e.g., the scanning block referenced around lines 2365-2404) to
accept the cancellation token (token) and check token.isCancellationRequested
inside its iteration(s), returning early when canceled; update enrichResults to
pass the token through when calling addUrlMatches so cancellation is honored
during URL scanning.
Motivation
Description
language-server/src/core/search-engine.tssoburstSearchfirst collects all matching candidates (viafindBurstMatches) and then ranks and truncates usingresults.sort(...).slice(0, maxResults)instead of stopping during traversal.onResult) to occur after ranking/truncation so streamed notifications match the final returned results.maxResults/results short-circuiting and updated signatures forfindBurstMatches,searchAllScopesInPriorityOrder,searchPriorityScopes, andsearchRemainingItemsto respect only cancellation tokens.should prefer best-scoring result across all scopes in burstSearchinlanguage-server/src/core/search-engine.test.tsto ensure best-scoring matches from non-priority scopes are returned.Testing
cd language-server && bun run testandcd language-server && bun testand observed all tests pass (including the new regression test).cd vscode-extension && bun run compileto verify packaging and artifacts copy completed successfully.cd language-server && bun test src/core/search-engine.test.tsto validate the specific changes and the new test passed.Codex Task
Summary by CodeRabbit
New Features
Bug Fixes
Tests