⚡ Bolt: O(N) to O(K) subset filtering in SearchEngine hot paths#401
Conversation
Replaced O(N) full-index iteration with O(F) file iteration in `rebuildPrioritizedFileItems` and `addUrlMatches`. This noticeably speeds up cache invalidations and phases by iterating only over the targeted subsets. Co-authored-by: AhmmedSamier <17784876+AhmmedSamier@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughWalkthroughSearchEngine hot paths now iterate precomputed subsets instead of scanning all indexed items: file-item iteration uses ChangesSearchEngine Optimization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.jules/bolt.md:
- Around line 13-14: The docs have an off-by-one ambiguity: when IDs are "from 0
to N" inclusive, `new Uint8Array(maxIndex)` is too small because valid indices
are 0..maxIndex-1; update the guidance to either require `new Uint8Array(N+1)`
when IDs range 0..N inclusive or reword to state IDs are 0..N-1; reference the
current symbols `Set<number>`, `new Uint8Array(maxIndex)`, and `array[id] = 1`
so the change clarifies sizing for `maxIndex` (use N+1) or adjusts the described
ID bounds accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 780e21d1-9e83-4097-82fc-a7bf95ac1dcc
📒 Files selected for processing (2)
.jules/bolt.mdlanguage-server/src/core/search-engine.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- language-server/src/core/search-engine.ts
| **Learning:** When keeping track of seen integer IDs that are dense and bounded (e.g. from 0 to N), using `new Set<number>()` incurs heavy allocation and insertion overhead compared to a fixed-size byte array. | ||
| **Action:** Replace `Set<number>` with `new Uint8Array(maxIndex)` and use `array[id] = 1` to track presence, which is ~15x faster and avoids garbage collection pauses in hot paths. (Benchmark context: `N=100,000` IDs, `bun` version 1.2.14, Linux x86_64, Intel Xeon 2.30GHz, 4 cores, 8GB RAM, averaged over 100 iterations comparing `Set<number>` addition vs `new Uint8Array(maxIndex)` indexed assignment `array[id] = 1`). |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In JavaScript, for new Uint8Array(length), what are the valid index bounds, and if IDs are in the inclusive range 0..N, should the length be N or N+1?
💡 Result:
Valid index bounds for new Uint8Array(length): indices are 0 through length−1 inclusive. Accessing an index ≥ length is out of bounds and yields undefined (for property access) / is not within the element range; e.g., TypedArray.prototype.at(index) is specified to return undefined when index >= array.length [1] and MDN notes that indexed properties consult the ArrayBuffer and can be undefined when indices are out of bound [2]. If your IDs are in the inclusive range 0..N, you need an array length of N+1 so that the last valid index N exists (since the last index equals length−1). (More generally, the typed array’s length is the number of elements.) [3][4] Practical mapping: 0..N (inclusive) → allocate new Uint8Array(N+1); then IDs 0..N correspond exactly to indices 0..(N+1−1).
Citations:
- 1: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/at
- 2: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray
- 3: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/length
- 4: https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/length
Clarify Uint8Array sizing to avoid off-by-one bug
The documentation states IDs are "from 0 to N" (inclusive), but recommends new Uint8Array(maxIndex), which has valid indices 0..maxIndex-1. For IDs 0..N inclusive, use new Uint8Array(N+1) or clarify that bounds are 0..N-1.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.jules/bolt.md around lines 13 - 14, The docs have an off-by-one ambiguity:
when IDs are "from 0 to N" inclusive, `new Uint8Array(maxIndex)` is too small
because valid indices are 0..maxIndex-1; update the guidance to either require
`new Uint8Array(N+1)` when IDs range 0..N inclusive or reword to state IDs are
0..N-1; reference the current symbols `Set<number>`, `new Uint8Array(maxIndex)`,
and `array[id] = 1` so the change clarifies sizing for `maxIndex` (use N+1) or
adjusts the described ID bounds accordingly.
💡 What: Replaced O(N) iteration over all indexed items in
rebuildPrioritizedFileItemsandaddUrlMatcheswith subset iterations using pre-populated caches (fileItemByNormalizedPathandscopedIndices).🎯 Why: Iterating over hundreds of thousands of items just to process the subset of files or endpoints adds unnecessary overhead and slows down caching operations.
📊 Impact: Eliminates ~100k loop iterations on large repos, noticeably speeding up stream search startup and cache invalidations.
🔬 Measurement: Verified that tests pass via
cd language-server && bun run lint && bun test.PR created automatically by Jules for task 1895633848562851465 started by @AhmmedSamier
Summary by CodeRabbit
Documentation
Refactor