Skip to content

Refactor vector_search to bounded top-k heap#56

Merged
Meru143 merged 2 commits intomainfrom
codex/refactor-vector_search-to-use-bounded-heap
Feb 25, 2026
Merged

Refactor vector_search to bounded top-k heap#56
Meru143 merged 2 commits intomainfrom
codex/refactor-vector_search-to-use-bounded-heap

Conversation

@Meru143
Copy link
Copy Markdown
Owner

@Meru143 Meru143 commented Feb 25, 2026

Motivation

  • Reduce memory growth and avoid collecting/sorting all rows in CodeIndex::vector_search by keeping only the top limit candidates while scanning embeddings.
  • Make ordering deterministic for equal scores and add tests that validate behavior on large synthetic inputs.

Description

  • Replace the unbounded Vec<(f64, CodeChunk)> + full sort() with a bounded min-heap (BinaryHeap<Reverse<ScoredChunk>>) maintained by a push_top_k helper to keep at most limit items while iterating rows.
  • Introduce ScoredChunk with Ord/PartialOrd implemented via total_cmp and an ordinal insertion index to ensure deterministic ties.
  • Add an early return for limit == 0, and finalize results by draining the heap into a Vec and sorting the small top-k list before returning.
  • Add deterministic tests: vector_search_large_input_matches_full_sort_expectation (20k synthetic rows) to verify functional equivalence with a full-sort baseline and push_top_k_keeps_heap_bounded to assert bounded heap size.

Testing

  • Ran cargo fmt --all successfully.
  • Ran store-focused tests with cargo test -p argus-codelens store::tests:: and all store tests passed (13 passed; 0 failed), including vector_search_large_input_matches_full_sort_expectation and push_top_k_keeps_heap_bounded which both passed.
  • Also ran targeted runs verifying the new helper and vector search behavior and observed no test failures.

Codex Task

Copilot AI review requested due to automatic review settings February 25, 2026 09:30
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the vector_search method to use a bounded min-heap instead of collecting and sorting all search results, reducing memory usage and improving performance for large datasets. The implementation introduces a ScoredChunk struct with deterministic ordering via total_cmp and an ordinal index for tie-breaking, along with a push_top_k helper function to maintain heap bounds while scanning embeddings.

Changes:

  • Introduced ScoredChunk struct with Ord/PartialOrd traits using total_cmp and ordinal-based tie-breaking for deterministic ordering
  • Replaced unbounded Vec<(f64, CodeChunk)> collection with bounded BinaryHeap<Reverse<ScoredChunk>> maintained via new push_top_k helper
  • Added comprehensive tests validating bounded heap behavior and functional equivalence with full-sort baseline on 20k synthetic rows

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 25, 2026

Warning

Rate limit exceeded

@Meru143 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 36 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 1093d90 and c8d2171.

📒 Files selected for processing (1)
  • crates/argus-codelens/src/store.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/refactor-vector_search-to-use-bounded-heap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Meru143 Meru143 merged commit 6a6f0fb into main Feb 25, 2026
1 check passed
@Meru143 Meru143 deleted the codex/refactor-vector_search-to-use-bounded-heap branch February 25, 2026 09:41
@Meru143 Meru143 mentioned this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants