Merged
Conversation
Test Results233 tests 233 ✅ 9m 25s ⏱️ Results for commit 6336663. ♻️ This comment has been updated with latest results. |
Benchmark Results |
deven96
reviewed
Dec 22, 2025
deven96
reviewed
Dec 22, 2025
1ef1e7e to
9d4b5d0
Compare
4258979 to
34441dc
Compare
Collaborator
Author
|
We achieve very strong recall on the SIFT10k dataset across multiple configurations. See the recall validation test here: (link to test case). This confirms that the current graph construction and search logic are functioning correctly, and provides a solid baseline for future performance optimizations. |
092a3c2 to
50b941d
Compare
deven96
approved these changes
Feb 26, 2026
Implement a correct and deterministic HNSW index with hierarchical search, stable level assignment, and performance-oriented optimizations. Core implementation: - Implement insert, search_layer, knn-search, and delete - Implement neighbor selection heuristic with diversity filtering - Ensure proper backlink removal on delete - Handle empty-neighbour edge cases safely - Deterministic level assignment via NodeId hash - Add determinism and recall tests (100% recall on 1K dataset) Performance improvements: - Eliminate Node cloning in search (use references) - Introduce BoundedMinHeap in search_layer - Remove manual heap size checks - Move SIMD distance functions and bounded heaps to similarity crate - Introduce EmbeddingKey(Arc<Vec<f32>>) across the non-linear index pipeline
Add empirical validation of HNSW correctness using the SIFT dataset. Validation: - Add recall tests against SIFT ground-truth neighbors - Add helpers to load and parse the SIFT dataset - Add reusable HNSW initialization helper for testing - Remove unnecessary setup in SIFT tests Benchmarking: - Add simple HNSW benchmark - Move SIFT data and related utilities into the similarity crate
…rformance optimizations Generalize HNSW over a distance trait and apply final structural, concurrency, and performance improvements across the index and CI pipeline. Generics & API: - Make HNSW generic over any linear distance implementation - Align insert and delete with the NonLinearIndex trait - Simplify HNSW initialization - Improve error logging in the DB non-linear index Concurrency & data structures: - Replace std collections with papaya for thread-safe access. Update benchmarks to follow suite. - Use SmallVec in LayerIndex to reduce heap allocations - Introduce a lightweight fast hasher for HNSW internals Performance improvements: - Remove redundant magnitude setup in cosine SIMD calculations CI & workflow: - Run Rust tests only for changed crates in GitHub Actions - Split full workspace tests vs non-AI tests to reduce CI time
50b941d to
6336663
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #184. Introduces the HNSW implementation with little to no improvements(Correctness over optimization)