Skip to content

HNSW Implementation#283

Merged
Iamdavidonuh merged 3 commits intomainfrom
david/impl-hnsw
Feb 26, 2026
Merged

HNSW Implementation#283
Iamdavidonuh merged 3 commits intomainfrom
david/impl-hnsw

Conversation

@Iamdavidonuh
Copy link
Collaborator

Part of #184. Introduces the HNSW implementation with little to no improvements(Correctness over optimization)

@github-actions
Copy link

github-actions bot commented Dec 19, 2025

Test Results

233 tests   233 ✅  9m 25s ⏱️
 34 suites    0 💤
  4 files      0 ❌

Results for commit 6336663.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Dec 19, 2025

Benchmark Results

group                                                        main                                   pr
-----                                                        ----                                   --
predicate_query_with_index/size_100                          1.10      3.4±0.00µs        ? ?/sec    1.00      3.1±0.00µs        ? ?/sec
predicate_query_with_index/size_1000                         1.06     34.0±0.02µs        ? ?/sec    1.00     32.0±0.01µs        ? ?/sec
predicate_query_with_index/size_10000                        1.00    383.7±0.16µs        ? ?/sec    1.03    396.7±0.24µs        ? ?/sec
predicate_query_with_index/size_100000                       1.10      6.3±0.24ms        ? ?/sec    1.00      5.7±0.44ms        ? ?/sec
predicate_query_without_index/size_100                       1.07      7.5±0.01µs        ? ?/sec    1.00      7.1±0.01µs        ? ?/sec
predicate_query_without_index/size_1000                      1.00     98.1±0.36µs        ? ?/sec    1.07    104.7±0.05µs        ? ?/sec
predicate_query_without_index/size_10000                     1.00    834.0±2.91µs        ? ?/sec    1.01    841.2±2.58µs        ? ?/sec
predicate_query_without_index/size_100000                    1.01     16.0±0.25ms        ? ?/sec    1.00     15.9±0.46ms        ? ?/sec
store_batch_insertion_without_predicates/size_100            1.00    199.6±2.23µs        ? ?/sec    1.01    201.4±1.74µs        ? ?/sec
store_batch_insertion_without_predicates/size_1000           1.09  1444.0±50.81µs        ? ?/sec    1.00  1325.1±35.61µs        ? ?/sec
store_batch_insertion_without_predicates/size_10000          1.00     14.0±0.11ms        ? ?/sec    1.01     14.2±0.11ms        ? ?/sec
store_batch_insertion_without_predicates/size_100000         1.00    137.6±0.75ms        ? ?/sec    1.00    137.8±0.70ms        ? ?/sec
store_retrieval_no_condition/size_100                        1.03     93.1±0.70µs        ? ?/sec    1.00     90.5±0.45µs        ? ?/sec
store_retrieval_no_condition/size_1000                       1.05   810.6±10.18µs        ? ?/sec    1.00   768.4±11.87µs        ? ?/sec
store_retrieval_no_condition/size_10000                      1.04      7.5±0.05ms        ? ?/sec    1.00      7.2±0.02ms        ? ?/sec
store_retrieval_no_condition/size_100000                     1.03     78.4±0.22ms        ? ?/sec    1.00     75.9±0.61ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100                   1.07    196.7±0.31µs        ? ?/sec    1.00    183.0±0.67µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_1000                  1.00   1159.1±2.35µs        ? ?/sec    1.00   1157.7±2.23µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_10000                 1.00     12.3±0.07ms        ? ?/sec    1.01     12.5±0.12ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100000                1.00    139.8±0.38ms        ? ?/sec    1.06    147.6±0.64ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100       1.01    275.2±0.70µs        ? ?/sec    1.00    273.0±0.19µs        ? ?/sec
store_sequential_insertion_without_predicates/size_1000      1.02      2.7±0.00ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
store_sequential_insertion_without_predicates/size_10000     1.02     27.2±0.06ms        ? ?/sec    1.00     26.8±0.03ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100000    1.01    271.7±1.13ms        ? ?/sec    1.00    268.6±0.45ms        ? ?/sec

@Iamdavidonuh Iamdavidonuh force-pushed the david/impl-hnsw branch 3 times, most recently from 1ef1e7e to 9d4b5d0 Compare January 12, 2026 12:19
@deven96 deven96 force-pushed the david/impl-hnsw branch 5 times, most recently from 4258979 to 34441dc Compare February 21, 2026 16:51
@Iamdavidonuh
Copy link
Collaborator Author

We achieve very strong recall on the SIFT10k dataset across multiple configurations.
Recall varies depending on the chosen HNSW parameters (e.g., M, ef_construction, and ef_search), but the current implementation consistently reaches high recall values.

See the recall validation test here: (link to test case).

This confirms that the current graph construction and search logic are functioning correctly, and provides a solid baseline for future performance optimizations.

@Iamdavidonuh Iamdavidonuh marked this pull request as ready for review February 21, 2026 23:20
@Iamdavidonuh Iamdavidonuh requested a review from deven96 February 21, 2026 23:22
@Iamdavidonuh Iamdavidonuh changed the title HNSW Impl HNSW Implementation: Feb 22, 2026
@Iamdavidonuh Iamdavidonuh changed the title HNSW Implementation: HNSW Implementation Feb 22, 2026
Implement a correct and deterministic HNSW index with
hierarchical search, stable level assignment, and
performance-oriented optimizations.

Core implementation:
- Implement insert, search_layer, knn-search, and delete
- Implement neighbor selection heuristic with diversity filtering
- Ensure proper backlink removal on delete
- Handle empty-neighbour edge cases safely
- Deterministic level assignment via NodeId hash
- Add determinism and recall tests (100% recall on 1K dataset)

Performance improvements:
- Eliminate Node cloning in search (use references)
- Introduce BoundedMinHeap in search_layer
- Remove manual heap size checks
- Move SIMD distance functions and bounded heaps to similarity crate
- Introduce EmbeddingKey(Arc<Vec<f32>>) across the non-linear index pipeline
Add empirical validation of HNSW correctness using the SIFT dataset.

Validation:
- Add recall tests against SIFT ground-truth neighbors
- Add helpers to load and parse the SIFT dataset
- Add reusable HNSW initialization helper for testing
- Remove unnecessary setup in SIFT tests

Benchmarking:
- Add simple HNSW benchmark
- Move SIFT data and related utilities into the similarity crate
…rformance optimizations

Generalize HNSW over a distance trait and apply final structural,
concurrency, and performance improvements across the index and CI pipeline.

Generics & API:
- Make HNSW generic over any linear distance implementation
- Align insert and delete with the NonLinearIndex trait
- Simplify HNSW initialization
- Improve error logging in the DB non-linear index

Concurrency & data structures:
- Replace std collections with papaya for thread-safe access.
  Update benchmarks to follow suite.
- Use SmallVec in LayerIndex to reduce heap allocations
- Introduce a lightweight fast hasher for HNSW internals

Performance improvements:
- Remove redundant magnitude setup in cosine SIMD calculations

CI & workflow:
- Run Rust tests only for changed crates in GitHub Actions
- Split full workspace tests vs non-AI tests to reduce CI time
@Iamdavidonuh Iamdavidonuh merged commit 6294432 into main Feb 26, 2026
11 of 12 checks passed
@Iamdavidonuh Iamdavidonuh deleted the david/impl-hnsw branch February 26, 2026 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants