HNSW Implementation by Iamdavidonuh · Pull Request #283 · deven96/ahnlich

Iamdavidonuh · 2025-12-19T12:31:53Z

Part of #184. Introduces the HNSW implementation with little to no improvements(Correctness over optimization)

github-actions · 2025-12-19T12:40:58Z

Test Results

233 tests 233 ✅ 9m 25s ⏱️
34 suites 0 💤
4 files 0 ❌

Results for commit 6336663.

♻️ This comment has been updated with latest results.

github-actions · 2025-12-19T12:56:16Z

Benchmark Results

group                                                        main                                   pr
-----                                                        ----                                   --
predicate_query_with_index/size_100                          1.10      3.4±0.00µs        ? ?/sec    1.00      3.1±0.00µs        ? ?/sec
predicate_query_with_index/size_1000                         1.06     34.0±0.02µs        ? ?/sec    1.00     32.0±0.01µs        ? ?/sec
predicate_query_with_index/size_10000                        1.00    383.7±0.16µs        ? ?/sec    1.03    396.7±0.24µs        ? ?/sec
predicate_query_with_index/size_100000                       1.10      6.3±0.24ms        ? ?/sec    1.00      5.7±0.44ms        ? ?/sec
predicate_query_without_index/size_100                       1.07      7.5±0.01µs        ? ?/sec    1.00      7.1±0.01µs        ? ?/sec
predicate_query_without_index/size_1000                      1.00     98.1±0.36µs        ? ?/sec    1.07    104.7±0.05µs        ? ?/sec
predicate_query_without_index/size_10000                     1.00    834.0±2.91µs        ? ?/sec    1.01    841.2±2.58µs        ? ?/sec
predicate_query_without_index/size_100000                    1.01     16.0±0.25ms        ? ?/sec    1.00     15.9±0.46ms        ? ?/sec
store_batch_insertion_without_predicates/size_100            1.00    199.6±2.23µs        ? ?/sec    1.01    201.4±1.74µs        ? ?/sec
store_batch_insertion_without_predicates/size_1000           1.09  1444.0±50.81µs        ? ?/sec    1.00  1325.1±35.61µs        ? ?/sec
store_batch_insertion_without_predicates/size_10000          1.00     14.0±0.11ms        ? ?/sec    1.01     14.2±0.11ms        ? ?/sec
store_batch_insertion_without_predicates/size_100000         1.00    137.6±0.75ms        ? ?/sec    1.00    137.8±0.70ms        ? ?/sec
store_retrieval_no_condition/size_100                        1.03     93.1±0.70µs        ? ?/sec    1.00     90.5±0.45µs        ? ?/sec
store_retrieval_no_condition/size_1000                       1.05   810.6±10.18µs        ? ?/sec    1.00   768.4±11.87µs        ? ?/sec
store_retrieval_no_condition/size_10000                      1.04      7.5±0.05ms        ? ?/sec    1.00      7.2±0.02ms        ? ?/sec
store_retrieval_no_condition/size_100000                     1.03     78.4±0.22ms        ? ?/sec    1.00     75.9±0.61ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100                   1.07    196.7±0.31µs        ? ?/sec    1.00    183.0±0.67µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_1000                  1.00   1159.1±2.35µs        ? ?/sec    1.00   1157.7±2.23µs        ? ?/sec
store_retrieval_non_linear_kdtree/size_10000                 1.00     12.3±0.07ms        ? ?/sec    1.01     12.5±0.12ms        ? ?/sec
store_retrieval_non_linear_kdtree/size_100000                1.00    139.8±0.38ms        ? ?/sec    1.06    147.6±0.64ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100       1.01    275.2±0.70µs        ? ?/sec    1.00    273.0±0.19µs        ? ?/sec
store_sequential_insertion_without_predicates/size_1000      1.02      2.7±0.00ms        ? ?/sec    1.00      2.7±0.00ms        ? ?/sec
store_sequential_insertion_without_predicates/size_10000     1.02     27.2±0.06ms        ? ?/sec    1.00     26.8±0.03ms        ? ?/sec
store_sequential_insertion_without_predicates/size_100000    1.01    271.7±1.13ms        ? ?/sec    1.00    268.6±0.45ms        ? ?/sec

ahnlich/similarity/src/hnsw.rs

Iamdavidonuh · 2026-02-21T23:20:41Z

We achieve very strong recall on the SIFT10k dataset across multiple configurations.
Recall varies depending on the chosen HNSW parameters (e.g., M, ef_construction, and ef_search), but the current implementation consistently reaches high recall values.

See the recall validation test here: (link to test case).

This confirms that the current graph construction and search logic are functioning correctly, and provides a solid baseline for future performance optimizations.

Implement a correct and deterministic HNSW index with hierarchical search, stable level assignment, and performance-oriented optimizations. Core implementation: - Implement insert, search_layer, knn-search, and delete - Implement neighbor selection heuristic with diversity filtering - Ensure proper backlink removal on delete - Handle empty-neighbour edge cases safely - Deterministic level assignment via NodeId hash - Add determinism and recall tests (100% recall on 1K dataset) Performance improvements: - Eliminate Node cloning in search (use references) - Introduce BoundedMinHeap in search_layer - Remove manual heap size checks - Move SIMD distance functions and bounded heaps to similarity crate - Introduce EmbeddingKey(Arc<Vec<f32>>) across the non-linear index pipeline

Add empirical validation of HNSW correctness using the SIFT dataset. Validation: - Add recall tests against SIFT ground-truth neighbors - Add helpers to load and parse the SIFT dataset - Add reusable HNSW initialization helper for testing - Remove unnecessary setup in SIFT tests Benchmarking: - Add simple HNSW benchmark - Move SIFT data and related utilities into the similarity crate

…rformance optimizations Generalize HNSW over a distance trait and apply final structural, concurrency, and performance improvements across the index and CI pipeline. Generics & API: - Make HNSW generic over any linear distance implementation - Align insert and delete with the NonLinearIndex trait - Simplify HNSW initialization - Improve error logging in the DB non-linear index Concurrency & data structures: - Replace std collections with papaya for thread-safe access. Update benchmarks to follow suite. - Use SmallVec in LayerIndex to reduce heap allocations - Introduce a lightweight fast hasher for HNSW internals Performance improvements: - Remove redundant magnitude setup in cosine SIMD calculations CI & workflow: - Run Rust tests only for changed crates in GitHub Actions - Split full workspace tests vs non-AI tests to reduce CI time

deven96 reviewed Dec 22, 2025

View reviewed changes

ahnlich/similarity/src/hnsw.rs Outdated Show resolved Hide resolved

deven96 reviewed Dec 22, 2025

View reviewed changes

ahnlich/similarity/src/hnsw.rs Outdated Show resolved Hide resolved

Iamdavidonuh force-pushed the david/impl-hnsw branch 3 times, most recently from 1ef1e7e to 9d4b5d0 Compare January 12, 2026 12:19

deven96 force-pushed the david/impl-hnsw branch 5 times, most recently from 4258979 to 34441dc Compare February 21, 2026 16:51

Iamdavidonuh marked this pull request as ready for review February 21, 2026 23:20

Iamdavidonuh requested a review from deven96 February 21, 2026 23:22

Iamdavidonuh changed the title ~~HNSW Impl~~ HNSW Implementation: Feb 22, 2026

Iamdavidonuh changed the title ~~HNSW Implementation:~~ HNSW Implementation Feb 22, 2026

deven96 force-pushed the david/impl-hnsw branch from 092a3c2 to 50b941d Compare February 26, 2026 12:56

deven96 approved these changes Feb 26, 2026

View reviewed changes

Iamdavidonuh added 3 commits February 26, 2026 22:55

Iamdavidonuh force-pushed the david/impl-hnsw branch from 50b941d to 6336663 Compare February 26, 2026 22:12

Iamdavidonuh merged commit 6294432 into main Feb 26, 2026
11 of 12 checks passed

Iamdavidonuh deleted the david/impl-hnsw branch February 26, 2026 23:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HNSW Implementation#283

HNSW Implementation#283
Iamdavidonuh merged 3 commits intomainfrom
david/impl-hnsw

Iamdavidonuh commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Iamdavidonuh commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Iamdavidonuh commented Dec 19, 2025

Uh oh!

github-actions bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

github-actions bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Uh oh!

Uh oh!

Uh oh!

Iamdavidonuh commented Feb 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 19, 2025 •

edited

Loading

github-actions bot commented Dec 19, 2025 •

edited

Loading