What's Changed
🚨 Breaking Changes
- Default to static linking of libcudart by @bdice in #1627
- Remove JIT+LTO fragment database by @KyleFromNVIDIA in #1927
- Use static cudart by @KyleFromNVIDIA in #1931
- Always build with JIT+LTO by @KyleFromNVIDIA in #1923
- Migrate RMM usage to CCCL MR design by @bdice in #1990
- Exposition of KMeans param object for PQ in C++ by @lowener in #2005
- [Cleanup] Combine Batched and Regular KMeans Impl by @tarang-jain in #2015
- Preserve input memory location for NN Descent by @jinsolp in #1928
🐛 Bug Fixes
- Fix CCCL compilation error by @viclafargue in #1963
- Forward-merge release/26.04 into main by @KyleFromNVIDIA in #1971
- Forward-merge release/26.04 into main by @KyleFromNVIDIA in #1980
- Remove dangling pointers in JIT Fragments by @divyegala in #1988
- Add
head_revto cuvs recipe by @KyleFromNVIDIA in #1993 - Fix potential OOB access in CAGRA search when graph size < dataset size by @irina-resh-nvda in #1780
- Fix MG kmeans intertia_check n_iters by @aamijar in #2020
- Fix cuvs_bench pytest pareto assert by @aamijar in #2027
- Fix nightly build matrix by @KyleFromNVIDIA in #2054
- Fix vulnerable index deserialization by @lowener in #2068
- Fix symbol export kmeans by @aamijar in #2070
- Fix argmin/argmax based on the distance type by @achirkin in #2016
- Remove unneeded request for CUDA device link phase by @robertmaynard in #2077
- Update Faiss and DiskANN Patch to Use C++20 by @tarang-jain in #1796
- Fix brute force Rust index dataset lifetime by @yan-zaretskiy in #2083
- Fix segfault cuvs bench by @aamijar in #2088
- Fix cagra::optimize modifying the state of raft::resources by @achirkin in #2103
- Add direct target dependency when embedding fatbins by @KyleFromNVIDIA in #2106
- Fix check for PQ vectorized load by @lowener in #2107
- Fix workspace usage by @mfoerste4 in #2135
- Add missing visibility controls in IVF SQ by @divyegala in #2141
📖 Documentation
- Elaborate on fragment architecture in JIT+LTO documentation by @KyleFromNVIDIA in #1991
- Add Cluster and Distance sections to C documentation by @lowener in #1955
- Adding CAGRA merge to the documentation by @viclafargue in #1942
- [Doc Update] CAGRA Memory Footprint by @singhmanas1 in #1300
- Align docs with pluggable benchmark API by @jnke2016 in #1891
- Add docs for cagra mem usage with NN Descent build algo by @jinsolp in #2000
- Fix minor typos in
cuvs-benchsource build docs by @jrbourbeau in #2006 - Fix
cuvs-benchdocker images in docs by @jrbourbeau in #2003 - Update JIT+LTO guide to reflect new automatic embedding system by @KyleFromNVIDIA in #2045
- [Docs] Convert Sphinx docs to Fern by @cjnolet in #2067
- Add UDF Usage and Developer docs by @divyegala in #2030
- [DOC] Adding API guides for core cuVS types by @cjnolet in #2117
🚀 New Features
- [REVIEW] Add L1 support to NN-Descent by @yan-zaretskiy in #1898
- PCA C and Python API by @aamijar in #1987
- Introduce UDF Architecture by @divyegala in #1804
- JIT LTO Cagra Search by @divyegala in #1807
- Expose supported brute force metrics in
all_neighborsby @jinsolp in #1827 - [REVIEW] Generalize and improve cagra::optimize by @mfoerste4 in #1830
- IVF-SQ C++ API by @viclafargue in #1865
🛠️ Improvements
- Use PQ API in CAGRA-Q + SCANN by @lowener in #1746
- Speed up recall calculation in cuVS Bench for large top-K by @jamxia155 in #1816
- Update codespell Version in pre-commit-config by @tarang-jain in #1920
- Forward-merge release/26.04 into main by @gforsyth in #1936
- Refactor
StaticFatbinFragmentEntryto use tags by @KyleFromNVIDIA in #1970 - Replace cudaMemcpy2DAsync Calls with raft::copy_matrix by @tarang-jain in #1976
- update pip devcontainers' base image tags by @trxcllnt in #1985
- Refactor instantiation matrices to generate at build time by @KyleFromNVIDIA in #1984
- Add option to enable "sve" optimization level on armv9 by @LizYou in #1121
- Improve cuvs-bench doc and add executable dir option by @tfeher in #681
- Enforce type safety in JIT+LTO launcher by @KyleFromNVIDIA in #1997
- Add KDE kernel by @Intron7 in #1915
- Coderabbit integration by @benfred in #1908
- Refactor fatbin registration to use common input file by @KyleFromNVIDIA in #2008
- Update to clang 20.1.8 by @bdice in #2009
- JIT+LTO IVF-PQ compute similarity by @KyleFromNVIDIA in #1957
- Refactor JIT+LTO kernels by @KyleFromNVIDIA in #2021
- feat(rust): add serialize/deserialize support for CAGRA index by @zbennett10 in #1840
- Use new compute-matrix workflow by @KyleFromNVIDIA in #2034
- Reuse minClusterAndDistance Helper for Balanced KMeans by @tarang-jain in #2001
- feat(rust): add search_with_filter to CAGRA Index by @jamie8johnson in #2019
- [REVIEW] Drop extra copy in
get_last_error_textby @jakirkham in #2044 - FIX: disable warpspeed scan by @mfoerste4 in #2062
- Use
token.rapids.nvidia.comwhen issuing S3 bucket creds in devcontainers by @trxcllnt in #2047 - Remove
NO_CUDART_DEPproperty by @KyleFromNVIDIA in #2065 - Switch the remaining C++17 components to C++20 by @achirkin in #2063
- fix(ci): resolve all zizmor findings and add zizmor pre-commit checks by @gforsyth in #2053
- fix(ci): declare explicit secrets in
publish-rust.yamlby @gforsyth in #2069 - [REVIEW] Rewrite cuvs-sys build to discover pre-installed cuVS via cmake-package by @yan-zaretskiy in #2022
- Fix symbol export by @vyasr in #2052
- fix(ci): add explicit
actions: writepermission fortelemetry-summarize
by @gforsyth in #2075 - [REVIEW] Improve 1-NN performance with split GEMM/reduction kernels on Blackwell by @vinaydes in #1768
- Build and test with CUDA 13.2.0 by @bdice in #2072
- Centralize shared utilities across benchmark backends by @jnke2016 in #2040
- Persistent CAGRA: benchmark group and bad config warnings by @achirkin in #2091
- Multi-GPU Batched KMeans by @viclafargue in #2017
- IVF-SQ C API by @viclafargue in #1910
- skip CuPy 14.1.0 by @jameslamb in #2142
New Contributors
- @singhmanas1 made their first contribution in #1300
- @LizYou made their first contribution in #1121
- @jamie8johnson made their first contribution in #2019
Full Changelog: v26.06.00a...v26.06.00