MiniHyperVec

MiniHyperVec is the open-source proof-of-concept version of Helmsman (The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman, OSDI 2026).

Prerequisites

Prepare a dedicated workspace under /mnt/service for project data, indexes, and third-party dependencies.

Recommended directory layout:

/mnt/service/
├── 3rd_party/
├── minihyper-vec/
└── ...

Build Third-Party Dependencies

1. Build SPDK

cd /mnt/service/3rd_party
git clone --recursive https://github.com/spdk/spdk.git
cd spdk
git checkout v22.09
git submodule update --init --recursive

./scripts/pkgdep.sh
./configure --prefix=/mnt/service/3rd_party/SPDK_Path --with-shared
make -j"$(nproc)"
make install

This installs SPDK to:

/mnt/service/3rd_party/SPDK_Path

2. Build oneTBB

cd /mnt/service/3rd_party
git clone https://github.com/oneapi-src/oneTBB.git
cd oneTBB
git checkout v2021.12.0

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/mnt/service/3rd_party/TBB_Path
make -j"$(nproc)"
make install

This installs oneTBB to:

/mnt/service/3rd_party/TBB_Path

Project Setup

1. `setup/.envrc`

This file initializes the environment variables required by MiniHyperVec.

Example:

# add SPDK
export C_INCLUDE_PATH=/mnt/service/3rd_party/SPDK_Path/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=/mnt/service/3rd_party/SPDK_Path/include:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt/service/3rd_party/SPDK_Path/lib
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/mnt/service/3rd_party/SPDK_Path/lib/pkgconfig

# add DPDK
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt/service/3rd_party/spdk/dpdk/build/lib

Before building or running the project, load the environment:

source ./setup/.envrc

2. `setup/path_config.json`

This file stores the paths for NVMe metadata and released indexes.

Example:

{
    "nvme_meta_path": "/mnt/service/minihyper-vec/tmp_file/nvme_meta/",
    "release_index_path": "/mnt/service/minihyper-vec/tmp_file/release_index/"
}

Field description:

nvme_meta_path: directory used to store generated NVMe metadata
release_index_path: directory used to store released indexes

Build MiniHyperVec

Configure the project:

cmake .. \
  -DCMAKE_CXX_STANDARD=20 \
  -DPRUNE_ON=OFF

Build:

make -j

Notes:

CMAKE_CXX_STANDARD=20 is required.
PRUNE_ON=OFF disables optional pruning-related dependencies such as LightGBM and ONNX Runtime.

NVMe Initialization

Before using SPDK-managed NVMe devices, bind the target NVMe SSD to the proper userspace driver.

1. Check the NVMe device path

readlink -f /sys/class/nvme/nvme0

2. Load VFIO modules

sudo modprobe vfio
sudo modprobe vfio_pci

3. Check the current driver binding

lspci -k -s 0000:43:00.0

4. Run SPDK setup

cd /mnt/service/3rd_party/spdk
sudo HUGEMEM=16384 ./scripts/setup.sh

5. Verify the driver binding again

lspci -k -s 0000:43:00.0

Generate NVMe Metadata

Whenever a new NVMe device is bound to SPDK, run config_nvme_meta again so that the project can rescan the device and regenerate the NVMe metadata files.

sudo ./app/config_nvme_meta

This tool probes the available NVMe devices and generates the metadata used by the project.

The output metadata files are stored under the directory specified by nvme_meta_path in setup/path_config.json.

Deployment

Use minihypervec_deploy to deploy a target collection.

Usage

./build/test/minihypervec_deploy <collection_name>

Parameters

<collection_name>: collection name to deploy

Example:

./build/test/minihypervec_deploy sift10m_int8_spann

Search

Use multi_thread_search to run multi-threaded search and evaluate retrieval quality against ground truth.

Usage

./build/test/multi_thread_search \
  --collection_name <collection_name> \
  --query_path <query_path> \
  --groundtruth_path <groundtruth_path> \
  --index_type <index_type> \
  --vec_type <vec_type> \
  --nprobe <nprobe> \
  --topk <topk> \
  --T <num_threads> \
  --memory_index_type <memory_index_type> \
  --memory_search_max_visits <max_visits>

Parameters

--collection_name Collection name.
--query_path Path to the query file.
--groundtruth_path Path to the ground-truth file used for evaluation.
--index_type Index type.
--vec_type Vector data type.
--nprobe Number of probed candidates/partitions during search.
--topk Number of nearest neighbors returned for each query.
--T Number of threads used during search.
--memory_index_type In-memory index type. Currently, only HNSW is supported.
--memory_search_max_visits Maximum number of visited nodes during the in-memory index search stage.

Example

./build/test/multi_thread_search \
  --collection_name <collection_name> \
  --query_path <query_path> \
  --groundtruth_path <groundtruth_path> \
  --index_type HV_CONST \
  --vec_type INT8 \
  --nprobe 36 \
  --topk 10 \
  --T 10 \
  --memory_index_type HNSW \
  --memory_search_max_visits 1800

Notes

Always load the project environment before building or running:

source ./setup/.envrc

If you add or rebind NVMe devices to SPDK, rerun:

sudo ./app/config_nvme_meta

Make sure path_config.json points to valid writable directories.
Ensure that the collection name, query file, and ground-truth file are consistent with the deployed dataset and index configuration.

Citation

If you use Helmsman in your research, please cite our papers:

@inproceedings{Osdi2026Helmsman,
	author = {Huang, Yuchen and Ma, Baiteng and Sun, Yiping and Shi, Yang and Chen, Xiao and Zhong, Xiaocheng and Wang, Zhiyong and Hu, Yao and Xu, Erci and Weng, Chuliang},
	title = {{The Clustering Strikes Back: Building Cost-Effective and High-Performance ANNS at Scale with Helmsman}},
	year = {2026},
    booktitle = {Proceedings of the 20th USENIX Symposium on Operating Systems Design and Implementation},
    series = {OSDI '26},
    publisher = {USENIX Association},
    address = {USA},
    url = {https://www.usenix.org/conference/osdi26/presentation/huang-yuchen},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
include		include
licenses		licenses
setup		setup
source		source
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniHyperVec

Prerequisites

Build Third-Party Dependencies

1. Build SPDK

2. Build oneTBB

Project Setup

1. `setup/.envrc`

2. `setup/path_config.json`

Build MiniHyperVec

NVMe Initialization

1. Check the NVMe device path

2. Load VFIO modules

3. Check the current driver binding

4. Run SPDK setup

5. Verify the driver binding again

Generate NVMe Metadata

Deployment

Usage

Parameters

Search

Usage

Parameters

Example

Notes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MiniHyperVec

Prerequisites

Build Third-Party Dependencies

1. Build SPDK

2. Build oneTBB

Project Setup

1. setup/.envrc

2. setup/path_config.json

Build MiniHyperVec

NVMe Initialization

1. Check the NVMe device path

2. Load VFIO modules

3. Check the current driver binding

4. Run SPDK setup

5. Verify the driver binding again

Generate NVMe Metadata

Deployment

Usage

Parameters

Search

Usage

Parameters

Example

Notes

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `setup/.envrc`

2. `setup/path_config.json`

Packages