RetriEval is a bencmarking suite designed for Billion-scale Vector Search workloads. It's primarily used to benchmark in-process Search Engines on CPUs and GPUs, like USearch, FAISS, and cuVS, but it also reuses similar profiling logic for standalone databases like Qdrant, Weaviate, and Redis. It works with the same plain input format standardized by the BigANN benchmark, aiming for reproducible measurements – with shuffled parallel construction, incremental recall curves, normalized metrics, & machine-readable reports, capturing everything from machine topology to indexing hyper-parameters.
cargo install --path .This installs all backend binaries to ~/.cargo/bin/.
Run USearch against a dataset:
retri-eval-usearch \
--vectors datasets/wiki_1M/base.1M.fbin \
--queries datasets/wiki_1M/query.public.100K.fbin \
--neighbors datasets/wiki_1M/groundtruth.public.100K.ibin \
--dtype f32,f16,i8 \
--metric ip \
--output results/Generate plots from the results:
uv run scripts/plot.py results/ --output-dir plots/| Backend | Parallelism | Quantization | Metrics |
|---|---|---|---|
| USearch | ForkUnion | f64, f32, bf16, f16, e5m2, e4m3, e3m2, e2m3, i8, u8, b1 | ip, l2, cos, hamming, ... |
| FAISS | OpenMP | f32, f16, bf16, u8, i8, b1 | ip, l2 |
| cuVS | CUDA | f32, f16, i8, u8 | l2, ip, cos |
- USearch: Input is passed directly in the specified type.
--dtypeselects both the input interpretation and the internal quantization. - FAISS: Input is always f32.
--dtypeselects the internal scalar quantizer (SQfp16, SQbf16, SQ8bit_direct, etc.). - cuVS: Currently benchmarks with f32. CAGRA natively supports f32, f16, i8, u8 for build.
retri-eval-usearch --dtype bf16 --metric l2 ...
retri-eval-faiss --dtype f16 --metric l2 ...
retri-eval-cuvs --metric l2 ...Input vectors are converted to f32 before sending to the database.
Server-side quantization is managed by the database engine, not the benchmark.
| Backend | Client | Docker Image | Metrics |
|---|---|---|---|
| Qdrant | qdrant-client, gRPC |
qdrant/qdrant |
ip, l2, cos, manhattan |
| Redis | redis, RESP |
redis/redis-stack |
ip, l2, cos |
| Weaviate | weaviate-community, REST |
semitechnologies/weaviate |
ip, l2, cos, hamming, manhattan |
| LanceDB | lancedb, in-process, Arrow |
— | ip, l2, cos, hamming |
Each backend is behind its own feature flag. Build only what you need:
cargo build --release --features usearch-backend # USearch
cargo build --release --features faiss-backend # FAISS
cargo build --release --features qdrant-backend # Qdrant
cargo build --release --features redis-backend # Redis
cargo build --release --features lancedb-backend # LanceDB
cargo build --release --features weaviate-backend # Weaviate
cargo build --release --features cuvs-backend # cuVSOr combine multiple:
cargo build --release --features usearch-backend,faiss-backend,qdrant-backendEach backend is a separate binary. Common flags shared by all:
--vectors <PATH> # Base vectors (.fbin, .u8bin, .i8bin)
--queries <PATH> # Query vectors
--neighbors <PATH> # Ground-truth neighbors (.ibin)
--keys <PATH> # Optional keys file (.i32bin)
--epochs <N> # Measurement steps (dataset split into N parts, default: 10)
--no-shuffle # Disable random insertion order (shuffle is on by default)
--output <DIR> # Output directory for JSON result files (omit for progress-only)
retri-eval-usearch additionally supports comma-separated sweeps:
--dtype <LIST> # f32,f16,bf16,e5m2,e4m3,e3m2,e2m3,i8,u8,b1
--metric <LIST> # ip, l2, cos, hamming, jaccard, sorensen, pearson, haversine, divergence
--connectivity <LIST> # HNSW M parameter (default: 0 = auto)
--expansion-add <LIST> # expansion factor during indexing (default: 0 = auto)
--expansion-search <LIST> # expansion factor during search (default: 0 = auto)
--shards <LIST> # Index shards (default: 2)
--threads <LIST> # Thread count (default: available cores)
retri-eval-cuvs (requires --features cuvs-backend and an NVIDIA GPU):
--metric <LIST> # l2, ip, cos (default: l2)
--graph-degree <LIST> # CAGRA output graph degree (default: 32)
--intermediate-graph-degree <LIST> # CAGRA intermediate graph degree (default: 64)
--itopk-size <LIST> # Search-time intermediate results (default: 64)
One JSON file per backend configuration, written to --output <dir>.
Files are auto-named <backend>-<hash>.json.
{
"machine": { "cpu_model": "Intel Xeon 6776P", "physical_cores": 96, ... },
"dataset": { "vectors_path": "...", "vectors_count": 10000000, "dimensions": 100, ... },
"config": { "backend": "usearch", "dtype": "f32", "metric": "l2", "connectivity": 16, ... },
"steps": [
{
"vectors_indexed": 1000000,
"add_elapsed": 12.3,
"add_throughput": 81300,
"memory_bytes": 412000000,
"search_elapsed": 0.45,
"search_throughput": 222000,
"recall_at_1": 0.0942,
"recall_at_10": 0.2815,
"ndcg_at_10": 0.1847,
"recall_at_1_normalized": 0.9420,
"recall_at_10_normalized": 0.9512,
"ndcg_at_10_normalized": 0.8470
}
]
}Cargo.toml
src/
bench.rs # Library root: Backend trait, types, BenchState, benchmark loop
dataset.rs # Memory-mapped .fbin/.ibin loading (zero-copy)
eval.rs # Recall@K, NDCG@K
output.rs # Report types, JSON writer, machine info
docker.rs # Docker container lifecycle (Tier 2 backends)
usearch.rs # retri-eval-usearch binary
faiss.rs # retri-eval-faiss binary
cuvs.rs # retri-eval-cuvs binary
qdrant.rs # retri-eval-qdrant binary
redis.rs # retri-eval-redis binary
lancedb.rs # retri-eval-lancedb binary
weaviate.rs # retri-eval-weaviate binary
docker/
qdrant.yml # Docker compose for Qdrant
redis.yml # Docker compose for Redis Stack
weaviate.yml # Docker compose for Weaviate
scripts/
plot.py # JSON results → PNG plots (Plotly, runnable via uv)
BigANN benchmark is a good starting point, if you are searching for large collections of high-dimensional vectors. Those often come with precomputed ground-truth neighbors, which is handy for recall evaluation. Datasets below are grouped by scale; only configurations with matching ground truth support recall evaluation.
| Dataset | Scalar Type | Dimensions | Metric | Base Size | Ground Truth |
|---|---|---|---|---|---|
| Unum UForm Wiki | f32 |
256 | IP | 1 GB | 100K queries, yes |
| Unum UForm Creative Captions | f32 |
256 | IP | 3 GB | cross-modal pairing |
| Arxiv with E5 | f32 |
768 | IP | 6 GB | cross-modal pairing |
| Dataset | Scalar Type | Dimensions | Metric | Base Size | Ground Truth |
|---|---|---|---|---|---|
| Meta BIGANN (SIFT) | u8 |
128 | L2 | 1.2 GB | 10K queries, yes |
| Microsoft Turing-ANNS | f32 |
100 | L2 | 3.7 GB | 100K queries, yes |
| Yandex Deep | f32 |
96 | L2 | 3.6 GB | ¹ no subset GT |
¹ Yandex only publishes ground truth computed against the full 1B dataset. A
base.10M.fbinexists for download but using 1B ground truth with a subset will produce misleadingly low recall. Use it only for throughput/latency testing, not recall evaluation.
| Dataset | Scalar Type | Dimensions | Metric | Base Size | Ground Truth |
|---|---|---|---|---|---|
| Meta BIGANN (SIFT) | u8 |
128 | L2 | 12 GB | 10K queries, yes |
| Microsoft Turing-ANNS | f32 |
100 | L2 | 37 GB | 100K queries, yes |
| Microsoft SpaceV | i8 |
100 | L2 | 9.3 GB | 30K queries, yes |
| Dataset | Scalar Type | Dimensions | Metric | Base Size | Ground Truth |
|---|---|---|---|---|---|
| Meta BIGANN (SIFT) | u8 |
128 | L2 | 119 GB | 10K queries, yes |
| Microsoft Turing-ANNS | f32 |
100 | L2 | 373 GB | 100K queries, yes |
| Microsoft SpaceV | i8 |
100 | L2 | 93 GB | 30K queries, yes |
| Yandex Text-to-Image | f32 |
200 | Cos | 750 GB | 100K queries, yes |
| Yandex Deep | f32 |
96 | L2 | 358 GB | 10K queries, yes |
mkdir -p datasets/wiki_1M/ && \
wget -nc https://huggingface.co/datasets/unum-cloud/ann-wiki-1m/resolve/main/base.1M.fbin -P datasets/wiki_1M/ && \
wget -nc https://huggingface.co/datasets/unum-cloud/ann-wiki-1m/resolve/main/query.public.100K.fbin -P datasets/wiki_1M/ && \
wget -nc https://huggingface.co/datasets/unum-cloud/ann-wiki-1m/resolve/main/groundtruth.public.100K.ibin -P datasets/wiki_1M/retri-eval-usearch \
--vectors datasets/wiki_1M/base.1M.fbin \
--queries datasets/wiki_1M/query.public.100K.fbin \
--neighbors datasets/wiki_1M/groundtruth.public.100K.ibin \
--dtype f32,f16,i8 --metric ip --threads 16 \
--output results/wiki_1MThe full 1B dataset is available from Meta. No pre-sliced subset base files exist, so range requests are used to download only the first N vectors, followed by a header patch to update the vector count. Pre-computed ground truth is available for 10M and 100M subsets.
mkdir -p datasets/sift_10M/ && \
wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/query.public.10K.u8bin -P datasets/sift_10M/ && \
wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/GT_10M/bigann-10M -O datasets/sift_10M/groundtruth.public.10K.ibin && \
wget --header="Range: bytes=0-1280000007" \
https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/base.1B.u8bin \
-O datasets/sift_10M/base.10M.u8bin && \
python3 -c "
import struct
with open('datasets/sift_10M/base.10M.u8bin', 'r+b') as f:
f.write(struct.pack('I', 10_000_000))
"retri-eval-usearch \
--vectors datasets/sift_10M/base.10M.u8bin \
--queries datasets/sift_10M/query.public.10K.u8bin \
--neighbors datasets/sift_10M/groundtruth.public.10K.ibin \
--dtype f32,f16,i8 --metric l2 --threads 16 \
--output results/sift_10Mmkdir -p datasets/sift_100M/ && \
wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/query.public.10K.u8bin -P datasets/sift_100M/ && \
wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/GT_100M/bigann-100M -O datasets/sift_100M/groundtruth.public.10K.ibin && \
wget --header="Range: bytes=0-12800000007" \
https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/base.1B.u8bin \
-O datasets/sift_100M/base.100M.u8bin && \
python3 -c "
import struct
with open('datasets/sift_100M/base.100M.u8bin', 'r+b') as f:
f.write(struct.pack('I', 100_000_000))
"retri-eval-usearch \
--vectors datasets/sift_100M/base.100M.u8bin \
--queries datasets/sift_100M/query.public.10K.u8bin \
--neighbors datasets/sift_100M/groundtruth.public.10K.ibin \
--dtype f32,f16,i8 --metric l2 --threads 96 \
--epochs 20 --output results/sift_100MThe full 1B dataset is ~373 GB of f32 vectors with 100 dimensions. Subsets can be obtained via range requests, followed by a header patch to update the vector count. Pre-computed ground truth is available for 1M, 10M, and 100M subsets.
mkdir -p datasets/turing_1M/ && \
wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/query100K.fbin \
-O datasets/turing_1M/query.public.100K.fbin && \
wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/msturing-gt-1M \
-O datasets/turing_1M/groundtruth.public.100K.ibin && \
wget --header="Range: bytes=0-400000007" \
https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/base1b.fbin \
-O datasets/turing_1M/base.1M.fbin && \
python3 -c "
import struct
with open('datasets/turing_1M/base.1M.fbin', 'r+b') as f:
f.write(struct.pack('I', 1_000_000))
"retri-eval-usearch \
--vectors datasets/turing_1M/base.1M.fbin \
--queries datasets/turing_1M/query.public.100K.fbin \
--neighbors datasets/turing_1M/groundtruth.public.100K.ibin \
--dtype f32,bf16,f16,i8 --metric l2 --threads 16 \
--output results/turing_1Mmkdir -p datasets/turing_10M/ && \
wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/query100K.fbin \
-O datasets/turing_10M/query.public.100K.fbin && \
wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/msturing-gt-10M \
-O datasets/turing_10M/groundtruth.public.100K.ibin && \
wget --header="Range: bytes=0-4000000007" \
https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/base1b.fbin \
-O datasets/turing_10M/base.10M.fbin && \
python3 -c "
import struct
with open('datasets/turing_10M/base.10M.fbin', 'r+b') as f:
f.write(struct.pack('I', 10_000_000))
"retri-eval-usearch \
--vectors datasets/turing_10M/base.10M.fbin \
--queries datasets/turing_10M/query.public.100K.fbin \
--neighbors datasets/turing_10M/groundtruth.public.100K.ibin \
--dtype f32,bf16,f16,i8 --metric l2 --threads 16 \
--output results/turing_10Mmkdir -p datasets/turing_100M/ && \
wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/query100K.fbin \
-O datasets/turing_100M/query.public.100K.fbin && \
wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/msturing-gt-100M \
-O datasets/turing_100M/groundtruth.public.100K.ibin && \
wget --header="Range: bytes=0-40000000007" \
https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/base1b.fbin \
-O datasets/turing_100M/base.100M.fbin && \
python3 -c "
import struct
with open('datasets/turing_100M/base.100M.fbin', 'r+b') as f:
f.write(struct.pack('I', 100_000_000))
"retri-eval-usearch \
--vectors datasets/turing_100M/base.100M.fbin \
--queries datasets/turing_100M/query.public.100K.fbin \
--neighbors datasets/turing_100M/groundtruth.public.100K.ibin \
--dtype f32,bf16,f16,i8 --metric l2 --threads 96 \
--epochs 20 --output results/turing_100MA 100M subset is available from Hugging Face. The original 1B dataset can be pulled from AWS S3.
mkdir -p datasets/spacev_100M/ && \
wget -nc https://huggingface.co/datasets/unum-cloud/ann-spacev-100m/resolve/main/base.100M.i8bin -P datasets/spacev_100M/ && \
wget -nc https://huggingface.co/datasets/unum-cloud/ann-spacev-100m/resolve/main/query.30K.i8bin -P datasets/spacev_100M/ && \
wget -nc https://huggingface.co/datasets/unum-cloud/ann-spacev-100m/resolve/main/groundtruth.30K.i32bin -P datasets/spacev_100M/retri-eval-usearch \
--vectors datasets/spacev_100M/base.100M.i8bin \
--queries datasets/spacev_100M/query.30K.i8bin \
--neighbors datasets/spacev_100M/groundtruth.30K.i32bin \
--dtype f32,f16,i8 --metric l2 --threads 96 \
--epochs 20 --output results/spacev_100MPre-built 10M subset and full 1B available from Yandex.
mkdir -p datasets/deep_10M/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P datasets/deep_10M/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/query.public.10K.fbin -P datasets/deep_10M/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/groundtruth.public.10K.ibin -P datasets/deep_10M/retri-eval-usearch \
--vectors datasets/deep_10M/base.10M.fbin \
--queries datasets/deep_10M/query.public.10K.fbin \
--neighbors datasets/deep_10M/groundtruth.public.10K.ibin \
--dtype f32,bf16,f16,i8 --metric l2 --threads 16 \
--output results/deep_10Mmkdir -p datasets/deep_1B/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P datasets/deep_1B/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/query.public.10K.fbin -P datasets/deep_1B/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/groundtruth.public.10K.ibin -P datasets/deep_1B/1M subset and full 1B available from Yandex.
mkdir -p datasets/t2i/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/base.1M.fbin -P datasets/t2i/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/query.public.100K.fbin -P datasets/t2i/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/groundtruth.public.100K.ibin -P datasets/t2i/retri-eval-usearch \
--vectors datasets/t2i/base.1M.fbin \
--queries datasets/t2i/query.public.100K.fbin \
--neighbors datasets/t2i/groundtruth.public.100K.ibin \
--dtype f32,bf16,f16,i8 --metric cos --threads 16 \
--output results/t2i_1Mmkdir -p datasets/t2i_1B/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/base.1B.fbin -P datasets/t2i_1B/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/query.public.100K.fbin -P datasets/t2i_1B/ && \
wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/groundtruth.public.100K.ibin -P datasets/t2i_1B/