Skip to content

ashvardanian/RetriEval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RetriEval benchmarks thumbnail

RetriEval is a bencmarking suite designed for Billion-scale Vector Search workloads. It's primarily used to benchmark in-process Search Engines on CPUs and GPUs, like USearch, FAISS, and cuVS, but it also reuses similar profiling logic for standalone databases like Qdrant, Weaviate, and Redis. It works with the same plain input format standardized by the BigANN benchmark, aiming for reproducible measurements – with shuffled parallel construction, incremental recall curves, normalized metrics, & machine-readable reports, capturing everything from machine topology to indexing hyper-parameters.

Quick Start

cargo install --path .

This installs all backend binaries to ~/.cargo/bin/. Run USearch against a dataset:

retri-eval-usearch \
    --vectors datasets/wiki_1M/base.1M.fbin \
    --queries datasets/wiki_1M/query.public.100K.fbin \
    --neighbors datasets/wiki_1M/groundtruth.public.100K.ibin \
    --dtype f32,f16,i8 \
    --metric ip \
    --output results/

Generate plots from the results:

uv run scripts/plot.py results/ --output-dir plots/

Backends

Search Engines

Backend Parallelism Quantization Metrics
USearch ForkUnion f64, f32, bf16, f16, e5m2, e4m3, e3m2, e2m3, i8, u8, b1 ip, l2, cos, hamming, ...
FAISS OpenMP f32, f16, bf16, u8, i8, b1 ip, l2
cuVS CUDA f32, f16, i8, u8 l2, ip, cos
  • USearch: Input is passed directly in the specified type. --dtype selects both the input interpretation and the internal quantization.
  • FAISS: Input is always f32. --dtype selects the internal scalar quantizer (SQfp16, SQbf16, SQ8bit_direct, etc.).
  • cuVS: Currently benchmarks with f32. CAGRA natively supports f32, f16, i8, u8 for build.
retri-eval-usearch --dtype bf16 --metric l2 ...
retri-eval-faiss --dtype f16 --metric l2 ...
retri-eval-cuvs --metric l2 ...

Vector Databases

Input vectors are converted to f32 before sending to the database. Server-side quantization is managed by the database engine, not the benchmark.

Backend Client Docker Image Metrics
Qdrant qdrant-client, gRPC qdrant/qdrant ip, l2, cos, manhattan
Redis redis, RESP redis/redis-stack ip, l2, cos
Weaviate weaviate-community, REST semitechnologies/weaviate ip, l2, cos, hamming, manhattan
LanceDB lancedb, in-process, Arrow ip, l2, cos, hamming

Each backend is behind its own feature flag. Build only what you need:

cargo build --release --features usearch-backend    # USearch
cargo build --release --features faiss-backend      # FAISS
cargo build --release --features qdrant-backend     # Qdrant
cargo build --release --features redis-backend      # Redis
cargo build --release --features lancedb-backend    # LanceDB
cargo build --release --features weaviate-backend   # Weaviate
cargo build --release --features cuvs-backend       # cuVS

Or combine multiple:

cargo build --release --features usearch-backend,faiss-backend,qdrant-backend

CLI Reference

Each backend is a separate binary. Common flags shared by all:

--vectors <PATH>           # Base vectors (.fbin, .u8bin, .i8bin)
--queries <PATH>           # Query vectors
--neighbors <PATH>         # Ground-truth neighbors (.ibin)
--keys <PATH>              # Optional keys file (.i32bin)
--epochs <N>               # Measurement steps (dataset split into N parts, default: 10)
--no-shuffle               # Disable random insertion order (shuffle is on by default)
--output <DIR>             # Output directory for JSON result files (omit for progress-only)

retri-eval-usearch additionally supports comma-separated sweeps:

--dtype <LIST>             # f32,f16,bf16,e5m2,e4m3,e3m2,e2m3,i8,u8,b1
--metric <LIST>            # ip, l2, cos, hamming, jaccard, sorensen, pearson, haversine, divergence
--connectivity <LIST>      # HNSW M parameter (default: 0 = auto)
--expansion-add <LIST>     # expansion factor during indexing (default: 0 = auto)
--expansion-search <LIST>  # expansion factor during search (default: 0 = auto)
--shards <LIST>            # Index shards (default: 2)
--threads <LIST>           # Thread count (default: available cores)

retri-eval-cuvs (requires --features cuvs-backend and an NVIDIA GPU):

--metric <LIST>                    # l2, ip, cos (default: l2)
--graph-degree <LIST>              # CAGRA output graph degree (default: 32)
--intermediate-graph-degree <LIST> # CAGRA intermediate graph degree (default: 64)
--itopk-size <LIST>                # Search-time intermediate results (default: 64)

Output Format

One JSON file per backend configuration, written to --output <dir>. Files are auto-named <backend>-<hash>.json.

{
  "machine": { "cpu_model": "Intel Xeon 6776P", "physical_cores": 96, ... },
  "dataset": { "vectors_path": "...", "vectors_count": 10000000, "dimensions": 100, ... },
  "config": { "backend": "usearch", "dtype": "f32", "metric": "l2", "connectivity": 16, ... },
  "steps": [
    {
      "vectors_indexed": 1000000,
      "add_elapsed": 12.3,
      "add_throughput": 81300,
      "memory_bytes": 412000000,
      "search_elapsed": 0.45,
      "search_throughput": 222000,
      "recall_at_1": 0.0942,
      "recall_at_10": 0.2815,
      "ndcg_at_10": 0.1847,
      "recall_at_1_normalized": 0.9420,
      "recall_at_10_normalized": 0.9512,
      "ndcg_at_10_normalized": 0.8470
    }
  ]
}

Project Structure

Cargo.toml
src/
    bench.rs            # Library root: Backend trait, types, BenchState, benchmark loop
    dataset.rs          # Memory-mapped .fbin/.ibin loading (zero-copy)
    eval.rs             # Recall@K, NDCG@K
    output.rs           # Report types, JSON writer, machine info
    docker.rs           # Docker container lifecycle (Tier 2 backends)
    usearch.rs          # retri-eval-usearch binary
    faiss.rs            # retri-eval-faiss binary
    cuvs.rs             # retri-eval-cuvs binary
    qdrant.rs           # retri-eval-qdrant binary
    redis.rs            # retri-eval-redis binary
    lancedb.rs          # retri-eval-lancedb binary
    weaviate.rs         # retri-eval-weaviate binary
docker/
    qdrant.yml          # Docker compose for Qdrant
    redis.yml           # Docker compose for Redis Stack
    weaviate.yml        # Docker compose for Weaviate
scripts/
    plot.py             # JSON results → PNG plots (Plotly, runnable via uv)

Datasets

BigANN benchmark is a good starting point, if you are searching for large collections of high-dimensional vectors. Those often come with precomputed ground-truth neighbors, which is handy for recall evaluation. Datasets below are grouped by scale; only configurations with matching ground truth support recall evaluation.

~1M Scale — Development & Testing

Dataset Scalar Type Dimensions Metric Base Size Ground Truth
Unum UForm Wiki f32 256 IP 1 GB 100K queries, yes
Unum UForm Creative Captions f32 256 IP 3 GB cross-modal pairing
Arxiv with E5 f32 768 IP 6 GB cross-modal pairing

~10M Scale

Dataset Scalar Type Dimensions Metric Base Size Ground Truth
Meta BIGANN (SIFT) u8 128 L2 1.2 GB 10K queries, yes
Microsoft Turing-ANNS f32 100 L2 3.7 GB 100K queries, yes
Yandex Deep f32 96 L2 3.6 GB ¹ no subset GT

¹ Yandex only publishes ground truth computed against the full 1B dataset. A base.10M.fbin exists for download but using 1B ground truth with a subset will produce misleadingly low recall. Use it only for throughput/latency testing, not recall evaluation.

~100M Scale

Dataset Scalar Type Dimensions Metric Base Size Ground Truth
Meta BIGANN (SIFT) u8 128 L2 12 GB 10K queries, yes
Microsoft Turing-ANNS f32 100 L2 37 GB 100K queries, yes
Microsoft SpaceV i8 100 L2 9.3 GB 30K queries, yes

~1B Scale

Dataset Scalar Type Dimensions Metric Base Size Ground Truth
Meta BIGANN (SIFT) u8 128 L2 119 GB 10K queries, yes
Microsoft Turing-ANNS f32 100 L2 373 GB 100K queries, yes
Microsoft SpaceV i8 100 L2 93 GB 30K queries, yes
Yandex Text-to-Image f32 200 Cos 750 GB 100K queries, yes
Yandex Deep f32 96 L2 358 GB 10K queries, yes

Download Instructions

Unum UForm Wiki — 1M, f32, 256d, IP

mkdir -p datasets/wiki_1M/ && \
    wget -nc https://huggingface.co/datasets/unum-cloud/ann-wiki-1m/resolve/main/base.1M.fbin -P datasets/wiki_1M/ && \
    wget -nc https://huggingface.co/datasets/unum-cloud/ann-wiki-1m/resolve/main/query.public.100K.fbin -P datasets/wiki_1M/ && \
    wget -nc https://huggingface.co/datasets/unum-cloud/ann-wiki-1m/resolve/main/groundtruth.public.100K.ibin -P datasets/wiki_1M/
retri-eval-usearch \
    --vectors datasets/wiki_1M/base.1M.fbin \
    --queries datasets/wiki_1M/query.public.100K.fbin \
    --neighbors datasets/wiki_1M/groundtruth.public.100K.ibin \
    --dtype f32,f16,i8 --metric ip --threads 16 \
    --output results/wiki_1M

Meta BIGANN — SIFT

The full 1B dataset is available from Meta. No pre-sliced subset base files exist, so range requests are used to download only the first N vectors, followed by a header patch to update the vector count. Pre-computed ground truth is available for 10M and 100M subsets.

10M subset, u8, 128d, L2, ~1.2 GB

mkdir -p datasets/sift_10M/ && \
    wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/query.public.10K.u8bin -P datasets/sift_10M/ && \
    wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/GT_10M/bigann-10M -O datasets/sift_10M/groundtruth.public.10K.ibin && \
    wget --header="Range: bytes=0-1280000007" \
        https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/base.1B.u8bin \
        -O datasets/sift_10M/base.10M.u8bin && \
    python3 -c "
import struct
with open('datasets/sift_10M/base.10M.u8bin', 'r+b') as f:
    f.write(struct.pack('I', 10_000_000))
"
retri-eval-usearch \
    --vectors datasets/sift_10M/base.10M.u8bin \
    --queries datasets/sift_10M/query.public.10K.u8bin \
    --neighbors datasets/sift_10M/groundtruth.public.10K.ibin \
    --dtype f32,f16,i8 --metric l2 --threads 16 \
    --output results/sift_10M

100M subset, u8, 128d, L2, ~12 GB

mkdir -p datasets/sift_100M/ && \
    wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/query.public.10K.u8bin -P datasets/sift_100M/ && \
    wget -nc https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/GT_100M/bigann-100M -O datasets/sift_100M/groundtruth.public.10K.ibin && \
    wget --header="Range: bytes=0-12800000007" \
        https://dl.fbaipublicfiles.com/billion-scale-ann-benchmarks/bigann/base.1B.u8bin \
        -O datasets/sift_100M/base.100M.u8bin && \
    python3 -c "
import struct
with open('datasets/sift_100M/base.100M.u8bin', 'r+b') as f:
    f.write(struct.pack('I', 100_000_000))
"
retri-eval-usearch \
    --vectors datasets/sift_100M/base.100M.u8bin \
    --queries datasets/sift_100M/query.public.10K.u8bin \
    --neighbors datasets/sift_100M/groundtruth.public.10K.ibin \
    --dtype f32,f16,i8 --metric l2 --threads 96 \
    --epochs 20 --output results/sift_100M

Microsoft Turing-ANNS

The full 1B dataset is ~373 GB of f32 vectors with 100 dimensions. Subsets can be obtained via range requests, followed by a header patch to update the vector count. Pre-computed ground truth is available for 1M, 10M, and 100M subsets.

1M subset, f32, 100d, L2, ~400 MB

mkdir -p datasets/turing_1M/ && \
    wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/query100K.fbin \
        -O datasets/turing_1M/query.public.100K.fbin && \
    wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/msturing-gt-1M \
        -O datasets/turing_1M/groundtruth.public.100K.ibin && \
    wget --header="Range: bytes=0-400000007" \
        https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/base1b.fbin \
        -O datasets/turing_1M/base.1M.fbin && \
    python3 -c "
import struct
with open('datasets/turing_1M/base.1M.fbin', 'r+b') as f:
    f.write(struct.pack('I', 1_000_000))
"
retri-eval-usearch \
    --vectors datasets/turing_1M/base.1M.fbin \
    --queries datasets/turing_1M/query.public.100K.fbin \
    --neighbors datasets/turing_1M/groundtruth.public.100K.ibin \
    --dtype f32,bf16,f16,i8 --metric l2 --threads 16 \
    --output results/turing_1M

10M subset, f32, 100d, L2, ~3.7 GB

mkdir -p datasets/turing_10M/ && \
    wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/query100K.fbin \
        -O datasets/turing_10M/query.public.100K.fbin && \
    wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/msturing-gt-10M \
        -O datasets/turing_10M/groundtruth.public.100K.ibin && \
    wget --header="Range: bytes=0-4000000007" \
        https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/base1b.fbin \
        -O datasets/turing_10M/base.10M.fbin && \
    python3 -c "
import struct
with open('datasets/turing_10M/base.10M.fbin', 'r+b') as f:
    f.write(struct.pack('I', 10_000_000))
"
retri-eval-usearch \
    --vectors datasets/turing_10M/base.10M.fbin \
    --queries datasets/turing_10M/query.public.100K.fbin \
    --neighbors datasets/turing_10M/groundtruth.public.100K.ibin \
    --dtype f32,bf16,f16,i8 --metric l2 --threads 16 \
    --output results/turing_10M

100M subset, f32, 100d, L2, ~37 GB

mkdir -p datasets/turing_100M/ && \
    wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/query100K.fbin \
        -O datasets/turing_100M/query.public.100K.fbin && \
    wget -nc https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/msturing-gt-100M \
        -O datasets/turing_100M/groundtruth.public.100K.ibin && \
    wget --header="Range: bytes=0-40000000007" \
        https://comp21storage.z5.web.core.windows.net/comp21/MSFT-TURING-ANNS/base1b.fbin \
        -O datasets/turing_100M/base.100M.fbin && \
    python3 -c "
import struct
with open('datasets/turing_100M/base.100M.fbin', 'r+b') as f:
    f.write(struct.pack('I', 100_000_000))
"
retri-eval-usearch \
    --vectors datasets/turing_100M/base.100M.fbin \
    --queries datasets/turing_100M/query.public.100K.fbin \
    --neighbors datasets/turing_100M/groundtruth.public.100K.ibin \
    --dtype f32,bf16,f16,i8 --metric l2 --threads 96 \
    --epochs 20 --output results/turing_100M

Microsoft SpaceV

A 100M subset is available from Hugging Face. The original 1B dataset can be pulled from AWS S3.

100M subset, i8, 100d, L2, ~9.3 GB

mkdir -p datasets/spacev_100M/ && \
    wget -nc https://huggingface.co/datasets/unum-cloud/ann-spacev-100m/resolve/main/base.100M.i8bin -P datasets/spacev_100M/ && \
    wget -nc https://huggingface.co/datasets/unum-cloud/ann-spacev-100m/resolve/main/query.30K.i8bin -P datasets/spacev_100M/ && \
    wget -nc https://huggingface.co/datasets/unum-cloud/ann-spacev-100m/resolve/main/groundtruth.30K.i32bin -P datasets/spacev_100M/
retri-eval-usearch \
    --vectors datasets/spacev_100M/base.100M.i8bin \
    --queries datasets/spacev_100M/query.30K.i8bin \
    --neighbors datasets/spacev_100M/groundtruth.30K.i32bin \
    --dtype f32,f16,i8 --metric l2 --threads 96 \
    --epochs 20 --output results/spacev_100M

Yandex Deep

Pre-built 10M subset and full 1B available from Yandex.

10M subset, f32, 96d, L2, ~3.6 GB

mkdir -p datasets/deep_10M/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.10M.fbin -P datasets/deep_10M/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/query.public.10K.fbin -P datasets/deep_10M/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/groundtruth.public.10K.ibin -P datasets/deep_10M/
retri-eval-usearch \
    --vectors datasets/deep_10M/base.10M.fbin \
    --queries datasets/deep_10M/query.public.10K.fbin \
    --neighbors datasets/deep_10M/groundtruth.public.10K.ibin \
    --dtype f32,bf16,f16,i8 --metric l2 --threads 16 \
    --output results/deep_10M

1B, f32, 96d, L2, ~358 GB

mkdir -p datasets/deep_1B/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/base.1B.fbin -P datasets/deep_1B/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/query.public.10K.fbin -P datasets/deep_1B/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/DEEP/groundtruth.public.10K.ibin -P datasets/deep_1B/

Yandex Text-to-Image

1M subset and full 1B available from Yandex.

1M subset, f32, 200d, Cos, ~750 MB

mkdir -p datasets/t2i/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/base.1M.fbin -P datasets/t2i/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/query.public.100K.fbin -P datasets/t2i/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/groundtruth.public.100K.ibin -P datasets/t2i/
retri-eval-usearch \
    --vectors datasets/t2i/base.1M.fbin \
    --queries datasets/t2i/query.public.100K.fbin \
    --neighbors datasets/t2i/groundtruth.public.100K.ibin \
    --dtype f32,bf16,f16,i8 --metric cos --threads 16 \
    --output results/t2i_1M

1B, f32, 200d, Cos, ~750 GB

mkdir -p datasets/t2i_1B/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/base.1B.fbin -P datasets/t2i_1B/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/query.public.100K.fbin -P datasets/t2i_1B/ && \
    wget -nc https://storage.yandexcloud.net/yandex-research/ann-datasets/T2I/groundtruth.public.100K.ibin -P datasets/t2i_1B/

About

Benchmark suite that compares vector search engines against each other on billion-scale datasets, from in-memory HNSW libraries like USearch, FAISS, & cuVS to managed databases like Qdrant, Weaviate, LanceDB, & Redis – built for reproducibility

Topics

Resources

Stars

Watchers

Forks

Contributors