VecOps-Bench

Measuring the Hidden Cost of Data Churn in Production Vector Databases

Key Finding: All tested HNSW indexes degrade 3.4%–9.7% after just 10% data churn, with 38× speed differences in churn operations.

The Production Gap

Current benchmarks (ANN-Benchmarks, VectorDBBench) evaluate Day 1 performance:

Fresh index, static corpus, synthetic queries

Real production systems face Day 2 challenges:

Continuous updates (documents added/removed)
Index degradation from data churn
Compaction overhead and timing
Memory pressure during operations

VecOps-Bench measures what happens when production reality meets HNSW theory.

Key Results (December 2025)

Temporal Drift Under 10% Data Churn

Database	Initial Recall@10	After 10 Cycles	Degradation
pgvector	83.13%	75.06%	9.71%
Milvus	97.84%	89.28%	8.75%
Chroma	88.99%	81.61%	8.29%
Weaviate	81.80%	79.06%	3.35%

Churn Operation Speed

Database	Avg Delete (100K)	Avg Insert (100K)	Full Cycle
Milvus	22.9s	14.3s	37s
Weaviate	94.9s	94.9s	190s
pgvector	87.5s	1307.8s	1395s
Chroma	671.5s	781.8s	1453s

38× speed difference between fastest (Milvus) and slowest (pgvector) churn operations.

Quick Start

Prerequisites

Docker & Docker Compose
Python 3.9+
64GB+ RAM recommended for 10M scale

Run Benchmark

# Clone repository
git clone https://github.com/debu-sinha/vecops-bench.git
cd vecops-bench

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Start databases
docker-compose up -d

# Run baseline benchmark
python scripts/run_benchmark_v2.py --scale 1000000

# Run temporal drift analysis (THE KEY EXPERIMENT)
python scripts/run_churn_test_v2.py --database milvus --cycles 10

Methodology

Dataset

Primary: Cohere Wikipedia embeddings (9.99M × 768 dimensions)
Validation: SIFT1M (1M × 128 dimensions)

HNSW Parameters (Standardized)

Parameter	Value	Rationale
M	16	Industry standard
ef_construction	128	Balanced build/quality
ef_search	100	Balanced recall/latency

Churn Protocol

Load 9.99M vectors + 10K held-out queries
Each cycle: DELETE 100K oldest + INSERT 100K new
Measure Recall@10 after each cycle
10 cycles = 10% total corpus turnover

Databases Tested

Database	Version	Status
Milvus	2.3.4	✅ Complete
pgvector	0.8.0	✅ Complete
Weaviate	1.27.0	✅ Complete
Chroma	0.6.3	✅ Complete
Qdrant	1.7.4	⚠️ Delete timeouts

Project Structure

vecops-bench/
├── src/
│   ├── databases/          # Database adapters
│   ├── metrics/            # Recall computation
│   └── resilience/         # Cold start, crash recovery
├── scripts/
│   ├── run_benchmark_v2.py # Baseline benchmark
│   ├── run_churn_test_v2.py # Temporal drift analysis
│   ├── run_sift1m_*.py     # SIFT1M validation
│   └── generate_vldb_figures.py # Publication figures
├── paper/
│   ├── vecops_bench_vldb2026.tex # VLDB 2026 submission
│   ├── figures/            # Publication figures
│   └── CLAIMS_VALIDATION.md # Data verification
├── results_v2/             # Experimental results (JSON)
└── docker-compose.yaml     # Database containers

Reproducing Results

Generate Publication Figures

python scripts/generate_vldb_figures.py
# Output: paper/figures/fig1_recall_degradation.pdf (etc.)

Validate Paper Claims

All claims in the paper are validated against experimental JSON files:

See paper/CLAIMS_VALIDATION.md for full verification

Hardware

AWS EC2 Instance:

Type: Memory-optimized (i4i.4xlarge or similar)
RAM: 128GB
vCPUs: 16
Storage: NVMe SSD

Citation

@article{sinha2025vecops,
  title={{VecOps-Bench}: Measuring the Hidden Cost of Data Churn
         in Production Vector Databases},
  author={Sinha, Debu},
  journal={Proceedings of the VLDB Endowment},
  year={2026},
  note={Under review}
}

License

MIT License - see LICENSE for details.

Contact

Author: Debu Sinha
GitHub: @debu-sinha
Issues: GitHub Issues

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
experiments		experiments
infrastructure		infrastructure
paper		paper
results_v2		results_v2
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VecOps-Bench

The Production Gap

Key Results (December 2025)

Temporal Drift Under 10% Data Churn

Churn Operation Speed

Quick Start

Prerequisites

Run Benchmark

Methodology

Dataset

HNSW Parameters (Standardized)

Churn Protocol

Databases Tested

Project Structure

Reproducing Results

Generate Publication Figures

Validate Paper Claims

Hardware

Citation

License

Contact

About

Uh oh!

Releases 1

Packages

Languages

debu-sinha/vecops-bench

Folders and files

Latest commit

History

Repository files navigation

VecOps-Bench

The Production Gap

Key Results (December 2025)

Temporal Drift Under 10% Data Churn

Churn Operation Speed

Quick Start

Prerequisites

Run Benchmark

Methodology

Dataset

HNSW Parameters (Standardized)

Churn Protocol

Databases Tested

Project Structure

Reproducing Results

Generate Publication Figures

Validate Paper Claims

Hardware

Citation

License

Contact

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages