This project benchmarks the ingestion performance of three popular vector databases: Qdrant, Weaviate, and ChromaDB using synthetic business and product data.
The benchmark measures:
- Insertion throughput (records per second)
- Total ingestion time
- Index build time
- Final storage size on disk
All databases use consistent vector embeddings generated from sentence-transformers/all-MiniLM-L6-v2
model.
vector-db-benchmark/
├── benchmark_vector_db_ingestion.py # Main benchmarking script
├── docker-compose.yml # Docker configuration for vector DBs
├── requirements.txt # Python dependencies
├── businesses.csv # Synthetic business data
├── businesses.json # Synthetic business data (JSON)
├── products.csv # Synthetic product data
├── products.json # Synthetic product data (JSON)
└── benchmark_results.txt # Benchmark results output
- Docker Desktop installed and running
- Python 3.8+ with virtual environment
- At least 4GB of free RAM
docker-compose up -d
This will start:
- Qdrant on ports 6333 (HTTP) and 6334 (gRPC)
- Weaviate on ports 8081 (HTTP) and 50051 (gRPC)
- ChromaDB on port 8000
# Activate virtual environment (if using one)
.\vector_db_env\Scripts\Activate.ps1
# Install required packages
pip install -r requirements.txt
python benchmark_vector_db_ingestion.py
Database | Records | Ingestion Time (s) | Index Time (s) | Throughput (rec/s) |
---|---|---|---|---|
Qdrant | 12,173 | 12.39 | 0.25 | 982.43 |
Weaviate | 12,173 | 7.14 | 0.22 | 1,704.00 |
ChromaDB | 12,173 | 15.78 | 0.01 | 771.44 |
- Weaviate showed the highest throughput at 1,704 records/second
- ChromaDB had the fastest index build time (0.01s)
- Qdrant provided balanced performance with 982.43 records/second
- All databases successfully ingested 12,173 records (100 businesses + 12,073 products)
business_id
: Unique identifierbusiness_name
: Company nameemail
: Contact emailbusiness_type
: Category (transport, online retail, hotel)branches
: Pipe-separated list of branch addresses
product_id
: Unique identifierproduct_name
: Product namequantity
: Stock quantityprice
: Product pricebusiness_id
: Foreign key to business
- Model:
sentence-transformers/all-MiniLM-L6-v2
- Dimension: 384
- Text Representation:
- Businesses:
{business_name} {email} {business_type}
- Products:
{product_name} quantity: {quantity} price: {price}
- Businesses:
To stop and remove all Docker containers:
docker-compose down
To stop and remove containers along with volumes (deletes all data):
docker-compose down -v
If you encounter port conflicts:
- Edit
docker-compose.yml
to use different ports - Update the port numbers in
benchmark_vector_db_ingestion.py
accordingly
Check container logs:
docker-compose logs [service-name]
# Example: docker-compose logs weaviate
Some minor resource warnings from ChromaDB client are expected and don't affect benchmark results.
- Qdrant: 100 records per batch
- Weaviate: Dynamic batching
- ChromaDB: 5,000 records per batch
All databases use Cosine similarity for vector search.
This is a benchmarking project for educational purposes.
Feel free to extend this benchmark with:
- Additional vector databases
- Different embedding models
- Query performance tests
- Larger datasets
- Memory usage metrics