This benchmark is designed to measure the experiential impact of database performance on LLM chat interactions. Unlike traditional benchmarks, this focuses on showing user experience differences rather than raw metrics.
- Compare OLAP (ClickHouse) vs OLTP (PostgreSQL) using LLM-style query patterns with
npm run bulk-test - Demonstrate how query latency affects AI-powered data conversations with
npm run latency-sim
📊 Results → ClickHouse is faster on datasets >50K records (up to 16.8x faster at 10M records).
The above chat shows the resulting chat performance, at 10m rows, of Postgres (unindexed) and ClickHouse.
This repo ships with results for tests run from 10k to 10m rows (see Benchmarking Methodology for details).
# Get the code
git clone https://github.com/514-labs/llm-test.git
cd llm-test
# Setup and build
npm install
npm run build
# Configure (optional - defaults work fine)
cp .env.example .env
Since the repo ships with results, you can run the chat sim with: npm run latency-sim.
If you want to run the benchmark yourself:
# Run Docker Desktop
# Optionally, edit .env to set the test sizes you are interested in.
# Default: BULK_TEST_SIZES=10000,50000,100000,500000,1000000,5000000,10000000
# Run the bulk test tool
npm run bulk-testnpm start- Run Benchmark (data generation + queries)npm run query-test- Query-only test with statistical analysis (requires databases running/populated)npm run bulk-test- Comprehensive testing (configurable databases and dataset sizes)npm run generate-graphs- Generate performance visualizationsnpm run latency-sim- Interactive LLM conversation delay simulatornpm run clean- Reset databases and clear results
CLI options: see npm run help. Use -- --help for details for a particular command.
46-column aircraft tracking records (position, altitude, transponder codes). See Schema Comparison
LLM simulation: "How many aircraft are in the air on average every minute for the past hour?"
- Q1: Table discovery
- Q2: Schema exploration
- Q3: Hourly counts
- Q4: Average calculation
Key .env settings: BULK_TEST_SIZES, BULK_TEST_DATABASES, BULK_TEST_CLEAR_RESULTS, DATASET_SIZE, BATCH_SIZE, PARALLEL_INSERT, container resources.
output/test-results.{json,csv} with timing data and 95% confidence intervals.
- Deterministic data (seeded)
- 3x warmup queries
- 95% CI across 100+ iterations
- Database-specific optimizations
- Connection errors:
docker ps - Memory issues: reduce
DATASET_SIZE - Port conflicts: check 8123, 5432, 5433
- Reset:
npm run clean && npm run kill-dbs && npm run start-dbs
Built by the team behind the open source Moose Stack.
