Skip to content

Performance comparison of ClickHouse vs PostgreSQL using LLM-style query patterns on realistic aircraft tracking data

License

Notifications You must be signed in to change notification settings

514-labs/LLM-query-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

LLM Query Performance Testing

This benchmark is designed to measure the experiential impact of database performance on LLM chat interactions. Unlike traditional benchmarks, this focuses on showing user experience differences rather than raw metrics.

  • Compare OLAP (ClickHouse) vs OLTP (PostgreSQL) using LLM-style query patterns with npm run bulk-test
  • Demonstrate how query latency affects AI-powered data conversations with npm run latency-sim

📊 Results → ClickHouse is faster on datasets >50K records (up to 16.8x faster at 10M records).

Latency Simulation

The above chat shows the resulting chat performance, at 10m rows, of Postgres (unindexed) and ClickHouse.

This repo ships with results for tests run from 10k to 10m rows (see Benchmarking Methodology for details).

Prerequisites

Setup

# Get the code
git clone https://github.com/514-labs/llm-test.git
cd llm-test

# Setup and build
npm install
npm run build

# Configure (optional - defaults work fine)
cp .env.example .env

Run chat sim

Since the repo ships with results, you can run the chat sim with: npm run latency-sim.

Run your own tests

If you want to run the benchmark yourself:

# Run Docker Desktop

# Optionally, edit .env to set the test sizes you are interested in. 
# Default: BULK_TEST_SIZES=10000,50000,100000,500000,1000000,5000000,10000000

# Run the bulk test tool
npm run bulk-test

Commands

  • npm start - Run Benchmark (data generation + queries)
  • npm run query-test - Query-only test with statistical analysis (requires databases running/populated)
  • npm run bulk-test - Comprehensive testing (configurable databases and dataset sizes)
  • npm run generate-graphs - Generate performance visualizations
  • npm run latency-sim - Interactive LLM conversation delay simulator
  • npm run clean - Reset databases and clear results

CLI options: see npm run help. Use -- --help for details for a particular command.

Data

46-column aircraft tracking records (position, altitude, transponder codes). See Schema Comparison

Query Pattern

LLM simulation: "How many aircraft are in the air on average every minute for the past hour?"

  1. Q1: Table discovery
  2. Q2: Schema exploration
  3. Q3: Hourly counts
  4. Q4: Average calculation

Configuration

Key .env settings: BULK_TEST_SIZES, BULK_TEST_DATABASES, BULK_TEST_CLEAR_RESULTS, DATASET_SIZE, BATCH_SIZE, PARALLEL_INSERT, container resources.

🔧 Full config →

Output

output/test-results.{json,csv} with timing data and 95% confidence intervals.

Methodology

  • Deterministic data (seeded)
  • 3x warmup queries
  • 95% CI across 100+ iterations
  • Database-specific optimizations

🔬 Details → | Schema →

Troubleshooting

  • Connection errors: docker ps
  • Memory issues: reduce DATASET_SIZE
  • Port conflicts: check 8123, 5432, 5433
  • Reset: npm run clean && npm run kill-dbs && npm run start-dbs

Built by the team behind the open source Moose Stack.

About

Performance comparison of ClickHouse vs PostgreSQL using LLM-style query patterns on realistic aircraft tracking data

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •