Compare the performance and scalability of FastAPI (Python) vs Express.js (Node.js) backed by Redis (single-threaded) vs Dragonfly (multi-threaded) under extreme load.
┌─────────────────────────────────────────────────────────────────┐
│ Host Machine │
│ │
│ ┌──────────┐ HTTP ┌──────────────────────────────────┐ │
│ │ oha │ ──────────> │ Docker │ │
│ │ (Rust) │ localhost │ │ │
│ └──────────┘ │ ┌────────┐ ┌─────────────┐ │ │
│ │ │FastAPI │────>│ Redis │ │ │
│ │ │:8000 │ │ :6379 │ │ │
│ │ └────────┘ └─────────────┘ │ │
│ │ or │ │
│ │ ┌────────┐ ┌─────────────┐ │ │
│ │ │Express │────>│ Dragonfly │ │ │
│ │ │:3000 │ │ :6379 │ │ │
│ │ └────────┘ └─────────────┘ │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
| Dimension | Values |
|---|---|
| Frameworks | FastAPI, Express.js |
| Data Stores | Redis 7, Dragonfly |
| Workers | 1, 2, 4 |
| Concurrency | 100, 500, 1,000, 2,000 |
| Target QPS | 1k, 5k, 10k, 20k (rate-limited) |
| Repetitions | 5 per configuration |
| Duration | 20 seconds per test |
Total: 4 configs × 3 workers × 16 load combos × 5 reps = 960 tests (~6 hours)
- Docker Desktop — 8GB RAM allocated, 8+ CPUs
- oha — Rust-based load generator
# Windows winget install hatoo.oha # macOS brew install oha # Linux cargo install oha - Python 3.10+ — For result parsing and report generation
pip install plotly pandas
git clone https://github.com/LCKYN/perf-api-framework.git
cd perf-api-frameworkchmod +x scripts/benchmark.sh
./scripts/benchmark.sh --quick./scripts/benchmark.shOpen results/report.html in any browser — it's a self-contained interactive dashboard.
perf-api-framework/
├── fastapi/
│ ├── app.py # FastAPI async endpoint
│ ├── requirements.txt # Python dependencies
│ └── Dockerfile
├── express/
│ ├── app.js # Express.js endpoint
│ ├── cluster.js # Node.js cluster wrapper
│ ├── package.json # NPM dependencies
│ └── Dockerfile
├── scripts/
│ ├── benchmark.sh # Main orchestration script
│ ├── seed.sh # Pre-populate test data
│ └── parse_results.py # JSON → CSV parser
├── report/
│ ├── generate_report.py # CSV → HTML dashboard (Plotly)
│ └── requirements.txt # Python report dependencies
├── results/ # Generated after running
│ ├── raw/ # Raw oha JSON output
│ ├── results.csv # Per-run results
│ ├── results_avg.csv # Averaged results
│ └── report.html # Interactive dashboard
├── docker-compose.yml # Parameterized compose file
├── .env.example # Environment template
├── .gitignore
└── README.md
GET /cached-value — Reads a pre-seeded 1KB key (benchmark:data) from Redis/Dragonfly.
This isolates the framework + data-store path without filesystem or computation overhead.
Instead of max-throughput blasting, oha runs at controlled QPS targets (1k → 5k → 10k → 20k)
with --latency-correction to avoid the Coordinated Omission Problem. This finds the
exact breaking point where latency spikes (the "hockey stick") rather than just the ceiling.
| Workers | API CPUs | DB CPUs | API Memory | DB Memory |
|---|---|---|---|---|
| 1 | 1.0 | 2.0 | 512M | 512M |
| 2 | 2.0 | 2.0 | 512M | 512M |
| 4 | 4.0 | 2.0 | 512M | 512M |
- Actual RPS vs Target QPS — Where does each config plateau below the target?
- p95 Latency vs Target QPS — Where does latency spike (hockey stick)?
- RPS vs Workers — Is scaling linear? Where does it flatten?
- Redis vs Dragonfly — At 4 workers, where does single-threaded Redis bottleneck?
Run individual components for debugging:
# Start FastAPI + Redis with 2 workers
WORKERS=2 DB_HOST=redis docker compose --profile fastapi --profile redis up -d
# Seed data
./scripts/seed.sh redis
# Quick oha test
oha -z 10s -c 100 -q 5000 --latency-correction --disable-keepalive \
--no-tui --output-format json http://localhost:8000/cached-value
# Tear down
docker compose --profile fastapi --profile redis down -vEdit the variables at the top of scripts/benchmark.sh:
FRAMEWORKS=("fastapi" "express") # Add/remove frameworks
DATASTORES=("redis" "dragonfly") # Add/remove stores
WORKER_COUNTS=(1 2 4) # Worker permutations
CONCURRENCIES=(100 500 1000 2000) # Concurrent connections
QPS_TARGETS=(1000 5000 10000 20000) # Rate targets
DURATION="20s" # Test duration
REPETITIONS=5 # Runs per configMIT