This repository contains the source code, methodology, experiment transcripts, correctness audit, and benchmark harnesses for the article:
"AI Does Not Replace Developers. It Amplifies What They Lack."
kqsqlite is a fork of a 487K LOC LLM-generated Rust rewrite of SQLite, forked under its MIT license for this experiment. The original rewrite was built entirely by Claude Opus 4 via Claude Code. We used the same model to diagnose its performance bugs, with three different human personas providing the prompts.
Three personas prompted the same LLM (Claude Opus 4) to diagnose performance bugs in kqsqlite. Same codebase, same model, same tools. Only the human's framing changed.
| Persona | Prompts | Result |
|---|---|---|
| L0 - Vibecoder ("make it faster") | 7 | 4.5x behind SQLite |
| L2 - Mid-level Dev (ratio analysis) | 32 | ~270x improvement, 5 bugs found |
| L5 - Domain Expert (tier framework) | 73 | ~900x improvement, 5/6 bugs found |
├── METHODOLOGY.md Experiment design, benchmarking protocol, scoring
├── CORRECTNESS_AUDIT.md Known correctness gaps and testing status
├── experiment/
│ ├── l0-tldr.md L0 (Vibecoder) session transcript
│ ├── l2-tldr.md L2 (Mid-level Dev) session transcript
│ ├── l5-tldr.md L5 (Domain Expert) session transcript
│ └── l5-snapshot-a-benchmarks.md Pre-experiment baseline + scale validation
└── src/ Full Rust source + C benchmark harnesses
├── Cargo.toml / Cargo.lock
├── crates/ 27 Rust crates (kqsqlite-*)
├── bench/ C benchmark harnesses + sqlite3.h
├── e2e/ End-to-end tests
├── fuzz/ Fuzz targets (parser, record codec)
└── tests/ Integration tests
Warning: a full cargo build of all 27 crates will use 8-12 GB of RAM and
significant disk space. On memory-constrained machines, build incrementally:
cd src
# Build individual crates one at a time
cargo build -p kqsqlite-types
cargo build -p kqsqlite-error
cargo build -p kqsqlite-parser
cargo build -p kqsqlite-btree
# ... etc
# Or check without producing binaries (less disk pressure)
cargo check -p kqsqlite-core
# To build the C API shared library (needed for benchmarks)
cargo build -p kqsqlite-c-api --release
# Use a separate target dir if /tmp has more space than your working dir
cargo build -p kqsqlite-c-api --release --target-dir /tmp/kqsqlite-buildThe C benchmark harnesses in bench/ link against libkqsqlite_c_api.so.
See bench/README.md and bench/run.sh for compilation and execution.
cd src
# 1. Build the C API shared library
cargo build -p kqsqlite-c-api --release
# 2. Compile a benchmark harness (e.g., the micro benchmark)
cd bench
gcc -O2 -o benchmark_kqsqlite benchmark.c -I include \
-L ../target/release -lkqsqlite_c_api -lpthread -ldl -lm
# 3. Run (with methodology controls)
taskset -c 0 LD_LIBRARY_PATH=../target/release ./benchmark_kqsqlite :memory:All benchmarks use: taskset -c 0, 5-run medians, loadavg-gated (<0.5),
:memory: mode, sequential runs only (never parallel - CPU contention skews
results). See METHODOLOGY.md for the full protocol.
- Methodology - Experiment design, constraint (vanilla prompting only), personas, ground truth bugs, benchmarking protocol, scoring rubric
- Correctness Audit - Eleven known gaps, seven unsafe optimization sites, seven plausible-but-fragile items, and what production would require
kqsqlite is an experiment, not production software. It has known correctness gaps
(see CORRECTNESS_AUDIT.md), missing SQL features, and untested
code paths. The performance numbers in this experiment measure computational efficiency
in :memory: mode only - file-backed I/O, concurrent access, and crash recovery are
not benchmarked.
The experiment demonstrates how human framing quality affects LLM-assisted diagnosis. It does not claim kqsqlite is suitable for any production use case.
MIT - see src/LICENSE for the original copyright notice.