toydb

A from-scratch SQL database engine in Rust, built as a teaching project. ~13 700 lines of Rust, 288 tests, no production dependencies beyond thiserror and rustyline.

toydb> CREATE TABLE movies (id INT PRIMARY KEY, title TEXT, year INT, rating FLOAT);
toydb> INSERT INTO movies VALUES (1,'Sicario',2015,7.6),(2,'Arrival',2016,7.9),(3,'Dune',2021,8.1);
toydb> SELECT title, year FROM movies WHERE year >= 2016 ORDER BY rating DESC LIMIT 2;
+---------+------+
| title   | year |
+---------+------+
| Dune    | 2021 |
| Arrival | 2016 |
+---------+------+
(2 rows)

30-Second Summary

Signal	Details
Positioning	Rust SQL database engine from scratch, useful for showing storage, query execution, WAL, and index/planner fundamentals.
Stack	Rust, hand-written lexer/parser, SQL executor, in-memory + page-based disk engine, WAL, runtime `BTreeMap` secondary indexes.
Hard parts	SQL three-valued logic, joins/aggregates/subqueries, slotted pages, WAL replay, snapshot-style transactions, `CREATE INDEX` + `IndexScan` planning.
Quick start	`cargo run --release -- examples/index_demo.sql` to see `SeqScan` switch to `IndexScan`.
Validation	`cargo fmt --check`, `cargo clippy --all-targets -- -D warnings`, `cargo test --all`.
Benchmark / result	288 tests cover parser/executor/storage/WAL/index behavior; index demo shows update/delete maintenance, and index tests cover disk reopen/rebuild behavior.

What it does

Layer	Capabilities
SQL frontend	Hand-written lexer + recursive-descent parser, full positional error reporting
AST	DDL (CREATE / DROP / ALTER TABLE, CREATE / DROP INDEX), DML (INSERT incl. INSERT...SELECT, UPDATE, DELETE), SELECT (WHERE / GROUP BY / HAVING / ORDER BY / LIMIT / OFFSET / JOIN / DISTINCT / CASE WHEN), BEGIN / COMMIT / ROLLBACK, EXPLAIN
Type system	NULL, BOOLEAN, INTEGER, FLOAT, STRING — with SQL three-valued logic
Expressions	Arithmetic, comparison, logic, string concat, IS NULL, IN, BETWEEN, LIKE, `CASE WHEN`, scalar functions (`abs`, `lower`, `upper`, `length`, `substring`, `coalesce`, `nullif`, `iff`, `concat`, ...)
Aggregates	`COUNT`, `SUM`, `AVG`, `MIN`, `MAX` with `GROUP BY` / `HAVING` / `DISTINCT` (e.g. `COUNT(DISTINCT col)`)
Query shaping	`SELECT DISTINCT`, alias-aware `ORDER BY` (`ORDER BY <SELECT alias>`), `LIMIT` / `OFFSET`, `UNION` / `UNION ALL`, scalar subqueries `(SELECT ...)`
Joins	`INNER`, `LEFT`, `RIGHT` (nested-loop)
Planning + indexes	`SeqScan` by default; single-column equality predicates can use `IndexScan(index=..., key=...)` via `EXPLAIN SELECT`
Storage engines	In-memory `MemoryEngine` and disk-backed `DiskEngine` (page-based, slotted pages, free list)
Durability	Write-ahead log, recovery on open, torn-write tolerance
Transactions	Snapshot isolation: `BEGIN` clones state, `ROLLBACK` restores, `COMMIT` accepts
REPL	Interactive `toydb` CLI with table-formatted output, `.tables`, `.help`, multi-line input

What it does NOT

Network protocol (no Postgres / MySQL wire — single process REPL only)
Correlated subqueries, CTEs, window functions (only uncorrelated scalar subqueries supported)
ORDER BY / LIMIT on UNION'd results (we accept the unions but reject the trailing clauses; wrap with a query that's more capable than toydb if you need it)
Cost-based optimisation, statistics, or join reordering
Range, composite, covering, unique, or expression indexes; secondary indexes are single-column equality only
Distributed anything (no replication, no sharding, no consensus)
Concurrent transactions (toydb is single-threaded)
Persisted index pages: disk indexes persist as catalog metadata and are rebuilt from table pages after open/WAL replay
VACUUM / type-length enforcement (VARCHAR(N) is parsed but N ignored)
ALTER TABLE on the disk engine (memory only — would require row-level rewrite of every page)

Build & run

cargo build --release
cargo test --all                    # 288 tests across unit + integration
cargo clippy --all-targets -- -D warnings

# REPL — in memory
cargo run --release

# REPL — durable
cargo run --release -- --db ./mydb.toydb

# Run a SQL script
cargo run --release -- examples/movies.sql
cargo run --release -- --db ./mydb.toydb examples/orders.sql

Tour by example

Basics — `examples/movies.sql`

$ cargo run --release -- examples/movies.sql

Demonstrates CREATE TABLE, multi-row INSERT, WHERE / ORDER BY / LIMIT, expression aliases, UPDATE.

Joins, group-by, aggregates — `examples/orders.sql`

$ cargo run --release -- examples/orders.sql

Builds two tables, runs INNER JOIN and LEFT JOIN, groups by customer with COUNT(*), SUM, AVG, applies HAVING, refers to SELECT aliases in ORDER BY.

Transactions — `examples/txn.sql`

$ cargo run --release -- examples/txn.sql

Shows a successful transfer (BEGIN ... COMMIT) and an aborted one (BEGIN ... ROLLBACK), proving the rollback fully restored balance state.

Indexes and EXPLAIN — `examples/index_demo.sql`

$ cargo run --release -- examples/index_demo.sql

Shows CREATE INDEX idx_users_age ON users(age), EXPLAIN SELECT ... WHERE age = 20, equality lookup through IndexScan, and index maintenance after UPDATE / DELETE. Dropping the index returns the same query to SeqScan.

Everything together — `examples/library.sql`

$ cargo run --release -- examples/library.sql

Mixes joins, group-by, having, LIKE / IN / BETWEEN, CASE WHEN ... END, computed columns, transactions — a small SF/F bibliography.

Persistence

$ rm -f /tmp/mydb.toydb /tmp/mydb.toydb-wal
$ cargo run --release -- --db /tmp/mydb.toydb <<EOF
CREATE TABLE k (id INT PRIMARY KEY, v TEXT);
INSERT INTO k VALUES (1,'one'),(2,'two'),(3,'three');
EOF
$ cargo run --release -- --db /tmp/mydb.toydb <<EOF
SELECT * FROM k ORDER BY id;
EOF

The second invocation reads the rows the first one inserted.

REPL meta commands

toydb> .tables
toydb> .schema users
toydb> .help
toydb> .exit

Layout

src/
├── lib.rs             public API
├── error.rs           crate-wide Error
├── format.rs          ASCII-grid result rendering
├── sql/               lexer, parser, AST
├── types/             Value, DataType, Row primitives
├── catalog/           Table / Column metadata
├── engine/            storage backends
│   ├── memory.rs        in-memory + transactions via clone-on-begin
│   ├── index.rs         runtime BTreeMap index store
│   └── disk.rs          disk-backed via pager + WAL
├── executor/          query planner + execution
│   ├── plan.rs          statement dispatcher, FROM/WHERE/SELECT
│   ├── expr.rs          scalar expression evaluator
│   └── aggregate.rs     COUNT / SUM / AVG / MIN / MAX, group folding
└── storage/           low-level on-disk primitives
    ├── encoding.rs      hand-rolled binary serialiser
    ├── page.rs          fixed 8 KiB slotted page
    ├── pager.rs         file-backed paged storage with cache
    └── wal.rs           append-only write-ahead log
bin/toydb.rs           REPL entry
tests/                 end-to-end SQL tests (basic, CLI, comprehensive, index, persistence, stress, transactions)
examples/              sample SQL scripts

Test inventory

File	What it covers
`tests/sql_basic.rs`	end-to-end CRUD, NULL semantics, constraints, idempotent DDL, mutation error paths
`tests/sql_comprehensive.rs`	broad feature pipelines and NULL three-valued logic
`tests/sql_persistence.rs`	open/close/reopen, WAL replay, multi-page tables, drop-table durability
`tests/sql_index.rs`	CREATE/DROP INDEX, IndexScan planning, update/delete maintenance, disk reopen/rebuild
`tests/sql_stress.rs`	larger in-memory and disk smoke workloads
`tests/sql_transaction.rs`	BEGIN/COMMIT/ROLLBACK, snapshot semantics, nested-begin rejection
each `src/<mod>/*.rs`	tight per-module unit tests (lexer, parser, expr, aggregate, page, pager, wal, ...)

Design notes

AST carries strings, not source slices. Frees the planner from source-tied lifetimes and makes errors stand-alone.
Expression evaluation is uniform. WHERE, ORDER BY, projection, and HAVING all walk the same eval_with path; only the Resolver changes (single-table, wide for joins, group-aware for aggregates).
Aggregates dedupe by structural AST equality. SUM(price) reused in SELECT and HAVING shares one accumulator per group.
Snapshot isolation = clone-on-begin. Cheap and obvious for a teaching engine; the disk engine does not yet support transactions. Statement errors inside a memory-engine transaction do not automatically roll the transaction back, so clients should issue ROLLBACK explicitly if they want to abandon earlier successful statements in the same BEGIN.
Secondary indexes are metadata-persistent, tree-rebuilt. The catalog stores Index { name, table, column }. Engines maintain a runtime BTreeMap<Value, BTreeSet<RowId>>; the disk engine rebuilds it from recovered table pages on open. That keeps the pager format simple and makes WAL replay authoritative, at the cost of O(n) open time per indexed table.
Planner rule is deliberately small. For a single-table SELECT, WHERE indexed_col = <constant> or an AND containing that equality chooses IndexScan; the executor still evaluates the full WHERE predicate after fetching candidate rows, so extra filters remain correct.
Pager is the only thing that touches disk. Layers above never open files directly, which keeps the I/O surface tiny.
WAL is intentionally minimal. No LSN, no checkpoint, no group commit — recovery just replays records onto the catalog and pages.

Index design and limits

Syntax:

CREATE INDEX idx_users_age ON users(age);
DROP INDEX idx_users_age;
EXPLAIN SELECT id FROM users WHERE age = 20;

Plan shape:

SeqScan `users`
  Filter (WHERE)
  Project (2 items)

IndexScan `users` (index=idx_users_age, key=20)
  Filter (WHERE)
  Project (2 items)

Complexity:

Operation	Without index	With equality index
`WHERE col = v` candidate lookup	O(n) row scan	O(log m + k) in the runtime B-tree
`INSERT`	O(1) append/page insert plus constraints	plus O(log m) per index
`UPDATE indexed_col`	O(n) to find target today	plus remove/insert O(log m)
`DELETE`	O(n) to find target today	plus remove O(log m)
Disk open	O(catalog)	O(rows in indexed tables) rebuild

Small benchmark / smoke workload:

cargo run --release -- examples/index_demo.sql

The demo prints the same SELECT ... WHERE age = 20 results before and after index creation. The visible benchmark signal is the plan change: the first EXPLAIN emits SeqScan, the second emits IndexScan (index=idx_users_age, key=20). For larger tables this changes candidate discovery from scanning every row to probing the B-tree and then evaluating the remaining WHERE predicate over only matching row ids.

See PLAN.md for the milestone-by-milestone walkthrough and CLAUDE.md for the in-repo coding conventions.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
PLAN.md		PLAN.md
README.md		README.md
REVIEW_AND_FIX_REPORT.md		REVIEW_AND_FIX_REPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

toydb

30-Second Summary

What it does

What it does NOT

Build & run

Tour by example

Basics — `examples/movies.sql`

Joins, group-by, aggregates — `examples/orders.sql`

Transactions — `examples/txn.sql`

Indexes and EXPLAIN — `examples/index_demo.sql`

Everything together — `examples/library.sql`

Persistence

REPL meta commands

Layout

Test inventory

Design notes

Index design and limits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

toydb

30-Second Summary

What it does

What it does NOT

Build & run

Tour by example

Basics — examples/movies.sql

Joins, group-by, aggregates — examples/orders.sql

Transactions — examples/txn.sql

Indexes and EXPLAIN — examples/index_demo.sql

Everything together — examples/library.sql

Persistence

REPL meta commands

Layout

Test inventory

Design notes

Index design and limits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Basics — `examples/movies.sql`

Joins, group-by, aggregates — `examples/orders.sql`

Transactions — `examples/txn.sql`

Indexes and EXPLAIN — `examples/index_demo.sql`

Everything together — `examples/library.sql`

Packages