StoreLite — Log-Structured Key-Value Storage Engine

A production-pattern storage engine in C++17 on Linux, implementing the core internals of systems like LevelDB, Bitcask, and NetApp ONTAP's WAFL: append-only writes, write-ahead logging, crash recovery, LRU caching, and compaction.

StoreLite is a log-structured key-value storage engine inspired by LSM-tree based systems such as LevelDB and RocksDB.

Benchmark Results (Ubuntu, VirtualBox)

Mode	Write	Read	Compaction Savings
No-sync / No-WAL (max throughput)	407K ops/s / 58.7 MB/s	308K ops/s	76.3%
WAL disabled, fsync enabled	255 ops/s	445K ops/s	2.2%
Full durable (sync + WAL)	130 ops/s	397K ops/s	55.1%

The 3,100x write throughput gap between durable and no-sync mode is not a bug. It is the fundamental storage engineering tradeoff — fdatasync waits for disk confirmation on every write. This is why real systems use WAL group commit, write coalescing, and battery-backed NVRAM.

Architecture

  Put("k", "v")
       │
       ▼
  ┌─────────┐  O_SYNC   ┌───────────┐
  │   WAL   │ ────────► │  wal.log  │  ← durable before segment write
  └────┬────┘           └───────────┘
       │
       ▼
  ┌──────────────┐  pwrite  ┌──────────────────┐
  │  Active Seg  │ ───────► │   N.seg  (disk)  │  ← append-only
  │ (write-only) │          └──────────────────┘
  └──────┬───────┘
         │  rotate when full (default 64MB)
         ▼
  ┌──────────────┐
  │  Immutable   │  ──►  Compactor  ──►  merged segment
  │  Segments    │        (background)     stale data reclaimed
  └──────────────┘
         │
         ▼
  ┌──────────────┐
  │  Hash Index  │  key → { seg_id, offset, val_len, timestamp_us }
  │ (in-memory)  │
  └──────┬───────┘
         │  cache-first on reads
         ▼
  ┌──────────────┐
  │  LRU Cache   │  O(1) get/put — doubly-linked list + hashmap
  └──────────────┘

On-Disk Record Format

┌──────────┬────────────────┬──────────┬──────────┬──────────────────────┐
│ crc32: 4 │ timestamp_us:8 │ key_len:4│ val_len:4│  key  │    value     │
└──────────┴────────────────┴──────────┴──────────┴───────┴──────────────┘
  ◄──────────────── 20-byte header ─────────────────►  variable length

val_len == 0xFFFFFFFF  →  tombstone (deletion marker)

CRC32 covers all bytes after the CRC field — detects bit rot and partial writes
Timestamp enables last-write-wins conflict resolution during compaction
All multi-byte fields are little-endian

Components

File	Lines	Responsibility
`record.cpp`	~130	CRC32 (table-driven), encode/decode, tombstone encoding
`segment.cpp`	~160	Append-only file; `pwrite`/`pread`/`fdatasync`; full scan for recovery
`wal.cpp`	~90	`O_SYNC` write-ahead log; replay on crash; truncate on clean shutdown
`lru_cache.cpp`	~70	Fixed-capacity LRU; O(1) via doubly-linked list + `unordered_map`
`compactor.cpp`	~100	Scan N segments → emit 1; keep latest timestamp per key; unlink stale files
`engine.cpp`	~300	Orchestrator; `std::shared_mutex` for concurrent reads; segment rotation
`main.cpp`	~180	CLI benchmark (write/read/mixed/delete) + interactive REPL

Key Systems Concepts

Concept	Where used
Log-structured writes	Every write is a sequential `pwrite` — no random I/O on write path
WAL + crash recovery	`O_SYNC` guarantees durability; WAL replayed on startup; partial records at end ignored
`fdatasync` vs `fsync`	Data-only flush per write — skips inode metadata update
`pwrite` / `pread`	Offset-explicit I/O — safe for concurrent readers without seeking
Tombstone deletions	Deletions write a sentinel record; compaction purges both tombstone and shadowed data
Segment rotation	Active segment sealed at `segment_max_bytes`; becomes immutable candidate for compaction
LRU O(1) eviction	Doubly-linked list (access order) + hashmap (O(1) lookup) — classic interview data structure
`std::shared_mutex`	Multiple concurrent reads; exclusive lock only on writes and segment rotation

Screenshots

Build — zero warnings

102 Tests Passing — CRC32, Record, LRU, Segment, WAL, Engine

Benchmark — Max Throughput (no-sync, no-WAL)

407K writes/sec · 308K reads/sec · 76.3% compaction savings

Benchmark — Durable Writes (fsync per write, no WAL)

Demonstrates the fdatasync cost: 255 writes/sec vs 407K — 1,600x slowdown for durability

Benchmark — Full Durability (sync + WAL)

Production-safe mode: WAL flushed before segment write, fdatasync on every record

Build & Run

Requirements: g++ >= 9, make, Linux (kernel ≥ 3.0)

git clone https://github.com/<your-username>/storeLite
cd StoreLite
make

Run all 102 tests

make test

Benchmark

# Max throughput (OS-buffered writes)
./storelite --bench --no-sync --no-wal

# fsync per write, no WAL
./storelite --bench --no-wal -n 1000

# Full durability: WAL + fsync (slow — intended)
./storelite --bench

# Custom: 500k ops, 512B values
./storelite --bench --no-sync --no-wal -n 500000 -v 512

Interactive REPL

./storelite --interactive --data-dir /tmp/mydb

> put name amit
OK
> put project StoreLite
OK
> get name
amit
> del name
OK
> get name
(nil)
> scan
project = StoreLite
(1 keys)
> stats
keys=1 segs=1 bytes=1024 chits=2 cmiss=1
> compact
Compacted: kept=1 dropped=1
> quit

Compaction Deep Dive

Workload:  100k unique keys written twice (100k stale entries)
           50k keys then deleted (50k tombstones)

Before:    2 segments, ~34.6 MB
After:     1 segment,   ~8.2 MB
Savings:   76.3%

Algorithm:
  1. Rotate active segment → immutable
  2. Scan all immutable segments oldest-first
  3. Build map: key → latest {value, timestamp_us}
  4. Emit only live (non-tombstone) entries to new segment
  5. Atomically update in-memory index
  6. unlink() old segment files

Test Coverage (102 tests)

Suite	Count	What is tested
CRC32	4	Known vector `0xCBF43926`, determinism, collision detection
Record encode/decode	6	Round-trip, tombstone, CRC corruption rejection, short buffer
LRU Cache	8	Basic get/put, eviction order, invalidation, hit/miss counters
Segment	4	Append, scan, tombstone records, persist across reopen
WAL	5	Log/replay, tombstone replay, clear, empty check
Engine (integration)	75	put/get/del, overwrite, persistence across reopen, scan, post-compaction reads

Why Log-Structured?

Traditional B-tree storage engines (like the ones used in older databases) require random I/O for updates — finding the right page, modifying it, writing it back. Log-structured engines convert all writes to sequential appends, which is dramatically faster on both spinning disks (no seek time) and SSDs (avoids write amplification).

The tradeoff: read amplification and space amplification grow over time as stale data accumulates — which is exactly what compaction solves.

This is the same architecture used in LevelDB, RocksDB, Apache Cassandra, and the core write path of NetApp ONTAP's WAFL (Write Anywhere File Layout).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
include		include
screenshots		screenshots
src		src
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StoreLite — Log-Structured Key-Value Storage Engine

Benchmark Results (Ubuntu, VirtualBox)

Architecture

On-Disk Record Format

Components

Key Systems Concepts

Screenshots

Build — zero warnings

102 Tests Passing — CRC32, Record, LRU, Segment, WAL, Engine

Benchmark — Max Throughput (no-sync, no-WAL)

Benchmark — Durable Writes (fsync per write, no WAL)

Benchmark — Full Durability (sync + WAL)

Build & Run

Run all 102 tests

Benchmark

Interactive REPL

Compaction Deep Dive

Test Coverage (102 tests)

Why Log-Structured?

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StoreLite — Log-Structured Key-Value Storage Engine

Benchmark Results (Ubuntu, VirtualBox)

Architecture

On-Disk Record Format

Components

Key Systems Concepts

Screenshots

Build — zero warnings

102 Tests Passing — CRC32, Record, LRU, Segment, WAL, Engine

Benchmark — Max Throughput (no-sync, no-WAL)

Benchmark — Durable Writes (fsync per write, no WAL)

Benchmark — Full Durability (sync + WAL)

Build & Run

Run all 102 tests

Benchmark

Interactive REPL

Compaction Deep Dive

Test Coverage (102 tests)

Why Log-Structured?

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages