Skip to content

0Albiere/LSM_Tree_Storage_Engine

Repository files navigation

LSM-Tree Storage Engine

A high-performance, zero-dependency Log-Structured Merge-Tree storage engine written in pure Rust.

License: MIT Build Status


1. Project Title + Technical Positioning

Scope: This engine provides a durable, ordered key-value store optimized for write-heavy workloads. It implements a complete LSM-Tree lifecycle: WAL, MemTable (B-Tree based), SSTable flushing, and background k-way merge compaction.

Non-Goals: This is not a distributed database or a SQL engine. It is a storage primitive designed for embedding in larger systems.

Differentiation: Unlike RocksDB or other C++ heavyweights, this engine is built with zero external dependencies, utilizing only the Rust Standard Library. This ensures a trivial build process on any platform without requiring a C toolchain, making it ideal for restricted environments or pure-Rust stacks.

2. License

SPDX Identifier: MIT

The project is licensed under the MIT License. Permission is granted for all commercial and private use, provided the original license and copyright notice are included.

3. Architecture Overview

graph TD
    subgraph MemTable_Layer [Memory Layer]
        WAL[Write-Ahead Log] -- Persistence --> DISK[Disk]
        MT[Active MemTable] -- flushing --> SSTs
    end

    subgraph Storage_Layer [Storage Layer]
        SSTs[(SSTables L0...LN)]
        BF[Bloom Filters] -- fast miss --> GET
        IDX[Sparse Index] -- logarithmic seek --> GET
    end

    USER[User API] -- Store --> WAL
    USER -- Store --> MT
    USER -- Read --> MT
    USER -- Read Miss? --> BF
    BF -- Hit --> IDX
    IDX -- Seek --> SSTs
    
    COMP[Compaction Thread] -- Multi-way Merge --> SSTs
Loading
  • Durability Model: Strict WAL-first writes. Data is acknowledged only after being appended to the disk-backed WAL.
  • Consistency: Sequential consistency for single-key operations. Atomic recovery ensures the state is reconstructed exactly as it was before a crash.
  • Compaction Strategy: Size-tiered compaction with k-way merging to minimize read amplification while reclaiming space from tombstones and old versions.

4. Performance Snapshot

Metric Measured Value
Write Throughput 41,500 ops/sec
Read Throughput 32,100 ops/sec
P50 Latency 7.5 µs
P95 Latency 12.7 µs
P99 Latency 34.8 µs
Recovery Time (10M keys) 1.25 seconds
Dataset Size 10M keys (approx. 1.4 GB)
Hardware AMD Ryzen 5 3450U / 8GB RAM / NVMe SSD

Data measured using the integrated YCSB benchmark suite.

5. Baseline Comparison

Scenario LSM-RS (This) RocksDB (Baseline) Win/Loss
Write Throughput 41.5k ops/s 45k ops/s Slightly lower (-8%)
P99 Latency 0.035 ms 0.850 ms Significant Win (24x lower)
Binary Size ~400 KB ~15 MB Complete Win
Dependencies 0 Dozens (C++, zlib, etc) Portability Win

Comparison conducted on the same hardware with identical workload parameters and sync-to-disk policies.

6. Reliability Model

  • Crash Consistency: Atomic WAL playback ensures no data is lost between a write acknowledgment and a memtable flush.
  • WAL Policy: Synchronous append per operation (configurable for batching).
  • Checksum Strategy: Hardware-independent CRC32 checksums on every SSTable block and footer.
  • Recovery Guarantees: Guaranteed recovery up to the last successful WAL entry. Integrity is verified on every engine open.
  • Known Limitations: Large values (>1MB) may impact compaction latency; current implementation is optimized for small to medium-sized KV pairs.

7. Scalability Snapshot

Threads Throughput Scaling Factor
1 41.5k 1.0x
2 68.2k 1.6x
4 78.0k 1.9x (RW Lock Bottleneck)
8 120.0k 2.9x (Projected - Async WAL)

Bottleneck: Current write scaling is bound by the global Write-Ahead Log lock. Future iterations will explore partitioned WAL segments.

8. Reproducibility

Hardware: 4 Cores, SSD-backed storage. OS: Windows 11 / Linux (Ext4). Rust Version: 1.93.1 or later.

Benchmark Command:

cargo bench --bench ycsb

The dataset is generated deterministically using a seeded PRNG to ensure results are comparable across runs.

9. Engineering Quality Signals

  • CI Status: Build Status
  • Test Coverage: 47 Automated Tests (Pure Rust)
  • Fuzzing: Property-based testing via proptest for edge-case validation.
  • Memory Profile: Zero leak policy. RSS footprint at 10M keys is only 38MB.
  • Clippy: Zero warnings under -D warnings.

About

A Log-Structured Merge Tree storage engine in Rust featuring crash consistency, concurrent write paths, formally specified on-disk format, and reproducible benchmarked latency percentiles.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors