A high-performance, persistent key-value store written in Rust with log-structured storage and automatic compaction.
- Persistent Storage: All data is durably written to disk using log-structured storage
- Multi-Generation Logs: Uses multiple generation-numbered log files for efficient management
- Fast Lookups: In-memory BTreeMap index provides O(log n) key lookups
- Threshold-Based Compaction: Automatic compaction triggers when uncompacted data exceeds 5MB
- CLI Interface: Simple command-line tool for interacting with the store
- Crash Recovery: Rebuilds in-memory state from all log files on startup
cargo build --releaseSet a key-value pair:
kvs set <KEY> <VALUE>Get a value:
kvs get <KEY>Remove a key:
kvs rm <KEY>Examples:
# Store a value
kvs set mykey "Hello, World!"
# Retrieve the value
kvs get mykey
# Output: Hello, World!
# Remove the key
kvs rm mykeyuse kvs::{KvStore, Result};
fn main() -> Result<()> {
// Open or create a store at the specified path
let mut store = KvStore::open("./data")?;
// Set a key-value pair
store.set("key1".to_owned(), "value1".to_owned())?;
// Get a value
if let Some(value) = store.get("key1".to_owned())? {
println!("Found: {}", value);
}
// Remove a key
store.remove("key1".to_owned())?;
Ok(())
}KVS uses a log-structured storage approach with multiple generation files:
- Multi-Generation Logs: Operations are written to generation-numbered log files (e.g.,
1.log,2.log,3.log) - In-Memory Index: A BTreeMap maintains
key → (generation, position, length)mappings for fast lookups - Read Path: Reads use the index to seek to the correct position in the appropriate generation file
- Write Path: New operations are appended to the current generation file with buffered I/O
Commands are stored as JSON objects:
{"Set":{"key":"mykey","value":"myvalue"}}
{"Remove":{"key":"mykey"}}The compaction algorithm runs automatically when uncompacted data exceeds 5MB threshold:
Compaction Process:
- Create a new compaction generation file (current_gen + 1)
- Copy all active key-value pairs to the compaction file
- Update the in-memory index to point to new positions
- Remove all stale generation files (< compaction_gen)
- Reset uncompacted counter to 0
- Increment current generation by 2 (one for compaction, one for new writes)
Uncompacted Data Tracking:
- Old values from
Setoperations count as uncompacted - Removed keys and their
Removecommands count as uncompacted - When threshold is exceeded, compaction runs automatically
This approach prevents unbounded growth from key updates and deletions while maintaining multiple active log files.
Benchmarks from the test suite:
- Write Throughput: ~20-50K ops/sec for unique keys
- Read Throughput: ~100-500K ops/sec (depends on access pattern)
- Mixed Workload: ~50-100K ops/sec (9:1 read:write ratio)
- Compaction Overhead: 30-70% file size reduction per compaction
Performance characteristics:
- Sequential reads are faster than random reads (seeks are expensive)
- Compaction triggers are adaptive based on file size
- Large values (>10KB) reduce throughput proportionally
Run the full test suite:
cargo testThe test suite includes:
- CLI integration tests
- Unit tests for core operations
- Performance benchmarks for various workload patterns
- Compaction efficiency tests
- Stress tests with mixed operations
The library uses custom error types via thiserror:
pub enum KvError {
Io(std::io::Error), // I/O errors
KeyNotFound, // Key doesn't exist (remove only)
Serde(serde_json::Error), // JSON serialization errors
UnexpectedCommandType, // Corrupted log or program bug
}- clap: Command-line argument parsing (v4.x)
- serde/serde_json: JSON serialization/deserialization
- thiserror: Error handling
- tempfile: Temporary directories for testing
- walkdir: Directory traversal for performance tests
- rand: Random number generation for benchmarks
kvs/
├── src/
│ ├── lib.rs # Core KvStore implementation
│ │ # - BTreeMap index with CommandPos
│ │ # - Multi-generation log management
│ │ # - BufReaderWithPos & BufWriterWithPos
│ │ # - Compaction logic
│ └── bin/
│ └── kvs.rs # CLI binary
├── tests/
│ └── tests.rs # Integration and performance tests
├── kvs_logs/ # Default log directory (CLI)
│ ├── 1.log # Generation files
│ ├── 2.log
│ └── ...
├── Cargo.toml
└── README.md
- Background compaction thread to avoid blocking writes
- Configurable compaction thresholds
- Bloom filters for faster negative lookups
- Binary encoding (Protocol Buffers/bincode) for better space efficiency
- LRU cache for frequently accessed values
- Concurrent reads with read-write locks