Skip to content

Adityabhaskar685/kvs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KVS - Key-Value Store

A high-performance, persistent key-value store written in Rust with log-structured storage and automatic compaction.

Features

  • Persistent Storage: All data is durably written to disk using log-structured storage
  • Multi-Generation Logs: Uses multiple generation-numbered log files for efficient management
  • Fast Lookups: In-memory BTreeMap index provides O(log n) key lookups
  • Threshold-Based Compaction: Automatic compaction triggers when uncompacted data exceeds 5MB
  • CLI Interface: Simple command-line tool for interacting with the store
  • Crash Recovery: Rebuilds in-memory state from all log files on startup

Installation

cargo build --release

Usage

Command Line Interface

Set a key-value pair:

kvs set <KEY> <VALUE>

Get a value:

kvs get <KEY>

Remove a key:

kvs rm <KEY>

Examples:

# Store a value
kvs set mykey "Hello, World!"

# Retrieve the value
kvs get mykey
# Output: Hello, World!

# Remove the key
kvs rm mykey

Library Usage

use kvs::{KvStore, Result};

fn main() -> Result<()> {
    // Open or create a store at the specified path
    let mut store = KvStore::open("./data")?;

    // Set a key-value pair
    store.set("key1".to_owned(), "value1".to_owned())?;

    // Get a value
    if let Some(value) = store.get("key1".to_owned())? {
        println!("Found: {}", value);
    }

    // Remove a key
    store.remove("key1".to_owned())?;

    Ok(())
}

Architecture

Storage Model

KVS uses a log-structured storage approach with multiple generation files:

  1. Multi-Generation Logs: Operations are written to generation-numbered log files (e.g., 1.log, 2.log, 3.log)
  2. In-Memory Index: A BTreeMap maintains key → (generation, position, length) mappings for fast lookups
  3. Read Path: Reads use the index to seek to the correct position in the appropriate generation file
  4. Write Path: New operations are appended to the current generation file with buffered I/O

Command Format

Commands are stored as JSON objects:

{"Set":{"key":"mykey","value":"myvalue"}}
{"Remove":{"key":"mykey"}}

Automatic Compaction

The compaction algorithm runs automatically when uncompacted data exceeds 5MB threshold:

Compaction Process:

  1. Create a new compaction generation file (current_gen + 1)
  2. Copy all active key-value pairs to the compaction file
  3. Update the in-memory index to point to new positions
  4. Remove all stale generation files (< compaction_gen)
  5. Reset uncompacted counter to 0
  6. Increment current generation by 2 (one for compaction, one for new writes)

Uncompacted Data Tracking:

  • Old values from Set operations count as uncompacted
  • Removed keys and their Remove commands count as uncompacted
  • When threshold is exceeded, compaction runs automatically

This approach prevents unbounded growth from key updates and deletions while maintaining multiple active log files.

Performance

Benchmarks from the test suite:

  • Write Throughput: ~20-50K ops/sec for unique keys
  • Read Throughput: ~100-500K ops/sec (depends on access pattern)
  • Mixed Workload: ~50-100K ops/sec (9:1 read:write ratio)
  • Compaction Overhead: 30-70% file size reduction per compaction

Performance characteristics:

  • Sequential reads are faster than random reads (seeks are expensive)
  • Compaction triggers are adaptive based on file size
  • Large values (>10KB) reduce throughput proportionally

Testing

Run the full test suite:

cargo test

The test suite includes:

  • CLI integration tests
  • Unit tests for core operations
  • Performance benchmarks for various workload patterns
  • Compaction efficiency tests
  • Stress tests with mixed operations

Error Handling

The library uses custom error types via thiserror:

pub enum KvError {
    Io(std::io::Error),                // I/O errors
    KeyNotFound,                        // Key doesn't exist (remove only)
    Serde(serde_json::Error),          // JSON serialization errors
    UnexpectedCommandType,              // Corrupted log or program bug
}

Dependencies

  • clap: Command-line argument parsing (v4.x)
  • serde/serde_json: JSON serialization/deserialization
  • thiserror: Error handling
  • tempfile: Temporary directories for testing
  • walkdir: Directory traversal for performance tests
  • rand: Random number generation for benchmarks

Project Structure

kvs/
├── src/
│   ├── lib.rs              # Core KvStore implementation
│   │                       # - BTreeMap index with CommandPos
│   │                       # - Multi-generation log management
│   │                       # - BufReaderWithPos & BufWriterWithPos
│   │                       # - Compaction logic
│   └── bin/
│       └── kvs.rs          # CLI binary
├── tests/
│   └── tests.rs            # Integration and performance tests
├── kvs_logs/               # Default log directory (CLI)
│   ├── 1.log              # Generation files
│   ├── 2.log
│   └── ...
├── Cargo.toml
└── README.md

Future Enhancements

  • Background compaction thread to avoid blocking writes
  • Configurable compaction thresholds
  • Bloom filters for faster negative lookups
  • Binary encoding (Protocol Buffers/bincode) for better space efficiency
  • LRU cache for frequently accessed values
  • Concurrent reads with read-write locks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages