A lightweight, persistent key-value store with LRU caching and log-structured storage
Inspired by Bitcask โข Built with Pure Python โข Zero Dependencies
Features โข Installation โข Quick Start โข Documentation โข Architecture
BitPyStore is a Python-based key-value database implementing log-structured storage with crash recovery, TTL support, and LRU caching. Inspired by Riak's Bitcask storage engine, it provides a simple yet powerful storage solution perfect for learning database internals or embedding in small applications.
- โ Zero Dependencies - Pure Python implementation, no external libraries required
- โ Crash Recovery - Automatic index rebuilding from append-only log
- โ Fast Reads - LRU cache with O(1) lookups for frequently accessed keys
- โ TTL Support - Automatic key expiration with time-to-live
- โ Data Integrity - CRC32 checksums prevent data corruption
- โ Network Ready - Built-in TCP server for remote access
- โ Educational - Clean, readable code demonstrating core database concepts
| Feature | Description |
|---|---|
| Persistent Storage | Append-only log file ensures data durability across restarts |
| In-Memory Index | Hash table for O(1) key lookups without scanning files |
| LRU Cache | Configurable cache (default: 1000 items) for hot data |
| TTL Expiration | Time-based automatic key deletion |
| Log Compaction | Garbage collection to reclaim space from deleted/old records |
| Checksum Verification | CRC32 integrity checks on every read |
| TCP Server | Network access via simple text-based protocol |
| Context Manager | Automatic resource cleanup with with statement |
| Crash Recovery | Rebuilds index by replaying log file on startup |
- Language: Python 3.6+
- Storage: Log-structured file storage (Bitcask-inspired)
- Caching: Custom LRU implementation using OrderedDict
- Networking: TCP socket server (stdlib)
- Serialization: JSON for data records
- Integrity: CRC32 checksums (zlib)
- Dependencies: None (pure Python stdlib)
- Python 3.6 or higher
- No external dependencies required!
git clone https://github.com/SurajAkhuli/BitPyStore.git
cd BitPyStoremkdir -p dataThat's it! No pip install needed. ๐
1. Start the Server
python tcp_server.pyYou should see:
Server running on 127.0.0.1:5000
2. Connect with Client
Open a new terminal:
python examples/client.py3. Example Session
Server: Welcome to KVStore Server
> PUT name suraj
Server: OK
> GET name
Server: VALUE suraj
> PUT session:abc token123 TTL 60
Server: OK
> TTL name 3600
Server: OK
> STATS
Server: keys_in_index: 2
keys_in_cache: 2
put_count: 3
delete_count: 0
file_size_bytes: 512
last_compaction_time: None
> DEL name
Server: DELETED
> COMPACT
Server: OK
> EXIT
Server: OK
from engine import KVStore
# Use context manager for automatic cleanup
with KVStore("data/mydb.db") as db:
# Store a key-value pair
db.put("user:1", "Alice")
# Store with TTL (expires in 1 hour)
db.put("session:abc", "token123", ttl=3600)
# Retrieve value
user = db.get("user:1")
print(user) # Output: Alice
# Check if key exists
session = db.get("session:abc")
if session:
print(f"Active session: {session}")
# Delete a key
db.delete("user:1")
# Compact the log to reclaim space
db.compact()
# Get database statistics
stats = db.stats()
print(f"Total keys: {stats['keys_in_index']}")
print(f"Cache size: {stats['keys_in_cache']}")BitPyStore/
โโโ README.md # Project documentation
โโโ engine.py # Core KVStore implementation
โโโ lru_cache.py # LRU cache implementation
โโโ tcp_server.py # TCP server for network access
โโโ examples/
โ โโโ client.py # Example TCP client
โโโ test/
โ โโโ test_engine.py # Unit tests and examples
โโโ data/ # Data directory (created at runtime)
โโโ *.log # Log files
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Client Layer โ
โ (examples/client.py) โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TCP Socket
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Network Layer โ
โ (tcp_server.py) โ
โ Protocol: PUT, GET, DEL, TTL, STATS, etc. โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Storage Engine โ
โ (engine.py) โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Index โ โ LRU Cache โ โ
โ โ keyโoffset โ โ 1000 items โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Persistent Storage โ
โ bitpystore.db (Log File) โ
โ Format: [length checksum]\n[JSON]\n โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
flowchart TD
A[Client: PUT key value] --> B[TCP Server receives command]
B --> C[Parse command & arguments]
C --> D{TTL specified?}
D -->|Yes| E[Calculate expiry timestamp]
D -->|No| F[Set expiry = 0]
E --> G[Create JSON record]
F --> G
G --> H[Calculate CRC32 checksum]
H --> I[Get current file offset]
I --> J[Write header: length checksum]
J --> K[Write JSON data]
K --> L[Flush to disk]
L --> M[Update in-memory index]
M --> N[Update LRU cache]
N --> O[Send OK response to client]
style A fill:#e1f5ff
style O fill:#c8e6c9
style L fill:#fff9c4
flowchart TD
A[Client: GET key] --> B[TCP Server receives command]
B --> C{Key exists in index?}
C -->|No| D[Send NOT_FOUND]
C -->|Yes| E{Check TTL expiry}
E -->|Expired| F[Delete from index & cache]
F --> D
E -->|Valid| G{Key in LRU cache?}
G -->|Yes - Cache Hit| H[Return cached value]
G -->|No - Cache Miss| I[Read header offset from index]
I --> J[Seek to file position]
J --> K[Read JSON data]
K --> L[Verify CRC32 checksum]
L --> M{Checksum valid?}
M -->|No| N[Data corruption error]
M -->|Yes| O[Parse JSON]
O --> P[Extract value]
P --> Q[Update LRU cache]
Q --> H
H --> R[Send VALUE response to client]
style A fill:#e1f5ff
style R fill:#c8e6c9
style G fill:#fff9c4
style D fill:#ffcdd2
flowchart TD
A[Client: DEL key] --> B[TCP Server receives command]
B --> C[Create tombstone record]
C --> D[Set op: delete in JSON]
D --> E[Calculate CRC32 checksum]
E --> F[Write header to log]
F --> G[Write tombstone JSON]
G --> H[Flush to disk]
H --> I[Remove key from index]
I --> J[Remove key from cache]
J --> K[Increment delete_count]
K --> L[Send DELETED response]
style A fill:#e1f5ff
style L fill:#c8e6c9
style H fill:#fff9c4
flowchart TD
A[Client: COMPACT] --> B[Create temporary file]
B --> C[Iterate through all keys in index]
C --> D{More keys?}
D -->|No| M[Close both files]
D -->|Yes| E{Key expired?}
E -->|Yes| C
E -->|No| F[Read value from original log]
F --> G[Calculate new checksum]
G --> H[Write to temp file]
H --> I[Update new offsets]
I --> C
M --> N[Atomic replace: temp โ original]
N --> O[Reopen log files]
O --> P[Clear old index]
P --> Q[Rebuild index from compacted log]
Q --> R[Update last_compaction_time]
R --> S[Send OK response]
style A fill:#e1f5ff
style S fill:#c8e6c9
style N fill:#fff9c4
flowchart TD
A[BitPyStore starts] --> B[Open log file]
B --> C[Initialize empty index]
C --> D[Seek to beginning]
D --> E[Read header line]
E --> F{EOF?}
F -->|Yes| G[Recovery complete]
F -->|No| H[Parse header]
H --> I[Read JSON data]
I --> J[Verify checksum]
J --> K{Valid?}
K -->|No| L[Skip record]
K -->|Yes| M[Parse JSON]
M --> N{Op type?}
N -->|PUT| O[Update index]
N -->|DELETE| P[Remove from index]
O --> E
P --> E
L --> E
G --> Q[Ready for connections]
style A fill:#e1f5ff
style G fill:#c8e6c9
style J fill:#fff9c4
KVStore(filename="data/bitpystore.db")Parameters:
filename(str): Path to the log file
Example:
db = KVStore("data/mydb.db")Store a key-value pair with optional TTL.
Parameters:
key(str): The key to storevalue(any): The value (must be JSON serializable)ttl(int, optional): Time-to-live in seconds
Example:
db.put("config:timeout", 30)
db.put("session:xyz", {"user_id": 42}, ttl=1800) # Expires in 30 minutesRetrieve a value by key.
Parameters:
key(str): The key to retrieve
Returns:
- The stored value, or
Noneif not found/expired
Example:
value = db.get("config:timeout")
if value is None:
print("Key not found or expired")
else:
print(f"Value: {value}")Delete a key from the store.
Parameters:
key(str): The key to delete
Example:
db.delete("session:xyz")Compact the log file to remove deleted and expired records.
Example:
db.compact() # Reclaim disk spaceGet database statistics.
Returns:
- Dictionary with the following metrics:
keys_in_index: Number of keys in the indexkeys_in_cache: Number of keys cachedput_count: Total PUT operationsdelete_count: Total DELETE operationsfile_size_bytes: Log file sizelast_compaction_time: Timestamp of last compaction
Example:
stats = db.stats()
print(f"Keys: {stats['keys_in_index']}")
print(f"File size: {stats['file_size_bytes']} bytes")| Command | Syntax | Description | Response |
|---|---|---|---|
| PUT | PUT key value [TTL seconds] |
Store key-value pair | OK |
| GET | GET key |
Retrieve value | VALUE data or NOT_FOUND |
| DEL | DEL key |
Delete key | DELETED |
| TTL | TTL key seconds |
Update TTL on existing key | OK or NOT_FOUND |
| STATS | STATS |
Get database statistics | Multi-line stats output |
| COMPACT | COMPACT |
Trigger log compaction | OK |
| SHUTDOWN | SHUTDOWN |
Stop the server | OK |
| EXIT | EXIT |
Disconnect client | OK |
| Operation | Time Complexity | I/O Operations | Notes |
|---|---|---|---|
| PUT | O(1) | 1 sequential write | Append to log file |
| GET (cache hit) | O(1) | 0 | Pure memory lookup |
| GET (cache miss) | O(1) | 1 random read | Index provides offset |
| DELETE | O(1) | 1 sequential write | Tombstone marker |
| COMPACT | O(N) | Read all + Write all | Blocks other operations |
Log Record Structure:
[length checksum]\n
[{"op": "put", "key": "name", "value": "suraj", "expiry": 0}]\n
Index Structure (In-Memory):
{
"name": (125, 175, 0), # (header_offset, json_offset, expiry)
"session:abc": (250, 300, 1700000000) # expiry timestamp for TTL
}# Store user sessions with automatic expiration
db.put(f"session:{session_id}", session_data, ttl=1800) # 30 minutes# Persistent application configuration
db.put("config:db_host", "localhost")
db.put("config:db_port", "5432")
db.put("config:max_connections", "100")# Cache expensive API responses
api_key = f"cache:api:{url}"
cached = db.get(api_key)
if not cached:
cached = expensive_api_call(url)
db.put(api_key, cached, ttl=300) # 5 minutes# Toggle features dynamically without deployment
db.put("feature:new_ui", "enabled")
db.put("feature:beta_access", "disabled")
if db.get("feature:new_ui") == "enabled":
show_new_ui()# Simple persistent job queue
db.put(f"job:{job_id}", job_data)
# Worker processes jobs
job = db.get(f"job:{job_id}")
if job:
process_job(job)
db.delete(f"job:{job_id}")- Single-threaded - No concurrent write support (use locks for multi-threading)
- Index in memory - All keys must fit in RAM
- No transactions - Single-key operations only, no ACID guarantees
- Blocking compaction - Compaction stops all operations temporarily
- No replication - Single-node only, no high availability
โ High-concurrency write workloads
โ Large datasets (> millions of keys with limited RAM)
โ Production systems requiring high availability
โ Applications needing multi-key ACID transactions
โ Distributed or replicated storage requirements
- Redis - For in-memory caching with persistence
- RocksDB - For embedded LSM-tree storage
- PostgreSQL - For transactional relational data
- MongoDB - For document-oriented data
Run the test suite:
python test/test_engine.py- โ Basic PUT/GET/DELETE operations
- โ Index persistence across restarts
- โ TTL expiration behavior
- โ Log compaction correctness
- โ Cache hit/miss scenarios
- โ Crash recovery simulation
- Thread-safe operations with read-write locks
- Background compaction (non-blocking)
- Automatic compaction triggers based on log size
- Batch PUT/DELETE operations
- Bloom filters for negative lookups
- Key iteration support with cursors
- Range queries (prefix matching)
- Snapshot isolation for consistent reads
- Log segmentation for better concurrency
- Compression support (snappy/zstd)
- Master-slave replication
- Distributed consensus (Raft)
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
git clone https://github.com/SurajAkhuli/BitPyStore.git
cd BitPyStore
python test/test_engine.py- Follow PEP 8 guidelines
- Add docstrings to new functions
- Include tests for new features
- Update README for API changes
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License - Copyright (c) 2024 Suraj Akhuli
- Inspired by: Bitcask - Riak's storage engine design
- Influenced by: LevelDB - Log-structured merge-tree patterns
- Educational Resource: Designing Data-Intensive Applications by Martin Kleppmann
Suraj Akhuli
- GitHub: @SurajAkhuli
- Project: BitPyStore
- Open an issue
- Start a discussion
โญ Star this repository if you find it helpful!
Made with โค๏ธ by Suraj Akhuli
Learning database internals, one commit at a time ๐