A distributed file system with quorum-based replication, implemented using Apache Thrift for communication. This system ensures data consistency and fault tolerance across multiple replicas via quorum consensus mechanisms.
This project implements a fault-tolerant, distributed file system where files are replicated across multiple nodes (replicas). Using quorum-based replication, read and write operations require consensus from a subset of replicas to ensure consistency and availability. The system includes:
- A coordinator node to manage versioning and request queuing
- A client interface for interacting with the system
- Replica servers that store and serve files
- Distributed Storage with High Availability: Cloud storage, edge computing, or multi-data center setups
- Strong Consistency Guarantees: Collaborative tools, distributed databases, and CDN backends
- Fault-Tolerant Access: Reliable access despite node/network failures
- Educational Use: Study quorum consensus and distributed systems
- 🔒 Consistency & Fault Tolerance: Quorum-based replication ensures correct reads and writes even in the presence of node failures, offering strong consistency guarantees.
- 📈 Scalability: Add replicas to distribute read and write load horizontally—ideal for scaling across data centers, edge nodes, or cloud instances.
- 🔁 Client-Side Simplicity: Clients interact with any replica; intelligent routing and quorum enforcement happen under the hood, simplifying integration and improving developer experience.
- ⚙️ Performance Tuning via Quorum Size Selection: Fine-tune read (
R) and write (W) quorum sizes based on workload characteristics—lower latencies for read-heavy or write-heavy environments while still ensuringR + W > Nfor safety. - 🌐 High Availability in Distributed Storage: Designed for cloud-native environments, edge computing platforms, or geographically distributed systems where availability despite node churn is critical.
- 📂 Resilient File Access: Maintains reliable read/write access across unreliable or high-latency networks, such as multi-data-center deployments or mobile backhauls.
- 📚 Research & Learning Friendly: Serves as a reference implementation for exploring quorum-based consensus, distributed systems behavior, and fault-tolerant architectures.
- 🧠 Use Case Ready: Supports applications like collaborative tools, distributed key-value stores, content delivery networks (CDNs), and other consistency-sensitive systems.
- Command-line interface to interact with replicas
- Supports
write,read, andlistoperations - Can communicate with any replica
- Local file storage backend
- Two types:
- Coordinator replica: Orchestrates read/write quorums and versioning
- Regular replica: Participates in quorum consensus
- Key classes:
ReplicaConnectionReplicaHandler
- Designated leader (
replica0) responsible for:- Request ordering
- Version control
- Managing quorum operations
- Defined in
service.thrift - Includes:
FileChunk,FileMetadata,StatusCode,ResponseReplicaServiceandCoordinatorService
thrift -r --gen py service.thriftUse the Makefile for convenience:
# Start all replica servers (replica0 is the coordinator)
make start_replicas
# Stop all replica servers
make stop_replicas
# Clean all replica directories
make clean_all# Write a test file
make write_file
# Read a test file
make read_file
# List all files in the system
make list_filesRun concurrent operations:
make concurrent_ops OPS="write,read,list"Run concurrent operations on a specific file:
make concurrent FILE=myfile.txt OPS="write,read,list"make benchmark ITERATIONS=5 FILE=myfile.txt OPS="write,read,list"| Command | Description |
|---|---|
make start_replicas |
Starts all 8 replica servers (in parallel) |
make stop_replicas |
Stops all replica servers |
make write_file |
Writes testfile.txt to the system |
make read_file |
Reads testfile.txt from the system |
make list_files |
Lists all files in the system |
make clean_all |
Stops replicas and deletes replica directories |
make concurrent_ops OPS="write,read" |
Runs multiple operations concurrently |
make concurrent FILE=myfile.txt OPS="read,list" |
Runs file operations on a specific file |
make benchmark ITERATIONS=3 FILE=myfile.txt OPS="write,read" |
Benchmarks system performance |
- Ensures overlap between read (R) and write (W) quorums:
R + W > N, whereNis the number of total replicas - Prevents stale reads and write conflicts
- Tolerates up to
N - min(R, W)replica failures - Continues processing as long as quorum requirements are met
- Coordinator queues requests to maintain consistency
- Writes are versioned and replicated across quorums
