Skip to content

TCKnickerbocker/Quorum_based_File_System

Repository files navigation

Quorum-Based File System

Jun-Ting Hsu, Thomas Knickerbocker

A distributed file system with quorum-based replication, implemented using Apache Thrift for communication. This system ensures data consistency and fault tolerance across multiple replicas via quorum consensus mechanisms.


📘 Overview

This project implements a fault-tolerant, distributed file system where files are replicated across multiple nodes (replicas). Using quorum-based replication, read and write operations require consensus from a subset of replicas to ensure consistency and availability. The system includes:

  • A coordinator node to manage versioning and request queuing
  • A client interface for interacting with the system
  • Replica servers that store and serve files

✅ Use Cases

  • Distributed Storage with High Availability: Cloud storage, edge computing, or multi-data center setups
  • Strong Consistency Guarantees: Collaborative tools, distributed databases, and CDN backends
  • Fault-Tolerant Access: Reliable access despite node/network failures
  • Educational Use: Study quorum consensus and distributed systems

🚀 Value Proposition

  • 🔒 Consistency & Fault Tolerance: Quorum-based replication ensures correct reads and writes even in the presence of node failures, offering strong consistency guarantees.
  • 📈 Scalability: Add replicas to distribute read and write load horizontally—ideal for scaling across data centers, edge nodes, or cloud instances.
  • 🔁 Client-Side Simplicity: Clients interact with any replica; intelligent routing and quorum enforcement happen under the hood, simplifying integration and improving developer experience.
  • ⚙️ Performance Tuning via Quorum Size Selection: Fine-tune read (R) and write (W) quorum sizes based on workload characteristics—lower latencies for read-heavy or write-heavy environments while still ensuring R + W > N for safety.
  • 🌐 High Availability in Distributed Storage: Designed for cloud-native environments, edge computing platforms, or geographically distributed systems where availability despite node churn is critical.
  • 📂 Resilient File Access: Maintains reliable read/write access across unreliable or high-latency networks, such as multi-data-center deployments or mobile backhauls.
  • 📚 Research & Learning Friendly: Serves as a reference implementation for exploring quorum-based consensus, distributed systems behavior, and fault-tolerant architectures.
  • 🧠 Use Case Ready: Supports applications like collaborative tools, distributed key-value stores, content delivery networks (CDNs), and other consistency-sensitive systems.

🧩 Components

1. Client

  • Command-line interface to interact with replicas
  • Supports write, read, and list operations
  • Can communicate with any replica

2. Replica Server

  • Local file storage backend
  • Two types:
    • Coordinator replica: Orchestrates read/write quorums and versioning
    • Regular replica: Participates in quorum consensus
  • Key classes:
    • ReplicaConnection
    • ReplicaHandler

3. Coordinator

  • Designated leader (replica0) responsible for:
    • Request ordering
    • Version control
    • Managing quorum operations

4. Thrift Interfaces

  • Defined in service.thrift
  • Includes:
    • FileChunk, FileMetadata, StatusCode, Response
    • ReplicaService and CoordinatorService

🛠️ Setup and Usage

1. Generate Thrift Code

thrift -r --gen py service.thrift

2. Start the System

Use the Makefile for convenience:

# Start all replica servers (replica0 is the coordinator)
make start_replicas

# Stop all replica servers
make stop_replicas

# Clean all replica directories
make clean_all

3. Client Operations

# Write a test file
make write_file

# Read a test file
make read_file

# List all files in the system
make list_files

Run concurrent operations:

make concurrent_ops OPS="write,read,list"

Run concurrent operations on a specific file:

make concurrent FILE=myfile.txt OPS="write,read,list"

4. Benchmarking

make benchmark ITERATIONS=5 FILE=myfile.txt OPS="write,read,list"

📁 Makefile Command Summary

Command Description
make start_replicas Starts all 8 replica servers (in parallel)
make stop_replicas Stops all replica servers
make write_file Writes testfile.txt to the system
make read_file Reads testfile.txt from the system
make list_files Lists all files in the system
make clean_all Stops replicas and deletes replica directories
make concurrent_ops OPS="write,read" Runs multiple operations concurrently
make concurrent FILE=myfile.txt OPS="read,list" Runs file operations on a specific file
make benchmark ITERATIONS=3 FILE=myfile.txt OPS="write,read" Benchmarks system performance

🔍 System Design Overview

Quorum Consensus

  • Ensures overlap between read (R) and write (W) quorums:
    R + W > N, where N is the number of total replicas
  • Prevents stale reads and write conflicts

Fault Tolerance

  • Tolerates up to N - min(R, W) replica failures
  • Continues processing as long as quorum requirements are met

Coordination

  • Coordinator queues requests to maintain consistency
  • Writes are versioned and replicated across quorums

System Architecture Diagram

System Diagram

About

A distributed file system using quorum-based replication to ensure data consistency and fault tolerance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors