A distributed file storage system built in Java using TCP sockets, implementing file replication across multiple data stores with a central controller for coordination.
Personal project exploring distributed file storage, replication, and coordination in Java.
┌──────────┐
┌────┤Controller├────┐
│ └────┬─────┘ │
│ │ │
┌──┴───┐ ┌───┴──┐ ┌───┴──┐
│Dstore│ │Dstore│ │Dstore│
│ 1 │ │ 2 │ │ 3 │
└──────┘ └──────┘ └──────┘
The system consists of three component types:
- Controller - Central orchestrator that manages file metadata (the Index), handles client requests, and coordinates file replication across Dstores.
- Dstore (Data Store) - Storage nodes that hold actual file data on disk. Multiple Dstores run concurrently, each managing its own file folder.
- Client - Connects to the Controller to perform file operations.
Communication uses a custom text-based protocol over TCP:
| Operation | Client → Controller | Controller → Client | Controller → Dstore |
|---|---|---|---|
| Store | STORE filename size |
STORE_TO port1 port2... |
- |
| Load | LOAD filename |
LOAD_FROM port size |
- |
| Remove | REMOVE filename |
REMOVE_COMPLETE |
REMOVE filename |
| List | LIST |
LIST file1 file2... |
- |
- File Replication - Files are replicated across R Dstores (configurable replication factor)
- Concurrent Operations - Thread-safe operations using
ConcurrentHashMap,CountDownLatch, andReentrantReadWriteLock - Load Balancing - Store operations select Dstores with the fewest files
- Rebalancing - Periodic redistribution of files across Dstores to maintain even distribution
- Fault Tolerance - Handles Dstore disconnections, timeouts, and reload attempts across replicas
- The Controller uses a fair
ReentrantReadWriteLockto protect the Dstore connection set - Store and Remove operations use
CountDownLatchto await acknowledgements from Dstores with configurable timeouts - All connection handling runs on dedicated threads
- The
Indexclass usesConcurrentHashMapfor thread-safe file metadata tracking with state transitions (e.g., "store in progress" → "store complete")
- Java 17 or later
mkdir -p out && javac -cp lib/client.jar -d out $(find src -name '*.java' | sort) examples/*.java tests/*.javajava -cp lib/client.jar:out dfs.controller.Controller <cport> <replication_factor> <timeout_ms> <rebalance_period_s>Example:
java -cp lib/client.jar:out dfs.controller.Controller 12345 3 2000 10java -cp lib/client.jar:out dfs.dstore.Dstore <port> <controller_port> <timeout_ms> <file_folder>Example (start 3 Dstores):
java -cp lib/client.jar:out dfs.dstore.Dstore 12346 12345 2000 var/dstores/dstore1
java -cp lib/client.jar:out dfs.dstore.Dstore 12347 12345 2000 var/dstores/dstore2
java -cp lib/client.jar:out dfs.dstore.Dstore 12348 12345 2000 var/dstores/dstore3java -cp lib/client.jar:out ClientMain <controller_port> <timeout_ms>Example:
java -cp lib/client.jar:out ClientMain 12345 2000Start the controller and at least 3 Dstores first, then run:
java -cp lib/client.jar:out ConcurrencySmokeTest clients-only 12345src/dfs/controller/- Controller entry point and controller-side connection handlingsrc/dfs/dstore/- Dstore entry point and Dstore-side connection handlingsrc/dfs/core/- Shared protocol, indexing, rebalancing, and base connection classessrc/dfs/logging/- Internal logging infrastructureexamples/- Example client programs for manual testingtests/- End-to-end verification utilities, including the concurrency smoke testlib/- External client library dependencyconfig/- Auxiliary configuration files such asmy_policy.policyvar/- Recommended runtime location for Dstore folders, downloads, and upload fixtures
MIT License - see LICENSE