Skip to content

AlexTuring010/syspro-hw2

Repository files navigation

syspro-hw2

An NFS-style distributed file-synchronisation system in C: an nfs_manager orchestrates copies across multiple machines, talking to per-host nfs_client agents over a custom PUSH/PULL TCP protocol. Multi-threaded throughout — thread pool on the manager, per-connection threads on the clients.

Second homework of the System Programming course at the University of Athens (Department of Informatics & Telecommunications). Solo project. Tested by SSH'ing into actual lab machines and running real cross-host sync jobs.

What it does

The manager reads source→destination pairs from a config file (each side specified as host:port:/path), then keeps the destination side in sync with the source by issuing PULL requests to the source client and PUSH requests to the destination client.

                   ┌─────────────────────────────────────┐
                   │           nfs_manager               │
                   │      (thread pool, central)         │
                   │                                     │
                   │   ┌──────────────────┐              │
                   │   │   TaskQueue      │ mutex+cond   │
                   │   │  (file copies)   │              │
                   │   └────────┬─────────┘              │
                   │            │                        │
                   │   ┌────────┴─────────┐              │
                   │   │  worker threads  │              │
                   │   │  fetch_task →    │              │
                   │   │   complete_task  │              │
                   │   └──────────────────┘              │
                   │       │            │                │
                   │       ▼            ▼                │
                   │   PULL req      PUSH req            │
                   └───────┼────────────┼────────────────┘
                           │            │
              ┌────────────┘            └────────────┐
              ▼                                      ▼
     ┌────────────────┐                   ┌────────────────┐
     │  nfs_client    │                   │  nfs_client    │
     │  on host A     │                   │  on host B     │
     │  (multi-thread,│                   │  (multi-thread,│
     │   MAX_CLIENTS) │                   │   MAX_CLIENTS) │
     └────────────────┘                   └────────────────┘

Console commands: add <source-spec> <dest-spec>, cancel <source-spec>, shutdown.

Architecture choices

Manager: thread pool inside the TaskQueue

The biggest design call vs. HW1's process pool: threads, not forks, because the work is now I/O-bound on remote sockets rather than filesystem-bound. The thread-pool implementation lives inside TaskQueue itself — same module owns:

  • The buffer of pending tasks
  • The mutex protecting the buffer
  • Two condition variables (one for "queue not full", one for "queue not empty") so worker threads sleep cleanly instead of spinning
  • The worker-thread set itself

When you create a TaskQueue you pass it a complete_task function pointer. That function is the work each worker executes per task. For this project, complete_task lives in nfs_manager.c (it needs access to manager-internal helpers like read_line), so the queue stays generic but the work is project-specific.

Network protocol: keep PUSH connections open

PUSH (manager → destination client) sends file chunks. Two design options for connection handling:

  • (a) Open a fresh TCP connection per PUSH
  • (b) Open one connection per (source, destination) pair, hold it open across many PUSHes

I went with (b). The chunked-protocol design only makes sense if you reuse the connection — otherwise the chunk size is essentially irrelevant. Side benefit: logs show one PUSH log line per file with total bytes, instead of N noisy entries.

PULL (manager → source client) is treated as failed if the corresponding PUSH fails — once the destination side has rejected the data, there's no point continuing to fetch.

Clients: multi-threaded, capped by MAX_CLIENTS

Each nfs_client accepts multiple inbound connections concurrently — one thread per connection, up to MAX_CLIENTS (defined in Configuration.h). Past that, new connections wait. This was important for the cross-machine tests: a single client can be both a source for one pair and a destination for another, simultaneously.

Initialization and main loop

The manager starts processing tasks during initialization (not after), because the TaskQueue has a max size. If we waited until all initial tasks were enqueued before kicking off workers, the queue could fill and we'd deadlock. Instead the main thread enqueues work while worker threads pull from the same queue — the standard producer-consumer pattern.

Console commands during the main loop:

  • add — same path as initialization: open LIST against the source, enqueue file copies
  • cancel — drain queued tasks for that source, but don't interrupt an in-flight task (a worker that's mid-copy completes; the queue just stops feeding it more)
  • shutdown — graceful drain: stop accepting new commands, finish queued tasks, exit

DNS resolution

Initial submission required IP addresses (e.g., 195.134.65.78). The current code does proper hostname resolution, so linux07.di.uoa.gr works.

Modules

File Purpose
src/nfs_manager.c Event loop, console command handling, complete_task logic
src/nfs_console.c CLI frontend (add, cancel, shutdown)
src/nfs_client.c Per-host file server: handles LIST, PULL, PUSH requests
src/modules/TaskQueue.c Generic thread pool + bounded task queue (mutex + condvars)
src/modules/DirList.c List of watched directory pairs (carried over from HW1)
includes/Configuration.h MAX_CLIENTS, chunk size, port defaults

Build & run

make                                  # produces nfs_manager, nfs_console, nfs_client
# on each host:
./nfs_client -p 9000 -l client_log.txt &
# on the orchestrator:
./nfs_manager -c config_file -l manager_log.txt -n 4 -b 100 &
./nfs_console -l console_log.txt

A real run from the UoA DI lab machines (console_example.png):

Console example

The first two commands in that screenshot demonstrate error handling — invalid command, then an add with the wrong port that fails the sync with a client timeout. The directory entry still gets added to DirList, but no copies are scheduled.

Sequence

Part of a two-piece System Programming arc:

  1. syspro-hw1 — single-machine, process-pool, inotify-driven
  2. syspro-hw2 (you are here) — distributed, thread-pool, custom TCP protocol. Reuses HW1's TaskQueue and DirList modules.

License

MIT — applies to my own code in this repo. Assignment-distributed materials retain their original course copyright.

About

NFS-style distributed file synchronization in C: nfs_manager (thread pool inside TaskQueue) + nfs_console + per-host nfs_client agents over a custom PUSH/PULL TCP protocol. HW2 of the UoA DI System Programming class. Solo project.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors