Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 50 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ NetCopy is **not**:
bearer token,
- a backup tool — there is no scheduling, snapshotting, or retention.

Tested on Linux. Runs on macOS and Windows too; see [Known issues](#known-issues)
for platform caveats.
Linux is the only platform under CI and the only one we ship release images
for. The pure-Java parts run on macOS and Windows too, but a few platform
quirks aren't tested on every commit — see [Known issues](#known-issues).

## Quick start

Expand Down Expand Up @@ -145,40 +146,71 @@ NetCopy splits cleanly into a **control plane** and a **data plane**.
| | | | data | | | |
| HttpPuller TcpPuller (port 7778 server) |<------>| HttpPuller TcpPuller (port 7778 server) |
| | | | | | | |
| SidecarStore (data.partial + chunks.bitmap + meta.json) |
| SidecarStore (data.partial + chunks.bitmap + chunks.hashes + meta.json) |
| JsonJobStore (<state-dir>/jobs/<id>.json) |
+---------------------------------------------+ +---------------------------------------------+
```

**Control plane (HTTP + WebSocket via Javalin, port 7777):**

- `GET /api/health` — liveness probe (no auth).
- `GET /api/browse` — list a directory under one of the peer's `--shared-root`s.
- `POST /api/manifest` — ask the peer to plan a transfer; returns a
`manifestId` plus a flat list of files, sizes, mtimes, and chunk plans.
- `POST /api/transfers` — start a job locally that will pull a manifest from
a remote peer. Persists a `JobState` in `<state-dir>/jobs/<id>.json`.
- `GET /api/transfers/{id}` — poll job state.
- `WS /ws/progress` — live `ProgressEvent`s (subscribed per `transferId`).
| Endpoint | Auth | Purpose |
|---|---|---|
| `GET /api/health` | no | Liveness probe (open). |
| `GET /api/peer/info` | yes | Peer self-description: hostname, version, TCP blob port, root counts. |
| `GET /api/browse?root=&path=` | yes | List a directory under a `--shared-root`. |
| `GET /api/browse-local?root=&path=` | yes | Same shape, rooted under a `--receive-root` (UI uses it for the target panel). |
| `POST /api/browse/stats` | yes | Recursive file count + byte total per path; powers the selection-stats footer. |
| `POST /api/manifest` | yes | Plan a transfer. Returns the full manifest (entries, sizes, mtimes, chunk plans, `manifestId`). |
| `POST /api/manifest/register` | yes | Re-register a previously-issued manifest (used by the puller after a source-side restart). |
| `GET /api/blob/{manifestId}/{fileId}` | yes | HTTP data-plane: file bytes (with `Range` support, `X-Chunk-Hash` response header). |
| `GET /api/hash/{manifestId}/{fileId}` | yes | Lazy XXH3-128 of a manifest entry; returns `202` while computing. |
| `POST /api/transfers` | yes | Start a job (target host pulls from a remote source). |
| `GET /api/transfers` | yes | List status snapshots (newest first). |
| `GET /api/transfers/{id}` | yes | Single status snapshot, including per-file table and per-chunk metrics. |
| `POST /api/transfers/{id}/{pause,resume,cancel}` | yes | Lifecycle controls. |
| `DELETE /api/transfers/{id}` | yes | Dismiss a terminal-state job from the persistent store. |
| `POST /api/relay/push` | yes | "Push from here to peer" — proxies `POST /api/transfers` to the peer using its token. |
| `GET /api/metrics` | yes | Host metrics (CPU/RAM/disk/GC, top threads) + per-server serve metrics. |
| `WS /ws/progress` | yes | Live `ProgressEvent` stream (subscribe per transfer or wildcard). |

**Data plane (two interchangeable protocols):**

- `GET /api/blob/{manifestId}/{fileId}` with HTTP `Range` headers, served by
Javalin via `FileChannel.transferTo`.
- A custom binary TCP protocol on port 7778: framed `[len:u32][type:u8][payload]`
with `HELLO`/`REQUEST`/`DATA_HEAD`/`DATA`/`DATA_END`/`ERR`/`BYE`. Designed to
reuse one connection across many `pullChunk` calls and avoid HTTP parsing
overhead at the price of a more interesting wire format.
with `HELLO` / `REQUEST` / `DATA_HEAD` / `DATA` / `DATA_END` / `DATA_END_V2`
(xxh3 trailer, single-pass; v0.3.0+) / `ERR` / `BYE`. Designed to reuse one
connection across many `pullChunk` calls and avoid HTTP parsing overhead at
the price of a more interesting wire format.

The protocol is selected per job at start time. See
[docs/protocol-comparison.md](docs/protocol-comparison.md) for benchmarks.
[docs/protocol-comparison.md](docs/protocol-comparison.md) for design notes.

**State and resume:**

- Each in-progress target file owns a sidecar directory `<file>.netcopy/`
containing `data.partial` (sparse, written at offsets), `meta.json`
(size, mtime, chunk plan), and `chunks.bitmap` (1 bit per chunk, set after
the chunk is downloaded **and** its xxh3-128 hash verified).
containing four files:
- `data.partial` — sparse, pre-allocated to the final size, written via
positional FileChannel writes;
- `meta.json` — immutable per-file descriptor (relPath, size, sourceMtime,
chunk plan, `schemaVersion`);
- `chunks.bitmap` — one bit per chunk, set after the chunk's bytes are
fsynced **and** its **xxh3-128** chunk-level hash matches what the source
advertised on the wire;
- `chunks.hashes` — fixed-size array of XXH3-128 digests (16 bytes per
chunk), positionally written as each chunk completes. Used by the
selective re-verify path on full-file hash mismatch so resume re-pulls
only the corrupted chunks instead of the whole file.
- Hashing has two layers:
- **Per-chunk** verification (and the on-the-wire `X-Chunk-Hash` /
`DATA_END_V2`) is **XXH3-128** — fast, ~10 GB/s on x86, allocates a small
per-chunk buffer.
- **Full-file finalize** is **SHA-256** in 256 KiB strides. Streaming
XXH3-128 in this codebase buffers all bytes into a `ByteArrayOutputStream`
that overflows the array-size limit on multi-GiB files — SHA-256 streams
cleanly via `MessageDigest.update`. The resulting digest lives in the
JSON's `hashHex` field for v0.x wire-format stability (the field name
will change in a future major bump).
- After all chunks are verified, `FileFinalizer` rehashes the whole file and
atomic-renames `data.partial` to the final target path.
- A job's overall state lives in `<state-dir>/jobs/<id>.json` (one JSON per
Expand Down
160 changes: 83 additions & 77 deletions docs/protocol-comparison.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,88 @@
# HTTP vs TCP — protocol comparison

NetCopy ships two interchangeable data-plane protocols. This document is
the home of the quantitative comparison between them. The numbers below
are produced by task **V5 — protocol comparison** and are placeholders
until that pass runs.

## What we are measuring

The same workload runs back-to-back over both protocols, on the same two
hosts, with the same chunk plan. Each row in the table below should report
median and p95 of three runs.

- **Throughput**: useful payload bytes per wall-clock second, averaged over
the whole transfer.
- **Time to first byte (TTFB)**: from `POST /api/transfers` accepting to the
first `ChunkCompleted` ProgressEvent.
- **CPU time**: server-side and client-side `getrusage` deltas, normalised
per GB transferred.
- **Connection count**: peak concurrent sockets the data plane opened.
- **Behaviour under loss**: same transfer with `tc qdisc add ... netem
loss 1%` applied to the receive interface — does the protocol recover
cleanly, and what is the throughput delta.

## Test workloads
# HTTP vs TCP — protocol design notes

NetCopy ships two interchangeable data-plane protocols. The user picks one
per transfer. This document explains the trade-offs and points to a manual
reproduction for benchmark numbers.

## What's different

Both protocols carry the same byte payload (file contents, in chunks, with
XXH3-128 chunk-level verification). They differ in framing and how the
hash gets onto the wire:

- **HTTP** — `GET /api/blob/{manifestId}/{fileId}` with a `Range:
bytes=START-END` header per chunk. Connection reuse via keep-alive. Server
pre-computes the chunk's XXH3-128, sets it as `X-Chunk-Hash` response
header, then streams the body via `FileChannel.transferTo` (which on Linux
decays to `sendfile(2)`). Pro: trivial to debug with `curl`, plays well with
any HTTP-aware proxy. Con: HTTP parsing overhead per chunk, and HTTP/1.1
connection-per-concurrent-chunk.
- **TCP** — one long-lived connection per peer, multiplexed by `reqId`.
Custom binary framing (see `tasks/contracts/data-formats.md`). Versioned
protocol: v1 is two-pass (hash → DataHead → stream → DataEnd, identical to
the HTTP path conceptually); **v2 (default since v0.3.0)** streams and
hashes in a single pass, putting the digest in a trailing `DataEndV2`
frame. Pro: fewer TCP connections (one per peer), no HTTP overhead, single
read pass on the source-side disk. Con: needs its own port (`--tcp-port`),
not curl-debuggable.

## Where the difference matters

- **Many small files (≤ 1 MB each).** TCP wins clearly. HTTP pays a full
request line + headers per chunk; with thousands of files this dominates.
- **One big file (multi-GB) on a fast disk.** Mostly identical. Both
protocols are CPU-bound on the hash and IO-bound on the disk; framing
overhead is in the noise.
- **One big file on a cold-cache HDD.** TCP v2 is meaningfully faster
because it does one disk read per chunk on the source instead of two.
v1's two-pass design was tractable on SSDs (the second pass came from the
page cache) but on HDD the source ended up reading the file twice with
cold seeks. v0.3.0 fixed that.
- **Lossy network.** Both rely on the kernel's TCP retransmit; the
application layers don't differ. NetCopy retries failed chunks with
exponential backoff identically.

In practice the user-visible bottleneck on a LAN is almost always **the
slower of the two disks** (source HDD seek + receiver fsync), not the
protocol. We've measured ~50–60 MB/s sustained from a single HDD source
with 8 parallel chunks regardless of which protocol we pick.

## Reproducing a comparison by hand

1. Start two daemons with identical flags except `--port`, `--tcp-port`, and
roots. Pin the JVM with `-XX:ActiveProcessorCount=N` if you want to
compare across CPU budgets.
2. Pre-generate the workload under one daemon's `--shared-root`.
3. From the UI on the other daemon, plan a transfer, then start it twice in
a row — once with `protocol: "http"`, once with `"tcp"`. Record the
`TransferCompleted` event's `totalDurationMs` and `avgThroughputBps`,
and screenshot the Performance modal's "This transfer (chunks)" tile for
per-chunk timings.
4. For loss runs:

```bash
sudo tc qdisc add dev <iface> root netem loss 1%
# ... run the transfer ...
sudo tc qdisc del dev <iface> root
```

5. Repeat with the TCP server disabled (`--tcp-port 0`) on the source side
to confirm the HTTP fallback works.

We deliberately don't ship a canned benchmark table here: numbers from a
single hardware setup mislead readers comparing to their own. The
Performance modal already exposes the per-chunk timings (source latency,
wire time, persist time, pool acquire wait) you need to identify your own
bottleneck.

## Suggested workloads

| ID | Description |
|---|---|
| W1 | One 32 GB file (large-chunk path) |
| W2 | 1000 small files of ~64 KB each (small-chunk path, file-parallelism dominates) |
| W1 | One 32 GB file (large-chunk path; tests sustained throughput) |
| W2 | 1000 small files of ~64 KB each (request count dominates) |
| W3 | Mixed: 4 GB ISO + 50 MB of small docs (typical real-world mix) |
| W4 | W1 again, but with `--file-parallelism=1 --chunks-per-file=1` (single-stream baseline) |

Each workload runs once over HTTP (`--tcp-port 0` on the server side) and
once over TCP (`protocol: "tcp"` in the transfer request).

## Results — placeholder
| W4 | W1 again with `--file-parallelism=1 --chunks-per-file=1` (single-stream baseline) |

Filled in by V5.

| Workload | Protocol | Throughput (MB/s) | TTFB (ms) | Server CPU (s/GB) | Peak conns | Loss 1% throughput |
|---|---|---|---|---|---|---|
| W1 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
| W1 | TCP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
| W2 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
| W2 | TCP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
| W3 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
| W3 | TCP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
| W4 | HTTP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |
| W4 | TCP | _TBD_ | _TBD_ | _TBD_ | _TBD_ | _TBD_ |

## Provisional reasoning

Until V5 produces real numbers, the design intuition is:

- **W1 (one big file)**: the two protocols should be within a few percent.
Both are dominated by `FileChannel.transferTo` on the server and direct
`pwrite` on the client; the framing overhead is amortised across multi-MB
chunks.
- **W2 (many small files)**: TCP should win materially. HTTP pays a full
request/response round-trip per chunk, plus header parsing; TCP reuses
one connection and sends only an 8-byte `REQUEST` frame per chunk.
- **W3 (mixed)**: closer to W1 by byte count; closer to W2 by request count.
Expect TCP to be modestly ahead.
- **W4 (single stream)**: both protocols saturate one TCP flow; the
bottleneck is the kernel and the NIC, not the framing.

## Reproducing the benchmark

V5 will publish a script under `verify/V5/` that drives both daemons in
the same JVM (or two JVMs on the same host) using a tmpfs receive root
to factor out disk speed. Until then, reproduce by hand:

1. Start two daemons with identical flags except `--port`, `--tcp-port`,
and roots. Pin the JVM with `-XX:ActiveProcessorCount=N` if you want
to compare across CPU budgets.
2. Pre-generate the workload under one daemon's `--shared-root`.
3. From the UI on the other daemon, plan a transfer, then start it twice
in a row — once with protocol HTTP, once with TCP. Record the
`TransferCompleted` event's `totalDurationMs` and `avgThroughputBps`.
4. For loss runs: `sudo tc qdisc add dev <iface> root netem loss 1%`.
Don't forget to `tc qdisc del` afterwards.
W2 is the workload where TCP shows its largest advantage; W4 is where the
two protocols converge.
Loading
Loading