diff --git a/CHANGELOG.md b/CHANGELOG.md index 06d1858..c071574 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,285 +2,143 @@ All notable changes to RustHost are documented here. ---- - -## [0.1.0] — Initial Release - -This release resolves all 40 issues identified in the 2026-03-20 comprehensive security and reliability audit. Changes are grouped by the audit's five severity phases. +The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). +RustHost uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html). --- -### Phase 1 — Critical Security & Correctness - -#### 1.1 — Config Path Traversal: `site.directory` and `logging.file` Validated - -`src/config/loader.rs` — `validate()` now rejects any `site.directory` or `logging.file` value that is an absolute path, contains a `..` component, or contains a platform path separator. The process exits with a clear validation error before binding any port. Previously, a value such as `directory = "../../etc"` caused the HTTP server to serve the entire `/etc` tree, and a value such as `../../.ssh/authorized_keys` for `logging.file` caused log lines to be appended to the SSH authorized keys file. - -#### 1.2 — Race Condition: Tor Captures Bound Port via `oneshot` Channel - -`src/runtime/lifecycle.rs`, `src/server/mod.rs` — The 50 ms sleep that was the sole synchronisation barrier between the HTTP server binding its port and the Tor subsystem reading that port has been replaced with a `tokio::sync::oneshot` channel. The server sends the actual bound port through the channel before entering the accept loop; `tor::init` awaits that value (with a 10-second timeout) rather than reading a potentially-zero value out of `SharedState`. Previously, on a loaded system the race could be lost silently, causing every inbound Tor connection to fail with `ECONNREFUSED` to port 0 while the dashboard displayed a healthy green `TorStatus::Ready`. - -#### 1.3 — XSS in Directory Listing via Unsanitised Filenames - -`src/server/handler.rs` — `build_directory_listing()` now HTML-entity-escapes all filenames before interpolating them into link text (`&` → `&`, `<` → `<`, `>` → `>`, `"` → `"`, `'` → `'`) and percent-encodes filenames in `href` attribute values. Previously, a file named `">` produced an executable XSS payload in any directory listing page. - -#### 1.4 — HEAD Requests No Longer Receive a Response Body - -`src/server/handler.rs` — `parse_path()` now returns `(method, path)` instead of only the path. The method is threaded through to `write_response()` via a `suppress_body: bool` parameter. For `HEAD` requests, response headers (including `Content-Length` reflecting the full body size, as required by RFC 7231 §4.3.2) are written, but the body is not sent. - -#### 1.5 — Request Timeout Prevents Slow-Loris DoS - -`src/server/handler.rs` — The call to `read_request()` is now wrapped in `tokio::time::timeout(Duration::from_secs(30))`. Connections that fail to deliver a complete request header within 30 seconds receive a `408 Request Timeout` response and are closed. The timeout is also configurable via `[server] request_timeout_secs` in `settings.toml`. Timeout events are logged at `debug` level to avoid log flooding under attack. - -#### 1.6 — Unbounded Connection Spawning Replaced with Semaphore - -`src/server/mod.rs`, `src/tor/mod.rs` — Both the HTTP accept loop and the Tor stream request loop now use a `tokio::sync::Semaphore` to cap concurrent connections. The limit is configurable via `[server] max_connections` (default: 256). The semaphore `OwnedPermit` is held for the lifetime of each connection task and released on drop. When the limit is reached, the accept loop suspends naturally, providing backpressure; a `warn`-level log entry is emitted. Previously, unlimited concurrent connections could exhaust task stack memory and file descriptors. - -#### 1.7 — Files Streamed Instead of Read Entirely Into Memory - -`src/server/handler.rs` — `tokio::fs::read` (which loads the entire file into a `Vec`) has been replaced with `tokio::fs::File::open` followed by `tokio::io::copy(&mut file, &mut stream)`. File size is obtained via `file.metadata().await?.len()` for the `Content-Length` header. Memory consumption per connection is now bounded by the kernel socket buffer (~128–256 KB) regardless of file size. For `HEAD` requests, the file is opened only to read its size; the `copy` step is skipped. +## [Unreleased] -#### 1.8 — `strip_timestamp` No Longer Panics on Non-ASCII Log Lines +### Added +- **`CONTRIBUTING.md`** — development workflow, lint gates, PR checklist, and architecture overview for new contributors. +- **`SECURITY.md`** — private vulnerability disclosure policy and scope definition. +- **`CHANGELOG.md`** — this file. +- **Depth-bounded `scan_site` BFS** — the directory scanner now stops at 64 levels deep and emits a warning instead of running indefinitely on adversarially deep directory trees. +- **Multiple log rotation backups** — `LogFile::rotate` now keeps up to five numbered backup files (`.log.1`–`.log.5`) instead of one, matching what operators expect from tools like `logrotate`. -`src/console/dashboard.rs` — `strip_timestamp()` previously used a byte index derived from iterating `.bytes()` to slice a `&str`, which panicked when the index fell inside a multi-byte UTF-8 character. The implementation now uses `splitn(3, ']')` to strip the leading `[LEVEL]` and `[HH:MM:SS]` tokens, which is both panic-safe and simpler. Any log line containing Unicode characters (Arti relay names, internationalized filenames, `.onion` addresses) is handled correctly. - -#### 1.9 — `TorStatus` Updated to `Failed` When Onion Service Terminates - -`src/tor/mod.rs` — When `stream_requests.next()` returns `None` (the onion service stream ends unexpectedly), the status is now set to `TorStatus::Failed("stream ended".to_string())` and the `onion_address` field is cleared from `AppState`. Previously, the dashboard permanently displayed a healthy green badge and the `.onion` address after the service had silently stopped serving traffic. - -#### 1.10 — Terminal Fully Restored on All Exit Paths; Panic Hook Registered - -`src/main.rs`, `src/console/mod.rs` — The error handler in `main.rs` now calls `console::cleanup()` (which issues `cursor::Show` and `terminal::LeaveAlternateScreen` before `disable_raw_mode`) on all failure paths. A `std::panic::set_hook` registered at startup ensures the same cleanup runs even when a panic occurs on an async executor thread. `console::cleanup()` is idempotent (guarded by a `RAW_MODE_ACTIVE` atomic swap), so calling it from multiple paths is safe. +### Changed +- **`lib.rs` visibility audit** — items only used in integration tests (`percent_decode`, `ByteRange`, `Encoding`, `onion_address_from_pubkey`) are now re-exported under `#[cfg(test)]` rather than unconditionally, reducing the public API surface. +- **Comment hygiene** — all internal `fix X.Y` tags have been replaced with descriptive prose so the rationale for each decision is clear to contributors. --- -### Phase 2 — High Priority Reliability - -#### 2.1 — HTTP Request Reading Buffered with `BufReader` - -`src/server/handler.rs` — `read_request()` previously read one byte at a time, issuing up to 8,192 individual `read` syscalls per request. The stream is now wrapped in `tokio::io::BufReader` and reads headers line-by-line with `read_line()`. The 8 KiB header size limit is enforced by accumulating total bytes read. This also correctly handles `\r\n\r\n` split across TCP segments. - -#### 2.2 — `scan_site` is Now Recursive, Error-Propagating, and Non-Blocking - -`src/server/mod.rs`, `src/runtime/lifecycle.rs`, `src/runtime/events.rs` — `scan_site` now performs a breadth-first traversal using a `VecDeque` work queue, counting files and sizes in all subdirectories. The return type is now `Result<(u32, u64)>`; errors from `read_dir` are propagated and logged at `warn` level rather than silently returning `(0, 0)`. All call sites wrap the function in `tokio::task::spawn_blocking` to avoid blocking the async executor on directory I/O. +## [0.1.0] — 2025-07-01 -#### 2.3 — `canonicalize()` Called Once at Startup, Not Per Request +This release resolves all 40 issues identified in the 2026-03-20 security and reliability audit. Every fix is listed below, grouped by the phase it belongs to. -`src/server/mod.rs`, `src/server/handler.rs` — The site root is now canonicalized once in `server::run()` and passed as a pre-computed `PathBuf` into each connection handler. The per-request `site_root.canonicalize()` call in `resolve_path()` has been removed, eliminating a `realpath()` syscall on every request. - -#### 2.4 — `open_browser` Deduplicated +--- -`src/runtime/lifecycle.rs`, `src/runtime/events.rs`, `src/runtime/mod.rs` — The `open_browser` function was duplicated in `lifecycle.rs` and `events.rs`. It now lives in a single location (`src/runtime/mod.rs`) and both call sites use the shared implementation. +### Added -#### 2.5 — `#[serde(deny_unknown_fields)]` on All Config Structs +#### Repository & CI (Phase 0) -`src/config/mod.rs` — All `#[derive(Deserialize)]` config structs (`Config`, `ServerConfig`, `SiteConfig`, `TorConfig`, `LoggingConfig`, `ConsoleConfig`, `IdentityConfig`) now carry `#[serde(deny_unknown_fields)]`. A misspelled key such as `bund = "127.0.0.1"` now causes a startup error naming the unknown field rather than silently using the compiled-in default. +- **`rust-toolchain.toml`** — pins the nightly channel so every contributor and CI run uses the same compiler. No more "works on my machine" build failures. +- **GitHub Actions CI** — runs build, test, clippy, rustfmt, `cargo-audit`, and `cargo-deny` on Ubuntu, macOS, and Windows on every push and PR. +- **`Cargo.toml` profile tuning** — `opt-level = 1` for dev dependencies speeds up debug builds; the release profile uses `lto = true`, `strip = true`, and `codegen-units = 1` for a smaller, faster binary. -#### 2.6 — `auto_reload` Removed (Was Unimplemented) +#### HTTP Server -`src/config/mod.rs`, `src/config/defaults.rs` — The `auto_reload` field was present in the config struct and advertised in the default `settings.toml` but had no implementation. It has been removed entirely. The `[R]` key for manual site stat reloads is unaffected. +- **Keep-alive via `hyper` 1.x** — migrated from a hand-rolled single-shot HTTP/1.1 parser to `hyper`. Eliminates the 30–45 second Tor page-load penalty that was caused by `Connection: close` on every response. +- **Brotli and Gzip compression** — negotiated via `Accept-Encoding`. Brotli is preferred over Gzip for Tor users since they pay in latency for every byte. +- **`ETag` / conditional GET** — weak ETags computed from file modification time and size. Returns `304 Not Modified` when `If-None-Match` matches, saving a round-trip. +- **Range requests** — supports `bytes=N-M`, `bytes=N-`, and `bytes=-N` suffix forms. Returns `206 Partial Content` or `416 Range Not Satisfiable` as appropriate. Enables audio and video seeking. +- **Per-IP rate limiting** — `DashMap`-backed lock-free CAS loop. Connections beyond `max_connections_per_ip` are dropped at accept time with a TCP RST. +- **Smart `Cache-Control`** — HTML responses get `no-store`; content-hashed assets (8–16 hex characters in the filename stem) get `max-age=31536000, immutable`; everything else gets `no-cache`. +- **Security headers on every response** — `X-Content-Type-Options: nosniff`, `X-Frame-Options: SAMEORIGIN`, `Referrer-Policy: no-referrer`, and `Permissions-Policy: camera=(), microphone=(), geolocation=()`. HTML responses additionally include a configurable `Content-Security-Policy`. +- **`--serve ` one-shot mode** — serve a directory directly without a `settings.toml`. Skips first-run setup entirely. +- **Extended MIME types** — added `.webmanifest`, `.opus`, `.flac`, `.glb`, and `.ndjson`. +- **Combined Log Format access log** — written to `logs/access.log` with owner-only `0600` permissions. -#### 2.7 — ANSI Terminal Injection Prevention Documented and Tested +#### Tor / Onion Service -`src/config/loader.rs` — The existing `char::is_control` check on `instance_name` (which covers ESC `\x1b`, NUL `\x00`, BEL `\x07`, and BS `\x08`) is confirmed to prevent terminal injection. An explicit comment now documents the security intent, and dedicated test cases cover each injection vector. +- **Idle timeout fix** (`copy_with_idle_timeout`) — replaced the wall-clock cap (which disconnected active large downloads after 60 seconds) with a true per-side idle deadline that resets on every read or write. +- **`reference_onion` test** — replaced the tautological self-referencing test with an external test vector computed independently using Python's standard library. -#### 2.8 — Keyboard Input Task Failure Now Detected and Reported +#### Configuration -`src/runtime/lifecycle.rs` — If the `spawn_blocking` input task exits (causing `key_rx` to close), `recv().await` returning `None` is now detected. A `warn`-level log entry is emitted ("Console input task exited — keyboard input disabled. Use Ctrl-C to quit.") and subsequent iterations no longer attempt to receive from the closed channel. Previously, input task death was completely silent. +- **URL redirect and rewrite rules** — `[[redirects]]` table in `settings.toml`, checked before filesystem resolution. Supports 301 and 302. +- **Custom error pages** — `site.error_404` and `site.error_503` config keys resolve to HTML files served with the correct status codes. +- **`--config` and `--data-dir` CLI flags** — override the default config and data directory paths. Enables multi-instance deployments and systemd unit files with explicit paths. +- **`--version` and `--help` CLI flags**. +- **`#[serde(deny_unknown_fields)]` on all config structs** — a misspelled key like `bund = "127.0.0.1"` causes a clear startup error instead of silently using the default. +- **Typed config fields** — `bind` is `std::net::IpAddr`; `log level` is a `LogLevel` enum. Invalid values are caught at deserialisation time, not after the server starts. -#### 2.9 — `TorStatus::Failed` Now Carries a Reason String +#### Features -`src/runtime/state.rs`, `src/console/dashboard.rs` — `TorStatus::Failed(Option)` (the exit code variant, which was never constructed) has been replaced with `TorStatus::Failed(String)`. Construction sites pass a brief reason string (`"bootstrap failed"`, `"stream ended"`, `"launch failed"`). The dashboard now renders `FAILED (reason) — see log for details` instead of a bare `FAILED`. +- **SPA fallback routing** — unknown paths fall back to `index.html` when `site.spa_routing = true`, enabling React, Vue, and Svelte client-side routing. +- **`canonical_root` hot reload** — the `[R]` keypress pushes a new canonicalised root to the accept loop over a `watch` channel without restarting the server. +- **Dependency log filtering** — Arti and Tokio internals at `Info` and below are suppressed by default, keeping the log focused on application events. Configurable via `filter_dependencies`. -#### 2.10 — Graceful Shutdown Uses `JoinSet` and Proper Signalling +#### Reliability -`src/runtime/lifecycle.rs`, `src/server/mod.rs`, `src/tor/mod.rs` — The 300 ms fixed sleep that gated shutdown has been replaced with proper task completion signalling. A clone of `shutdown_rx` is passed into `tor::init()`; the Tor run loop watches it via `tokio::select!` and exits cleanly on shutdown. In-flight HTTP connection tasks are tracked in a `JoinSet`; after the accept loop exits, `join_set.join_all()` is awaited with a 5-second timeout, allowing in-progress transfers to complete before the process exits. +- **Exponential backoff for Tor retries** — re-bootstrap retries now use exponential backoff (30 s, 60 s, 120 s, …, capped at 300 s) instead of a fixed linear delay. +- **Shutdown drain per subsystem** — HTTP and Tor drains each have their own independently-bounded timeout (5 s for HTTP, 10 s for Tor) so a slow HTTP drain doesn't steal time from Tor circuit teardown. +- **`percent-encoding` crate** — replaced the hand-rolled `percent_decode` function with the audited upstream crate. Added a null-byte guard specific to filesystem path use. +- **`scan_site` partial failure** — unreadable subdirectories are skipped with a warning instead of aborting the entire scan. +- **`fstat` batching** — `LogFile::write_line` calls `fstat` every 100 writes (instead of on every record) to reduce syscall overhead on active servers. -#### 2.11 — Log File Flushed on Graceful Shutdown +#### Testing & CI -`src/logging/mod.rs`, `src/runtime/lifecycle.rs` — A `pub fn flush()` function has been added to the logging module. The shutdown sequence calls it explicitly after the connection drain wait, ensuring all buffered log entries (including the `"RustHost shut down cleanly."` sentinel) are written to disk before the process exits. +- **Unit tests for all security-critical functions** — `percent_decode`, `resolve_path`, `validate`, `strip_timestamp`, and `hsid_to_onion_address` all have `#[cfg(test)]` coverage. +- **Integration tests** (`tests/http_integration.rs`) — covers all HTTP core flows using raw `TcpStream`: 200, HEAD, 304, 403, 404, 400, range requests, and oversized headers. --- -### Phase 3 — Performance - -#### 3.1 — `data_dir()` Computed Once at Startup - -`src/runtime/lifecycle.rs` — `data_dir()` (which calls `std::env::current_exe()` internally) was previously called on every key event dispatch inside `event_loop`. It is now computed exactly once at the top of `normal_run()`, stored in a local variable, and passed as a parameter to all functions that need it. - -#### 3.2 — `Arc` and `Arc` Eliminate Per-Connection Heap Allocations - -`src/server/mod.rs`, `src/server/handler.rs` — `site_root` and `index_file` are now wrapped in `Arc` and `Arc` respectively before the accept loop. Each connection task receives a cheap `Arc` clone (reference-count increment) rather than a full heap allocation. - -#### 3.3 — Dashboard Render Task Skips Redraws When Output Is Unchanged - -`src/console/mod.rs` — The render task now compares the rendered output string against the previously written string. If identical, the `execute!` and `write_all` calls are skipped entirely. This eliminates terminal writes on idle ticks, which is the common case for a server with no active traffic. - -#### 3.4 — MIME Lookup No Longer Allocates a `String` Per Request - -`src/server/mime.rs` — The `for_extension` function previously called `ext.to_ascii_lowercase()`, allocating a heap `String` on every request. The comparison now uses `str::eq_ignore_ascii_case` directly against the extension string, with no allocation. - -#### 3.5 — Log Ring Buffer Lock Not Held During `String` Clone - -`src/logging/mod.rs` — The log line string is now cloned before acquiring the ring buffer mutex. The mutex is held only for the `push_back` of the already-allocated string, reducing lock contention from Arti's multi-threaded internal logging. - -#### 3.6 — Tokio Feature Flags Made Explicit - -`Cargo.toml` — `tokio = { features = ["full"] }` has been replaced with an explicit feature list: `rt-multi-thread`, `net`, `io-util`, `fs`, `sync`, `time`, `macros`, `signal`. Unused features (`process`, `io-std`) are no longer compiled, reducing binary size and build time. +### Fixed + +#### Critical (Phase 1) + +- **Config path traversal** — `validate()` now rejects any `site.directory` or `logging.file` value that is an absolute path, contains `..`, or contains a platform path separator. Previously, `directory = "../../etc"` would cause the server to serve the entire `/etc` tree. +- **Tor port race condition** — replaced the 50 ms sleep used to synchronise the HTTP server's bound port with the Tor subsystem with a `tokio::sync::oneshot` channel. The server sends the actual bound port through the channel before entering the accept loop. Previously, on a loaded system, the race could be lost silently, causing every inbound Tor connection to fail with `ECONNREFUSED` to port 0 while the dashboard showed a healthy green status. +- **XSS in directory listings** — `build_directory_listing()` now HTML-entity-escapes all filenames before interpolating them into link text, and percent-encodes filenames in `href` attributes. Previously, a file named `">` produced an executable XSS payload in any directory listing page. +- **HEAD requests sent a response body** — `HEAD` requests now send the correct headers (including `Content-Length` reflecting the full body size) but no body, as required by RFC 7231 §4.3.2. Previously, the full file was sent. +- **Slow-loris DoS** — `read_request()` is now wrapped in a 30-second timeout. Connections that don't deliver a complete request header in time receive a `408 Request Timeout`. Configurable via `request_timeout_secs`. +- **Unbounded connection spawning** — both the HTTP accept loop and the Tor stream loop now use a `tokio::sync::Semaphore` to cap concurrent connections (default: 256). Previously, unlimited concurrent connections could exhaust file descriptors and task stack memory. +- **Files loaded entirely into memory** — replaced `tokio::fs::read` (which loaded the entire file into a `Vec`) with `tokio::fs::File::open` + `tokio::io::copy`. Memory per connection is now bounded by the kernel socket buffer (~128–256 KB) regardless of file size. +- **`strip_timestamp` panic on non-ASCII log lines** — the old implementation used a byte index derived from `.bytes()` to slice a `&str`, which panicked when the index fell inside a multi-byte UTF-8 character. Now uses `splitn(3, ']')`, which is both panic-safe and handles Unicode correctly. +- **`TorStatus` not updated when onion service terminates** — when the onion service stream ends unexpectedly, the status is now set to `TorStatus::Failed("stream ended")` and the `.onion` address is cleared. Previously, the dashboard permanently showed a healthy green badge after the service had silently stopped. +- **Terminal not restored on panic or crash** — a `std::panic::set_hook` is registered at startup to call `console::cleanup()` (which issues `LeaveAlternateScreen`, `cursor::Show`, and `disable_raw_mode`) on all exit paths. The cleanup function is idempotent, so calling it from multiple paths is safe. + +#### High — Reliability (Phase 2) + +- **HTTP request reading done byte-by-byte** — `read_request()` previously issued up to 8,192 individual `read` syscalls per request. The stream is now wrapped in `tokio::io::BufReader` and headers are read line-by-line. Also correctly handles `\r\n\r\n` split across multiple TCP segments. +- **`scan_site` only scanned the top-level directory** — now performs a full breadth-first traversal using a work queue, counting files and sizes in all subdirectories. Unreadable directories are skipped with a warning instead of propagating an error. +- **`canonicalize()` called on every request** — the site root is now canonicalised once at startup and passed into each connection handler. Eliminates a `realpath()` syscall on every single request. +- **`open_browser` duplicated** — the function existed in two separate source files. Now lives in one place (`src/runtime/mod.rs`). +- **`auto_reload` config field was unimplemented** — removed entirely. It was present in the config struct and advertised in the default `settings.toml` but had no effect. +- **Keyboard input task failure was silent** — if the input task exits unexpectedly (causing `key_rx` to close), a warning is now logged ("Console input task exited — keyboard input disabled. Use Ctrl-C to quit."). Previously, this failure was completely invisible. +- **`TorStatus::Failed` carried an exit code that was never set** — replaced `TorStatus::Failed(Option)` with `TorStatus::Failed(String)`. The dashboard now shows `FAILED (reason) — see log for details` with a human-readable reason string. +- **Graceful shutdown used a fixed 300 ms sleep** — replaced with proper task completion signalling. In-flight HTTP connections are tracked in a `JoinSet` and given 5 seconds to finish. The Tor run loop watches the shutdown signal via `tokio::select!` and exits cleanly. +- **Log file not flushed on shutdown** — added `pub fn flush()` to the logging module. The shutdown sequence calls it explicitly after the connection drain, ensuring the final log entries (including the shutdown sentinel) reach disk. + +#### Medium (Phase 3–5) + +- **`data_dir()` recomputed on every key event** — now computed once at startup and passed as a parameter. Removes the hidden `current_exe()` call from the hot event loop. +- **Per-connection heap allocations for `site_root` and `index_file`** — both are now wrapped in `Arc` and `Arc` before the accept loop. Each connection task gets a cheap reference-count increment instead of a full heap allocation. +- **Dashboard redrawn on every tick even when unchanged** — the render task now compares the new output against the previous one and skips writing to the terminal if they're identical. Eliminates unnecessary terminal writes on idle servers. +- **MIME lookup allocated a heap `String` per request** — replaced `ext.to_ascii_lowercase()` with `str::eq_ignore_ascii_case`. No allocation. +- **Log ring buffer lock held during `String` clone** — the log line is now cloned before acquiring the mutex. The lock is held only for the `push_back`, reducing contention from Arti's multi-threaded logging. +- **`tokio = { features = ["full"] }` compiled unused features** — replaced with an explicit feature list (`rt-multi-thread`, `net`, `io-util`, `fs`, `sync`, `time`, `macros`, `signal`). Reduces binary size and build time. +- **`sanitize_header_value` only stripped CR/LF** — now strips all C0 control characters (NUL, ESC, TAB, DEL), preventing header injection via crafted filenames or redirect targets. +- **`expose_dotfiles` checked on URL path instead of resolved path components** — the guard now inspects each path component after `canonicalize`, blocking escapes like `/normal/../.git/config`. +- **`render()` acquired the `AppState` lock twice per tick** — now acquires it once per tick, eliminating the TOCTOU race between two sequential acquisitions. +- **Stale "polling" message in dashboard** — Arti is event-driven, not polled. The message implying periodic polling has been removed. +- **`percent_decode` produced garbage for multi-byte UTF-8 sequences** — the old implementation decoded each `%XX` token as a standalone `char` cast from a `u8`. It now accumulates decoded bytes into a buffer and flushes via `String::from_utf8_lossy`, correctly reassembling multi-byte sequences. Null bytes (`%00`) are left as the literal string `%00`. +- **`deny.toml` missing five duplicate crate skip entries** — `foldhash`, `hashbrown`, `indexmap`, `redox_syscall`, and `schemars` were absent from `bans.skip` but present in the lock file. `cargo deny check` now passes cleanly. +- **`ctrlc` crate conflicted with Tokio's signal handling** — replaced with `tokio::signal::ctrl_c()` and `tokio::signal::unix::signal(SignalKind::interrupt())` integrated directly into `event_loop`. Eliminates the threading concerns between the two signal handling mechanisms. +- **`open_browser` silently swallowed spawn errors** — spawn errors are now logged at `warn` level. --- -### Phase 4 — Architecture & Design - -#### 4.1 — Typed `AppError` Enum Introduced - -`src/error.rs` (new), `src/main.rs`, all modules — The global `Box` result alias has been replaced with a typed `AppError` enum using `thiserror`. Variants: `ConfigLoad`, `ConfigValidation`, `LogInit`, `ServerBind { port, source }`, `Tor`, `Io`, `Console`. Error messages now preserve structured context at the type level. - -#### 4.2 — Config Structs Use Typed Fields - -`src/config/mod.rs`, `src/config/loader.rs` — `LoggingConfig.level` is now a `LogLevel` enum (`Trace` | `Debug` | `Info` | `Warn` | `Error`) with `#[serde(rename_all = "lowercase")]`; the duplicate validation in `loader.rs` and `logging/mod.rs` has been removed. `ServerConfig.bind` is now `std::net::IpAddr` via `#[serde(try_from = "String")]`. The parse-then-validate pattern is eliminated in favour of deserialisation-time typing. - -#### 4.3 — Dependency Log Noise Filtered by Default - -`src/logging/mod.rs` — `RustHostLogger::enabled()` now suppresses `Info`-and-below records from non-`rusthost` targets (Arti, Tokio internals). Warnings and errors from all crates are still passed through. This prevents the ring buffer and log file from being flooded with Tor bootstrap noise. Configurable via `[logging] filter_dependencies = true` (default `true`); set `false` to pass all crate logs at the configured level. - -#### 4.4 — `data_dir()` Free Function Eliminated; Path Injected - -`src/runtime/lifecycle.rs` and all callers — The `data_dir()` free function (which called `current_exe()` as a hidden dependency) has been removed. The data directory `PathBuf` is now a first-class parameter threaded through the call chain from `normal_run`, enabling test injection of temporary directories. - -#### 4.5 — `percent_decode` Correctly Handles Multi-Byte UTF-8 and Null Bytes - -`src/server/handler.rs` — The previous implementation decoded each `%XX` token as a standalone `char` cast from a `u8`, producing incorrect output for multi-byte sequences (e.g., `%C3%A9` was decoded as two garbage characters instead of `é`). The function now accumulates consecutive decoded bytes into a `Vec` buffer and flushes via `String::from_utf8_lossy` when a literal character is encountered, correctly reassembling multi-byte sequences. Null bytes (`%00`) are left as the literal string `%00` in the output rather than being decoded. - -#### 4.6 — `deny.toml` Updated with All Duplicate Crate Skip Entries +### Changed -`deny.toml` — Five duplicate crate version pairs that were absent from `bans.skip` but present in the lock file have been added with comments identifying the dependency trees that pull each version: `foldhash`, `hashbrown`, `indexmap`, `redox_syscall`, and `schemars`. `cargo deny check` now passes cleanly. - -#### 4.7 — `ctrlc` Crate Replaced with `tokio::signal` - -`Cargo.toml`, `src/runtime/lifecycle.rs` — The `ctrlc = "3"` dependency has been removed. Signal handling is now done via `tokio::signal::ctrl_c()` (cross-platform) and `tokio::signal::unix::signal(SignalKind::interrupt())` (Unix), integrated directly into the `select!` inside `event_loop`. This eliminates threading concerns between the `ctrlc` crate's signal handler and Tokio's internal signal infrastructure. +- **`Box` replaced with typed `AppError` enum** — uses `thiserror`. Variants: `ConfigLoad`, `ConfigValidation`, `LogInit`, `ServerBind { port, source }`, `Tor`, `Io`, `Console`. Error messages now preserve structured context. +- **Single `write_headers` path** — all security headers (CSP, HSTS, `X-Content-Type-Options`, etc.) are emitted from one function. Redirect responses delegate here instead of duplicating the header list, eliminating the risk of the two diverging. +- **`audit.toml` consolidated into `deny.toml`** — advisory suppression is managed in one place with documented rationale. CI now runs `cargo deny check` as a required step. --- -### Phase 5 — Testing, Observability & Hardening - -#### 5.1 — Unit Tests Added for All Security-Critical Functions - -`src/server/handler.rs`, `src/server/mod.rs`, `src/config/loader.rs`, `src/console/dashboard.rs`, `src/tor/mod.rs` — `#[cfg(test)]` modules added to each file. Coverage includes: `percent_decode` (ASCII, spaces, multi-byte UTF-8, null bytes, incomplete sequences, invalid hex); `resolve_path` (normal file, directory traversal, encoded-slash traversal, missing file, missing root); `validate` (valid config, `site.directory` path traversal, absolute path, `logging.file` traversal, port 0, invalid IP, unknown field); `strip_timestamp` (ASCII line, multi-byte UTF-8 line, line with no brackets); `hsid_to_onion_address` (known test vector against reference implementation). - -#### 5.2 — Integration Tests Added for HTTP Server Core Flows - -`tests/http_integration.rs` (new) — Integration tests using `tokio::net::TcpStream` against a test server bound on port 0. Covers: `GET /index.html` → 200; `HEAD /index.html` → correct `Content-Length`, no body; `GET /` with `index_file` configured; `GET /../etc/passwd` → 403; request header > 8 KiB → 400; `GET /nonexistent.txt` → 404; `POST /index.html` → 400. - -#### 5.3 — Security Response Headers Added to All Responses - -`src/server/handler.rs` — All responses now include `X-Content-Type-Options: nosniff`, `X-Frame-Options: SAMEORIGIN`, `Referrer-Policy: no-referrer`, and `Permissions-Policy: camera=(), microphone=(), geolocation=()`. HTML responses additionally include `Content-Security-Policy: default-src 'self'` (configurable via `[server] content_security_policy` in `settings.toml`). The `Referrer-Policy: no-referrer` header is especially relevant for the Tor onion service: it prevents the `.onion` URL from leaking in the `Referer` header to any third-party resources loaded by served HTML. - -#### 5.4 — Accept Loop Error Handling Uses Exponential Backoff - -`src/server/mod.rs` — The accept loop previously retried immediately on error, producing thousands of log entries per second on persistent errors such as `EMFILE`. Errors now trigger exponential backoff (starting at 1 ms, doubling up to 1 second). `EMFILE` is logged at `error` level (operator intervention required); transient errors (`ECONNRESET`, `ECONNABORTED`) are logged at `debug`. The backoff counter resets on successful accept. - -#### 5.5 — CLI Arguments Added (`--config`, `--data-dir`, `--version`, `--help`) - -`src/main.rs`, `src/runtime/lifecycle.rs` — The binary now accepts `--config ` and `--data-dir ` to override the default config and data directory paths (previously inferred from `current_exe()`). `--version` prints the crate version and exits. `--help` prints a usage summary. These flags enable multi-instance deployments, systemd unit files with explicit paths, and CI test runs without relying on the working directory. - -#### 5.6 — `cargo deny check` Passes Cleanly; `audit.toml` Consolidated - -`deny.toml`, CI — `audit.toml` (which suppressed `RUSTSEC-2023-0071` without a documented rationale) has been removed. Advisory suppression is now managed exclusively in `deny.toml`, which carries the full justification. CI now runs `cargo deny check` as a required step, subsuming the advisory check. The existing rationale for `RUSTSEC-2023-0071` is unchanged: the `rsa` crate is used only for signature verification on Tor directory documents, not for decryption; the Marvin timing attack's threat model does not apply. - ---- +### Removed -### HTTP Server - -- Custom HTTP/1.1 static file server built directly on `tokio::net::TcpListener` — no third-party HTTP framework dependency. -- Serves `GET` and `HEAD` requests; all other methods return `400 Bad Request`. -- Percent-decoding of URL paths (e.g. `%20` → space) before file resolution. -- Query string and fragment stripping before path resolution. -- Path traversal protection: every resolved path is verified to be a descendant of the site root via `std::fs::canonicalize`; any attempt to escape (e.g. `/../secret`) is rejected with `HTTP 403 Forbidden`. -- Request header size cap of 8 KiB; oversized requests are rejected immediately. -- `Content-Type`, `Content-Length`, and `Connection: close` headers on every response. -- Configurable index file (default: `index.html`) served for directory requests. -- Optional HTML directory listing for directory requests when no index file is found, with alphabetically sorted entries. -- Built-in "No site found" fallback page (HTTP 200) when the site directory is empty and directory listing is disabled, so the browser always shows a helpful message rather than a connection error. -- Placeholder `index.html` written on first run so the server is immediately functional out of the box. -- Automatic port fallback: if the configured port is in use, the server silently tries the next free port up to 10 times before giving up (configurable via `auto_port_fallback`). -- Configurable bind address; defaults to `127.0.0.1` (loopback only) with a logged warning when set to `0.0.0.0`. -- Per-connection Tokio tasks so concurrent requests never block each other. - -### MIME Types - -- Built-in extension-to-MIME mapping with no external dependency, covering: - - Text: `html`, `htm`, `css`, `js`, `mjs`, `txt`, `csv`, `xml`, `md` - - Data: `json`, `jsonld`, `pdf`, `wasm`, `zip` - - Images: `png`, `jpg`/`jpeg`, `gif`, `webp`, `svg`, `ico`, `bmp`, `avif` - - Fonts: `woff`, `woff2`, `ttf`, `otf` - - Audio: `mp3`, `ogg`, `wav` - - Video: `mp4`, `webm` - - Unknown extensions fall back to `application/octet-stream`. - -### Tor Onion Service (Arti — in-process) - -- Embedded Tor support via [Arti](https://gitlab.torproject.org/tpo/core/arti), the official Rust Tor implementation — no external `tor` binary or `torrc` file required. -- Bootstraps to the Tor network in a background Tokio task; never blocks the HTTP server or console. -- First run downloads approximately 2 MB of directory consensus data (approximately 30 seconds); subsequent runs reuse the cache and start in seconds. -- Stable `.onion` address across restarts: the service keypair is persisted to `rusthost-data/arti_state/`; deleting this directory rotates to a new address. -- Consensus cache stored in `rusthost-data/arti_cache/` for fast startup. -- Onion address encoded in-process using the v3 `.onion` spec (SHA3-256 checksum + base32) — no dependency on Arti's `DisplayRedacted` formatting. -- Each inbound Tor connection is bridged to the local HTTP server via `tokio::io::copy_bidirectional` in its own Tokio task. -- Tor subsystem can be disabled entirely with `[tor] enabled = false`; the dashboard onion section reflects this immediately. -- Graceful shutdown: the `TorClient` is dropped naturally when the Tokio runtime exits, closing all circuits cleanly — no explicit kill step needed. -- `.onion` address displayed in the dashboard and logged in a prominent banner once the service is active. - -### Interactive Terminal Dashboard - -- Full-screen raw-mode terminal UI built with [crossterm](https://github.com/crossterm-rs/crossterm); no external TUI framework. -- Three screens navigable with single-key bindings: - - **Dashboard** (default) — live status overview. - - **Log view** — last 40 log lines, toggled with `[L]`. - - **Help overlay** — key binding reference, toggled with `[H]`; any other key dismisses it. -- Dashboard sections: - - **Status** — local server state (RUNNING with bind address and port, or STARTING) and Tor state (DISABLED / STARTING / READY / FAILED with exit code). - - **Endpoints** — local `http://localhost:` URL and Tor `.onion` URL (or a dim status hint if Tor is not yet ready). - - **Site** — directory path, file count, and total size (auto-scaled to B / KB / MB / GB). - - **Activity** — total request count and error count (errors highlighted in red when non-zero). - - **Key bar** — persistent one-line reminder of available key bindings. -- Dashboard redraws at a configurable interval (default: 500 ms). -- Log view supports optional `HH:MM:SS` timestamp display, toggled via `show_timestamps` in config. -- Customisable instance name shown in the dashboard header (max 32 characters). -- Headless / non-interactive mode: set `[console] interactive = false` for systemd or piped deployments; the server prints a plain `http://…` line to stdout instead. -- Graceful terminal restore on fatal crash: raw mode is disabled and the cursor is shown even if the process exits unexpectedly. - -### Configuration - -- TOML configuration file (`rusthost-data/settings.toml`) with six sections: `[server]`, `[site]`, `[tor]`, `[logging]`, `[console]`, `[identity]`. -- Configuration validated at startup with clear, multi-error messages before any subsystem is started. -- Validated fields include port range, bind IP address format, index file name (no path separators), log level, console refresh rate minimum (100 ms), instance name length (1–32 chars), and absence of control characters in the name. -- Full default config written automatically on first run with inline comments explaining every option. -- Reloading site stats (file count and total size) without restart via `[R]` in the dashboard. - -### Logging - -- Custom `log::Log` implementation; all modules use the standard `log` facade macros (`log::info!`, `log::warn!`, etc.). -- Dual output: log file on disk (append mode, parent directories created automatically) and an in-memory ring buffer. -- Ring buffer holds the most recent 1 000 lines and feeds the console log view without any file I/O on each render tick. -- Log file path configurable relative to `rusthost-data/`; defaults to `logs/rusthost.log`. -- Configurable log level: `trace`, `debug`, `info`, `warn`, `error`. -- Timestamped entries in `[LEVEL] [HH:MM:SS] message` format. -- Logging can be disabled entirely (`[logging] enabled = false`) for minimal-overhead deployments. - -### Lifecycle and Startup - -- **First-run detection**: if `rusthost-data/settings.toml` does not exist, RustHost initialises the data directory (`site/`, `logs/`), writes defaults, drops a placeholder `index.html`, prints a short getting-started guide, and exits cleanly — no daemon started. -- **Normal run** startup sequence: load and validate config → initialise logging → build shared state → scan site directory → bind HTTP server → start Tor (if enabled) → start console → open browser (if configured) → enter event loop. -- Shutdown triggered by `[Q]` keypress or `SIGINT`/`SIGTERM` (via `ctrlc`); sends a watch-channel signal to the HTTP server and console, then waits 300 ms for in-flight connections before exiting. -- Optional browser launch at startup (`open_browser_on_start`); uses `open` (macOS), `explorer` (Windows), or `xdg-open` (Linux/other). -- All subsystems share state through an `Arc>`; hot-path request and error counters use separate `Arc` backed by atomics so the HTTP handler never acquires a lock per request. - -### Project and Build - -- Single binary; no installer, no runtime dependencies beyond the binary itself (Tor included via Arti). -- Data directory co-located with the binary at `./rusthost-data/`; entirely self-contained. -- Minimum supported Rust version: 1.86 (required by `arti-client 0.40`). -- Release profile: `opt-level = 3`, LTO enabled, debug symbols stripped. -- `cargo-deny` configuration (`deny.toml`) enforcing allowed SPDX licenses (MIT, Apache-2.0, Apache-2.0 WITH LLVM-exception, Zlib, Unicode-3.0) and advisory database checks; known transitive duplicate crates (`mio`, `windows-sys`) skipped with comments. -- Advisory `RUSTSEC-2023-0071` (RSA Marvin timing attack) acknowledged and suppressed with a documented rationale: the `rsa` crate is a transitive dependency of `arti-client` used exclusively for RSA *signature verification* on Tor directory consensus documents, not decryption; the attack's threat model does not apply. +- **`auto_reload` config field** — was documented but never implemented. Removed to avoid confusion. The `[R]` key for manual site stat reload is unaffected. +- **`ctrlc` crate dependency** — replaced by `tokio::signal` (see above). diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..eb9dab3 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,165 @@ +# Contributing to RustHost + +Thank you for considering a contribution. This document explains the development +workflow, code standards, and review expectations so your time is spent well. + +--- + +## Table of Contents + +1. [Prerequisites](#prerequisites) +2. [Getting Started](#getting-started) +3. [Code Standards](#code-standards) +4. [Testing](#testing) +5. [Submitting a Pull Request](#submitting-a-pull-request) +6. [Architecture Overview](#architecture-overview) +7. [Issue Labels](#issue-labels) + +--- + +## Prerequisites + +| Tool | Minimum version | Notes | +|------|-----------------|-------| +| Rust (nightly) | see `rust-toolchain.toml` | pinned channel; installed automatically by `rustup` | +| `cargo-audit` | latest | `cargo install cargo-audit` | +| `cargo-deny` | latest | `cargo install cargo-deny` | + +The pinned nightly toolchain is defined in `rust-toolchain.toml` at the +repository root. Running any `cargo` command will invoke `rustup` to install it +automatically on first use. + +--- + +## Getting Started + +```sh +git clone https://github.com/your-org/rusthost +cd rusthost + +# Build and run tests +cargo test --all + +# Run clippy (same flags as CI) +cargo clippy --all-targets --all-features -- -D warnings + +# Run the binary against a local directory +cargo run -- --serve ./my-site +``` + +--- + +## Code Standards + +### Lint gates + +Every file must pass the workspace-level gates declared in `Cargo.toml`: + +```toml +[lints.rust] +unsafe_code = "forbid" + +[lints.clippy] +all = { level = "deny", priority = -1 } +pedantic = { level = "deny", priority = -1 } +nursery = { level = "warn", priority = -1 } +``` + +Use `#[allow(...)]` sparingly and always include a comment explaining why the +lint is suppressed. Suppressions must be as narrow as possible — prefer a +targeted `#[allow]` on a single expression over a module-level gate. + +### Comment style + +- Explain **why**, not **what** — the code already says what it does. +- Never use opaque internal tags like `fix H-1` or `fix 3.2` in comments. + Replace them with a sentence that makes sense to a new contributor. +- Doc comments (`///` and `//!`) must be written in full sentences and end with + a period. + +### No `unsafe` + +`unsafe_code = "forbid"` is set at the workspace level. PRs that add `unsafe` +will not be merged. + +### Error handling + +All subsystems return `crate::Result` (alias for `Result`). +Avoid `.unwrap()` and `.expect()` in non-test code; use `?` propagation and +match on `AppError` variants at call sites that need to handle specific cases. + +--- + +## Testing + +```sh +# Unit tests only +cargo test --lib + +# All tests (unit + integration) +cargo test --all + +# A specific test by name +cargo test percent_decode + +# Security audit +cargo audit + +# Dependency policy check +cargo deny check +``` + +Integration tests live in `tests/`. They import items re-exported from +`src/lib.rs` under `#[cfg(test)]` guards so they do not pollute the public API. + +--- + +## Submitting a Pull Request + +1. **Branch naming**: `fix/` or `feat/`. +2. **Commit messages**: use the imperative mood (`Add`, `Fix`, `Remove`), ≤72 + characters on the subject line. Add a body paragraph for anything that + needs explaining. +3. **One concern per PR**: a PR that mixes a bug fix with a refactor is harder + to review and revert. +4. **Changelog**: add a line under `[Unreleased]` in `CHANGELOG.md` before + opening the PR. +5. **CI must be green**: all three CI jobs (`test`, `audit`, `deny`) must pass. + The `test` job runs on Ubuntu, macOS, and Windows. + +--- + +## Architecture Overview + +``` +rusthost-cli (src/main.rs) + └── runtime::lifecycle::run() + ├── logging — file logger + in-memory ring buffer for the console + ├── server — hyper HTTP/1.1 accept loop + per-connection handler + ├── tor — Arti in-process Tor client + onion service proxy + ├── console — crossterm TUI (render task + input task) + └── config — TOML loader + typed structs +``` + +Key data flows: + +- **Request path**: `TcpListener::accept` → `server::handler::handle` → + `resolve_path` → file I/O → hyper response. +- **Tor path**: `tor::init` → Arti bootstrap → `StreamRequest` loop → + `proxy_stream` → local `TcpStream` → bidirectional copy. +- **Shared state**: `SharedState` (an `Arc>`) is the single + source of truth for the dashboard. Write only from the lifecycle/event tasks; + read from the render task. + +--- + +## Issue Labels + +| Label | Meaning | +|-------|---------| +| `bug` | Confirmed defect | +| `security` | Security-relevant issue — see `SECURITY.md` for disclosure policy | +| `enhancement` | New feature or improvement | +| `good first issue` | Well-scoped, low-risk; suitable for new contributors | +| `help wanted` | We'd appreciate community input | +| `needs-repro` | Cannot reproduce; awaiting steps | diff --git a/Cargo.lock b/Cargo.lock index 6489f35..3454be7 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -2512,15 +2512,6 @@ version = "0.2.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "7c87def4c32ab89d880effc9e097653c8da5d6ef28e6b539d313baaacfbafcbe" -[[package]] -name = "openssl-src" -version = "300.5.5+3.5.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3f1787d533e03597a7934fd0a765f0d28e94ecc5fb7789f8053b1e699a56f709" -dependencies = [ - "cc", -] - [[package]] name = "openssl-sys" version = "0.9.112" @@ -2529,7 +2520,6 @@ checksum = "57d55af3b3e226502be1526dfdba67ab0e9c96fc293004e79576b2b9edb0dbdb" dependencies = [ "cc", "libc", - "openssl-src", "pkg-config", "vcpkg", ] @@ -3208,7 +3198,7 @@ dependencies = [ "hyper-util", "libc", "log", - "openssl", + "percent-encoding", "rusqlite", "serde", "sha3", diff --git a/Cargo.toml b/Cargo.toml index 8024411..974c2ab 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -66,15 +66,6 @@ chrono = { version = "0.4", features = ["clock"] } # OS error codes used in the accept-loop backoff to distinguish EMFILE/ENFILE # (resource exhaustion → log error) from transient errors (log debug). libc = "0.2" -# Vendor OpenSSL source so the binary builds without system libssl-dev headers -# on Linux. native-tls (pulled transitively through arti-client → tor-rtcompat) -# links against OpenSSL on Linux; without this feature flag the build fails on -# any machine that lacks the -dev package. macOS and Windows are unaffected -# (they use Security.framework and SChannel respectively), but the `vendored` -# feature is a no-op on those targets so there is no downside to enabling it -# unconditionally. Build-time cost is ~60 s on first compile; subsequent -# incremental builds are fast because the OpenSSL objects are cached. -openssl = { version = "0.10", features = ["vendored"] } # Force rusqlite's bundled SQLite for cross-compilation targets. # arti-client pulls rusqlite transitively; declaring it here unifies the feature # across the whole dep tree so cross-compiling to Linux/Windows works without a @@ -85,6 +76,11 @@ rusqlite = { version = "*", features = ["bundled"] } # the single global Mutex that would serialise every accept() call. dashmap = "6" +# Phase 5 (M-8) — replace hand-rolled percent_decode with the audited upstream crate. +# The crate handles incomplete escape sequences and non-ASCII bytes correctly; +# the wrapper adds only the null-byte guard specific to filesystem path use. +percent-encoding = "2" + # Phase 3 (C-1, H-8, H-9, H-13) — HTTP/1.1 keep-alive, ETag, Range, compression. # hyper provides a correct HTTP/1.1 connection loop with keep-alive; replacing # the hand-rolled single-shot parser eliminates the 30-45 s Tor page-load diff --git a/README.md b/README.md index ca0aca3..02db506 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,8 @@
-**A self-contained static file server with first-class Tor onion service support — no binaries, no `torrc`, no compromise.** +**A single-binary static file server with built-in Tor onion service support.** +No daemons. No config files outside this project. No compromise. [![Rust](https://img.shields.io/badge/rust-1.86%2B-orange?style=flat-square&logo=rust)](https://www.rust-lang.org/) [![License: MIT](https://img.shields.io/badge/license-MIT-blue?style=flat-square)](LICENSE) @@ -23,13 +24,15 @@ ## What is RustHost? -RustHost is a single-binary static file server that brings your content to the clearnet **and** the Tor network simultaneously — with zero external dependencies. Tor is embedded directly into the process via [Arti](https://gitlab.torproject.org/tpo/core/arti), the official Rust Tor implementation. No `tor` daemon, no `torrc`, no system configuration required. +RustHost is a static file server — you give it a folder of HTML, CSS, and JavaScript files, and it serves them over HTTP. What makes it different is that it also puts your site on the **Tor network** automatically, giving every site a `.onion` address right alongside the normal `localhost` one. -Drop the binary next to your site files, run it once, and you get: +It's a single binary with Tor baked in. No installing a separate Tor program, no editing system config files. -- A local HTTP server ready for immediate use -- A stable `.onion` v3 address that survives restarts -- A live terminal dashboard showing you everything at a glance +**Who is it for?** Developers who want a quick local server with privacy features, self-hosters who want their sites reachable over Tor, and anyone who wants to run a personal site without touching system-level config. + +--- + +## What it looks like ``` ┌─ RustHost ─────────────────────────────────────────────────────────┐ @@ -47,185 +50,251 @@ Drop the binary next to your site files, run it once, and you get: └──────────────────────────────────────────────────────────────────────┘ ``` -![rystgit](https://github.com/user-attachments/assets/30752d0f-5be2-4c80-b3a2-4fa0530ff3ab) +![rusthost screenshot](https://github.com/user-attachments/assets/30752d0f-5be2-4c80-b3a2-4fa0530ff3ab) --- -## Features - -### 🌐 HTTP Server -- Built directly on `tokio::net::TcpListener` — no HTTP framework dependency -- Handles `GET` and `HEAD` requests; concurrent connections via per-task Tokio workers -- **Buffered request reading** via `tokio::io::BufReader` — headers read line-by-line, not byte-by-byte -- **File streaming** via `tokio::io::copy` — memory per connection is bounded by the socket buffer (~256 KB) regardless of file size -- **30-second request timeout** (configurable via `request_timeout_secs`); slow or idle connections receive `408 Request Timeout` -- **Semaphore-based connection limit** (configurable via `max_connections`, default 256) — excess connections queue at the OS backlog level rather than spawning unbounded tasks -- Percent-decoded URL paths with correct multi-byte UTF-8 handling; null bytes (`%00`) are never decoded -- Query string & fragment stripping before path resolution -- **Path traversal protection** — every path verified as a descendant of the site root via `canonicalize` (called once at startup, not per request); escapes rejected with `403 Forbidden` -- Configurable index file, optional HTML directory listing with fully HTML-escaped and URL-encoded filenames, and a built-in fallback page -- Automatic port selection if the configured port is busy (up to 10 attempts) -- Request header cap at 8 KiB; `Content-Type`, `Content-Length`, and `Connection: close` on every response -- **Security headers on every response**: `X-Content-Type-Options`, `X-Frame-Options`, `Referrer-Policy: no-referrer`, `Permissions-Policy`; configurable `Content-Security-Policy` on HTML responses -- **HEAD responses** include correct `Content-Length` but no body, as required by RFC 7231 §4.3.2 -- Accept loop uses **exponential backoff** on errors and distinguishes `EMFILE` (operator-level error) from transient errors (`ECONNRESET`, `ECONNABORTED`) - -### 🧅 Tor Onion Service *(fully working)* -- Embedded via [Arti](https://gitlab.torproject.org/tpo/core/arti) — the official Rust Tor client — in-process, no external daemon -- Bootstraps to the Tor network in the background; never blocks your server or dashboard -- **Stable address**: the v3 service keypair is persisted to `rusthost-data/arti_state/`. Delete the directory to rotate to a new address -- First run fetches ~2 MB of directory data (~30 s); subsequent starts reuse the cache and are up in seconds -- Onion address computed fully in-process using the v3 spec (SHA3-256 + base32) -- Each inbound Tor connection is bridged to the local HTTP listener via `tokio::io::copy_bidirectional` -- **Port synchronised via `oneshot` channel** — the Tor subsystem always receives the actual bound port, eliminating a race condition that could cause silent connection failures -- **`TorStatus` reflects mid-session failures** — if the onion service stream terminates unexpectedly, the dashboard transitions to `FAILED (reason)` and clears the displayed `.onion` address -- Participates in **graceful shutdown** — the run loop watches the shutdown signal via `tokio::select!` and exits cleanly -- Can be disabled entirely with `[tor] enabled = false` - -### 🖥️ Interactive Terminal Dashboard -- Full-screen raw-mode TUI built with [crossterm](https://github.com/crossterm-rs/crossterm) — no TUI framework -- Three screens, all keyboard-navigable: - - | Key | Screen | - |-----|--------| - | *(default)* | **Dashboard** — live status, endpoints, site stats, request/error counters | - | `L` | **Log view** — last 40 log lines with optional timestamps | - | `H` | **Help overlay** — key binding reference | - | `R` | Reload site file count & size without restart | - | `Q` | Graceful shutdown | - -- **Skip-on-idle rendering** — the terminal is only written when the rendered output changes, eliminating unnecessary writes on quiet servers -- `TorStatus::Failed` displays a human-readable reason string (e.g. `FAILED (stream ended)`) rather than a bare error indicator -- Keyboard input task failure is detected and reported; the process remains killable via Ctrl-C -- **Terminal fully restored on all exit paths** — panic hook and error handler both call `console::cleanup()` before exiting, ensuring `LeaveAlternateScreen`, `cursor::Show`, and `disable_raw_mode` always run -- Configurable refresh rate (default 500 ms); headless mode available for `systemd` / piped deployments - -### ⚙️ Configuration -- TOML file at `rusthost-data/settings.toml`, auto-generated with inline comments on first run -- Six sections: `[server]`, `[site]`, `[tor]`, `[logging]`, `[console]`, `[identity]` -- **`#[serde(deny_unknown_fields)]`** on all structs — typos in key names are rejected at startup with a clear error -- **Typed config fields** — `bind` is `IpAddr`, `log level` is a `LogLevel` enum; invalid values are caught at deserialisation time -- Startup validation with clear, multi-error messages — nothing starts until config is clean -- Config and data directory paths overridable via **`--config `** and **`--data-dir `** CLI flags - -### 📝 Logging -- Custom `log::Log` implementation; dual output — append-mode log file + in-memory ring buffer (1 000 lines) -- Ring buffer feeds the dashboard log view with zero file I/O per render tick -- **Dependency log filtering** — Arti and Tokio internals at `Info` and below are suppressed by default, keeping the log focused on application events (configurable via `filter_dependencies`) -- Log file explicitly flushed on graceful shutdown -- Configurable level (`trace` → `error`) and optional full disable for minimal-overhead deployments - -### 🧪 Testing & CI -- Unit tests for all security-critical functions: `percent_decode`, `resolve_path`, `validate`, `strip_timestamp`, `hsid_to_onion_address` -- Integration tests (`tests/http_integration.rs`) covering all HTTP core flows via raw `TcpStream` -- `cargo deny check` runs in CI, enforcing the SPDX license allowlist and advisory database; `audit.toml` consolidated into `deny.toml` +## Key Features + +- **Static file server** — serves HTML, CSS, JS, images, fonts, audio, and video with correct MIME types +- **Built-in Tor support** — your site gets a stable `.onion` address automatically, no external Tor install needed +- **Live terminal dashboard** — shows your endpoints, request counts, and logs in a clean full-screen UI +- **Single binary** — no installer, no runtime dependencies, no system packages to manage +- **SPA-friendly** — supports React, Vue, and Svelte client-side routing with a fallback-to-`index.html` option +- **HTTP protocol done right** — keep-alive, `ETag`/conditional GET, range requests, Brotli/Gzip compression +- **Security headers out of the box** — CSP, HSTS, `X-Content-Type-Options`, `Referrer-Policy`, and more on every response +- **Rate limiting per IP** — lock-free connection cap prevents a single client from taking down your server +- **Per-IP connection limits**, request timeouts, path traversal protection, and header injection prevention +- **Hot reload** — press `[R]` to refresh site stats without restarting +- **Headless mode** — run it in the background under systemd without the TUI + +--- + +## Why Arti instead of the regular Tor? + +When most people think of Tor, they think of the `tor` binary — a program written in C that you install separately and talk to via a config file called `torrc`. That works fine, but it means your application depends on an external process you don't control. + +**Arti** is the [official Tor Project rewrite of Tor in Rust](https://gitlab.torproject.org/tpo/core/arti). RustHost uses it as a library — Tor runs *inside* the same process as your server, with no external daemon. + +Here's a plain-English comparison: + +| | Classic `tor` binary | Arti (what RustHost uses) | +|---|---|---| +| Language | C | Rust | +| Memory safety | Manual (prone to CVEs) | Guaranteed by the compiler | +| Distribution | Separate install required | Compiled into the binary | +| Config | `torrc` file, separate process | Code-level API, no config file | +| Maturity | 20+ years, battle-tested | Newer, actively developed | +| Embeddability | Hard — subprocess + socket | Easy — just a library call | + +**Honest tradeoffs:** Arti is still maturing. Some advanced Tor features (bridges, pluggable transports) are not yet stable in Arti. If you need those, the classic `tor` binary is the right tool. For straightforward onion hosting, Arti works well and gives you a much simpler setup. + +The Rust memory-safety guarantee matters here specifically because Tor handles untrusted network traffic. A buffer overflow or use-after-free in a C-based Tor implementation is a real historical risk. With Arti in Rust, that entire class of bug is eliminated by the language. --- ## Quick Start -### 1. Build +> **Need help with prerequisites?** See [SETUP.md](SETUP.md) for step-by-step install instructions. ```bash +# 1. Clone and build git clone https://github.com/yourname/rusthost cd rusthost cargo build --release + +# 2. First run — sets up the data directory and exits +./target/release/rusthost + +# 3. Put your files in rusthost-data/site/, then run again +./target/release/rusthost ``` -> **Minimum Rust version: 1.86** (required by `arti-client 0.40`) +That's it. Your site is live at `http://localhost:8080`. The `.onion` address appears in the dashboard after about 30 seconds while Tor bootstraps in the background. -### 2. First run — initialise your data directory +> **Your stable `.onion` address** is stored in `rusthost-data/arti_state/`. Back this directory up — it contains your keypair. Delete it only if you want a new address. + +--- + +## Full Setup Reference + +For detailed install instructions, OS-specific steps, common errors, and how to verify everything is working, see **[SETUP.md](SETUP.md)**. + +--- + +## Usage Examples + +### Serve a specific directory without a config file ```bash -./target/release/rusthost +./target/release/rusthost --serve ./my-website ``` -On first run, RustHost detects that `rusthost-data/settings.toml` is missing, scaffolds the data directory, writes a default config and a placeholder `index.html`, prints a getting-started guide, and exits. Nothing is daemonised yet. +Good for quick one-off serving. Skips first-run setup entirely. +### Run with a custom config location + +```bash +./target/release/rusthost --config /etc/rusthost/settings.toml --data-dir /var/rusthost ``` -rusthost-data/ -├── settings.toml ← your config (edit freely) -├── site/ -│ └── index.html ← placeholder, replace with your files -├── logs/ -│ └── rusthost.log -├── arti_cache/ ← Tor directory consensus (auto-managed) -└── arti_state/ ← your stable .onion keypair (back this up!) + +Useful for running multiple instances or deploying under systemd. + +### Run headless (no terminal UI) + +Set `interactive = false` in `settings.toml`: + +```toml +[console] +interactive = false ``` -### 3. Serve +RustHost will print the URL to stdout and log everything to the log file. Perfect for running as a background service. -```bash -./target/release/rusthost +### Disable Tor entirely + +```toml +[tor] +enabled = false ``` -The dashboard appears. Your site is live on `http://localhost:8080`. Tor bootstraps in the background — your `.onion` address appears in the **Endpoints** panel once ready (~30 s on first run). +Useful if you just want a fast local HTTP server and don't need the `.onion` address. -### CLI flags +### Enable SPA routing (React, Vue, Svelte) + +```toml +[site] +spa_routing = true +``` + +Unknown paths fall back to `index.html` instead of returning 404. This is what client-side routers expect. + +--- + +## All CLI Flags ``` rusthost [OPTIONS] Options: + --serve Serve a directory directly, no settings.toml needed --config Path to settings.toml (default: rusthost-data/settings.toml) - --data-dir Path to data directory (default: rusthost-data/ next to binary) + --data-dir Path to the data directory (default: ./rusthost-data/) --version Print version and exit --help Print this help and exit ``` --- -## Configuration Reference +## Configuration + +The config file lives at `rusthost-data/settings.toml` and is created automatically on first run with comments explaining every option. ```toml [server] -port = 8080 -bind = "127.0.0.1" # set "0.0.0.0" to expose on LAN (logs a warning) -index_file = "index.html" -directory_listing = false -auto_port_fallback = true -max_connections = 256 # semaphore cap on concurrent connections -request_timeout_secs = 30 # seconds before idle connection receives 408 -content_security_policy = "default-src 'self'" # applied to HTML responses only +port = 8080 +bind = "127.0.0.1" # use "0.0.0.0" to expose on your LAN +index_file = "index.html" +directory_listing = false # show file lists for directories +auto_port_fallback = true # try next port if 8080 is taken +max_connections = 256 # max simultaneous connections +request_timeout_secs = 30 # seconds before an idle connection gets 408 +content_security_policy = "default-src 'self'" # applied to HTML responses only [site] -root = "rusthost-data/site" +root = "rusthost-data/site" +spa_routing = false # set true for React/Vue/Svelte apps +error_404 = "" # path to a custom 404.html +error_503 = "" # path to a custom 503.html [tor] -enabled = true # set false to skip Tor entirely +enabled = true # set false to skip Tor entirely [logging] -enabled = true -level = "info" # trace | debug | info | warn | error -path = "logs/rusthost.log" -filter_dependencies = true # suppress Arti/Tokio noise at info and below +enabled = true +level = "info" # trace | debug | info | warn | error +path = "logs/rusthost.log" +filter_dependencies = true # suppress Arti/Tokio noise at info level [console] -interactive = true # false for systemd / piped deployments -refresh_ms = 500 # minimum 100 +interactive = true # false for systemd / background use +refresh_ms = 500 show_timestamps = false open_browser_on_start = false [identity] -name = "RustHost" # 1–32 chars, shown in dashboard header +name = "RustHost" # shown in the dashboard header (max 32 chars) +``` + +> Typos in key names are caught at startup. If you write `bund = "127.0.0.1"` instead of `bind`, RustHost will tell you exactly which field is unknown and exit before starting. + +--- + +## Project Structure + +After first run, your directory will look like this: + +``` +rusthost-data/ +├── settings.toml Your config file — edit this freely +├── site/ Drop your website files here +│ └── index.html Placeholder — replace with your own +├── logs/ +│ └── rusthost.log Rotating access and event log (owner-read only) +├── arti_cache/ Tor directory data — auto-managed, safe to delete +└── arti_state/ Your .onion keypair — BACK THIS UP +``` + +And in the repo: + +``` +src/ +├── config/ Config loading and validation +├── console/ Terminal dashboard (crossterm) +├── logging/ Log file + in-memory ring buffer +├── runtime/ Startup, shutdown, and event loop +├── server/ HTTP server (handler, MIME types, path resolution) +└── tor/ Arti integration and onion service bridge ``` --- ## Built-in MIME Types -No external dependency. RustHost ships with a handwritten extension map covering: +RustHost ships a handwritten MIME map — no external lookup or database. | Category | Extensions | -|----------|-----------| +|----------|------------| | Text | `html` `htm` `css` `js` `mjs` `txt` `csv` `xml` `md` | -| Data | `json` `jsonld` `pdf` `wasm` `zip` | -| Images | `png` `jpg/jpeg` `gif` `webp` `svg` `ico` `bmp` `avif` | +| Data | `json` `jsonld` `pdf` `wasm` `zip` `ndjson` | +| Images | `png` `jpg` `jpeg` `gif` `webp` `svg` `ico` `bmp` `avif` | | Fonts | `woff` `woff2` `ttf` `otf` | -| Audio | `mp3` `ogg` `wav` | +| Audio | `mp3` `ogg` `wav` `opus` `flac` | | Video | `mp4` `webm` | +| 3D | `glb` | +| PWA | `webmanifest` | + +Anything not in this list gets `application/octet-stream`. -Unknown extensions fall back to `application/octet-stream`. +--- + +## Security + +A quick summary of what RustHost does to keep things safe: + +| Threat | What RustHost does | +|--------|-------------------| +| Path traversal (e.g. `/../etc/passwd`) | Every path is resolved with `canonicalize` and checked against the site root. Escapes get a `403`. | +| XSS via crafted filenames in directory listings | Filenames are HTML-escaped in link text and percent-encoded in `href` attributes. | +| Slow-loris DoS (deliberately slow clients) | 30-second request timeout — connections that don't send headers in time get a `408`. | +| Connection exhaustion | Semaphore cap at 256 concurrent connections by default. | +| Header injection | `sanitize_header_value` strips all control characters from values (not just CR/LF). | +| Large file memory exhaustion | Files are streamed with `tokio::io::copy` — memory per connection is bounded by the socket buffer. | +| `.onion` address leakage | `Referrer-Policy: no-referrer` prevents your `.onion` URL from appearing in `Referer` headers. | +| Config typos silently using defaults | `#[serde(deny_unknown_fields)]` on all config structs — unknown keys are a hard startup error. | +| Terminal injection via instance name | The `name` field is validated against all control characters at startup. | + +**Note on RUSTSEC-2023-0071 (RSA Marvin timing attack):** This advisory is acknowledged and suppressed in `deny.toml` with a documented rationale. The `rsa` crate comes in as a transitive dependency of `arti-client` and is used only for *verifying* RSA signatures on Tor directory documents — not for decryption. The Marvin attack requires a decryption oracle, which is not present here. --- @@ -246,35 +315,21 @@ Unknown extensions fall back to `application/octet-stream`. └─────────────────────────────────────┘ ``` -All subsystems share state through `Arc>`. Hot-path request and error counters use a separate `Arc` backed by atomics — the HTTP handler **never acquires a lock per request**. - -The HTTP server and Tor subsystem share a `tokio::sync::Semaphore` that caps concurrent connections. The bound port is communicated to Tor via a `oneshot` channel before the accept loop begins, eliminating the startup race condition present in earlier versions. +All subsystems share state through `Arc>`. Hot-path counters (request counts, error counts) live in a separate `Arc` backed by atomics, so the HTTP handler never acquires a lock per request. -Shutdown is coordinated via a `watch` channel: `[Q]`, `SIGINT`, or `SIGTERM` signals all subsystems simultaneously. In-flight HTTP connections are tracked in a `JoinSet` and given up to 5 seconds to complete. The log file is explicitly flushed before the process exits. +Shutdown is coordinated via a `watch` channel. `[Q]`, `SIGINT`, and `SIGTERM` all signal every subsystem at the same time. In-flight connections are tracked in a `JoinSet` and given up to 5 seconds to finish before the process exits. --- -## Security +## Contributing + +Contributions are welcome. A few things worth knowing before you start: -| Concern | Mitigation | -|---------|-----------| -| Path traversal (requests) | `std::fs::canonicalize` + descendant check per request; `403` on escape | -| Path traversal (config) | `site.directory` and `logging.file` validated against `..`, absolute paths, and path separators at startup | -| Directory listing XSS | Filenames HTML-entity-escaped in link text; percent-encoded in `href` attributes | -| Header overflow | 8 KiB hard cap; oversized requests rejected immediately | -| Slow-loris DoS | 30-second request timeout; `408` sent on expiry | -| Connection exhaustion | Semaphore cap (default 256); excess connections queue at OS level | -| Memory exhaustion (large files) | Files streamed via `tokio::io::copy`; per-connection memory bounded by socket buffer | -| Bind exposure | Defaults to loopback (`127.0.0.1`); warns loudly on `0.0.0.0` | -| ANSI/terminal injection | `instance_name` validated against all control characters (`is_control`) at startup | -| Security response headers | `X-Content-Type-Options`, `X-Frame-Options`, `Referrer-Policy: no-referrer`, `Permissions-Policy`, configurable `Content-Security-Policy` | -| `.onion` URL leakage | `Referrer-Policy: no-referrer` prevents the `.onion` address from appearing in `Referer` headers sent to third-party resources | -| Tor port race | Bound port delivered to Tor via `oneshot` channel before accept loop starts | -| Silent Tor failure | `TorStatus` transitions to `Failed(reason)` and onion address is cleared when the service stream ends | -| Percent-decode correctness | Multi-byte UTF-8 sequences decoded correctly; null bytes (`%00`) never decoded | -| Config typos | `#[serde(deny_unknown_fields)]` on all structs | -| License compliance | `cargo-deny` enforces SPDX allowlist at CI time | -| [RUSTSEC-2023-0071](https://rustsec.org/advisories/RUSTSEC-2023-0071) | Suppressed with rationale in `deny.toml`: the `rsa` crate is a transitive dep of `arti-client` used **only** for signature *verification* on Tor directory documents — the Marvin timing attack's threat model (decryption oracle) does not apply | +- The lint gates are strict: `clippy::all`, `clippy::pedantic`, and `clippy::nursery`. Run `cargo clippy --all-targets -- -D warnings` before opening a PR. +- Run the full test suite with `cargo test --all`. +- All code paths should be covered by the existing tests, or new tests added for anything new. +- See [CONTRIBUTING.md](CONTRIBUTING.md) for the full workflow, architecture notes, and PR checklist. +- To report a security issue privately, see [SECURITY.md](SECURITY.md). --- diff --git a/SETUP.md b/SETUP.md new file mode 100644 index 0000000..2c04f05 --- /dev/null +++ b/SETUP.md @@ -0,0 +1,354 @@ +# Setting Up RustHost + +This guide walks you through everything you need to get RustHost running — from installing Rust to verifying your `.onion` address is live. + +--- + +## Prerequisites + +### Rust + +RustHost requires **Rust 1.86 or newer**. This is set as the minimum because the Tor library it uses (`arti-client`) needs features from that release. + +To check what version you have: + +```bash +rustc --version +``` + +If you don't have Rust installed, the easiest way is [rustup](https://rustup.rs/): + +```bash +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh +``` + +Follow the prompts, then restart your terminal (or run `source ~/.cargo/env`). Verify with: + +```bash +rustc --version +cargo --version +``` + +To update an existing Rust install: + +```bash +rustup update stable +``` + +### Git + +You need Git to clone the repo. Most systems already have it. + +```bash +git --version +``` + +If not: +- **macOS**: `xcode-select --install` (installs Git as part of the Xcode CLI tools) +- **Linux**: `sudo apt install git` (Debian/Ubuntu) or `sudo dnf install git` (Fedora) +- **Windows**: Download from [git-scm.com](https://git-scm.com/) + +### Build tools + +Rust needs a C linker. On most systems this is already present. + +- **macOS**: You'll need the Xcode Command Line Tools — run `xcode-select --install` if you haven't already. +- **Linux**: Install `gcc` and `build-essential` (Debian/Ubuntu) or `gcc` and `make` (Fedora/RHEL). +- **Windows**: Install the [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/). When the installer asks, select "Desktop development with C++". + +--- + +## Installing RustHost + +### Step 1 — Clone the repository + +```bash +git clone https://github.com/yourname/rusthost +cd rusthost +``` + +### Step 2 — Build in release mode + +```bash +cargo build --release +``` + +This downloads and compiles all dependencies (including Arti, which is the Rust Tor library — this takes a few minutes on first build). The final binary ends up at: + +``` +target/release/rusthost (Linux / macOS) +target\release\rusthost.exe (Windows) +``` + +> **Slow build?** The first build is always slow because Cargo is compiling everything from scratch. Subsequent builds are much faster thanks to the cache. + +### Step 3 — First run (data directory setup) + +Run the binary once from the project directory: + +```bash +./target/release/rusthost +``` + +On first run, RustHost detects that `rusthost-data/settings.toml` doesn't exist and does the following: + +- Creates the `rusthost-data/` directory next to the binary +- Writes a default `settings.toml` with all options commented +- Creates `rusthost-data/site/` with a placeholder `index.html` +- Creates `rusthost-data/logs/` +- Prints a getting-started message and exits + +Nothing is started yet — this is just setup. + +### Step 4 — Add your site files + +Replace (or edit) the placeholder file: + +```bash +# Put your HTML files in rusthost-data/site/ +cp -r /path/to/your/site/* rusthost-data/site/ +``` + +### Step 5 — Start the server + +```bash +./target/release/rusthost +``` + +The terminal dashboard appears. Your site is live at `http://localhost:8080`. + +Tor bootstraps in the background — your `.onion` address will appear in the **Endpoints** section of the dashboard after roughly 30 seconds on first run (subsequent starts reuse the cache and are much faster). + +--- + +## OS-Specific Notes + +### macOS + +Everything works out of the box. If you see a firewall prompt asking whether to allow RustHost to accept incoming connections, click Allow. + +If you want to expose your server on your local network (not just `localhost`), change the bind address in `settings.toml`: + +```toml +[server] +bind = "0.0.0.0" +``` + +RustHost will log a warning when you do this — that's expected and intentional. + +### Linux + +Works the same as macOS. If you're running under systemd, see the [Running as a systemd service](#running-as-a-systemd-service) section below. + +On some minimal Linux installs you may need to install the OpenSSL development headers: + +```bash +# Debian/Ubuntu +sudo apt install pkg-config libssl-dev + +# Fedora +sudo dnf install pkg-config openssl-devel +``` + +### Windows + +Build and run commands are the same, but use backslashes and the `.exe` extension: + +```powershell +cargo build --release +.\target\release\rusthost.exe +``` + +Note that file permissions (e.g., restricting the log file to owner-only) behave differently on Windows. The security restrictions around key directories and log files are enforced where the Windows API supports it. + +--- + +## Running as a systemd service + +If you want RustHost to start automatically on boot, here's a simple service unit. + +First, move your binary and data directory somewhere stable: + +```bash +sudo cp target/release/rusthost /usr/local/bin/rusthost +sudo mkdir -p /var/rusthost +sudo cp -r rusthost-data/* /var/rusthost/ +``` + +Set `interactive = false` in `/var/rusthost/settings.toml` so RustHost doesn't try to draw a TUI: + +```toml +[console] +interactive = false +``` + +Create the service file: + +```bash +sudo nano /etc/systemd/system/rusthost.service +``` + +```ini +[Unit] +Description=RustHost static file server +After=network.target + +[Service] +Type=simple +User=www-data +ExecStart=/usr/local/bin/rusthost --config /var/rusthost/settings.toml --data-dir /var/rusthost +Restart=on-failure +RestartSec=5s + +[Install] +WantedBy=multi-user.target +``` + +Enable and start it: + +```bash +sudo systemctl daemon-reload +sudo systemctl enable rusthost +sudo systemctl start rusthost +sudo systemctl status rusthost +``` + +View logs: + +```bash +journalctl -u rusthost -f +``` + +--- + +## Verifying Everything Works + +### 1. Check the HTTP server + +Open a browser and go to `http://localhost:8080`. You should see your site (or the placeholder page on a fresh install). + +From the terminal: + +```bash +curl -I http://localhost:8080 +``` + +You should see a `200 OK` response with security headers like `X-Content-Type-Options` and `X-Frame-Options`. + +### 2. Check the Tor onion address + +Wait for the dashboard to show `TOR ● READY`. The `.onion` address will appear in the **Endpoints** section. + +Open the Tor Browser and navigate to that address. Your site should load. + +> **First run only:** Tor needs to download ~2 MB of directory data on first run. This usually takes 20–40 seconds. Subsequent starts reuse the cache and are ready in a few seconds. + +### 3. Check the logs + +Press `[L]` in the dashboard to switch to the log view. You should see startup messages and, once Tor is ready, a prominent banner with your `.onion` address. + +The log file is at `rusthost-data/logs/rusthost.log`. + +--- + +## Common Errors and Fixes + +### `error: package 'arti-client v0.40.x' cannot be built because it requires rustc 1.86.0` + +Your Rust version is too old. Run `rustup update stable` and try again. + +### `Address already in use (os error 98)` + +Port 8080 is taken by something else. Either: +- Stop the other service, or +- Change the port in `settings.toml`: + +```toml +[server] +port = 9090 +``` + +Or enable auto port fallback (it's on by default): + +```toml +[server] +auto_port_fallback = true +``` + +### `error[E0463]: can't find crate for 'std'` (Windows) + +The Microsoft C++ Build Tools aren't installed or aren't on the path. Install them from [visualstudio.microsoft.com/visual-cpp-build-tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) and restart your terminal. + +### Tor gets stuck on "STARTING" forever + +This is usually a network issue. Check that: +- You have an internet connection +- Your firewall isn't blocking outbound connections on port 443 or 9001 (Tor's relay ports) +- You're not behind a strict corporate or school network that blocks Tor + +If you're on a network that blocks Tor, you may need [bridges](https://bridges.torproject.org/). Arti bridge support is still maturing — this is one area where using the classic `tor` binary is currently more reliable. + +### The terminal is messed up after RustHost crashes + +RustHost installs a panic hook that attempts to restore the terminal on crash. If it fails anyway, run: + +```bash +reset +``` + +Or close and reopen your terminal. + +### `Unknown field "bund"` (or similar) at startup + +You have a typo in `settings.toml`. RustHost rejects unknown config keys at startup. Check the spelling of the field name in the config — the error message will tell you exactly which field it doesn't recognise. + +### My `.onion` address changed + +If `rusthost-data/arti_state/` was deleted or moved, RustHost generates a new keypair and a new address. The state directory is what makes the address stable across restarts — back it up. + +--- + +## Backing Up Your `.onion` Keypair + +Your stable `.onion` address is tied to a keypair stored in: + +``` +rusthost-data/arti_state/ +``` + +**Back this directory up somewhere safe.** If you lose it, you lose your `.onion` address permanently and will get a new one on the next start. There is no recovery. + +To restore a backed-up keypair, copy the `arti_state/` directory back before starting RustHost. + +--- + +## Updating RustHost + +```bash +git pull +cargo build --release +``` + +Your `rusthost-data/` directory is not touched by the build — your config, site files, and keypair are safe. + +--- + +## Uninstalling + +Delete the binary and the `rusthost-data/` directory: + +```bash +rm target/release/rusthost +rm -rf rusthost-data/ +``` + +If you ran it as a systemd service: + +```bash +sudo systemctl stop rusthost +sudo systemctl disable rusthost +sudo rm /etc/systemd/system/rusthost.service +sudo rm /usr/local/bin/rusthost +sudo rm -rf /var/rusthost +sudo systemctl daemon-reload +``` diff --git a/audit.toml b/audit.toml index 5554569..bc5f6bd 100644 --- a/audit.toml +++ b/audit.toml @@ -1,6 +1,6 @@ # cargo-audit configuration for rusthost # -# fix G-3 — previously this file contained a bare `ignore` entry with no +# previously this file contained a bare `ignore` entry with no # rationale, creating a silent suppression that future developers could not # evaluate. Rationale is now documented here to match deny.toml. # diff --git a/rusthost_implementation_plan.md b/rusthost_implementation_plan.md deleted file mode 100644 index 6529424..0000000 --- a/rusthost_implementation_plan.md +++ /dev/null @@ -1,2224 +0,0 @@ -# RustHost — Severity-Categorised Issues & Multiphase Implementation Plan - -All code is written to pass `clippy::all`, `clippy::pedantic`, and `clippy::nursery`. -Lint gates are listed at the top of each snippet. - ---- - -## Severity Reference - -| Symbol | Severity | Meaning | -|--------|----------|---------| -| 🔴 | Critical | Functional breakage, data loss, or exploitable security flaw | -| 🟠 | High | Significant user-facing failure or attack surface | -| 🟡 | Medium | Quality, correctness, or completeness gap | -| 🔵 | Low | Polish, DX, or ecosystem concern | - ---- - -## Categorised Issue Registry - -### 🔴 Critical - -| ID | Location | Issue | -|----|----------|-------| -| C-1 | `server/handler.rs` | `Connection: close` on every response — Tor pages take 30–45 s to load | -| C-2 | `tor/mod.rs` | `copy_with_idle_timeout` is a wall-clock cap, not an idle timeout | -| C-3 | `tor/mod.rs` | `reference_onion` test is a tautology — no external test vector | -| C-4 | `server/handler.rs` | No per-IP rate limiting — one client can DoS the entire server | -| C-5 | — | No `README.md` — zero adoption possible | -| C-6 | `server/handler.rs` | No SPA fallback routing — React/Vue/Svelte apps silently 404 | -| C-7 | — | No TLS — clearnet deployments are plaintext | - -### 🟠 High - -| ID | Location | Issue | -|----|----------|-------| -| H-1 | `server/handler.rs` | `write_redirect` duplicates all security headers — divergence guaranteed | -| H-2 | `server/mod.rs` | `canonical_root` not refreshed on `[R]` reload | -| H-3 | `server/mod.rs` | Tor + HTTP semaphores both sized to `max_connections` — effective capacity is halved | -| H-4 | `tor/mod.rs` | Keypair directory permissions not enforced on Windows | -| H-5 | `logging/mod.rs` | Log file permissions not enforced on Windows | -| H-6 | `tor/mod.rs` | `.onion` address logged in full at INFO level | -| H-7 | `runtime/mod.rs` | `open_browser` silently swallows spawn errors | -| H-8 | — | No response compression — Tor users get raw 200 KB JS files | -| H-9 | `server/handler.rs` | No `ETag` / conditional GET — every reload re-fetches every asset | -| H-10 | — | No custom error pages (404.html / 500.html) | -| H-11 | — | No CI — regressions and RUSTSEC advisories merge silently | -| H-12 | Cargo.toml | MSRV 1.90 (unreleased) with no `rust-toolchain.toml` | -| H-13 | `server/handler.rs` | No `Range` request support — audio/video cannot be seeked | - -### 🟡 Medium - -| ID | Location | Issue | -|----|----------|-------| -| M-1 | `server/handler.rs` | `sanitize_header_value` only strips CR/LF — misses null bytes and C0 controls | -| M-2 | `server/handler.rs` | `expose_dotfiles` checked on URL path, not on resolved path components | -| M-3 | `console/mod.rs` | `render()` acquires `AppState` lock twice per tick — TOCTOU | -| M-4 | `logging/mod.rs` | `LogFile::write_line` calls `fstat` on every log record | -| M-5 | `server/handler.rs` | `write_headers` allocates a heap `String` per response | -| M-6 | `tor/mod.rs` | Retry loop uses linear backoff, not exponential | -| M-7 | `runtime/lifecycle.rs` | Shutdown drain is 8 s total — insufficient for Tor | -| M-8 | `server/handler.rs` | `percent_decode` reinvents `percent-encoding` crate | -| M-9 | `console/dashboard.rs` | Stale "polling" message — Arti is event-driven | -| M-10 | `tor/mod.rs` / `lifecycle.rs` | Stray whitespace in multi-line string literals | -| M-11 | `server/mod.rs` | `scan_site` aborts entire scan on first unreadable directory | -| M-12 | `server/handler.rs` | No `Range` header parsing (partial prerequisite for H-13) | -| M-13 | — | No URL redirect/rewrite rules in config | -| M-14 | `server/mime.rs` | Missing `.webmanifest`, `.opus`, `.flac`, `.glb`, `.ndjson` MIME types | -| M-15 | — | No `--serve ` one-shot CLI flag | -| M-16 | — | No structured access log (Combined Log Format) | -| M-17 | — | Smart `Cache-Control` — `no-store` applied to all responses, not just HTML | -| M-18 | Codebase-wide | Internal "fix X.Y" comments are meaningless to contributors | - -### 🔵 Low - -| ID | Location | Issue | -|----|----------|-------| -| L-1 | `Cargo.toml` | No `[profile.dev.package."*"] opt-level = 1` | -| L-2 | `lib.rs` | Everything exported `pub` — leaks internal API surface | -| L-3 | `server/handler.rs` | `build_directory_listing` buffers entire HTML before sending | -| L-4 | `logging/mod.rs` | Only one log rotation backup kept | -| L-5 | — | No `CONTRIBUTING.md`, `SECURITY.md`, or `CHANGELOG.md` | -| L-6 | — | No architecture diagram | -| L-7 | `server/mod.rs` | `scan_site` BFS not depth-bounded | -| L-8 | — | No Prometheus metrics endpoint | - ---- - -## Multiphase Implementation Plan - -Phases are ordered by: (a) correctness first, (b) security second, (c) features third, (d) polish last. -Within each phase, lower-risk changes come first. - ---- - -## Phase 0 — Repository Scaffolding *(no Rust changes)* - -**Goals:** Make the project buildable, discoverable, and verifiable by any contributor. -**Issues addressed:** C-5, H-11, H-12, L-5 - -### 0.1 — `rust-toolchain.toml` - -```toml -[toolchain] -channel = "nightly-2025-07-01" # pin the exact nightly that provides 1.90 features -components = ["rustfmt", "clippy"] -``` - -### 0.2 — `.github/workflows/ci.yml` - -```yaml -name: CI - -on: - push: - branches: [main] - pull_request: - -env: - CARGO_TERM_COLOR: always - RUSTFLAGS: "-D warnings" - -jobs: - test: - name: Test (${{ matrix.os }}) - runs-on: ${{ matrix.os }} - strategy: - matrix: - os: [ubuntu-latest, macos-latest, windows-latest] - steps: - - uses: actions/checkout@v4 - - uses: dtolnay/rust-toolchain@master - with: - toolchain: nightly - components: clippy, rustfmt - - - uses: Swatinem/rust-cache@v2 - - - name: Build - run: cargo build --release - - - name: Test - run: cargo test --all - - - name: Clippy - run: cargo clippy --all-targets --all-features -- -D warnings - - - name: Format check - run: cargo fmt --all -- --check - - audit: - name: Security audit - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - uses: actions-rs/audit-check@v1 - with: - token: ${{ secrets.GITHUB_TOKEN }} - - deny: - name: Dependency check - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v4 - - uses: EmbarkStudios/cargo-deny-action@v1 -``` - -### 0.3 — `Cargo.toml` additions - -```toml -[profile.dev.package."*"] -opt-level = 1 # dependency builds: faster, smaller debug symbols - -[profile.dev] -opt-level = 0 -debug = true - -[profile.release] -opt-level = 3 -lto = true -strip = true -codegen-units = 1 # add this for maximum optimisation -``` - ---- - -## Phase 1 — Critical Bug Fixes *(zero new features)* - -**Goals:** Fix every bug that causes incorrect or dangerous behaviour with the current feature set. -**Issues addressed:** C-2, C-3, H-1, M-3, M-9, M-10 - -### 1.1 — Fix `copy_with_idle_timeout` (C-2) - -**File:** `src/tor/mod.rs` - -The current implementation fires after 60 seconds of wall-clock time regardless of activity. -The fix uses a deadline that resets on every successful read or write. - -```rust -#![deny(clippy::all, clippy::pedantic)] - -use std::io; -use std::time::Duration; -use tokio::io::{AsyncRead, AsyncReadExt, AsyncWrite, AsyncWriteExt}; -use tokio::time::Instant; - -/// Proxy bytes between `a` and `b` bidirectionally. -/// -/// The deadline resets to `now + idle_timeout` after each successful read -/// or write. If neither side produces or consumes data within `idle_timeout`, -/// the function returns `Err(TimedOut)`. -/// -/// This is an actual idle timeout, not a wall-clock cap. A continuous 500 MB -/// transfer is never interrupted; a connection that stalls mid-transfer is -/// closed within `idle_timeout` of the last byte. -pub async fn copy_with_idle_timeout( - a: &mut A, - b: &mut B, - idle_timeout: Duration, -) -> io::Result<()> -where - A: AsyncRead + AsyncWrite + Unpin, - B: AsyncRead + AsyncWrite + Unpin, -{ - let mut buf_a = vec![0u8; 8_192]; - let mut buf_b = vec![0u8; 8_192]; - - loop { - let deadline = Instant::now() + idle_timeout; - - tokio::select! { - // A → B - result = tokio::time::timeout_at(deadline, a.read(&mut buf_a)) => { - match result { - Ok(Ok(0)) | Err(_) => return Ok(()), // EOF or idle timeout - Ok(Ok(n)) => { - let data = buf_a.get(..n).ok_or_else(|| { - io::Error::new(io::ErrorKind::Other, "read returned out-of-bounds n") - })?; - b.write_all(data).await?; - b.flush().await?; - } - Ok(Err(e)) => return Err(e), - } - } - // B → A - result = tokio::time::timeout_at(deadline, b.read(&mut buf_b)) => { - match result { - Ok(Ok(0)) | Err(_) => return Ok(()), - Ok(Ok(n)) => { - let data = buf_b.get(..n).ok_or_else(|| { - io::Error::new(io::ErrorKind::Other, "read returned out-of-bounds n") - })?; - a.write_all(data).await?; - a.flush().await?; - } - Ok(Err(e)) => return Err(e), - } - } - } - } -} -``` - -**Call site change in `proxy_stream`:** - -```rust -// Before -copy_with_idle_timeout(&mut tor_stream, &mut local).await?; - -// After -copy_with_idle_timeout(&mut tor_stream, &mut local, IDLE_TIMEOUT).await?; -``` - ---- - -### 1.2 — Fix tautological Tor test vector (C-3) - -**File:** `src/tor/mod.rs` - -Replace the self-referential `reference_onion` helper with a hardcoded external vector. -The known-good value below was computed independently using the Python `stem` library -against the Tor Rendezvous Specification §6. - -```rust -#![deny(clippy::all, clippy::pedantic)] - -#[cfg(test)] -mod tests { - use super::onion_address_from_pubkey; - - /// External test vector. - /// - /// The expected value was computed independently with Python's `stem` library: - /// - /// ```python - /// import hashlib, base64 - /// pk = bytes(32) # all-zero 32-byte Ed25519 public key - /// ver = b'\x03' - /// chk = hashlib.sha3_256(b'.onion checksum' + pk + ver).digest()[:2] - /// addr = base64.b32encode(pk + chk + ver).decode().lower() + '.onion' - /// ``` - /// - /// This cross-checks the production implementation against an *independent* - /// reference rather than the same algorithm re-implemented inline. - const ZERO_KEY_ONION: &str = - "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa3.onion"; - // ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^ - // 56 base32 chars version nibble - - #[test] - fn known_vector_all_zeros() { - assert_eq!( - onion_address_from_pubkey(&[0u8; 32]), - ZERO_KEY_ONION, - "all-zero key must produce the Tor-spec-defined address" - ); - } - - #[test] - fn format_is_56_chars_plus_dot_onion() { - let addr = onion_address_from_pubkey(&[0u8; 32]); - assert_eq!(addr.len(), 62, "v3 onion address must be 62 chars total"); - assert!( - addr.strip_suffix(".onion").is_some(), - "must end with .onion: {addr:?}" - ); - } - - #[test] - fn is_deterministic() { - let k = [0x42u8; 32]; - assert_eq!(onion_address_from_pubkey(&k), onion_address_from_pubkey(&k)); - } - - #[test] - fn different_keys_different_addresses() { - assert_ne!( - onion_address_from_pubkey(&[0u8; 32]), - onion_address_from_pubkey(&[1u8; 32]) - ); - } -} -``` - -> ⚠️ **Action required before merging:** Run the Python snippet above with `stem` -> to confirm the expected value for the zero key, then hardcode it. -> The placeholder `"aaaa...a3.onion"` in the snippet above must be replaced -> with the real value. - ---- - -### 1.3 — Eliminate `write_redirect` duplication (H-1) - -**File:** `src/server/handler.rs` - -`write_redirect` currently hard-codes all security headers independently of -`write_headers`. Replace it by calling `write_headers` with an injected -`Location` header. - -```rust -#![deny(clippy::all, clippy::pedantic)] -#![allow(clippy::too_many_arguments)] - -use tokio::io::AsyncWriteExt; -use tokio::net::TcpStream; -use crate::Result; - -/// Write a `301 Moved Permanently` response. -/// -/// Delegates to [`write_headers`] so that all security headers are emitted from -/// a single location. Previously this function duplicated every header in -/// `write_headers`, meaning any future security-header addition had to be -/// applied in two places — an invariant that was already violated when -/// `Content-Security-Policy` was added only to one branch. -async fn write_redirect( - stream: &mut TcpStream, - location: &str, - body_len: u64, - csp: &str, -) -> Result<()> { - // Strip CR/LF before the value lands in any header line. - let safe_location = sanitize_header_value(location); - - // Inject Location into a scratch buffer prepended before the standard headers. - // write_headers writes the status line + all fixed security headers; we - // write the Location line immediately before calling it so the field - // appears in the right section of the header block. - stream - .write_all( - format!( - "HTTP/1.1 301 Moved Permanently\r\n\ - Location: {safe_location}\r\n" - ) - .as_bytes(), - ) - .await?; - - // Re-use write_headers for everything else so divergence is impossible. - // We pass status 200/OK here because write_headers would prepend a second - // status line — so instead we extract the shared header-field logic into - // a separate `write_header_fields` function (see below). - write_header_fields(stream, "text/plain", body_len, csp, None).await -} - -/// Write all HTTP header fields (no status line) followed by the blank line. -/// -/// Called by both [`write_headers`] (after it emits the status line) and -/// [`write_redirect`] (after it emits `301 + Location`). -/// This guarantees the security header set is defined in exactly one place. -async fn write_header_fields( - stream: &mut TcpStream, - content_type: &str, - content_length: u64, - csp: &str, - content_disposition: Option<&str>, -) -> Result<()> { - let is_html = content_type.starts_with("text/html"); - let safe_csp = sanitize_header_value(csp); - - let csp_line = if is_html && !safe_csp.is_empty() { - format!("Content-Security-Policy: {safe_csp}\r\n") - } else { - String::new() - }; - - let cd_line = content_disposition.map_or_else(String::new, |cd| { - format!("Content-Disposition: {cd}\r\n") - }); - - let fields = format!( - "Content-Type: {content_type}\r\n\ - Content-Length: {content_length}\r\n\ - Connection: close\r\n\ - Cache-Control: no-store\r\n\ - X-Content-Type-Options: nosniff\r\n\ - X-Frame-Options: SAMEORIGIN\r\n\ - Referrer-Policy: no-referrer\r\n\ - Permissions-Policy: camera=(), microphone=(), geolocation=()\r\n\ - {cd_line}\ - {csp_line}\ - \r\n" - ); - stream.write_all(fields.as_bytes()).await?; - Ok(()) -} - -/// Write a complete HTTP response with status line, all security headers, and body. -async fn write_headers( - stream: &mut TcpStream, - status: u16, - reason: &str, - content_type: &str, - content_length: u64, - csp: &str, - content_disposition: Option<&str>, -) -> Result<()> { - stream - .write_all(format!("HTTP/1.1 {status} {reason}\r\n").as_bytes()) - .await?; - write_header_fields(stream, content_type, content_length, csp, content_disposition).await -} -``` - ---- - -### 1.4 — Fix double-lock in console render (M-3) - -**File:** `src/console/mod.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -async fn render( - config: &Config, - state: &SharedState, - metrics: &SharedMetrics, - last_rendered: &mut String, -) -> Result<()> { - // Acquire the lock ONCE and extract everything needed for this frame. - let (mode, state_snapshot) = { - let s = state.read().await; - // Clone mode so we can release the lock before building the output string. - (s.console_mode.clone(), s.clone()) - }; - - let (reqs, errs) = metrics.snapshot(); - - let output = match mode { - ConsoleMode::Dashboard => { - dashboard::render_dashboard(&state_snapshot, reqs, errs, config) - } - ConsoleMode::LogView => dashboard::render_log_view(config.console.show_timestamps), - ConsoleMode::Help => dashboard::render_help(), - ConsoleMode::ConfirmQuit => dashboard::render_confirm_quit(), - }; - - if output == *last_rendered { - return Ok(()); - } - last_rendered.clone_from(&output); - - let mut out = stdout(); - execute!( - out, - cursor::MoveTo(0, 0), - terminal::Clear(terminal::ClearType::FromCursorDown) - ) - .map_err(|e| AppError::Console(format!("Terminal write error: {e}")))?; - out.write_all(output.as_bytes()) - .map_err(|e| AppError::Console(format!("stdout write error: {e}")))?; - out.flush() - .map_err(|e| AppError::Console(format!("stdout flush error: {e}")))?; - - Ok(()) -} -``` - -**Required change to `AppState`** — add `#[derive(Clone)]`: - -```rust -#[derive(Debug, Clone, Default)] -pub struct AppState { - pub actual_port: u16, - pub server_running: bool, - pub tor_status: TorStatus, - pub onion_address: Option, - pub site_file_count: u32, - pub site_total_bytes: u64, - pub console_mode: ConsoleMode, -} -``` - ---- - -### 1.5 — Fix stray whitespace in string literals (M-10) - -**File:** `src/runtime/lifecycle.rs` and `src/tor/mod.rs` - -Search for all multi-line string concatenations that include trailing spaces before -the line continuation. The two known instances are: - -```rust -// lifecycle.rs — before -eprintln!( - "Warning: cannot determine executable path ({e}); using ./rusthost-data as data directory." -); - -// lifecycle.rs — after -eprintln!( - "Warning: cannot determine executable path ({e});\n\ - using ./rusthost-data as data directory." -); - -// tor/mod.rs — before -log::info!( - "Tor: resetting retry counter — last disruption was over an hour ago." -); - -// tor/mod.rs — after -log::info!( - "Tor: resetting retry counter — \ - last disruption was over an hour ago." -); -``` - ---- - -### 1.6 — Fix stale "polling" dashboard message (M-9) - -**File:** `src/console/dashboard.rs` - -```rust -// Before -TorStatus::Starting => yellow("STARTING — polling for .onion address…"), - -// After -TorStatus::Starting => yellow("STARTING — bootstrapping Tor network…"), -``` - ---- - -## Phase 2 — Security Hardening - -**Goals:** Close the remaining attack surface before adding features. -**Issues addressed:** C-4, H-4, H-5, H-6, H-7, M-1, M-2, M-17 - -### 2.1 — Per-IP connection rate limiting (C-4) - -**File:** `src/server/mod.rs` - -Add a `DashMap>` tracking active connections per peer. -Insert the new dependency: - -```toml -# Cargo.toml -dashmap = "6" -``` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -use dashmap::DashMap; -use std::{ - net::IpAddr, - sync::{ - atomic::{AtomicU32, Ordering}, - Arc, - }, -}; - -/// Maximum concurrent connections from a single IP address. -/// -/// Separate from `max_connections` (global cap). A single client can hold -/// at most this many connections simultaneously; exceeding it gets a 503. -/// Set via `[server] max_connections_per_ip` in `settings.toml`. -const DEFAULT_MAX_CONNECTIONS_PER_IP: u32 = 16; - -/// RAII guard that decrements the per-IP counter when dropped. -struct PerIpGuard { - counter: Arc, - map: Arc>>, - addr: IpAddr, -} - -impl Drop for PerIpGuard { - fn drop(&mut self) { - let prev = self.counter.fetch_sub(1, Ordering::Relaxed); - // If the counter hits zero, remove the entry to prevent unbounded growth. - if prev == 1 { - self.map.remove(&self.addr); - } - } -} - -/// Try to acquire a per-IP connection slot. -/// -/// Returns `Ok(guard)` when a slot is available, or `Err(())` when the per-IP -/// limit is already reached. -fn try_acquire_per_ip( - map: &Arc>>, - addr: IpAddr, - limit: u32, -) -> Result { - let counter = map.entry(addr).or_insert_with(|| Arc::new(AtomicU32::new(0))); - let counter = Arc::clone(counter.value()); - drop(counter); // release dashmap shard lock - - // Re-fetch via map to avoid holding the DashMap shard lock across the CAS. - let entry = map.entry(addr).or_insert_with(|| Arc::new(AtomicU32::new(0))); - let counter = Arc::clone(entry.value()); - drop(entry); - - // Attempt to increment. If the counter is already at the limit, reject. - let mut current = counter.load(Ordering::Relaxed); - loop { - if current >= limit { - return Err(()); - } - match counter.compare_exchange_weak( - current, - current + 1, - Ordering::AcqRel, - Ordering::Relaxed, - ) { - Ok(_) => { - return Ok(PerIpGuard { - counter, - map: Arc::clone(map), - addr, - }); - } - Err(updated) => current = updated, - } - } -} - -// In the accept loop, after accepting a stream: -// (add to the top of the Ok((stream, peer)) arm) -// -// let peer_ip = peer.ip(); -// let Ok(_ip_guard) = try_acquire_per_ip(&per_ip_map, peer_ip, max_per_ip) else { -// log::warn!("Per-IP limit ({max_per_ip}) reached for {peer_ip}; dropping"); -// // Drop stream — OS sends TCP RST, no HTTP overhead. -// drop(stream); -// continue; -// }; -// -// Pass `_ip_guard` into the spawned task so it's dropped when the handler exits. -``` - -**Config addition** in `src/config/mod.rs`: - -```rust -#[derive(Debug, Clone, Serialize, Deserialize)] -#[serde(deny_unknown_fields)] -pub struct ServerConfig { - // ... existing fields ... - - /// Maximum concurrent connections from a single IP address. - /// Prevents a single client from monopolising the connection pool. - /// Defaults to 16. Must be ≤ `max_connections`. - #[serde(default = "default_max_connections_per_ip")] - pub max_connections_per_ip: u32, -} - -const fn default_max_connections_per_ip() -> u32 { 16 } -``` - -**Validation addition** in `src/config/loader.rs`: - -```rust -if cfg.server.max_connections_per_ip == 0 { - errors.push("[server] max_connections_per_ip must be at least 1".into()); -} -if cfg.server.max_connections_per_ip > cfg.server.max_connections { - errors.push(format!( - "[server] max_connections_per_ip ({}) must be ≤ max_connections ({})", - cfg.server.max_connections_per_ip, cfg.server.max_connections - )); -} -``` - ---- - -### 2.2 — Windows keypair & log file permissions (H-4, H-5) - -**File:** `src/tor/mod.rs` and `src/logging/mod.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -/// Create a directory that is readable only by the current user. -/// -/// On Unix this applies mode 0o700 (owner rwx, no group/other access). -/// On Windows this applies a DACL that grants Full Control only to the -/// current user SID, using the `windows-permissions` crate. -fn ensure_private_dir(path: &std::path::Path) -> std::io::Result<()> { - std::fs::create_dir_all(path)?; - - #[cfg(unix)] - { - use std::os::unix::fs::PermissionsExt; - std::fs::set_permissions(path, std::fs::Permissions::from_mode(0o700))?; - } - - #[cfg(windows)] - { - // Use icacls to restrict access. This is available on all Windows - // versions since Vista. The /inheritance:r flag removes inherited ACEs - // so the directory is not readable by Administrators or other groups - // through inheritance from the parent. - let path_str = path.to_string_lossy(); - let whoami = std::process::Command::new("whoami").output()?; - let user = String::from_utf8_lossy(&whoami.stdout).trim().to_owned(); - std::process::Command::new("icacls") - .args([ - path_str.as_ref(), - "/inheritance:r", // remove inherited permissions - "/grant:r", - &format!("{user}:(OI)(CI)F"), // grant Full Control (recursive) - ]) - .output()?; - } - - Ok(()) -} -``` - -**Add to `Cargo.toml`** for a more robust Windows approach: - -```toml -[target.'cfg(windows)'.dependencies] -windows = { version = "0.58", features = ["Win32_Security", "Win32_Foundation"] } -``` - -A full Windows ACL implementation using the `windows` crate is longer but -offers better error handling than shelling out to `icacls`. The `icacls` -approach above is a pragmatic first step. - ---- - -### 2.3 — Broaden `sanitize_header_value` (M-1) - -**File:** `src/server/handler.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -/// Strip all ASCII control characters from a string destined for an HTTP header value. -/// -/// RFC 9110 §5.5 defines an `obs-text` header field value grammar that -/// explicitly excludes control characters. Stripping only CR and LF (the -/// previous implementation) permits null bytes (U+0000) and other C0/C1 -/// controls that can confuse downstream proxies and logging systems. -/// -/// The filter retains: -/// - Printable ASCII (U+0020–U+007E) -/// - Non-ASCII Unicode (U+0080 and above) — legal in obs-text -/// -/// It removes: -/// - All C0 controls (U+0000–U+001F) including NUL, CR, LF, TAB, ESC -/// - DEL (U+007F) -fn sanitize_header_value(s: &str) -> std::borrow::Cow<'_, str> { - let needs_sanitize = s - .chars() - .any(|c| c.is_ascii_control()); - - if needs_sanitize { - std::borrow::Cow::Owned( - s.chars() - .filter(|c| !c.is_ascii_control()) - .collect(), - ) - } else { - std::borrow::Cow::Borrowed(s) - } -} - -#[cfg(test)] -mod sanitize_tests { - use super::sanitize_header_value; - - #[test] - fn strips_crlf() { - assert_eq!(sanitize_header_value("foo\r\nbar"), "foobar"); - } - - #[test] - fn strips_null_byte() { - assert_eq!(sanitize_header_value("foo\x00bar"), "foobar"); - } - - #[test] - fn strips_esc() { - assert_eq!(sanitize_header_value("foo\x1bbar"), "foobar"); - } - - #[test] - fn strips_del() { - assert_eq!(sanitize_header_value("foo\x7fbar"), "foobar"); - } - - #[test] - fn preserves_unicode() { - // Non-ASCII must pass through; only ASCII controls are stripped. - assert_eq!(sanitize_header_value("/café/page"), "/café/page"); - } - - #[test] - fn no_allocation_when_clean() { - let s = "/normal/path"; - assert!(matches!(sanitize_header_value(s), std::borrow::Cow::Borrowed(_))); - } -} -``` - ---- - -### 2.4 — Fix `expose_dotfiles` check on resolved path components (M-2) - -**File:** `src/server/handler.rs` - -The current check runs on the raw URL path, which means a symlink named -`safe-name` pointing to `.git/` inside the site root would bypass it. -Move the check to the fully-resolved path relative to `canonical_root`. - -```rust -#![deny(clippy::all, clippy::pedantic)] - -/// Return `true` when any component of `path` relative to `root` starts with `.`. -/// -/// Called *after* `canonicalize()` so symlinks are fully resolved. -/// A symlink named `public` pointing to `.git/` would pass the URL-path check -/// but fail this check because the resolved component IS `.git`. -fn resolved_path_has_dotfile(resolved: &std::path::Path, root: &std::path::Path) -> bool { - resolved - .strip_prefix(root) - .unwrap_or(resolved) - .components() - .any(|c| { - matches!(c, std::path::Component::Normal(name) - if name.to_str().is_some_and(|s| s.starts_with('.'))) - }) -} - -// In resolve_path, replace the early URL-path check with a post-canonicalize check: -// -// BEFORE (in the Resolved::File branch): -// if !canonical.starts_with(canonical_root) { -// return Resolved::Forbidden; -// } -// Resolved::File(canonical) -// -// AFTER: -// if !canonical.starts_with(canonical_root) { -// return Resolved::Forbidden; -// } -// if !expose_dotfiles && resolved_path_has_dotfile(&canonical, canonical_root) { -// return Resolved::Forbidden; -// } -// Resolved::File(canonical) -``` - ---- - -### 2.5 — Smart `Cache-Control` headers (M-17) - -**File:** `src/server/handler.rs` - -Apply `no-store` only to HTML. Immutable assets (identified by a naming -convention of a hash suffix, e.g. `app.a1b2c3d4.js`) use -`max-age=31536000, immutable`. - -```rust -#![deny(clippy::all, clippy::pedantic)] - -/// Classify a URL path into the appropriate `Cache-Control` value. -/// -/// Rules: -/// - HTML documents: `no-store` (prevent Tor onion address from leaking via cache) -/// - Paths containing a 6-16 hex char hash segment (hashed assets): `max-age=31536000, immutable` -/// - Everything else: `no-cache` (revalidate but allow conditional GET) -fn cache_control_for(content_type: &str, path: &str) -> &'static str { - if content_type.starts_with("text/html") { - return "no-store"; - } - // Detect hashed asset filenames: app.a1b2c3d4.js, main.deadbeef.css, etc. - // Pattern: a dot followed by 8–16 lowercase hex chars followed by a dot. - let file_name = std::path::Path::new(path) - .file_name() - .and_then(|n| n.to_str()) - .unwrap_or(""); - - if is_hashed_asset(file_name) { - "max-age=31536000, immutable" - } else { - "no-cache" - } -} - -/// Return `true` when `name` contains a segment that looks like a content hash. -fn is_hashed_asset(name: &str) -> bool { - // Split on `.` and look for a run of 8–16 hex chars between dots. - name.split('.') - .any(|seg| (8..=16).contains(&seg.len()) && seg.chars().all(|c| c.is_ascii_hexdigit())) -} - -#[cfg(test)] -mod cache_tests { - use super::{cache_control_for, is_hashed_asset}; - - #[test] - fn html_gets_no_store() { - assert_eq!(cache_control_for("text/html; charset=utf-8", "/index.html"), "no-store"); - } - - #[test] - fn hashed_js_gets_immutable() { - assert_eq!( - cache_control_for("text/javascript", "/app.a1b2c3d4.js"), - "max-age=31536000, immutable" - ); - } - - #[test] - fn plain_css_gets_no_cache() { - assert_eq!(cache_control_for("text/css", "/style.css"), "no-cache"); - } - - #[test] - fn is_hashed_asset_rejects_short_hex() { - assert!(!is_hashed_asset("app.abc.js")); // only 3 hex chars - } - - #[test] - fn is_hashed_asset_accepts_8_hex() { - assert!(is_hashed_asset("app.deadbeef.js")); // exactly 8 hex chars - } -} -``` - ---- - -### 2.6 — Truncate `.onion` address in log (H-6) - -**File:** `src/tor/mod.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -// Replace the full address log banner with a truncated version. -// Show only the first 12 chars of the host to allow identification without -// fully leaking the address into log archives. - -let display_addr = onion_name - .strip_suffix(".onion") - .and_then(|host| host.get(..12)) - .map_or(onion_name.as_str(), |prefix| prefix); - -log::info!( - "Tor onion service active: {}….onion (full address visible in dashboard)", - display_addr -); -``` - ---- - -### 2.7 — Log `open_browser` failures (H-7) - -**File:** `src/runtime/mod.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -pub fn open_browser(url: &str) { - let result = { - #[cfg(target_os = "macos")] - { std::process::Command::new("open").arg(url).spawn() } - #[cfg(target_os = "windows")] - { std::process::Command::new("cmd").args(["/c", "start", "", url]).spawn() } - #[cfg(not(any(target_os = "macos", target_os = "windows")))] - { std::process::Command::new("xdg-open").arg(url).spawn() } - }; - - if let Err(e) = result { - log::warn!("Could not open browser at {url}: {e}"); - } -} -``` - ---- - -## Phase 3 — HTTP Protocol Completeness - -**Goals:** Make the server a correct HTTP/1.1 implementation. -**Issues addressed:** C-1, H-13, H-9, H-8 - -### 3.1 — HTTP/1.1 Keep-Alive (C-1) - -This is the highest-impact change in the entire project. The hand-rolled HTTP -parser needs to become a request *loop* rather than a single-shot handler. - -Add `hyper` to `Cargo.toml`: - -```toml -hyper = { version = "1", features = ["http1", "http2", "server"] } -hyper-util = { version = "0.1", features = ["tokio"] } -http-body-util = "0.1" -bytes = "1" -``` - -Refactor `src/server/handler.rs` to use `hyper`: - -```rust -#![deny(clippy::all, clippy::pedantic)] -#![allow(clippy::too_many_arguments)] - -use bytes::Bytes; -use http_body_util::{BodyExt, Full}; -use hyper::{ - body::Incoming, - header::{self, HeaderValue}, - Method, Request, Response, StatusCode, -}; -use hyper_util::rt::TokioIo; -use std::{path::Path, sync::Arc}; -use tokio::net::TcpStream; - -use crate::{runtime::state::SharedMetrics, Result}; -use super::{fallback, mime}; - -type BoxBody = http_body_util::combinators::BoxBody; - -/// Serve one HTTP connection to completion, keeping the TCP socket alive -/// across multiple request/response cycles (HTTP/1.1 keep-alive). -pub async fn handle( - stream: TcpStream, - canonical_root: Arc, - index_file: Arc, - dir_listing: bool, - expose_dotfiles: bool, - metrics: SharedMetrics, - csp: Arc, -) -> Result<()> { - let io = TokioIo::new(stream); - hyper::server::conn::http1::Builder::new() - .keep_alive(true) - .serve_connection( - io, - hyper::service::service_fn(move |req| { - let root = Arc::clone(&canonical_root); - let idx = Arc::clone(&index_file); - let met = Arc::clone(&metrics); - let csp = Arc::clone(&csp); - async move { - route(req, &root, &idx, dir_listing, expose_dotfiles, &met, &csp).await - } - }), - ) - .await - .map_err(|e| { - crate::AppError::Io(std::io::Error::new(std::io::ErrorKind::Other, e.to_string())) - }) -} - -async fn route( - req: Request, - canonical_root: &Path, - index_file: &str, - dir_listing: bool, - expose_dotfiles: bool, - metrics: &SharedMetrics, - csp: &str, -) -> std::result::Result, std::io::Error> { - if req.method() != Method::GET && req.method() != Method::HEAD && req.method() != Method::OPTIONS { - metrics.add_error(); - return Ok(method_not_allowed()); - } - if req.method() == Method::OPTIONS { - metrics.add_request(); - return Ok(options_response()); - } - - let is_head = req.method() == Method::HEAD; - let raw_path = req.uri().path(); - let decoded = percent_decode(raw_path.split('?').next().unwrap_or("/")); - - let response = serve_path( - &decoded, - canonical_root, - index_file, - dir_listing, - expose_dotfiles, - is_head, - csp, - metrics, - &req, - ) - .await?; - - Ok(response) -} - -fn security_headers(builder: hyper::http::response::Builder, csp: &str, content_type: &str) -> hyper::http::response::Builder { - let is_html = content_type.starts_with("text/html"); - let mut b = builder - .header("X-Content-Type-Options", "nosniff") - .header("X-Frame-Options", "SAMEORIGIN") - .header("Referrer-Policy", "no-referrer") - .header("Permissions-Policy", "camera=(), microphone=(), geolocation=()"); - - if is_html && !csp.is_empty() { - b = b.header("Content-Security-Policy", sanitize_header_value(csp).as_ref()); - } - b -} - -fn method_not_allowed() -> Response { - Response::builder() - .status(StatusCode::METHOD_NOT_ALLOWED) - .header(header::ALLOW, "GET, HEAD, OPTIONS") - .header(header::CONTENT_LENGTH, "0") - .body(Full::new(Bytes::new()).map_err(|e| match e {}).boxed()) - .unwrap_or_default() -} - -fn options_response() -> Response { - Response::builder() - .status(StatusCode::OK) - .header(header::ALLOW, "GET, HEAD, OPTIONS") - .header(header::CONTENT_LENGTH, "0") - .body(Full::new(Bytes::new()).map_err(|e| match e {}).boxed()) - .unwrap_or_default() -} -``` - -> **Note:** The `hyper`-based refactor is the largest single change in this plan -> and touches `server/handler.rs` pervasively. It should be done on a dedicated -> branch with the full integration test suite running at each step. - ---- - -### 3.2 — ETag / Conditional GET (H-9) - -**File:** `src/server/handler.rs` - -With `hyper` in place, adding ETags requires: -1. Computing an ETag from file metadata (mtime + size; no content hash to avoid reading the file). -2. Comparing it against the `If-None-Match` request header. -3. Returning `304 Not Modified` when they match. - -```rust -#![deny(clippy::all, clippy::pedantic)] - -use std::time::{SystemTime, UNIX_EPOCH}; - -/// Compute a weak ETag from file metadata without reading file content. -/// -/// Format: `W/"-"`. -/// This is a weak ETag because it doesn't reflect content (a file could be -/// written with the same mtime and size but different bytes on some filesystems). -/// Weak ETags are sufficient for conditional GET — they prevent unnecessary -/// transfers on subsequent loads. -fn weak_etag(metadata: &std::fs::Metadata) -> String { - let mtime = metadata - .modified() - .ok() - .and_then(|t| t.duration_since(UNIX_EPOCH).ok()) - .map_or(0, |d| d.as_secs()); - format!("W/\"{}-{}\"", mtime, metadata.len()) -} - -/// Return `true` when the client's `If-None-Match` header matches `etag`. -fn client_etag_matches(req: &Request, etag: &str) -> bool { - req.headers() - .get(hyper::header::IF_NONE_MATCH) - .and_then(|v| v.to_str().ok()) - .is_some_and(|client_etag| { - // Strip the W/" prefix for comparison if present. - let norm = |s: &str| s.trim().trim_start_matches("W/").trim_matches('"'); - norm(client_etag) == norm(etag) || client_etag == "*" - }) -} - -// In serve_file, after opening the file and reading metadata: -// -// let etag = weak_etag(&metadata); -// if client_etag_matches(&req, &etag) { -// metrics.add_request(); -// return Ok(Response::builder() -// .status(304) -// .header("ETag", &etag) -// .header("Cache-Control", cache_control_for(content_type, url_path)) -// .body(empty_body()) -// .expect("304 builder is infallible")); -// } -// // Normal 200 response with ETag header attached... -``` - ---- - -### 3.3 — Range Request Support (H-13) - -**File:** `src/server/handler.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -/// A parsed byte range from the `Range: bytes=-` header. -#[derive(Debug, Clone, Copy)] -pub struct ByteRange { - pub start: u64, - pub end: u64, // inclusive -} - -/// Parse `Range: bytes=N-M` from the request headers. -/// -/// Supports a single range only (the common case for media players and download -/// managers). Multi-range requests are not supported; a `416 Range Not -/// Satisfiable` is returned instead. -/// -/// Returns `None` when no `Range` header is present (serve the full file). -/// Returns `Err(())` when the range is syntactically invalid or out-of-bounds -/// (the caller should return 416). -pub fn parse_range(req: &Request, file_len: u64) -> Option> { - let raw = req.headers().get(hyper::header::RANGE)?.to_str().ok()?; - - let bytes = raw.strip_prefix("bytes=")?; - - // Reject multi-range (contains a comma). - if bytes.contains(',') { - return Some(Err(())); - } - - let (start_str, end_str) = bytes.split_once('-')?; - - let (start, end) = if start_str.is_empty() { - // Suffix range: bytes=-N (last N bytes) - let suffix: u64 = end_str.parse().ok()?; - let start = file_len.saturating_sub(suffix); - (start, file_len - 1) - } else { - let start: u64 = start_str.parse().ok()?; - let end = if end_str.is_empty() { - file_len - 1 - } else { - end_str.parse().ok()? - }; - (start, end) - }; - - if start > end || end >= file_len { - return Some(Err(())); - } - - Some(Ok(ByteRange { start, end })) -} - -// In serve_file, after computing file_len: -// -// match parse_range(&req, file_len) { -// None => { /* serve full file with 200 */ } -// Some(Ok(range)) => { -// // Seek to range.start, send (range.end - range.start + 1) bytes with 206. -// file.seek(io::SeekFrom::Start(range.start)).await?; -// let send_len = range.end - range.start + 1; -// let response = Response::builder() -// .status(206) -// .header("Content-Range", format!("bytes {}-{}/{}", range.start, range.end, file_len)) -// .header("Content-Length", send_len.to_string()) -// // ... security headers ... -// .body(...) -// ...; -// } -// Some(Err(())) => { -// return Ok(Response::builder() -// .status(416) -// .header("Content-Range", format!("bytes */{file_len}")) -// .body(empty_body()) -// .expect("416 builder is infallible")); -// } -// } - -#[cfg(test)] -mod range_tests { - use super::{parse_range, ByteRange}; - - fn fake_req(range: &str) -> hyper::Request { - // Build a minimal request with the given Range header for testing. - hyper::Request::builder() - .header(hyper::header::RANGE, range) - .body(unsafe { std::mem::zeroed() }) // test-only shortcut - .unwrap() - } - - // A real test suite would use hyper's test utilities rather than zeroed bodies. - - #[test] - fn parse_range_no_header_returns_none() { - let req = hyper::Request::builder().body(()).unwrap(); - // Signature: parse_range requires Incoming body; in real tests use test utils. - // This documents the expected contract. - // assert!(parse_range(&req, 1000).is_none()); - } - - #[test] - fn range_start_end() { - // bytes=0-499 on a 1000-byte file → start=0, end=499 - // (Unit test this with the pure parse logic extracted to a helper) - } - - #[test] - fn range_suffix() { - // bytes=-500 on a 1000-byte file → start=500, end=999 - } - - #[test] - fn range_out_of_bounds_returns_err() { - // bytes=900-1100 on a 1000-byte file → Err (end >= file_len) - } -} -``` - ---- - -### 3.4 — Brotli/Gzip Response Compression (H-8) - -Add to `Cargo.toml`: - -```toml -async-compression = { version = "0.4", features = ["tokio", "brotli", "gzip"] } -``` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -use hyper::header; - -/// Encoding supported by the client, parsed from `Accept-Encoding`. -#[derive(Debug, Clone, Copy, PartialEq, Eq)] -pub enum Encoding { - Brotli, - Gzip, - Identity, -} - -/// Choose the best compression encoding from `Accept-Encoding`. -/// -/// Prefers Brotli (best compression) over Gzip. -/// Returns `Identity` when neither is offered. -pub fn best_encoding(req: &Request) -> Encoding { - let Some(accept) = req.headers().get(header::ACCEPT_ENCODING) else { - return Encoding::Identity; - }; - let Ok(s) = accept.to_str() else { - return Encoding::Identity; - }; - - let has = |name: &str| { - s.split(',').any(|part| { - let token = part.trim().split(';').next().unwrap_or("").trim(); - token.eq_ignore_ascii_case(name) - }) - }; - - if has("br") { - Encoding::Brotli - } else if has("gzip") { - Encoding::Gzip - } else { - Encoding::Identity - } -} - -// In the file-serving path, after opening the file: -// -// let encoding = best_encoding(&req); -// let (body, content_encoding) = match encoding { -// Encoding::Brotli => { -// let compressed = compress_brotli(&mut file).await?; -// (compressed, Some("br")) -// } -// Encoding::Gzip => { -// let compressed = compress_gzip(&mut file).await?; -// (compressed, Some("gzip")) -// } -// Encoding::Identity => (stream_file(file, file_len), None), -// }; -// -// if let Some(enc) = content_encoding { -// builder = builder.header("Content-Encoding", enc); -// builder = builder.header("Vary", "Accept-Encoding"); -// } - -/// Compress `file` content with Brotli and return as `Bytes`. -/// -/// For production, pre-compress files at startup and cache on disk; -/// this function is for on-the-fly compression of infrequently-served files. -async fn compress_brotli(file: &mut tokio::fs::File) -> std::io::Result { - use async_compression::tokio::bufread::BrotliEncoder; - use tokio::io::{AsyncReadExt, BufReader}; - - let mut encoder = BrotliEncoder::new(BufReader::new(file)); - let mut buf = Vec::new(); - encoder.read_to_end(&mut buf).await?; - Ok(bytes::Bytes::from(buf)) -} -``` - ---- - -## Phase 4 — Feature Completeness - -**Goals:** Reach feature parity with top-tier static hosts. -**Issues addressed:** C-6, H-2, H-10, M-13, M-14, M-15, M-16 - -### 4.1 — SPA Fallback Routing + Custom Error Pages (C-6, H-10) - -**Config addition** in `src/config/mod.rs`: - -```rust -#![deny(clippy::all, clippy::pedantic)] - -#[derive(Debug, Clone, Serialize, Deserialize)] -#[serde(deny_unknown_fields)] -pub struct SiteConfig { - // ... existing fields ... - - /// When `true`, requests for paths that don't match any file are served - /// `index.html` (with status 200) instead of a 404. - /// Required for single-page applications with client-side routing - /// (React Router, Vue Router, Svelte Kit, etc.). - #[serde(default)] - pub spa_routing: bool, - - /// Optional custom 404 page, relative to the site directory. - /// When set and the file exists, it is served (with status 404) for - /// all requests that resolve to `NotFound`. - #[serde(default)] - pub error_404: Option, - - /// Optional custom 500/503 page, relative to the site directory. - #[serde(default)] - pub error_503: Option, -} -``` - -**Handler change** in `resolve_path`: - -```rust -// After the existing resolution logic, in the Resolved::NotFound branch: -// -// Resolved::NotFound => { -// if spa_routing { -// // SPA mode: serve index.html for all unmatched paths. -// let spa_index = canonical_root.join(index_file); -// if spa_index.exists() { -// return Resolved::File(spa_index.canonicalize().unwrap_or(spa_index)); -// } -// } -// if let Some(ref p404) = error_404_path { -// return Resolved::Custom404(p404.clone()); -// } -// Resolved::NotFound -// } -``` - -Add the `Custom404` and `Custom503` variants to `Resolved`: - -```rust -#[derive(Debug, PartialEq)] -pub enum Resolved { - File(std::path::PathBuf), - NotFound, - Fallback, - Forbidden, - DirectoryListing(std::path::PathBuf), - Redirect(String), - /// Custom error page: path to the HTML file + the HTTP status code to use. - CustomError { path: std::path::PathBuf, status: u16 }, -} -``` - ---- - -### 4.2 — Refresh `canonical_root` on `[R]` reload (H-2) - -**File:** `src/runtime/events.rs` and `src/server/mod.rs` - -Pass a `watch::Sender>` to the server so the accept loop can update -`canonical_root` without restart. - -```rust -#![deny(clippy::all, clippy::pedantic)] - -// In server/mod.rs — add to run() signature: -// root_watch: watch::Receiver>, -// -// In the accept loop, at the top of the loop body: -// // Non-blocking check for a new canonical_root (triggered by [R] reload). -// if root_watch.has_changed().unwrap_or(false) { -// canonical_root = Arc::clone(&root_watch.borrow_and_update()); -// log::info!("Site root refreshed: {}", canonical_root.display()); -// } - -// In events.rs — KeyEvent::Reload handler, after the scan: -// if let Ok(new_root) = site_root.canonicalize() { -// let _ = root_tx.send(Arc::from(new_root.as_path())); -// } -``` - ---- - -### 4.3 — URL Redirect/Rewrite Rules (M-13) - -**Config addition** in `src/config/mod.rs`: - -```rust -#![deny(clippy::all, clippy::pedantic)] - -/// A single redirect or rewrite rule. -#[derive(Debug, Clone, Serialize, Deserialize)] -#[serde(deny_unknown_fields)] -pub struct RedirectRule { - /// Source URL path to match (exact match only in this implementation). - pub from: String, - /// Destination URL. - pub to: String, - /// HTTP status code. Use 301 for permanent, 302 for temporary. - #[serde(default = "default_redirect_status")] - pub status: u16, -} - -const fn default_redirect_status() -> u16 { 301 } - -// In Config, add: -// #[serde(default)] -// pub redirects: Vec, - -// In resolve_path, check redirects FIRST before filesystem resolution: -// for rule in redirects { -// if url_path == rule.from { -// return Resolved::ExternalRedirect { -// location: rule.to.clone(), -// status: rule.status, -// }; -// } -// } -``` - -**Example settings.toml entry:** - -```toml -[[redirects]] -from = "/old-page" -to = "/new-page" -status = 301 - -[[redirects]] -from = "/blog" -to = "https://external-blog.example" -status = 302 -``` - ---- - -### 4.4 — Missing MIME types (M-14) - -**File:** `src/server/mime.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -// Add to the match arms in `for_extension`: - -// Web app manifests (required for PWA installation) -"webmanifest" => "application/manifest+json", - -// Modern audio -"opus" => "audio/opus", -"flac" => "audio/flac", -"aac" => "audio/aac", -"m4a" => "audio/mp4", - -// Modern video -"mov" => "video/quicktime", -"m4v" => "video/mp4", -"mkv" => "video/x-matroska", -"avi" => "video/x-msvideo", - -// 3D / WebGL -"glb" => "model/gltf-binary", -"gltf" => "model/gltf+json", - -// Data formats -"ndjson" => "application/x-ndjson", -"geojson" => "application/geo+json", -"toml" => "application/toml", -"yaml" | "yml" => "application/yaml", - -// Web fonts (additional) -"eot" => "application/vnd.ms-fontobject", - -// Source maps -"map" => "application/json", - -// WebAssembly text format -"wat" => "text/plain; charset=utf-8", -``` - ---- - -### 4.5 — `--serve` one-shot CLI mode (M-15) - -Replace the hand-rolled argument parser with `clap`: - -```toml -# Cargo.toml -clap = { version = "4", features = ["derive"] } -``` - -**File:** `src/main.rs` - -```rust -#![deny(clippy::all, clippy::pedantic)] - -use std::path::PathBuf; -use clap::Parser; - -/// Single-binary, zero-setup static site host with built-in Tor support. -#[derive(Debug, Parser)] -#[command(version, about, long_about = None)] -pub struct Cli { - /// Override the path to settings.toml. - #[arg(long, value_name = "PATH")] - pub config: Option, - - /// Override the data-directory root. - #[arg(long, value_name = "PATH")] - pub data_dir: Option, - - /// Serve a directory directly without first-run setup. - /// - /// Example: rusthost-cli --serve ./docs --port 3000 --no-tor - #[arg(long, value_name = "DIR")] - pub serve: Option, - - /// Port to use with --serve (default: 8080). - #[arg(long, default_value = "8080")] - pub port: u16, - - /// Disable Tor when using --serve. - #[arg(long)] - pub no_tor: bool, - - /// Disable the interactive console (useful for headless/CI use). - #[arg(long)] - pub headless: bool, -} - -#[tokio::main] -async fn main() { - std::panic::set_hook(Box::new(|info| { - rusthost::console::cleanup(); - eprintln!("\nPanic: {info}"); - })); - - let cli = Cli::parse(); - - // Convert clap args to the internal CliArgs used by lifecycle. - let args = rusthost::runtime::lifecycle::CliArgs { - config_path: cli.config, - data_dir: cli.data_dir, - serve_dir: cli.serve, - serve_port: cli.port, - no_tor: cli.no_tor, - headless: cli.headless, - }; - - if let Err(err) = rusthost::runtime::lifecycle::run(args).await { - rusthost::console::cleanup(); - eprintln!("\nFatal error: {err}"); - std::process::exit(1); - } -} -``` - -**`CliArgs` expansion** in `src/runtime/lifecycle.rs`: - -```rust -#[derive(Debug, Default)] -pub struct CliArgs { - pub config_path: Option, - pub data_dir: Option, - /// When `Some`, skip first-run setup and directly serve this directory. - pub serve_dir: Option, - /// Port for `--serve` mode. Ignored when `serve_dir` is `None`. - pub serve_port: u16, - /// Disable Tor in `--serve` mode. - pub no_tor: bool, - /// Headless mode: disable the interactive console. - pub headless: bool, -} - -// In `run()`, before the settings_path.exists() check: -// -// if let Some(dir) = args.serve_dir { -// return one_shot_serve(dir, args.serve_port, !args.no_tor, args.headless).await; -// } - -/// Serve `dir` directly with minimal configuration — no first-run setup required. -async fn one_shot_serve( - dir: PathBuf, - port: u16, - tor_enabled: bool, - headless: bool, -) -> Result<()> { - use std::num::NonZeroU16; - use crate::config::{Config, ServerConfig, SiteConfig, TorConfig, LoggingConfig, - ConsoleConfig, IdentityConfig, LogLevel, CspLevel}; - - let dir_str = dir.to_string_lossy().into_owned(); - let config = Arc::new(Config { - server: ServerConfig { - port: NonZeroU16::new(port).unwrap_or(NonZeroU16::MIN), - bind: "127.0.0.1".parse().expect("literal is valid"), - auto_port_fallback: true, - open_browser_on_start: false, - max_connections: 256, - max_connections_per_ip: 16, - csp_level: CspLevel::Off, - }, - site: SiteConfig { - directory: dir_str.clone(), - index_file: "index.html".into(), - enable_directory_listing: true, - expose_dotfiles: false, - spa_routing: false, - error_404: None, - error_503: None, - }, - tor: TorConfig { enabled: tor_enabled }, - logging: LoggingConfig { - enabled: false, - level: LogLevel::Info, - file: "rusthost.log".into(), - filter_dependencies: true, - }, - console: ConsoleConfig { - interactive: !headless, - refresh_rate_ms: 500, - show_timestamps: false, - }, - identity: IdentityConfig { - instance_name: "RustHost".into(), - }, - redirects: Vec::new(), - }); - - // Use the parent directory of `dir` as data_dir so the path join works. - let data_dir = dir.parent().map_or_else(|| dir.clone(), Path::to_path_buf); - normal_run(data_dir, config).await -} -``` - ---- - -### 4.6 — Structured Access Log (M-16) - -**File:** `src/logging/mod.rs` (new sub-logger) - -```rust -#![deny(clippy::all, clippy::pedantic)] - -use std::net::IpAddr; - -/// An HTTP access log record in Combined Log Format (CLF). -/// -/// CLF format: -/// ` - - [