Add in-memory database with SIGUSR1 snapshots and --in-memory standalone#280
Add in-memory database with SIGUSR1 snapshots and --in-memory standalone#280daniel-thom merged 7 commits intomainfrom
Conversation
Lets torc-server run entirely from a SQLite in-memory database with on-demand snapshots to disk via SIGUSR1. Targets HPC login/compute nodes where shared filesystems (Lustre, GPFS, NFS) are intermittently slow and stall request handlers. Server side: - bootstrap.rs spawns a snapshot task that listens for SIGUSR1 and runs `VACUUM INTO` to a `.tmp` sibling, rotates older snapshots (`<base>.1`, `.2`, …), then atomically renames into place. Configurable via `TORC_SERVER_SNAPSHOT_PATH` and `TORC_SERVER_SNAPSHOT_KEEP` (default 5, min 1). - torc-server.rs auto-rewrites bare `:memory:` to `sqlite:file::memory:?cache=shared` and sets `min_connections(1)`, fixing a latent bug where pool connections were each getting private in-memory databases. - After each snapshot the server prints `TORC_SNAPSHOT_DONE=<path>` on stdout (flushed) so a parent process can synchronize on snapshot completion. Client side: - New global flags `--in-memory` (Unix only, requires `--standalone`) and `--snapshot-interval-seconds <N>` (requires `--in-memory`). - `start_standalone_server` swaps `--database <path>` for `--database :memory:`, sets `TORC_SERVER_SNAPSHOT_PATH` to the on-disk path, and defaults `TORC_SERVER_SNAPSHOT_KEEP=1` so users see one canonical file at the path they expect. - Stdout drain thread routes `TORC_SNAPSHOT_DONE` lines to a channel the parent waits on. `Drop` sends a final SIGUSR1 and waits up to 5 s for confirmation before stdin-EOF shutdown, so when the command returns, the workflow is queryable. - Periodic-snapshot thread parks on a stop-mpsc with timeout, sending SIGUSR1 each tick until Drop stops it (avoids racing the final snapshot). - `--in-memory` is restricted to `exec` and `run` — read-only commands like `results list` would otherwise snapshot an empty DB over existing data. Docs cover both the low-level server operation and the standalone client mode for HPC use cases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds support for running torc-server against a shared-cache SQLite in-memory database with persistence via SIGUSR1-triggered snapshots, plus a torc --standalone --in-memory client flow intended for HPC environments with intermittently slow shared filesystems.
Changes:
- Server: install a SIGUSR1 listener that snapshots via
VACUUM INTO, rotates snapshots, and emitsTORC_SNAPSHOT_DONE=<path>for parent-process synchronization. - Client: introduce
--in-memoryand optional--snapshot-interval-seconds, wiring standalone startup/shutdown to request snapshots and wait for completion. - Docs: document both direct server usage and the standalone client workflow for HPC usage.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/server/http_server/bootstrap.rs |
Adds SIGUSR1 snapshot task, snapshot config, rotation logic, and stdout completion marker. |
src/main.rs |
Extends standalone subprocess management with SIGUSR1 snapshot triggering, periodic snapshot thread, and shutdown synchronization. |
src/cli.rs |
Adds --in-memory and --snapshot-interval-seconds CLI flags and help text. |
src/bin/torc-server.rs |
Rewrites :memory: to shared-cache URI and keeps at least one pool connection alive in memory mode. |
docs/src/specialized/admin/server-deployment.md |
Documents in-memory mode, snapshot signaling, rotation, and standalone usage. |
docs/src/core/how-to/run-inline-commands.md |
Adds HPC guidance for torc exec using --in-memory and periodic snapshots. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Drain pending TORC_SNAPSHOT_DONE notifications before requesting the final snapshot in StandaloneServer::Drop. Without this, a periodic- snapshot notification could satisfy the wait immediately and let shutdown proceed before the *final* snapshot lands, breaking the "command returns => DB is queryable" contract. - Switch the snapshot-done channel to sync_channel(1) with try_send so periodic notifications can't accumulate unboundedly between the only reader (Drop). - Use SqliteJournalMode::Memory for in-memory databases (WAL is silently ignored for `:memory:`) and update the startup log line to report the actual mode rather than always claiming WAL. - Roll back canonical-snapshot demotion if the final tmp->base rename fails in rotate_and_promote, so the canonical path keeps pointing at a valid snapshot on transient FS errors instead of disappearing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Detect in-memory mode from the pre-rewrite database URL via exact match instead of post-rewrite substring search, eliminating a false-positive on on-disk paths whose filename happens to contain `:memory:` (colons are legal in Unix paths). Also accept bare `:memory:` (no `sqlite:` prefix) since that is what users naturally type when setting `DATABASE_URL` directly. - Distinguish RecvTimeoutError::Timeout from Disconnected when waiting for the final TORC_SNAPSHOT_DONE in StandaloneServer::Drop, so a server that exits early no longer produces a misleading "timed out" warning. - Update the SnapshotConfig.base doc comment to match what the code actually guarantees: relative paths are resolved against the startup CWD when possible, but may remain relative if current_dir() fails. - Add three integration tests covering --in-memory: a basic exec-then-query smoke test, a periodic-snapshot variant with --snapshot-interval-seconds, and a guard test verifying the flag is rejected for read-only commands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Only `sqlite::memory:` (with trailing colon, the sqlx canonical form
and what `format!("sqlite:{}", ":memory:")` produces) and bare
`:memory:` are real spellings; `sqlite::memory` was added
speculatively and is not a form users will ever type.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Detect explicit shared-cache in-memory URLs (anything containing `file::memory:`) as in-memory, so a user supplying e.g. `sqlite:file::memory:?cache=shared` directly via DATABASE_URL or --database gets the same `min_connections(1)` treatment and isn't silently dropped between requests. Split detection from rewriting so user-supplied URLs are preserved verbatim. - Move the snapshot-completion stdout write into spawn_blocking so a slow/backpressured pipe can't park a Tokio worker. - Shorten the periodic-snapshot integration test sleep from 3 s to 1.5 s. One periodic tick (interval 1 s) is still guaranteed and CI cost drops by half. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Exercises `torc -s --in-memory --snapshot-interval-seconds N exec` on a Slurm compute node (the example in run-inline-commands.md) and asserts the snapshotted on-disk torc.db has every subjob completed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Reject `--snapshot-interval-seconds 0` at the CLI layer with a `value_parser` range, so users can't silently disable periodic snapshots while believing they enabled them. Drop the now-redundant `secs > 0` guard in main.rs. - `snapshot_once` now `create_dir_all`s the snapshot parent before `VACUUM INTO`, so a `TORC_SERVER_SNAPSHOT_PATH` pointed at a non-existent directory no longer fails the snapshot. - Register the SIGUSR1 handler before printing `TORC_SERVER_PORT=`, so a parent that races between readiness and the snapshot-loop spawn can't accidentally terminate the server (default SIGUSR1 disposition is terminate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lets torc-server run entirely from a SQLite in-memory database with on-demand snapshots to disk via SIGUSR1. Targets HPC login/compute nodes where shared filesystems (Lustre, GPFS, NFS) are intermittently slow and stall request handlers.
Server side:
VACUUM INTOto a.tmpsibling, rotates older snapshots (<base>.1,.2, …), then atomically renames into place. Configurable viaTORC_SERVER_SNAPSHOT_PATHandTORC_SERVER_SNAPSHOT_KEEP(default 5, min 1).:memory:tosqlite:file::memory:?cache=sharedand setsmin_connections(1), fixing a latent bug where pool connections were each getting private in-memory databases.TORC_SNAPSHOT_DONE=<path>on stdout (flushed) so a parent process can synchronize on snapshot completion.Client side:
--in-memory(Unix only, requires--standalone) and--snapshot-interval-seconds <N>(requires--in-memory).start_standalone_serverswaps--database <path>for--database :memory:, setsTORC_SERVER_SNAPSHOT_PATHto the on-disk path, and defaultsTORC_SERVER_SNAPSHOT_KEEP=1so users see one canonical file at the path they expect.TORC_SNAPSHOT_DONElines to a channel the parent waits on.Dropsends a final SIGUSR1 and waits up to 5 s for confirmation before stdin-EOF shutdown, so when the command returns, the workflow is queryable.--in-memoryis restricted toexecandrun— read-only commands likeresults listwould otherwise snapshot an empty DB over existing data.Docs cover both the low-level server operation and the standalone client mode for HPC use cases.