Skip to content

Add in-memory database with SIGUSR1 snapshots and --in-memory standalone#280

Merged
daniel-thom merged 7 commits intomainfrom
feat/in-memory-db
Apr 25, 2026
Merged

Add in-memory database with SIGUSR1 snapshots and --in-memory standalone#280
daniel-thom merged 7 commits intomainfrom
feat/in-memory-db

Conversation

@daniel-thom
Copy link
Copy Markdown
Collaborator

Lets torc-server run entirely from a SQLite in-memory database with on-demand snapshots to disk via SIGUSR1. Targets HPC login/compute nodes where shared filesystems (Lustre, GPFS, NFS) are intermittently slow and stall request handlers.

Server side:

  • bootstrap.rs spawns a snapshot task that listens for SIGUSR1 and runs VACUUM INTO to a .tmp sibling, rotates older snapshots (<base>.1, .2, …), then atomically renames into place. Configurable via TORC_SERVER_SNAPSHOT_PATH and TORC_SERVER_SNAPSHOT_KEEP (default 5, min 1).
  • torc-server.rs auto-rewrites bare :memory: to sqlite:file::memory:?cache=shared and sets min_connections(1), fixing a latent bug where pool connections were each getting private in-memory databases.
  • After each snapshot the server prints TORC_SNAPSHOT_DONE=<path> on stdout (flushed) so a parent process can synchronize on snapshot completion.

Client side:

  • New global flags --in-memory (Unix only, requires --standalone) and --snapshot-interval-seconds <N> (requires --in-memory).
  • start_standalone_server swaps --database <path> for --database :memory:, sets TORC_SERVER_SNAPSHOT_PATH to the on-disk path, and defaults TORC_SERVER_SNAPSHOT_KEEP=1 so users see one canonical file at the path they expect.
  • Stdout drain thread routes TORC_SNAPSHOT_DONE lines to a channel the parent waits on. Drop sends a final SIGUSR1 and waits up to 5 s for confirmation before stdin-EOF shutdown, so when the command returns, the workflow is queryable.
  • Periodic-snapshot thread parks on a stop-mpsc with timeout, sending SIGUSR1 each tick until Drop stops it (avoids racing the final snapshot).
  • --in-memory is restricted to exec and run — read-only commands like results list would otherwise snapshot an empty DB over existing data.

Docs cover both the low-level server operation and the standalone client mode for HPC use cases.

Lets torc-server run entirely from a SQLite in-memory database with
on-demand snapshots to disk via SIGUSR1. Targets HPC login/compute
nodes where shared filesystems (Lustre, GPFS, NFS) are intermittently
slow and stall request handlers.

Server side:
- bootstrap.rs spawns a snapshot task that listens for SIGUSR1 and
  runs `VACUUM INTO` to a `.tmp` sibling, rotates older snapshots
  (`<base>.1`, `.2`, …), then atomically renames into place.
  Configurable via `TORC_SERVER_SNAPSHOT_PATH` and
  `TORC_SERVER_SNAPSHOT_KEEP` (default 5, min 1).
- torc-server.rs auto-rewrites bare `:memory:` to
  `sqlite:file::memory:?cache=shared` and sets `min_connections(1)`,
  fixing a latent bug where pool connections were each getting
  private in-memory databases.
- After each snapshot the server prints `TORC_SNAPSHOT_DONE=<path>`
  on stdout (flushed) so a parent process can synchronize on snapshot
  completion.

Client side:
- New global flags `--in-memory` (Unix only, requires `--standalone`)
  and `--snapshot-interval-seconds <N>` (requires `--in-memory`).
- `start_standalone_server` swaps `--database <path>` for
  `--database :memory:`, sets `TORC_SERVER_SNAPSHOT_PATH` to the
  on-disk path, and defaults `TORC_SERVER_SNAPSHOT_KEEP=1` so users
  see one canonical file at the path they expect.
- Stdout drain thread routes `TORC_SNAPSHOT_DONE` lines to a channel
  the parent waits on. `Drop` sends a final SIGUSR1 and waits up to
  5 s for confirmation before stdin-EOF shutdown, so when the command
  returns, the workflow is queryable.
- Periodic-snapshot thread parks on a stop-mpsc with timeout, sending
  SIGUSR1 each tick until Drop stops it (avoids racing the final
  snapshot).
- `--in-memory` is restricted to `exec` and `run` — read-only
  commands like `results list` would otherwise snapshot an empty DB
  over existing data.

Docs cover both the low-level server operation and the standalone
client mode for HPC use cases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for running torc-server against a shared-cache SQLite in-memory database with persistence via SIGUSR1-triggered snapshots, plus a torc --standalone --in-memory client flow intended for HPC environments with intermittently slow shared filesystems.

Changes:

  • Server: install a SIGUSR1 listener that snapshots via VACUUM INTO, rotates snapshots, and emits TORC_SNAPSHOT_DONE=<path> for parent-process synchronization.
  • Client: introduce --in-memory and optional --snapshot-interval-seconds, wiring standalone startup/shutdown to request snapshots and wait for completion.
  • Docs: document both direct server usage and the standalone client workflow for HPC usage.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/server/http_server/bootstrap.rs Adds SIGUSR1 snapshot task, snapshot config, rotation logic, and stdout completion marker.
src/main.rs Extends standalone subprocess management with SIGUSR1 snapshot triggering, periodic snapshot thread, and shutdown synchronization.
src/cli.rs Adds --in-memory and --snapshot-interval-seconds CLI flags and help text.
src/bin/torc-server.rs Rewrites :memory: to shared-cache URI and keeps at least one pool connection alive in memory mode.
docs/src/specialized/admin/server-deployment.md Documents in-memory mode, snapshot signaling, rotation, and standalone usage.
docs/src/core/how-to/run-inline-commands.md Adds HPC guidance for torc exec using --in-memory and periodic snapshots.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main.rs
Comment thread src/bin/torc-server.rs Outdated
Comment thread src/server/http_server/bootstrap.rs Outdated
Comment thread src/main.rs
- Drain pending TORC_SNAPSHOT_DONE notifications before requesting the
  final snapshot in StandaloneServer::Drop. Without this, a periodic-
  snapshot notification could satisfy the wait immediately and let
  shutdown proceed before the *final* snapshot lands, breaking the
  "command returns => DB is queryable" contract.
- Switch the snapshot-done channel to sync_channel(1) with try_send so
  periodic notifications can't accumulate unboundedly between the only
  reader (Drop).
- Use SqliteJournalMode::Memory for in-memory databases (WAL is
  silently ignored for `:memory:`) and update the startup log line to
  report the actual mode rather than always claiming WAL.
- Roll back canonical-snapshot demotion if the final tmp->base rename
  fails in rotate_and_promote, so the canonical path keeps pointing at
  a valid snapshot on transient FS errors instead of disappearing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/server/http_server/bootstrap.rs Outdated
Comment thread src/bin/torc-server.rs Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs
daniel-thom and others added 2 commits April 25, 2026 08:00
- Detect in-memory mode from the pre-rewrite database URL via exact
  match instead of post-rewrite substring search, eliminating a
  false-positive on on-disk paths whose filename happens to contain
  `:memory:` (colons are legal in Unix paths). Also accept bare
  `:memory:` (no `sqlite:` prefix) since that is what users naturally
  type when setting `DATABASE_URL` directly.
- Distinguish RecvTimeoutError::Timeout from Disconnected when waiting
  for the final TORC_SNAPSHOT_DONE in StandaloneServer::Drop, so a
  server that exits early no longer produces a misleading "timed out"
  warning.
- Update the SnapshotConfig.base doc comment to match what the code
  actually guarantees: relative paths are resolved against the startup
  CWD when possible, but may remain relative if current_dir() fails.
- Add three integration tests covering --in-memory: a basic
  exec-then-query smoke test, a periodic-snapshot variant with
  --snapshot-interval-seconds, and a guard test verifying the flag is
  rejected for read-only commands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Only `sqlite::memory:` (with trailing colon, the sqlx canonical form
and what `format!("sqlite:{}", ":memory:")` produces) and bare
`:memory:` are real spellings; `sqlite::memory` was added
speculatively and is not a form users will ever type.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bin/torc-server.rs Outdated
Comment thread src/server/http_server/bootstrap.rs Outdated
Comment thread tests/test_standalone.rs Outdated
- Detect explicit shared-cache in-memory URLs (anything containing
  `file::memory:`) as in-memory, so a user supplying e.g.
  `sqlite:file::memory:?cache=shared` directly via DATABASE_URL or
  --database gets the same `min_connections(1)` treatment and isn't
  silently dropped between requests. Split detection from rewriting so
  user-supplied URLs are preserved verbatim.
- Move the snapshot-completion stdout write into spawn_blocking so a
  slow/backpressured pipe can't park a Tokio worker.
- Shorten the periodic-snapshot integration test sleep from 3 s to
  1.5 s. One periodic tick (interval 1 s) is still guaranteed and CI
  cost drops by half.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/server/http_server/bootstrap.rs
Comment thread src/server/http_server/bootstrap.rs
Comment thread src/cli.rs Outdated
daniel-thom and others added 2 commits April 25, 2026 08:31
Exercises `torc -s --in-memory --snapshot-interval-seconds N exec` on a
Slurm compute node (the example in run-inline-commands.md) and asserts
the snapshotted on-disk torc.db has every subjob completed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Reject `--snapshot-interval-seconds 0` at the CLI layer with a
  `value_parser` range, so users can't silently disable periodic
  snapshots while believing they enabled them. Drop the now-redundant
  `secs > 0` guard in main.rs.
- `snapshot_once` now `create_dir_all`s the snapshot parent before
  `VACUUM INTO`, so a `TORC_SERVER_SNAPSHOT_PATH` pointed at a
  non-existent directory no longer fails the snapshot.
- Register the SIGUSR1 handler before printing `TORC_SERVER_PORT=`,
  so a parent that races between readiness and the snapshot-loop spawn
  can't accidentally terminate the server (default SIGUSR1 disposition
  is terminate).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@daniel-thom daniel-thom merged commit 5260448 into main Apr 25, 2026
9 checks passed
@daniel-thom daniel-thom deleted the feat/in-memory-db branch April 25, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants