Support an ephemeral torc server and batched execution of jobs without a pre-existing workflow#271
Support an ephemeral torc server and batched execution of jobs without a pre-existing workflow#271daniel-thom merged 9 commits intomainfrom
Conversation
Introduces `torc exec` for synthesizing workflows from ad-hoc shell commands (with parameter expansion and per-job CPU/memory monitoring) and `torc --standalone` (-s) for spawning an ephemeral torc-server so users can run workflows without an existing server. `torc -s exec` composes the two for the common case: monitor one command or run a parallel queue in one invocation, persisting the workflow in ./torc_output/torc.db for later inspection. Parameter expansion gains an `@file.txt` form for reading values from disk. Docs and the CLI cheat sheet now feature the inline-commands workflow, and integration tests cover the standalone lifecycle (startup, DB persistence, subprocess reap, no-op for local commands) and exec end-to-end (single/ multi-command, commands-file, param product/zip/range, name/description, error paths, time-series + plots). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new “standalone” execution path for the torc CLI that can spawn an ephemeral local torc-server, and introduces a new torc exec subcommand to synthesize and run workflows from inline commands (including parameter expansion), with supporting tests and documentation.
Changes:
- Add
-s/--standalonemode to spawn/tear down a localtorc-serverand route CLI calls to it. - Implement
torc execto build aWorkflowSpecfrom-c/-C/--paraminputs and run it locally via existing job-running infrastructure. - Extend parameter parsing to support
@file-backed parameter lists; add end-to-end integration tests and docs for the new UX.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
src/main.rs |
Spawns and wires an ephemeral torc-server for --standalone; dispatches new Exec command. |
src/cli.rs |
Adds CLI flags for standalone mode and defines the exec subcommand interface/help text. |
src/lib.rs |
Exposes the new exec_cmd module behind the client feature. |
src/exec_cmd.rs |
Implements torc exec workflow synthesis, parameter expansion, monitoring config, and execution. |
src/client/parameter_expansion.rs |
Adds @file parsing for parameter values plus unit tests. |
tests/common.rs |
Adds helpers to build binaries once and run torc -s ... subprocess-based tests. |
tests/test_standalone.rs |
New integration tests covering standalone server lifecycle and DB behaviors. |
tests/test_exec_cmd.rs |
New integration tests covering torc exec behaviors, error paths, and help output. |
docs/src/introduction.md |
Mentions inline execution as a key feature. |
docs/src/core/reference/cli-cheatsheet.md |
Adds cheatsheet section for torc exec. |
docs/src/core/how-to/run-inline-commands.md |
New how-to guide documenting torc exec usage. |
docs/src/core/how-to/index.md |
Adds how-to index entry for inline commands. |
docs/src/SUMMARY.md |
Registers the new how-to page in the docs navigation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fixes Copilot review comments on PR #271: - Pass raw DB path to `torc-server --database`; torc-server itself prefixes `sqlite:`, so the previous `sqlite:<path>` argument would become `sqlite:sqlite:<path>` (#5). - Drain torc-server stderr in a dedicated thread rather than piping and ignoring it or inheriting the fd (#2). Inheriting shares the fd with the child, which prevented Output::wait from returning when process::exit orphaned the server. - Keep draining torc-server stdout for the lifetime of the child. Previously the drainer exited after the port handshake, leaving the pipe buffer able to fill (#7). - `wait()` after `kill()` on every early-return path in `start_standalone_server` so zombies aren't left behind on Unix (#8). - Correct the misleading SAFETY comment on the `unsafe set_var` call in main (#3). Tie the torc-server subprocess lifetime to the torc client via a parent-death pipe, so standalone cleanup no longer depends on `Drop`: - torc-server gains a hidden `--shutdown-on-stdin-eof` flag; when set, a blocking thread reads stdin and signals graceful shutdown on EOF. The existing Ctrl+C handler and the new trigger are composed in a shared `build_shutdown_future` used by both the HTTP and HTTPS branches. - `StandaloneServer` now owns the child's stdin write end; when the torc client terminates by any means (normal return, `process::exit`, signal, crash), the kernel closes the pipe, torc-server sees EOF, and it shuts down. `Drop` still kills the child for the fast path. - Added integration test for the process::exit path. Test/portability fixes: - Use `std::env::join_paths` for PATH construction in the test helper (#9). - Gate the `ps`-based shutdown test with `#[cfg(unix)]` (#6). - Fix misleading comment in the commands-file fixture: indented `#` lines are stripped after `trim()` (#10). - Drop unused `torc_server_binary_path` import and placeholder (#11). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- `detect_spec_file_in_trailing` now only matches on known spec extensions
(yaml/yml/json/json5/kdl). Previously any existing file path triggered a
`torc run` suggestion, so `torc exec commands.txt` (user meant `-C`) was
misdirected. Add a unit test for the non-spec-file case.
- `StandaloneServer::Drop` now performs a graceful shutdown: close the
child's stdin pipe (so the server sees EOF and drains connections), poll
`try_wait` for up to 2s, and only fall back to `kill()` if the server
hasn't exited in time. Protects against interrupting in-flight SQLite
writes on normal exit.
- Remove the `unsafe std::env::set_var("TORC_API_URL", ...)` call in main.
It was unsound (ran after start_standalone_server spawned threads) and
redundant: every subprocess torc spawns (job_runner, async_cli_command,
torc-dash, etc.) already passes TORC_API_URL explicitly at spawn time.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Standalone client uses 127.0.0.1 (matches bind) instead of `localhost`, which can resolve to ::1 first on dual-stack systems and fail to connect. - `torc exec --sample-interval-seconds` now takes u32 with a range(1..) parser, so 0 and negative values are rejected at parse time instead of reaching the resource monitor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| value_parser = clap::value_parser!(u32).range(1..) | ||
| )] | ||
| sample_interval_seconds: Option<u32>, |
There was a problem hiding this comment.
The CLI currently allows arbitrarily large --sample-interval-seconds values (u32 range 1..), but the value is stored in the workflow spec as an i32 (resource_monitor.sample_interval_seconds). This can overflow/wrap when converting and produce a negative interval. Please cap the parser to 1..=i32::MAX (or switch the arg type to i32 with an explicit range) to match the spec type.
| value_parser = clap::value_parser!(u32).range(1..) | |
| )] | |
| sample_interval_seconds: Option<u32>, | |
| value_parser = clap::value_parser!(i32).range(1..=i32::MAX as i64) | |
| )] | |
| sample_interval_seconds: Option<i32>, |
| - **Inline Commands with `torc exec`** — Run one command (or a queue of them) with CPU/memory | ||
| monitoring and parallelism control, no spec file or running server required | ||
| ([how-to](./core/how-to/run-inline-commands.md)) |
There was a problem hiding this comment.
This bullet implies torc exec alone requires no running server, but in the implementation that’s only true when paired with -s/--standalone (otherwise exec still talks to the configured API URL). Consider rewording to torc -s exec or explicitly mentioning the --standalone requirement to avoid confusing new users.
| - **Inline Commands with `torc exec`** — Run one command (or a queue of them) with CPU/memory | |
| monitoring and parallelism control, no spec file or running server required | |
| ([how-to](./core/how-to/run-inline-commands.md)) | |
| - **Inline Commands with `torc -s exec`** — Run one command (or a queue of them) in standalone | |
| mode with CPU/memory monitoring and parallelism control, no spec file or running server | |
| required ([how-to](./core/how-to/run-inline-commands.md)) |
| /// Run the exec command. Returns on workflow completion; exits the process on errors. | ||
| pub fn run(args: ExecArgs, config: &Configuration, user: &str) { | ||
| // Detect the `torc exec <file>` mistake and redirect users to `torc run`. | ||
| if let Some(hint) = detect_spec_file_in_trailing(&args.trailing) { |
There was a problem hiding this comment.
Nice UX guard here. One edge case though: this runs before delimiter handling, so torc exec -- cat workflow.yaml gets treated like a mistaken spec-file call. Could we only apply this hint in the non--- path?
| .as_ref() | ||
| .map(|j| j.granularity.clone()) | ||
| .unwrap_or(MonitorGranularity::Summary), | ||
| sample_interval_seconds: sample_interval_seconds.map(|v| v as i32).unwrap_or(10), |
There was a problem hiding this comment.
Small type gotcha: sample_interval_seconds is u32, but this cast to i32 can overflow on large inputs and go negative. Might be safer to clamp/validate at parse time (or keep an unsigned type through the model).
| torc -s results list # most recent run's jobs + metrics | ||
| torc -s workflows list # every workflow in the standalone DB | ||
| torc -s jobs list <workflow_id> # job statuses for a specific workflow | ||
| torc -s tui # interactive browser |
There was a problem hiding this comment.
Tiny docs mismatch: this shows torc -s tui, but top-level -s is currently ignored for tui. Should this be torc tui --standalone (or should CLI behavior be aligned)?
Adds a new HPC how-to covering the sbatch patterns enabled by the ephemeral torc-server in #271: - `torc -s run workflow.yaml` for single-node spec-driven jobs - `torc -s exec -C commands.txt` for single-node ad-hoc command queues - How to inspect the persisted database from the login node afterwards Also documents the multi-node alternative (persistent torc-server bound to a routable hostname + `srun`) and explains the standalone vs multi-node tradeoff up front so users pick the right pattern. Wired into SUMMARY.md and the HPC section index.
* Address pesap review feedback on #271 - exec_cmd: skip `detect_spec_file_in_trailing` when `--` was used so `torc exec -- cat workflow.yaml` is run as a shell command instead of being flagged as a `torc run` mistake. Added a regression test. - cli: switch `--sample-interval-seconds` from `u32` (range `1..`) to `i32` (range `1..`) so the value can't wrap negative when assigned to `ResourceMonitorConfig::sample_interval_seconds`. Dropped the `as i32` cast in exec_cmd accordingly. - docs: fix `torc -s tui` example (the top-level `-s` is ignored for `tui`; the working invocation is `torc tui --standalone`). Move the `torc exec` how-to out of the first slot in SUMMARY / how-to index / introduction Key Features since it's an alternative entry point, not the main workflow. * Document self-contained Slurm jobs using standalone mode Adds a new HPC how-to covering the sbatch patterns enabled by the ephemeral torc-server in #271: - `torc -s run workflow.yaml` for single-node spec-driven jobs - `torc -s exec -C commands.txt` for single-node ad-hoc command queues - How to inspect the persisted database from the login node afterwards Also documents the multi-node alternative (persistent torc-server bound to a routable hostname + `srun`) and explains the standalone vs multi-node tradeoff up front so users pick the right pattern. Wired into SUMMARY.md and the HPC section index.
User can run `torc -s exec -c "bash work.sh 1" -c "bash work.sh 2" and get
rmon.