Support an ephemeral torc server and batched execution of jobs without a pre-existing workflow by daniel-thom · Pull Request #271 · NatLabRockies/torc

daniel-thom · 2026-04-19T17:08:02Z

User can run `torc -s exec -c "bash work.sh 1" -c "bash work.sh 2" and get

Parallel execution, similar to GNU parallel, but with results tracking.
CPU/memory monitoring, both of the job processes and overall compute node. No more need from rmon.

Introduces `torc exec` for synthesizing workflows from ad-hoc shell commands (with parameter expansion and per-job CPU/memory monitoring) and `torc --standalone` (-s) for spawning an ephemeral torc-server so users can run workflows without an existing server. `torc -s exec` composes the two for the common case: monitor one command or run a parallel queue in one invocation, persisting the workflow in ./torc_output/torc.db for later inspection. Parameter expansion gains an `@file.txt` form for reading values from disk. Docs and the CLI cheat sheet now feature the inline-commands workflow, and integration tests cover the standalone lifecycle (startup, DB persistence, subprocess reap, no-op for local commands) and exec end-to-end (single/ multi-command, commands-file, param product/zip/range, name/description, error paths, time-series + plots). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a new “standalone” execution path for the torc CLI that can spawn an ephemeral local torc-server, and introduces a new torc exec subcommand to synthesize and run workflows from inline commands (including parameter expansion), with supporting tests and documentation.

Changes:

Add -s/--standalone mode to spawn/tear down a local torc-server and route CLI calls to it.
Implement torc exec to build a WorkflowSpec from -c/-C/--param inputs and run it locally via existing job-running infrastructure.
Extend parameter parsing to support @file-backed parameter lists; add end-to-end integration tests and docs for the new UX.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`src/main.rs`	Spawns and wires an ephemeral `torc-server` for `--standalone`; dispatches new `Exec` command.
`src/cli.rs`	Adds CLI flags for standalone mode and defines the `exec` subcommand interface/help text.
`src/lib.rs`	Exposes the new `exec_cmd` module behind the `client` feature.
`src/exec_cmd.rs`	Implements `torc exec` workflow synthesis, parameter expansion, monitoring config, and execution.
`src/client/parameter_expansion.rs`	Adds `@file` parsing for parameter values plus unit tests.
`tests/common.rs`	Adds helpers to build binaries once and run `torc -s ...` subprocess-based tests.
`tests/test_standalone.rs`	New integration tests covering standalone server lifecycle and DB behaviors.
`tests/test_exec_cmd.rs`	New integration tests covering `torc exec` behaviors, error paths, and help output.
`docs/src/introduction.md`	Mentions inline execution as a key feature.
`docs/src/core/reference/cli-cheatsheet.md`	Adds cheatsheet section for `torc exec`.
`docs/src/core/how-to/run-inline-commands.md`	New how-to guide documenting `torc exec` usage.
`docs/src/core/how-to/index.md`	Adds how-to index entry for inline commands.
`docs/src/SUMMARY.md`	Registers the new how-to page in the docs navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Fixes Copilot review comments on PR #271: - Pass raw DB path to `torc-server --database`; torc-server itself prefixes `sqlite:`, so the previous `sqlite:<path>` argument would become `sqlite:sqlite:<path>` (#5). - Drain torc-server stderr in a dedicated thread rather than piping and ignoring it or inheriting the fd (#2). Inheriting shares the fd with the child, which prevented Output::wait from returning when process::exit orphaned the server. - Keep draining torc-server stdout for the lifetime of the child. Previously the drainer exited after the port handshake, leaving the pipe buffer able to fill (#7). - `wait()` after `kill()` on every early-return path in `start_standalone_server` so zombies aren't left behind on Unix (#8). - Correct the misleading SAFETY comment on the `unsafe set_var` call in main (#3). Tie the torc-server subprocess lifetime to the torc client via a parent-death pipe, so standalone cleanup no longer depends on `Drop`: - torc-server gains a hidden `--shutdown-on-stdin-eof` flag; when set, a blocking thread reads stdin and signals graceful shutdown on EOF. The existing Ctrl+C handler and the new trigger are composed in a shared `build_shutdown_future` used by both the HTTP and HTTPS branches. - `StandaloneServer` now owns the child's stdin write end; when the torc client terminates by any means (normal return, `process::exit`, signal, crash), the kernel closes the pipe, torc-server sees EOF, and it shuts down. `Drop` still kills the child for the fast path. - Added integration test for the process::exit path. Test/portability fixes: - Use `std::env::join_paths` for PATH construction in the test helper (#9). - Gate the `ps`-based shutdown test with `#[cfg(unix)]` (#6). - Fix misleading comment in the commands-file fixture: indented `#` lines are stripped after `trim()` (#10). - Drop unused `torc_server_binary_path` import and placeholder (#11). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- `detect_spec_file_in_trailing` now only matches on known spec extensions (yaml/yml/json/json5/kdl). Previously any existing file path triggered a `torc run` suggestion, so `torc exec commands.txt` (user meant `-C`) was misdirected. Add a unit test for the non-spec-file case. - `StandaloneServer::Drop` now performs a graceful shutdown: close the child's stdin pipe (so the server sees EOF and drains connections), poll `try_wait` for up to 2s, and only fall back to `kill()` if the server hasn't exited in time. Protects against interrupting in-flight SQLite writes on normal exit. - Remove the `unsafe std::env::set_var("TORC_API_URL", ...)` call in main. It was unsound (ran after start_standalone_server spawned threads) and redundant: every subprocess torc spawns (job_runner, async_cli_command, torc-dash, etc.) already passes TORC_API_URL explicitly at spawn time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Standalone client uses 127.0.0.1 (matches bind) instead of `localhost`, which can resolve to ::1 first on dual-stack systems and fail to connect. - `torc exec --sample-interval-seconds` now takes u32 with a range(1..) parser, so 0 and negative values are rejected at parse time instead of reaching the resource monitor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-19T21:25:35Z

+            value_parser = clap::value_parser!(u32).range(1..)
+        )]
+        sample_interval_seconds: Option<u32>,


The CLI currently allows arbitrarily large --sample-interval-seconds values (u32 range 1..), but the value is stored in the workflow spec as an i32 (resource_monitor.sample_interval_seconds). This can overflow/wrap when converting and produce a negative interval. Please cap the parser to 1..=i32::MAX (or switch the arg type to i32 with an explicit range) to match the spec type.

Suggested change

value_parser = clap::value_parser!(u32).range(1..)

)]

sample_interval_seconds: Option<u32>,

value_parser = clap::value_parser!(i32).range(1..=i32::MAX as i64)

)]

sample_interval_seconds: Option<i32>,

Copilot · 2026-04-19T21:25:35Z

+- **Inline Commands with `torc exec`** — Run one command (or a queue of them) with CPU/memory
+  monitoring and parallelism control, no spec file or running server required
+  ([how-to](./core/how-to/run-inline-commands.md))


This bullet implies torc exec alone requires no running server, but in the implementation that’s only true when paired with -s/--standalone (otherwise exec still talks to the configured API URL). Consider rewording to torc -s exec or explicitly mentioning the --standalone requirement to avoid confusing new users.

Suggested change

- **Inline Commands with `torc exec`** — Run one command (or a queue of them) with CPU/memory

monitoring and parallelism control, no spec file or running server required

([how-to](./core/how-to/run-inline-commands.md))

- **Inline Commands with `torc -s exec`** — Run one command (or a queue of them) in standalone

mode with CPU/memory monitoring and parallelism control, no spec file or running server

required ([how-to](./core/how-to/run-inline-commands.md))

pesap · 2026-04-20T20:12:52Z

+/// Run the exec command. Returns on workflow completion; exits the process on errors.
+pub fn run(args: ExecArgs, config: &Configuration, user: &str) {
+    // Detect the `torc exec <file>` mistake and redirect users to `torc run`.
+    if let Some(hint) = detect_spec_file_in_trailing(&args.trailing) {


Nice UX guard here. One edge case though: this runs before delimiter handling, so torc exec -- cat workflow.yaml gets treated like a mistaken spec-file call. Could we only apply this hint in the non--- path?

pesap · 2026-04-20T20:13:01Z

+                .as_ref()
+                .map(|j| j.granularity.clone())
+                .unwrap_or(MonitorGranularity::Summary),
+            sample_interval_seconds: sample_interval_seconds.map(|v| v as i32).unwrap_or(10),


Small type gotcha: sample_interval_seconds is u32, but this cast to i32 can overflow on large inputs and go negative. Might be safer to clamp/validate at parse time (or keep an unsigned type through the model).

pesap · 2026-04-20T20:13:09Z

+torc -s results list                # most recent run's jobs + metrics
+torc -s workflows list              # every workflow in the standalone DB
+torc -s jobs list <workflow_id>     # job statuses for a specific workflow
+torc -s tui                         # interactive browser


Tiny docs mismatch: this shows torc -s tui, but top-level -s is currently ignored for tui. Should this be torc tui --standalone (or should CLI behavior be aligned)?

Adds a new HPC how-to covering the sbatch patterns enabled by the ephemeral torc-server in #271: - `torc -s run workflow.yaml` for single-node spec-driven jobs - `torc -s exec -C commands.txt` for single-node ad-hoc command queues - How to inspect the persisted database from the login node afterwards Also documents the multi-node alternative (persistent torc-server bound to a routable hostname + `srun`) and explains the standalone vs multi-node tradeoff up front so users pick the right pattern. Wired into SUMMARY.md and the HPC section index.

* Address pesap review feedback on #271 - exec_cmd: skip `detect_spec_file_in_trailing` when `--` was used so `torc exec -- cat workflow.yaml` is run as a shell command instead of being flagged as a `torc run` mistake. Added a regression test. - cli: switch `--sample-interval-seconds` from `u32` (range `1..`) to `i32` (range `1..`) so the value can't wrap negative when assigned to `ResourceMonitorConfig::sample_interval_seconds`. Dropped the `as i32` cast in exec_cmd accordingly. - docs: fix `torc -s tui` example (the top-level `-s` is ignored for `tui`; the working invocation is `torc tui --standalone`). Move the `torc exec` how-to out of the first slot in SUMMARY / how-to index / introduction Key Features since it's an alternative entry point, not the main workflow. * Document self-contained Slurm jobs using standalone mode Adds a new HPC how-to covering the sbatch patterns enabled by the ephemeral torc-server in #271: - `torc -s run workflow.yaml` for single-node spec-driven jobs - `torc -s exec -C commands.txt` for single-node ad-hoc command queues - How to inspect the persisted database from the login node afterwards Also documents the multi-node alternative (persistent torc-server bound to a routable hostname + `srun`) and explains the standalone vs multi-node tradeoff up front so users pick the right pattern. Wired into SUMMARY.md and the HPC section index.

daniel-thom and others added 4 commits April 18, 2026 22:03

Add compute node resource monitoring

58c075e

Add support for generate_plots

2c7e8b4

Merge branch 'main' into feat/standalone

8ad773f

daniel-thom requested a review from Copilot April 19, 2026 17:08

Copilot started reviewing on behalf of daniel-thom April 19, 2026 17:08 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

daniel-thom requested review from Copilot and nkeilbart April 19, 2026 18:35

Copilot started reviewing on behalf of daniel-thom April 19, 2026 18:36 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread src/exec_cmd.rs Outdated

Comment thread src/main.rs

Comment thread src/main.rs Outdated

daniel-thom requested a review from Copilot April 19, 2026 20:50

Copilot started reviewing on behalf of daniel-thom April 19, 2026 20:50 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread src/main.rs

Comment thread src/exec_cmd.rs

daniel-thom requested a review from Copilot April 19, 2026 21:18

Copilot started reviewing on behalf of daniel-thom April 19, 2026 21:19 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

daniel-thom and others added 2 commits April 19, 2026 17:42

Fix docs

db5286c

Improve torc exec CLI behavior

134d530

daniel-thom merged commit 6841eff into main Apr 20, 2026
9 checks passed

daniel-thom deleted the feat/standalone branch April 20, 2026 19:27

pesap reviewed Apr 20, 2026

View reviewed changes

daniel-thom mentioned this pull request Apr 20, 2026

Address pesap review feedback on #271 #272

Merged

5 tasks

Conversation

daniel-thom commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pesap Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

pesap Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

pesap Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

daniel-thom commented Apr 19, 2026 •

edited

Loading