Skip to content

Support an ephemeral torc server and batched execution of jobs without a pre-existing workflow#271

Merged
daniel-thom merged 9 commits intomainfrom
feat/standalone
Apr 20, 2026
Merged

Support an ephemeral torc server and batched execution of jobs without a pre-existing workflow#271
daniel-thom merged 9 commits intomainfrom
feat/standalone

Conversation

@daniel-thom
Copy link
Copy Markdown
Collaborator

@daniel-thom daniel-thom commented Apr 19, 2026

User can run `torc -s exec -c "bash work.sh 1" -c "bash work.sh 2" and get

  1. Parallel execution, similar to GNU parallel, but with results tracking.
  2. CPU/memory monitoring, both of the job processes and overall compute node. No more need from rmon.

daniel-thom and others added 4 commits April 18, 2026 22:03
Introduces `torc exec` for synthesizing workflows from ad-hoc shell commands
(with parameter expansion and per-job CPU/memory monitoring) and
`torc --standalone` (-s) for spawning an ephemeral torc-server so users can run
workflows without an existing server. `torc -s exec` composes the two for the
common case: monitor one command or run a parallel queue in one invocation,
persisting the workflow in ./torc_output/torc.db for later inspection.

Parameter expansion gains an `@file.txt` form for reading values from disk.
Docs and the CLI cheat sheet now feature the inline-commands workflow, and
integration tests cover the standalone lifecycle (startup, DB persistence,
subprocess reap, no-op for local commands) and exec end-to-end (single/
multi-command, commands-file, param product/zip/range, name/description,
error paths, time-series + plots).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new “standalone” execution path for the torc CLI that can spawn an ephemeral local torc-server, and introduces a new torc exec subcommand to synthesize and run workflows from inline commands (including parameter expansion), with supporting tests and documentation.

Changes:

  • Add -s/--standalone mode to spawn/tear down a local torc-server and route CLI calls to it.
  • Implement torc exec to build a WorkflowSpec from -c/-C/--param inputs and run it locally via existing job-running infrastructure.
  • Extend parameter parsing to support @file-backed parameter lists; add end-to-end integration tests and docs for the new UX.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/main.rs Spawns and wires an ephemeral torc-server for --standalone; dispatches new Exec command.
src/cli.rs Adds CLI flags for standalone mode and defines the exec subcommand interface/help text.
src/lib.rs Exposes the new exec_cmd module behind the client feature.
src/exec_cmd.rs Implements torc exec workflow synthesis, parameter expansion, monitoring config, and execution.
src/client/parameter_expansion.rs Adds @file parsing for parameter values plus unit tests.
tests/common.rs Adds helpers to build binaries once and run torc -s ... subprocess-based tests.
tests/test_standalone.rs New integration tests covering standalone server lifecycle and DB behaviors.
tests/test_exec_cmd.rs New integration tests covering torc exec behaviors, error paths, and help output.
docs/src/introduction.md Mentions inline execution as a key feature.
docs/src/core/reference/cli-cheatsheet.md Adds cheatsheet section for torc exec.
docs/src/core/how-to/run-inline-commands.md New how-to guide documenting torc exec usage.
docs/src/core/how-to/index.md Adds how-to index entry for inline commands.
docs/src/SUMMARY.md Registers the new how-to page in the docs navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/client/parameter_expansion.rs
Comment thread src/main.rs
Comment thread src/main.rs Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs
Comment thread tests/common.rs
Comment thread tests/test_exec_cmd.rs Outdated
Comment thread tests/test_exec_cmd.rs Outdated
Fixes Copilot review comments on PR #271:

- Pass raw DB path to `torc-server --database`; torc-server itself prefixes
  `sqlite:`, so the previous `sqlite:<path>` argument would become
  `sqlite:sqlite:<path>` (#5).
- Drain torc-server stderr in a dedicated thread rather than piping and
  ignoring it or inheriting the fd (#2). Inheriting shares the fd with the
  child, which prevented Output::wait from returning when process::exit
  orphaned the server.
- Keep draining torc-server stdout for the lifetime of the child. Previously
  the drainer exited after the port handshake, leaving the pipe buffer able
  to fill (#7).
- `wait()` after `kill()` on every early-return path in
  `start_standalone_server` so zombies aren't left behind on Unix (#8).
- Correct the misleading SAFETY comment on the `unsafe set_var` call in main
  (#3).

Tie the torc-server subprocess lifetime to the torc client via a
parent-death pipe, so standalone cleanup no longer depends on `Drop`:

- torc-server gains a hidden `--shutdown-on-stdin-eof` flag; when set, a
  blocking thread reads stdin and signals graceful shutdown on EOF. The
  existing Ctrl+C handler and the new trigger are composed in a shared
  `build_shutdown_future` used by both the HTTP and HTTPS branches.
- `StandaloneServer` now owns the child's stdin write end; when the torc
  client terminates by any means (normal return, `process::exit`, signal,
  crash), the kernel closes the pipe, torc-server sees EOF, and it shuts
  down. `Drop` still kills the child for the fast path.
- Added integration test for the process::exit path.

Test/portability fixes:

- Use `std::env::join_paths` for PATH construction in the test helper (#9).
- Gate the `ps`-based shutdown test with `#[cfg(unix)]` (#6).
- Fix misleading comment in the commands-file fixture: indented `#` lines
  are stripped after `trim()` (#10).
- Drop unused `torc_server_binary_path` import and placeholder (#11).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/exec_cmd.rs Outdated
Comment thread src/main.rs
Comment thread src/main.rs Outdated
- `detect_spec_file_in_trailing` now only matches on known spec extensions
  (yaml/yml/json/json5/kdl). Previously any existing file path triggered a
  `torc run` suggestion, so `torc exec commands.txt` (user meant `-C`) was
  misdirected. Add a unit test for the non-spec-file case.

- `StandaloneServer::Drop` now performs a graceful shutdown: close the
  child's stdin pipe (so the server sees EOF and drains connections), poll
  `try_wait` for up to 2s, and only fall back to `kill()` if the server
  hasn't exited in time. Protects against interrupting in-flight SQLite
  writes on normal exit.

- Remove the `unsafe std::env::set_var("TORC_API_URL", ...)` call in main.
  It was unsound (ran after start_standalone_server spawned threads) and
  redundant: every subprocess torc spawns (job_runner, async_cli_command,
  torc-dash, etc.) already passes TORC_API_URL explicitly at spawn time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main.rs
Comment thread src/exec_cmd.rs
- Standalone client uses 127.0.0.1 (matches bind) instead of `localhost`,
  which can resolve to ::1 first on dual-stack systems and fail to connect.
- `torc exec --sample-interval-seconds` now takes u32 with a range(1..)
  parser, so 0 and negative values are rejected at parse time instead of
  reaching the resource monitor.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/cli.rs
Comment on lines +345 to +347
value_parser = clap::value_parser!(u32).range(1..)
)]
sample_interval_seconds: Option<u32>,
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI currently allows arbitrarily large --sample-interval-seconds values (u32 range 1..), but the value is stored in the workflow spec as an i32 (resource_monitor.sample_interval_seconds). This can overflow/wrap when converting and produce a negative interval. Please cap the parser to 1..=i32::MAX (or switch the arg type to i32 with an explicit range) to match the spec type.

Suggested change
value_parser = clap::value_parser!(u32).range(1..)
)]
sample_interval_seconds: Option<u32>,
value_parser = clap::value_parser!(i32).range(1..=i32::MAX as i64)
)]
sample_interval_seconds: Option<i32>,

Copilot uses AI. Check for mistakes.
Comment thread docs/src/introduction.md Outdated
Comment on lines +42 to +44
- **Inline Commands with `torc exec`** — Run one command (or a queue of them) with CPU/memory
monitoring and parallelism control, no spec file or running server required
([how-to](./core/how-to/run-inline-commands.md))
Copy link

Copilot AI Apr 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet implies torc exec alone requires no running server, but in the implementation that’s only true when paired with -s/--standalone (otherwise exec still talks to the configured API URL). Consider rewording to torc -s exec or explicitly mentioning the --standalone requirement to avoid confusing new users.

Suggested change
- **Inline Commands with `torc exec`** — Run one command (or a queue of them) with CPU/memory
monitoring and parallelism control, no spec file or running server required
([how-to](./core/how-to/run-inline-commands.md))
- **Inline Commands with `torc -s exec`** — Run one command (or a queue of them) in standalone
mode with CPU/memory monitoring and parallelism control, no spec file or running server
required ([how-to](./core/how-to/run-inline-commands.md))

Copilot uses AI. Check for mistakes.
Comment thread src/exec_cmd.rs
@daniel-thom daniel-thom merged commit 6841eff into main Apr 20, 2026
9 checks passed
@daniel-thom daniel-thom deleted the feat/standalone branch April 20, 2026 19:27
Comment thread src/exec_cmd.rs
/// Run the exec command. Returns on workflow completion; exits the process on errors.
pub fn run(args: ExecArgs, config: &Configuration, user: &str) {
// Detect the `torc exec <file>` mistake and redirect users to `torc run`.
if let Some(hint) = detect_spec_file_in_trailing(&args.trailing) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice UX guard here. One edge case though: this runs before delimiter handling, so torc exec -- cat workflow.yaml gets treated like a mistaken spec-file call. Could we only apply this hint in the non--- path?

Comment thread src/exec_cmd.rs
.as_ref()
.map(|j| j.granularity.clone())
.unwrap_or(MonitorGranularity::Summary),
sample_interval_seconds: sample_interval_seconds.map(|v| v as i32).unwrap_or(10),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small type gotcha: sample_interval_seconds is u32, but this cast to i32 can overflow on large inputs and go negative. Might be safer to clamp/validate at parse time (or keep an unsigned type through the model).

torc -s results list # most recent run's jobs + metrics
torc -s workflows list # every workflow in the standalone DB
torc -s jobs list <workflow_id> # job statuses for a specific workflow
torc -s tui # interactive browser
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny docs mismatch: this shows torc -s tui, but top-level -s is currently ignored for tui. Should this be torc tui --standalone (or should CLI behavior be aligned)?

daniel-thom added a commit that referenced this pull request Apr 20, 2026
Adds a new HPC how-to covering the sbatch patterns enabled by the
ephemeral torc-server in #271:

- `torc -s run workflow.yaml` for single-node spec-driven jobs
- `torc -s exec -C commands.txt` for single-node ad-hoc command queues
- How to inspect the persisted database from the login node afterwards

Also documents the multi-node alternative (persistent torc-server bound
to a routable hostname + `srun`) and explains the standalone vs
multi-node tradeoff up front so users pick the right pattern.

Wired into SUMMARY.md and the HPC section index.
daniel-thom added a commit that referenced this pull request Apr 21, 2026
* Address pesap review feedback on #271

- exec_cmd: skip `detect_spec_file_in_trailing` when `--` was used so
  `torc exec -- cat workflow.yaml` is run as a shell command instead of
  being flagged as a `torc run` mistake. Added a regression test.
- cli: switch `--sample-interval-seconds` from `u32` (range `1..`) to
  `i32` (range `1..`) so the value can't wrap negative when assigned to
  `ResourceMonitorConfig::sample_interval_seconds`. Dropped the
  `as i32` cast in exec_cmd accordingly.
- docs: fix `torc -s tui` example (the top-level `-s` is ignored for
  `tui`; the working invocation is `torc tui --standalone`). Move the
  `torc exec` how-to out of the first slot in SUMMARY / how-to index /
  introduction Key Features since it's an alternative entry point, not
  the main workflow.

* Document self-contained Slurm jobs using standalone mode

Adds a new HPC how-to covering the sbatch patterns enabled by the
ephemeral torc-server in #271:

- `torc -s run workflow.yaml` for single-node spec-driven jobs
- `torc -s exec -C commands.txt` for single-node ad-hoc command queues
- How to inspect the persisted database from the login node afterwards

Also documents the multi-node alternative (persistent torc-server bound
to a routable hostname + `srun`) and explains the standalone vs
multi-node tradeoff up front so users pick the right pattern.

Wired into SUMMARY.md and the HPC section index.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants