Add start-one-worker-per-node to interactive recovery#245
Merged
daniel-thom merged 9 commits intomainfrom Apr 2, 2026
Merged
Conversation
Allow the user to specify start_one_worker_per_node for multi-node Slurm allocations in the torc recover command.
The generated Python and Julia clients were incorrect for the get_pending_actions API command.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates Torc’s recovery workflow UX and OpenAPI artifacts, while also improving Slurm partition reporting and expanding the YAML examples suite.
Changes:
- Add
start_one_worker_per_nodesupport to the interactivetorc recoverflow and propagate it toslurm schedule-nodes. - Fix Slurm partition reporting by introducing
resolved_partitionin scheduler planning and using it for analysis/state queries. - Correct
get_pending_actionsOpenAPI parameter location/encoding, regenerate clients, and add CI parity checks for generated clients.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/server/live_router.rs |
Marks pending-actions params as query params for OpenAPI generation. |
src/client/scheduler_plan.rs |
Adds resolved_partition to planned schedulers and populates it during plan generation. |
src/client/commands/slurm.rs |
Uses resolved_partition for allocation analysis and partition state queries. |
src/client/commands/recover.rs |
Interactive recovery now prompts for and forwards --start-one-worker-per-node. |
src/client/apis/workflow_actions_api.rs |
Updates Rust client query serialization for trigger_type (multi). |
python_client/src/torc/openapi_client/api/workflow_actions_api.py |
Makes trigger_type optional and switches collection format to multi. |
julia_client/Torc/src/api/apis/api_WorkflowActionsApi.jl |
Moves trigger_type to an optional query parameter with explode semantics. |
julia_client/julia_client/docs/WorkflowActionsApi.md |
Updates Julia docs to reflect optional trigger_type query param. |
examples/yaml/stdio_configuration.yaml |
Adds a stdio capture/override example workflow. |
examples/yaml/slurm_staged_pipeline.yaml |
Tweaks CPU binding configuration in the Slurm staged pipeline example. |
examples/yaml/simulation_sweep.yaml |
Adds project and metadata fields to example. |
examples/yaml/ro_crate_provenance.yaml |
Adds project and metadata fields to example. |
examples/yaml/multi_node_slurm.yaml |
Adds a multi-node Slurm + start_one_worker_per_node example workflow. |
examples/yaml/hyperparameter_sweep.yaml |
Adds project and metadata fields to example. |
examples/yaml/fan_in_with_regexes.yaml |
Adds an input-file-regex fan-in example workflow. |
examples/yaml/direct_mode_checkpointing.yaml |
Adds a direct-mode checkpointing/resource enforcement example workflow. |
api/sync_openapi.sh |
Adjusts when OpenAPI parity checks run during sync flows. |
api/regenerate_rust_client.sh |
Switches Rust client cleanup to find -delete, preserving ro_crate_api.rs. |
api/openapi.yaml |
Moves trigger_type for pending actions from path → query and makes it optional. |
api/openapi.codegen.yaml |
Mirrors the same pending-actions parameter change for codegen spec. |
api/check_client_codegen_parity.sh |
Adds a script to diff generated Python/Julia clients against checked-in versions. |
.github/workflows/lint.yml |
Adds CI step to run client codegen parity checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
build.rs watched .git/index, which changes on nearly every git operation, causing cargo to re-run the build script and invalidate all test targets. Removed the GIT_DIRTY env var entirely and replaced .git/index watching with tracking the current branch ref file, so rebuilds only happen on actual commit changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
torc recovercommand