Conversation
Allow Slurm execution configs to set an optional srun MPI mode. When configured, job runners now pass the value through as srun --mpi=<value>; when omitted, no MPI flag is added. Add validation and serialization coverage for the new field and extend tests for unset and configured srun argument handling.
Apply srun_mpi only to the outer srun that launches one job runner per allocated node in start_one_worker_per_node mode. Remove the inner per-job srun wiring, update validation to allow the option for direct-mode worker-per-node workflows, and add coverage for the submission script plus the adjusted execution-config behavior.
There was a problem hiding this comment.
Pull request overview
Adds an optional execution_config.srun_mpi setting to control the MPI launcher mode used by the outer srun when launching one job-runner per allocated node, and threads that value through spec parsing/validation into Slurm submission script generation.
Changes:
- Extend
ExecutionConfigwithsrun_mpiand include it in YAML/JSON/KDL parsing + rendering. - Pass
srun_mpiinto Slurm submission script generation and emitsrun --mpi=<value>only when configured. - Add/adjust integration tests for config parsing/validation and script content.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/client/workflow_spec.rs |
Adds ExecutionConfig.srun_mpi, wires serialization, and introduces validation rules for when it’s allowed. |
src/client/commands/slurm.rs |
Passes execution_config.srun_mpi through to Slurm script construction. |
src/client/hpc/hpc_interface.rs |
Extends the HpcInterface::create_submission_script API with an optional srun_mpi parameter. |
src/client/hpc/slurm_interface.rs |
Emits --mpi=<mode> on the outer srun when launching one worker per node. |
src/client/hpc/hpc_manager.rs |
Updates call site for the new create_submission_script signature (currently passes None). |
tests/test_execution_config.rs |
Adds parsing/roundtrip assertions and spec validation tests for srun_mpi. |
tests/test_slurm_commands.rs |
Updates existing tests for the new signature and adds a script-content test covering --mpi. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Can you add something in the docs about this? |
Document the execution_config.srun_mpi field in the workflow-spec reference and add a worker-per-node direct mode example showing how it affects the outer srun job runner launch.
|
Added docs for this in two places:
That should cover both the field reference and the practical usage path for this PR. |
Validate srun_mpi as a single safe token whenever it is provided, reject it unless a worker-per-node schedule_nodes action is present, and enforce the same check when writing Slurm submission scripts. Add regression tests for direct-mode whitespace rejection, slurm-mode no-op rejection, and invalid script-generation input.
|
Addressed the remaining inline review comments in 0b6ac71:
Targeted checks run for this follow-up:
|
|
Follow-up on the four Copilot inline comments:
These changes are in commit 0b6ac71. |
Summary
Testing