Summary
env_params are only resolved in the agent loop (CloudAIGymEnv.step), so their validity depends on one fact: will this run be agent-driven? Today that fact has no single source of truth — it's scattered across tr.is_dse_job (config), agent.samples_env_params (config), and args.single_sbatch (CLI), reconciled nowhere.
Symptom
A config with env_params + an env-aware (RL) agent passes validate_dse_env_params (it's is_dse_job and the agent samples). But if run with --single-sbatch, dispatch routes to the grid-unroll path, which calls apply_params_set(combination) with no env_params. The env_params are silently dropped (and the field is left as an unresolved list heading into command-gen).
# src/cloudai/cli/handlers.py (mode decision)
has_dse = any(tr.is_dse_job for tr in test_scenario.test_runs)
if args.single_sbatch or not has_dse: # <-- single_sbatch forces grid unroll
handle_non_dse_job(runner, args)
Root cause (two coupled flaws)
- Missing abstraction: there is no single
is_agent_driven concept. env_params validity (and dispatch) should gate on it, computed once.
--single-sbatch overloads scheduling with search strategy. It is a scheduling/packaging concern (cram cases into one sbatch) but currently forces the grid strategy, overriding whatever agent the config declared. Scheduling should be orthogonal to env / action-space / search strategy. In future --single-sbatch should support agent-driven runs too (e.g. a genetic algorithm launching multiple evaluations in parallel), where env_params are perfectly valid.
Why not a quick guard
Rejecting env_params when args.single_sbatch (the obvious patch) bakes in the exact coupling we want to remove: the day --single-sbatch supports agent-driven runs, env_params would be valid there, yet the guard would still reject them. The fix belongs in the model, not the run handler.
Direction
- Introduce a single source of truth for agent-driven execution (config/agent capability); gate both dispatch and
env_params validation on it.
- Decouple
--single-sbatch from search strategy so it only affects scheduling/packaging and composes with both grid and agent-driven runs.
Pointers
src/cloudai/cli/handlers.py — handle_dry_run_and_run mode decision.
src/cloudai/configurator/env_params.py — validate_dse_env_params.
src/cloudai/systems/slurm/single_sbatch_runner.py — grid unroll calling apply_params_set without env_params.
Surfaced by the env_params work in #901; the underlying scheduling/strategy coupling predates it.
Summary
env_paramsare only resolved in the agent loop (CloudAIGymEnv.step), so their validity depends on one fact: will this run be agent-driven? Today that fact has no single source of truth — it's scattered acrosstr.is_dse_job(config),agent.samples_env_params(config), andargs.single_sbatch(CLI), reconciled nowhere.Symptom
A config with
env_params+ an env-aware (RL) agent passesvalidate_dse_env_params(it'sis_dse_joband the agent samples). But if run with--single-sbatch, dispatch routes to the grid-unroll path, which callsapply_params_set(combination)with noenv_params. The env_params are silently dropped (and the field is left as an unresolved list heading into command-gen).Root cause (two coupled flaws)
is_agent_drivenconcept. env_params validity (and dispatch) should gate on it, computed once.--single-sbatchoverloads scheduling with search strategy. It is a scheduling/packaging concern (cram cases into one sbatch) but currently forces the grid strategy, overriding whatever agent the config declared. Scheduling should be orthogonal to env / action-space / search strategy. In future--single-sbatchshould support agent-driven runs too (e.g. a genetic algorithm launching multiple evaluations in parallel), where env_params are perfectly valid.Why not a quick guard
Rejecting
env_paramswhenargs.single_sbatch(the obvious patch) bakes in the exact coupling we want to remove: the day--single-sbatchsupports agent-driven runs, env_params would be valid there, yet the guard would still reject them. The fix belongs in the model, not the run handler.Direction
env_paramsvalidation on it.--single-sbatchfrom search strategy so it only affects scheduling/packaging and composes with both grid and agent-driven runs.Pointers
src/cloudai/cli/handlers.py—handle_dry_run_and_runmode decision.src/cloudai/configurator/env_params.py—validate_dse_env_params.src/cloudai/systems/slurm/single_sbatch_runner.py— grid unroll callingapply_params_setwithout env_params.Surfaced by the
env_paramswork in #901; the underlying scheduling/strategy coupling predates it.