Description
PPO crashes during training, due to an action space violation during evaluation. The bug is a missing action clip in the evaluation loop.
The training loop clips actions (control_experiment.py:108), but the evaluation loop does not do so in either the sequential or vectorized paths (control_experiment.py:211 and 238).
PPOActorNetProbabilistic uses a plain Normal distribution with no squashing, so its outputs are unbounded.
Description
PPO crashes during training, due to an action space violation during evaluation. The bug is a missing action clip in the evaluation loop.
The training loop clips actions (control_experiment.py:108), but the evaluation loop does not do so in either the sequential or vectorized paths (control_experiment.py:211 and 238).
PPOActorNetProbabilistic uses a plain Normal distribution with no squashing, so its outputs are unbounded.