Conversation
Adds ProximalPolicyOptimizationAgent.resume() classmethod to continue training from a saved checkpoint. Restores VecNormalize statistics and keeps normalization in training mode, mirroring SB3's recommended PPO.load() + .learn(reset_num_timesteps=False) pattern. Also adds reset_num_timesteps keyword argument to do_training() and a --resume-from CLI flag to train_ppo.py. Moves logging.basicConfig to the top of main() in both train_ppo.py and play_ppo.py.
- Add ProximalPolicyOptimizationAgent.resume() classmethod for checkpoint-based
continued training; restores VecNormalize stats and keeps normalization in
training mode (SB3 PPO.load() + learn(reset_num_timesteps=False) pattern)
- Add reset_num_timesteps kwarg to do_training() to preserve LR schedule on resume
- Add --resume-from PATH CLI flag to scripts/train_ppo.py
- Port _save_reward_plot() from non-dnv folder: saves scatter plot of episode
rewards vs training step as PNG alongside the model after each training run
- Wrap environments in TimeLimit(max_episode_steps=3000) so episodes always
terminate even when a plateau agent never triggers the env's own reward condition;
fixes missing PNGs after several rounds of resumed training
- Move logging.basicConfig to top of main() in train_ppo.py and play_ppo.py
- Extend matplotlib stubs: Figure.tight_layout/savefig, Axes.scatter, plt.close,
matplotlib.use()
- Add matplotlib.use("Agg") to tests/conftest.py for headless CI
- Update CHANGELOG.md and README.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ProximalPolicyOptimizationAgent.resume()classmethod to continue training from a savedcheckpoint, restoring VecNormalize statistics and preserving the learning rate schedule via
SB3's
PPO.load()+learn(reset_num_timesteps=False)pattern_save_reward_plot(): saves a scatter plot of episode rewards vs training step as a PNGalongside the model after each training run
TimeLimit(max_episode_steps=3000)so episodes always terminate — fixesa silent bug where a plateau agent (one that learned to stay alive but not succeed) would run
the entire training budget without a single episode end, leaving reward stats empty and no PNG
generated
--resume-from PATHCLI flag toscripts/train_ppo.pyandreset_num_timestepskwargto
do_training()logging.basicConfigto top ofmain()in both scriptsFigure.tight_layout,Figure.savefig,Axes.scatter,plt.close,matplotlib.use()) and addmatplotlib.use("Agg")totests/conftest.pyfor headless CITest plan
uv run pytest tests/test_ppo.py -v— 6/6 passuv run ruff check .— cleanuv run mypy src— cleanuv run pyright— pre-existing pygame errors only, no regressions