Feature/ppo resume training by aleksandarbabicdnv · Pull Request #9 · dnv-opensource/crane-controller

aleksandarbabicdnv · 2026-04-27T18:22:12Z

Summary

Add ProximalPolicyOptimizationAgent.resume() classmethod to continue training from a saved
checkpoint, restoring VecNormalize statistics and preserving the learning rate schedule via
SB3's PPO.load() + learn(reset_num_timesteps=False) pattern
Add _save_reward_plot(): saves a scatter plot of episode rewards vs training step as a PNG
alongside the model after each training run
Wrap environments in TimeLimit(max_episode_steps=3000) so episodes always terminate — fixes
a silent bug where a plateau agent (one that learned to stay alive but not succeed) would run
the entire training budget without a single episode end, leaving reward stats empty and no PNG
generated
Add --resume-from PATH CLI flag to scripts/train_ppo.py and reset_num_timesteps kwarg
to do_training()
Move logging.basicConfig to top of main() in both scripts
Extend matplotlib stubs (Figure.tight_layout, Figure.savefig, Axes.scatter, plt.close,
matplotlib.use()) and add matplotlib.use("Agg") to tests/conftest.py for headless CI

Test plan

uv run pytest tests/test_ppo.py -v — 6/6 pass
uv run ruff check . — clean
uv run mypy src — clean
uv run pyright — pre-existing pygame errors only, no regressions
Train from scratch and resume 4+ times — PNG generated every round including all resumes

Adds ProximalPolicyOptimizationAgent.resume() classmethod to continue training from a saved checkpoint. Restores VecNormalize statistics and keeps normalization in training mode, mirroring SB3's recommended PPO.load() + .learn(reset_num_timesteps=False) pattern. Also adds reset_num_timesteps keyword argument to do_training() and a --resume-from CLI flag to train_ppo.py. Moves logging.basicConfig to the top of main() in both train_ppo.py and play_ppo.py.

- Add ProximalPolicyOptimizationAgent.resume() classmethod for checkpoint-based continued training; restores VecNormalize stats and keeps normalization in training mode (SB3 PPO.load() + learn(reset_num_timesteps=False) pattern) - Add reset_num_timesteps kwarg to do_training() to preserve LR schedule on resume - Add --resume-from PATH CLI flag to scripts/train_ppo.py - Port _save_reward_plot() from non-dnv folder: saves scatter plot of episode rewards vs training step as PNG alongside the model after each training run - Wrap environments in TimeLimit(max_episode_steps=3000) so episodes always terminate even when a plateau agent never triggers the env's own reward condition; fixes missing PNGs after several rounds of resumed training - Move logging.basicConfig to top of main() in train_ppo.py and play_ppo.py - Extend matplotlib stubs: Figure.tight_layout/savefig, Axes.scatter, plt.close, matplotlib.use() - Add matplotlib.use("Agg") to tests/conftest.py for headless CI - Update CHANGELOG.md and README.rst

…ssion

eisDNV

Good to go

aleksandarbabicdnv added 3 commits April 27, 2026 15:17

fix: use correct mypy error code attr-defined for obs_rms.mean suppre…

f63c6ef

…ssion

aleksandarbabicdnv assigned eisDNV Apr 27, 2026

eisDNV closed this Apr 28, 2026

eisDNV reopened this Apr 28, 2026

aleksandarbabicdnv self-assigned this Apr 28, 2026

aleksandarbabicdnv requested a review from eisDNV April 28, 2026 07:56

aleksandarbabicdnv removed their assignment Apr 28, 2026

eisDNV approved these changes Apr 28, 2026

View reviewed changes

eisDNV merged commit ee80120 into main Apr 28, 2026
20 checks passed

eisDNV deleted the feature/ppo-resume-training branch April 28, 2026 08:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/ppo resume training#9

Feature/ppo resume training#9
eisDNV merged 3 commits intomainfrom
feature/ppo-resume-training

aleksandarbabicdnv commented Apr 27, 2026 •

edited

Loading

Uh oh!

eisDNV left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aleksandarbabicdnv commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

eisDNV left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aleksandarbabicdnv commented Apr 27, 2026 •

edited

Loading