Skip to content

Feature/ppo resume training#9

Merged
eisDNV merged 3 commits intomainfrom
feature/ppo-resume-training
Apr 28, 2026
Merged

Feature/ppo resume training#9
eisDNV merged 3 commits intomainfrom
feature/ppo-resume-training

Conversation

@aleksandarbabicdnv
Copy link
Copy Markdown
Collaborator

@aleksandarbabicdnv aleksandarbabicdnv commented Apr 27, 2026

Summary

  • Add ProximalPolicyOptimizationAgent.resume() classmethod to continue training from a saved
    checkpoint, restoring VecNormalize statistics and preserving the learning rate schedule via
    SB3's PPO.load() + learn(reset_num_timesteps=False) pattern
  • Add _save_reward_plot(): saves a scatter plot of episode rewards vs training step as a PNG
    alongside the model after each training run
  • Wrap environments in TimeLimit(max_episode_steps=3000) so episodes always terminate — fixes
    a silent bug where a plateau agent (one that learned to stay alive but not succeed) would run
    the entire training budget without a single episode end, leaving reward stats empty and no PNG
    generated
  • Add --resume-from PATH CLI flag to scripts/train_ppo.py and reset_num_timesteps kwarg
    to do_training()
  • Move logging.basicConfig to top of main() in both scripts
  • Extend matplotlib stubs (Figure.tight_layout, Figure.savefig, Axes.scatter, plt.close,
    matplotlib.use()) and add matplotlib.use("Agg") to tests/conftest.py for headless CI

Test plan

  • uv run pytest tests/test_ppo.py -v — 6/6 pass
  • uv run ruff check . — clean
  • uv run mypy src — clean
  • uv run pyright — pre-existing pygame errors only, no regressions
  • Train from scratch and resume 4+ times — PNG generated every round including all resumes

Adds ProximalPolicyOptimizationAgent.resume() classmethod to continue
training from a saved checkpoint. Restores VecNormalize statistics and
keeps normalization in training mode, mirroring SB3's recommended
PPO.load() + .learn(reset_num_timesteps=False) pattern.

Also adds reset_num_timesteps keyword argument to do_training() and a
--resume-from CLI flag to train_ppo.py. Moves logging.basicConfig to
the top of main() in both train_ppo.py and play_ppo.py.
- Add ProximalPolicyOptimizationAgent.resume() classmethod for checkpoint-based
  continued training; restores VecNormalize stats and keeps normalization in
  training mode (SB3 PPO.load() + learn(reset_num_timesteps=False) pattern)
- Add reset_num_timesteps kwarg to do_training() to preserve LR schedule on resume
- Add --resume-from PATH CLI flag to scripts/train_ppo.py
- Port _save_reward_plot() from non-dnv folder: saves scatter plot of episode
  rewards vs training step as PNG alongside the model after each training run
- Wrap environments in TimeLimit(max_episode_steps=3000) so episodes always
  terminate even when a plateau agent never triggers the env's own reward condition;
  fixes missing PNGs after several rounds of resumed training
- Move logging.basicConfig to top of main() in train_ppo.py and play_ppo.py
- Extend matplotlib stubs: Figure.tight_layout/savefig, Axes.scatter, plt.close,
  matplotlib.use()
- Add matplotlib.use("Agg") to tests/conftest.py for headless CI
- Update CHANGELOG.md and README.rst
@eisDNV eisDNV closed this Apr 28, 2026
@eisDNV eisDNV reopened this Apr 28, 2026
@aleksandarbabicdnv aleksandarbabicdnv self-assigned this Apr 28, 2026
@aleksandarbabicdnv aleksandarbabicdnv removed their assignment Apr 28, 2026
Copy link
Copy Markdown
Collaborator

@eisDNV eisDNV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go

@eisDNV eisDNV merged commit ee80120 into main Apr 28, 2026
20 checks passed
@eisDNV eisDNV deleted the feature/ppo-resume-training branch April 28, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants