- Pong-PPO – Improved PPO with vectorized Atari preprocessing, GAE, and augmentation.
-
BipedalWalker – PPO agent with normalized vector envs and deterministic evaluation.
-
Lunar-Landar – Simple policy-gradient style training for LunarLander; latest clip (epoch 526).
training_epoch_526.mp4
- VizDoom-RL – DQN/Double DQN with dueling heads and soft target updates.


