Skip to content

Simple PPO implementation with ~270 lines of code. Support parallel sampling and multiple GPU

License

Notifications You must be signed in to change notification settings

StarCycle/SimplePPO

Repository files navigation

Simple PPO

An extremely simple PPO implementation based on Pytorch with ~270 lines of code. It supports:

  • Grid search of hyperparameters
  • Showing curves with tensorboard
  • Parallel sampling from multiple environments via Gym interface. You can define your own parallel environment by modifying env.py
  • Discrete action space (select an action in a step) and MultiDiscrete action space (select multiple actions in a step).
  • Single GPU or multi-GPU training via nn.DataParallel of Pytorch.

This implementation is a simplification of the PPO algorithm in cleanRL (link).

How to use it?

Just run the ppo.py in each folder. During training, the output will be recorded in a runs folder. You can visualize the output by:

tensorboard --logdir=runs

Note

  • env.py contains a "filling grid" environment. There are several grids. If the agent fills a blank grid, it will receive a reward of +1, otherwise the reward will be -1.
  • When you use multiple GPUs, the number of GPUs should be smaller or equal to the number of parallel environments.
  • I am using gym.vector.AsyncVectorEnv to create parallel environments with multiprocessing. However, debugging a multiprocessing program is complicated. Thus, I advise you to switch to gym.vector.SyncVectorEnv during debugging, which only uses multithreading.
  • Current multi-GPU capability can work but is quite slow. I will improve it.
  • New: PPO+Transformer is based on "MultiDiscrete Action Space - Sequential Sampling - Single GPU" configuration. It's a dirty implementation but does work.

Citation

Please cite the following work:

Li, Z. (2022). Use Reinforcement Learning to Generate Testing Commands for Onboard Software of Small Satellites.

The RL algorithms in this work are in StarCycle/TestCommandGeneration

About

Simple PPO implementation with ~270 lines of code. Support parallel sampling and multiple GPU

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages