Skip to content
PPO Dash: Improving Generalization in Deep Reinforcement Learning
Python Jupyter Notebook
Branch: master
Clone or download
Latest commit dcb5b32 Jul 17, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
ppo-dash-study add code Jul 17, 2019
ppo-dash-training add code Jul 17, 2019
test against validation seeds add code Jul 17, 2019 add code Jul 17, 2019 add code Jul 17, 2019


Code for reproducing the results found in PPO Dash: Improving Generalization in Deep Reinforcement Learning


PPO-Dash is a modified version of the PPO algrothem that utalises the following optimizations and best practices:

  • Action Space Reduction
  • Frame Stack Reduction
  • Large Scale Hyperparameters
  • Vector Observations
  • Normalized Observations
  • Reward Hacking
  • Recurrent Memory

PPO-Dash was able to solve the first 10 levels of the Obsticle Tower Enviroment without the need for demonstrations or curosity based algorthemic enhancements.

The version of PPO-Dash in the technical paper, placed 2nd in Round One of the Obsticle Tower Challenge with an average score of 10. We were able to reproduce this score in Round Two of the challenge, with a minor modifiaction (randomizing the themes during in training). We placed 4th overall, with a score of 10.8 with the addition of demonstrations.

Reproducing Results

To reproduce the results listed in the paper and for round one of the competition, see ReproduceRound1


This codebase derives from pytorch-a2c-ppo-acktr - #8258f95


If you use PPO-Dash in your research, we ask that you cite the technical report as a reference.

You can’t perform that action at this time.