Pendulum with PPO

In this notebook we solve the Pendulum environment using PPO </examples/stubs/ppo>. We'll use a simple multi-layer percentron for our function approximator for the policy and q-function.

This notebook periodically generates GIFs, so that we can inspect how the training is progressing.

After a few hundred episodes, this is what you can expect: