In this notebook we solve the Pendulum environment using PPO </examples/stubs/ppo>
. We'll use a simple multi-layer percentron for our function approximator for the policy and q-function.
This notebook periodically generates GIFs, so that we can inspect how the training is progressing.
After a few hundred episodes, this is what you can expect:
ppo.py
ppo.py