In this notebook we solve the PongDeterministic-v4 environment using a TD actor-critic algorithm with PPO policy updates.
We use convolutional neural nets (without pooling) as our function approximators for the state value function
v(s) and policy <updateable policy>
π(a|s), see AtariFunctionApproximator
<keras_gym.value_functions.AtariFunctionApproximator>
.
This notebook periodically generates GIFs, so that we can inspect how the training is progressing.
After a few hundred episodes, this is what you can expect:
To view the notebook in a new tab, click . To interact with the notebook in Google Colab, hit the "Open in Colab" button below.