Atari 2600: Pong with PPO

In this notebook we solve the PongDeterministic-v4 environment using a TD actor-critic algorithm with PPO policy updates.

We use convolutional neural nets (without pooling) as our function approximators for the state value function v(s) and policy <updateable policy> π(a|s), see AtariFunctionApproximator <keras_gym.value_functions.AtariFunctionApproximator>.

This notebook periodically generates GIFs, so that we can inspect how the training is progressing.

After a few hundred episodes, this is what you can expect:

To view the notebook in a new tab, click . To interact with the notebook in Google Colab, hit the "Open in Colab" button below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppo.rst

ppo.rst

Atari 2600: Pong with PPO

Files

ppo.rst

Latest commit

History

ppo.rst

File metadata and controls

Atari 2600: Pong with PPO