Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Latest commit

 

History

History
38 lines (26 loc) · 1.38 KB

ppo.rst

File metadata and controls

38 lines (26 loc) · 1.38 KB

Atari 2600: Pong with PPO

In this notebook we solve the PongDeterministic-v4 environment using a TD actor-critic algorithm with PPO policy updates.

We use convolutional neural nets (without pooling) as our function approximators for the state value function v(s) and policy <updateable policy> π(a|s), see AtariFunctionApproximator <keras_gym.value_functions.AtariFunctionApproximator>.

This notebook periodically generates GIFs, so that we can inspect how the training is progressing.

After a few hundred episodes, this is what you can expect:

Beating Atari 2600 Pong after a few hundred episodes.

To view the notebook in a new tab, click . To interact with the notebook in Google Colab, hit the "Open in Colab" button below.

Open in Colab