<a href="https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/breakout.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RL Training on Atari Games

Github Repo: [https://github.com/hill-a/stable-baselines](https://github.com/hill-a/stable-baselines)

Medium article: [https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82](https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82)

[RL Baselines Zoo](https://github.com/araffin/rl-baselines-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines.

It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.

Documentation is available online: [https://stable-baselines.readthedocs.io/](https://stable-baselines.readthedocs.io/)

# Install Dependencies

List of full dependencies can be found in the [README](https://github.com/hill-a/stable-baselines).

```
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev zlib1g-dev
```


```
pip install stable-baselines[mpi]
```

In [0]:
# Stable Baselines only supports tensorflow 1.x for now
%tensorflow_version 1.x
!pip install stable-baselines[mpi]==2.10.2


## Import policy, RL agent, ...

In [0]:
from stable_baselines.common.cmd_util import make_atari_env
from stable_baselines.common.policies import CnnPolicy
from stable_baselines.common.vec_env import VecFrameStack
from stable_baselines import A2C

## Training on Breakout

We will use atari wrapper (it will downsample the image and convert it to gray scale).

About Atari preprocessing: [Frame Skipping and Pre-Processing for Deep Q-Networks on Atari 2600 Games](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/)

In [0]:
# There already exists an environment generator that will make and wrap atari environments correctly.
# We use 16 parallel processes
env = make_atari_env('BreakoutNoFrameskip-v4', num_env=16, seed=0)
# Stack 4 frames
env = VecFrameStack(env, n_stack=4)

Define and train the agent, we will be using [A2C](https://stable-baselines.readthedocs.io/en/docs/modules/a2c.html), a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C). 

It runs at ~500 FPS, which is roughly 3 hours of training for 5 million steps.

If you don't have the patience to wait, you can download a pretrained agent [here](https://drive.google.com/open?id=1x2IXBB4OIyxqY3-BW-PV0gijnJckprLq)

In [0]:
model = A2C(CnnPolicy, env, lr_schedule='constant', verbose=1)
model.learn(total_timesteps=int(5e6))

Download and enjoy the trained agent on your local machine

In [0]:
from google.colab import files

In [0]:
model.save("breakout_a2c")
files.download("breakout_a2c.zip")

Here, we demonstrate how to load a pretrained agent

In [0]:
model = A2C.load("breakout_a2c.zip", lr_schedule='constant', verbose=1)
model.set_env(env)
model.learn(total_timesteps=int(5e6))

You can also upload trained agent from your local machine.

In [0]:
files.upload()
print("done")

Saving breakout_ppo2_5.zip to breakout_ppo2_5.zip
done


(Optional) Rename the uploaded file before loading it

In [0]:
!mv breakout_a2c_1.zip breakout_a2c.zip

Load the agent and continue training!

In [0]:
model = A2C.load("breakout_a2c", lr_schedule='constant', verbose=1)

Loading a model without an environment, this model cannot be trained until it has a valid environment.


In [0]:
model.set_env(env)

In [0]:
model.learn(total_timesteps=int(1e6))