# Stable Baselines, a Fork of OpenAI Baselines - Train on Atari Games

Github Repo: [https://github.com/hill-a/stable-baselines](https://github.com/hill-a/stable-baselines)

Medium article: [https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82](https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82)

# Install Dependencies and Stable Baselines Using Pip

List of full dependencies can be found in the [README](https://github.com/hill-a/stable-baselines).

```

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev zlib1g-dev
```


```

pip install stable-baselines
```

In [0]:
!apt install cmake libopenmpi-dev zlib1g-dev
!pip install stable-baselines==2.1.1

## Import policy, RL agent, ...

In [0]:
from stable_baselines.common.cmd_util import make_atari_env
from stable_baselines.common.policies import CnnPolicy
from stable_baselines.common.vec_env import VecFrameStack
from stable_baselines import ACER

## Training on Atari

We will use atari wrapper (it will downsample the image and convert it to gray scale).

About Atari preprocessing: [Frame Skipping and Pre-Processing for Deep Q-Networks on Atari 2600 Games](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/)

![Pong](https://cdn-images-1.medium.com/max/800/1*UHYJE7lF8IDZS_U5SsAFUQ.gif)

In [0]:
# There already exists an environment generator that will make and wrap atari environments correctly.
env = make_atari_env('PongNoFrameskip-v4', num_env=4, seed=0)
# Stack 4 frames
env = VecFrameStack(env, n_stack=4)

In [0]:
model = ACER(CnnPolicy, env, verbose=1)
model.learn(total_timesteps=10000)

Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
----------------------------------
| avg_norm_adj        | 0        |
| avg_norm_g          | 0.0868   |
| avg_norm_grads_f    | 0.0868   |
| avg_norm_k          | 2.45     |
| avg_norm_k_dot_g    | 0.0868   |
| entropy             | 151      |
| explained_variance  | -0.0638  |
| fps                 | 0        |
| loss                | -1.49    |
| loss_bc             | -0       |
| loss_f              | 0.014    |
| loss_policy         | 0.014    |
| loss_q              | 0.000194 |
| mean_episode_length | 0        |
| mean_episode_reward | 0        |
| norm_grads          | 0.0287   |
| norm_grads_policy   | 0.0153   |
| norm_grads_q        | 0.0243   |
| total_timesteps     | 0        |
----------------------------------
----------------------------------
| avg_norm_adj        | 0.109    |
| avg_norm_g          | 1.0

<stable_baselines.acer.acer_simple.ACER at 0x7fc41cc4c588>

## Dowload / Upload Trained Agent and Continue Training

Save and download trained model

In [0]:
from google.colab import files

In [0]:
model.save("acer_pong")
files.download("acer_pong.pkl")

Upload train agent from your local machine

In [0]:
files.upload()

{}

In [0]:
!du -h acer*

6.6M	acer_pong.pkl


Load the agent, and then you can continue training

In [0]:
trained_model = ACER.load("acer_pong.pkl", verbose=1)
env = make_atari_env('PongNoFrameskip-v4', num_env=4, seed=0)
env = VecFrameStack(env, n_stack=4)
trained_model.set_env(env)

Loading a model without an environment, this model cannot be trained until it has a valid environment.


  result = entry_point.load(False)
  result = entry_point.load(False)
  result = entry_point.load(False)
  result = entry_point.load(False)


In [0]:
trained_model.learn(int(0.5e6))

----------------------------------
| avg_norm_adj        | 0.0126   |
| avg_norm_g          | 0.391    |
| avg_norm_grads_f    | 0.382    |
| avg_norm_k          | 2.49     |
| avg_norm_k_dot_g    | 0.396    |
| entropy             | 150      |
| explained_variance  | -5.91    |
| fps                 | 0        |
| loss                | -1.47    |
| loss_bc             | -0       |
| loss_f              | 0.0321   |
| loss_policy         | 0.0321   |
| loss_q              | 0.0061   |
| mean_episode_length | 0        |
| mean_episode_reward | 0        |
| norm_grads          | 0.151    |
| norm_grads_policy   | 0.0705   |
| norm_grads_q        | 0.132    |
| total_timesteps     | 0        |
----------------------------------
----------------------------------
| avg_norm_adj        | 0.0988   |
| avg_norm_g          | 1.55     |
| avg_norm_grads_f    | 1.49     |
| avg_norm_k          | 2.53     |
| avg_norm_k_dot_g    | 1.6      |
| entropy             | 149      |
| explained_variance

In [0]:
trained_model.save("acer_pong_3")
files.download("acer_pong_3.pkl")