<a href="https://colab.research.google.com/github/ZacharyZekaiXu/ZekaiXu_CrossmodalRecognition/blob/main/origin_pong_games.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Baselines3 - Train on Atari Games

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.

It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [None]:
!pip install stable-baselines3[extra] ale-py==0.7.4

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting stable-baselines3[extra]
  Downloading stable_baselines3-1.6.0-py3-none-any.whl (177 kB)
[K     |████████████████████████████████| 177 kB 4.3 MB/s 
[?25hCollecting ale-py==0.7.4
  Downloading ale_py-0.7.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 42.5 MB/s 
Collecting gym==0.21
  Downloading gym-0.21.0.tar.gz (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 49.0 MB/s 
Collecting protobuf~=3.19.0
  Downloading protobuf-3.19.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 53.3 MB/s 
[?25hCollecting autorom[accept-rom-license]~=0.4.2
  Downloading AutoROM-0.4.2-py3-none-any.whl (16 kB)
Collecting AutoROM.accept-rom-license
  Downloading AutoROM.accept-rom-license-0.4.2.tar.gz (9.8 kB)
  Installing build dependencies ... [?2

## Import policy, RL agent, ...

In [None]:
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

## Training on Atari

We will use atari wrapper (it will downsample the image and convert it to gray scale).

About Atari preprocessing: [Frame Skipping and Pre-Processing for Deep Q-Networks on Atari 2600 Games](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/)

![Pong](https://cdn-images-1.medium.com/max/800/1*UHYJE7lF8IDZS_U5SsAFUQ.gif)

In [None]:
# There already exists an environment generator that will make and wrap atari environments correctly.
env = make_atari_env('PongNoFrameskip-v4', n_envs=4, seed=0)
# Stack 4 frames
env = VecFrameStack(env, n_stack=4)

In [None]:
model = A2C('CnnPolicy', env, verbose=1)
model.learn(total_timesteps=10000)

Using cuda device
Wrapping the env in a VecTransposeImage.
------------------------------------
| time/                 |          |
|    fps                | 155      |
|    iterations         | 100      |
|    time_elapsed       | 12       |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -1.79    |
|    explained_variance | -0.00216 |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | -0.347   |
|    value_loss         | 0.193    |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.61e+03 |
|    ep_rew_mean        | -20.2    |
| time/                 |          |
|    fps                | 236      |
|    iterations         | 200      |
|    time_elapsed       | 16       |
|    total_timesteps    | 4000     |
| train/                |          |
|    entropy_loss       | -1.78    |
|    explained_v

<stable_baselines3.a2c.a2c.A2C at 0x7f7f0a1a7cd0>

## Download / Upload Trained Agent and Continue Training

Save and download trained model

In [None]:
from google.colab import files

In [None]:
model.save("a2c_pong")
files.download("a2c_pong.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Upload train agent from your local machine

In [None]:
files.upload()

In [None]:
!du -h a2c*

7.0M	a2c_pong.zip


Load the agent, and then you can continue training

In [None]:
trained_model = A2C.load("a2c_pong", verbose=1)
env = make_atari_env('PongNoFrameskip-v4', n_envs=4, seed=0)
env = VecFrameStack(env, n_stack=4)
trained_model.set_env(env)

Wrapping the env in a VecTransposeImage.


In [None]:
trained_model.learn(int(0.5e6))

In [None]:
trained_model.save("a2c_pong_2")
files.download("a2c_pong_2.zip")