<a href="https://colab.research.google.com/github/TBKHori/Music-Recon13/blob/main/atari_games.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Baselines3 - Train on Atari Games

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [1]:
!pip install "stable-baselines3[extra]>=2.0.0a4"

Collecting stable-baselines3[extra]>=2.0.0a4
  Downloading stable_baselines3-2.0.0-py3-none-any.whl (178 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.4/178.4 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gymnasium==0.28.1 (from stable-baselines3[extra]>=2.0.0a4)
  Downloading gymnasium-0.28.1-py3-none-any.whl (925 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m925.5/925.5 kB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
Collecting shimmy[atari]~=0.2.1 (from stable-baselines3[extra]>=2.0.0a4)
  Downloading Shimmy-0.2.1-py3-none-any.whl (25 kB)
Collecting autorom[accept-rom-license]~=0.6.0 (from stable-baselines3[extra]>=2.0.0a4)
  Downloading AutoROM-0.6.1-py3-none-any.whl (9.4 kB)
Collecting jax-jumpy>=1.0.0 (from gymnasium==0.28.1->stable-baselines3[extra]>=2.0.0a4)
  Downloading jax_jumpy-1.0.0-py3-none-any.whl (20 kB)
Collecting farama-notifications>=0.0.1 (from gymnasium==0.28.1->stable-baselines3[extra]>=2.0.0a4)
  Downlo

## Import policy, RL agent, ...

In [2]:
from stable_baselines3 import A2C
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack

  if not hasattr(tensorboard, "__version__") or LooseVersion(
  float8_e4m3b11fnuz = ml_dtypes.float8_e4m3b11


## Training on Atari

We will use atari wrapper (it will downsample the image and convert it to gray scale).

About Atari preprocessing: [Frame Skipping and Pre-Processing for Deep Q-Networks on Atari 2600 Games](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/)

![Pong](https://cdn-images-1.medium.com/max/800/1*UHYJE7lF8IDZS_U5SsAFUQ.gif)

In [3]:
# There already exists an environment generator that will make and wrap atari environments correctly.
env = make_atari_env("PongNoFrameskip-v4", n_envs=4, seed=0)
# Stack 4 frames
env = VecFrameStack(env, n_stack=4)

In [4]:
model = A2C("CnnPolicy", env, verbose=1)
model.learn(total_timesteps=10_000)

Using cuda device
Wrapping the env in a VecTransposeImage.
------------------------------------
| time/                 |          |
|    fps                | 148      |
|    iterations         | 100      |
|    time_elapsed       | 13       |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -1.78    |
|    explained_variance | -0.158   |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | 0.0382   |
|    value_loss         | 0.000538 |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.73e+03 |
|    ep_rew_mean        | -20.5    |
| time/                 |          |
|    fps                | 227      |
|    iterations         | 200      |
|    time_elapsed       | 17       |
|    total_timesteps    | 4000     |
| train/                |          |
|    entropy_loss       | -1.77    |
|    explained_v

<stable_baselines3.a2c.a2c.A2C at 0x7f61f50a4c40>

## Download / Upload Trained Agent and Continue Training

Save and download trained model

In [5]:
from google.colab import files

In [6]:
model.save("a2c_pong")
files.download("a2c_pong.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Upload train agent from your local machine

In [6]:
files.upload()

Saving a2c_pong.zip to a2c_pong.zip


In [7]:
!du -h a2c*

14M	a2c_pong.zip


Load the agent, and then you can continue training

In [8]:
trained_model = A2C.load("a2c_pong", verbose=1)
env = make_atari_env('PongNoFrameskip-v4', n_envs=4, seed=0)
env = VecFrameStack(env, n_stack=4)
trained_model.set_env(env)

Wrapping the env in a VecTransposeImage.


In [9]:
trained_model.learn(int(0.5e6))

------------------------------------
| time/                 |          |
|    fps                | 422      |
|    iterations         | 100      |
|    time_elapsed       | 4        |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -1.71    |
|    explained_variance | -0.107   |
|    learning_rate      | 0.0007   |
|    n_updates          | 599      |
|    policy_loss        | 0.0896   |
|    value_loss         | 0.00308  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.38e+03 |
|    ep_rew_mean        | -20.8    |
| time/                 |          |
|    fps                | 437      |
|    iterations         | 200      |
|    time_elapsed       | 9        |
|    total_timesteps    | 4000     |
| train/                |          |
|    entropy_loss       | -1.76    |
|    explained_variance | 0.0196   |
|    learning_rate      | 0.0007   |
|

<stable_baselines3.a2c.a2c.A2C at 0x7f61f50a4b20>

In [10]:
trained_model.save("a2c_pong_2")
files.download("a2c_pong_2.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>