# Atari Games

Training a RL agent on Atari games is straightforward thanks to `make_atari_env` helper function. It will do [all the preprocessing](https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/) and multiprocessing for you. To install the Atari environments, run the command `pip install gymnasium[atari, accept-rom-license]` to install the Atari environments and ROMs, or install Stable Baselines3 with `pip install stable-baselines3[extra]` to install this and other optional dependencies.

Example here: [https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#id2](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#id2)

In [1]:
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3 import A2C

In [2]:
# There already exists and environment generator
# that will make and wrap atari environment correctly.
# Here we are also multi-worker training (n_envs=4 => 4 environments)
env = make_atari_env("ALE/Pong-v5", n_envs=4, seed=0, env_kwargs={"full_action_space": False, "frameskip": 1})

# Frame-stacking with 4 frames
env = VecFrameStack(env, n_stack=4)

model = A2C("CnnPolicy", env, verbose=1)
model.learn(total_timesteps=25_000)

A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]


Using cuda device
Wrapping the env in a VecTransposeImage.
------------------------------------
| time/                 |          |
|    fps                | 108      |
|    iterations         | 100      |
|    time_elapsed       | 18       |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -1.66    |
|    explained_variance | 0.117    |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | 0.04     |
|    value_loss         | 0.00071  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.12e+03 |
|    ep_rew_mean        | -21      |
| time/                 |          |
|    fps                | 188      |
|    iterations         | 200      |
|    time_elapsed       | 21       |
|    total_timesteps    | 4000     |
| train/                |          |
|    entropy_loss       | -1.72    |
|    explained_v

<stable_baselines3.a2c.a2c.A2C at 0x7f72d249f430>

In [7]:
#env.render_mode = "human"
obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render(mode="human")

KeyboardInterrupt: 