# Atari Agent Training

In this Notebook we will explore yet another environment, train the agent and then share the result using Huggingface ecosystem.


#### Running in Colab/Kaggle

If you are running this on Colab or Kaggle, please uncomment below cell and run this to install required dependencies.



In [1]:
## uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle

# !apt-get update 
# !apt-get install -y swig cmake ffmpeg freeglut3-dev xvfb

In [2]:
## uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle

# !pip install "box2d-py==2.3.8"
# !pip install "stable-baselines3[extra]==2.1.0"
# !pip install "huggingface_sb3>=3.0"
# !pip install "moviepy==1.0.3"

## Imports

In [1]:
import gymnasium as gym

from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3 import A2C

## Create the Gym env and instantiate the agent

For this example, we will use an environment from Atari Simulator. Atari environments are simulated via the Arcade Learning Environment (ALE) through the [Stella](https://github.com/stella-emu/stella) and the [Arcade Learning Environment](https://github.com/mgbellemare/Arcade-Learning-Environment).

![Pong](https://gymnasium.farama.org/_images/pong.gif)


We will use [Pong](https://gymnasium.farama.org/environments/atari/pong/) from the Atari game simulator. We will be using a specific variant  `PongNoFrameskip-v4`. For now we will not delve into the details except telling you that in this case the state is the image of the game and actions are the game controller actions. As we will be using images as state, we will use a different variation of policy network called `CnnPolicy` which is based on CNNs (Convolutional Neural Networks).

Last notebook we used DQN as the algorithm to train the agent. In this notebook we will be used [A2C algorithm](https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html). A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic ([A3C](https://arxiv.org/abs/1602.01783)).

In [2]:
# There already exists an environment generator
# that will make and wrap atari environments correctly.
# Here we are also multi-worker training (n_envs=4 => 4 environments)
env_id = "PongNoFrameskip-v4"

vec_env = make_atari_env(env_id, n_envs=4, seed=0)
# Frame-stacking with 4 frames
vec_env = VecFrameStack(vec_env, n_stack=4)

model = A2C("CnnPolicy", vec_env, verbose=1)


A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]


Using cpu device
Wrapping the env in a VecTransposeImage.


## Train the Agent

It is advisable to run this on an envirnment with GPU as otherwise it may take long to run this

In [3]:
%%time
# If you have a GPU, you can increase the `total_timesteps` to something like 1_000_000 i.e. one million
# It would take about 45 mins to train
# The more you train the better the result will be

model.learn(total_timesteps=20_000, log_interval=500, progress_bar=True)

Output()

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.22e+03 |
|    ep_rew_mean        | -20.8    |
| time/                 |          |
|    fps                | 406      |
|    iterations         | 500      |
|    time_elapsed       | 24       |
|    total_timesteps    | 10000    |
| train/                |          |
|    entropy_loss       | -1.41    |
|    explained_variance | 0.933    |
|    learning_rate      | 0.0007   |
|    n_updates          | 499      |
|    policy_loss        | 0.113    |
|    value_loss         | 0.0101   |
------------------------------------


------------------------------------
| rollout/              |          |
|    ep_len_mean        | 3.14e+03 |
|    ep_rew_mean        | -20.9    |
| time/                 |          |
|    fps                | 393      |
|    iterations         | 1000     |
|    time_elapsed       | 50       |
|    total_timesteps    | 20000    |
| train/                |          |
|    entropy_loss       | -1.76    |
|    explained_variance | 0.329    |
|    learning_rate      | 0.0007   |
|    n_updates          | 999      |
|    policy_loss        | -0.216   |
|    value_loss         | 0.102    |
------------------------------------


CPU times: user 2min 56s, sys: 5.28 s, total: 3min 1s
Wall time: 50.8 s


<stable_baselines3.a2c.a2c.A2C at 0x13173ceb0>

## Generate a Video

In [4]:
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

video_length = 1000
video_folder = "videos/"

vec_env = make_atari_env(env_id, n_envs=1, seed=0)
# Frame-stacking with 4 frames
vec_env = VecFrameStack(vec_env, n_stack=4)

obs = vec_env.reset()

# Record the video starting at the first step
vec_env = VecVideoRecorder(vec_env, video_folder,
                       record_video_trigger=lambda x: x == 0, video_length=video_length,
                       name_prefix=f"a2c-agent-{env_id}")

vec_env.reset()
for _ in range(video_length + 1):
  action, _state = model.predict(obs)
  obs, _, _, _ = vec_env.step(action)
# Save the video
vec_env.close()


  logger.warn(


Saving video to /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4
Moviepy - Building video /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4.
Moviepy - Writing video /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4



                                                                 

Moviepy - Done !
Moviepy - video ready /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4


## Watch the Video

To get better results and if you have GPU, increase the `total_timesteps` to something like `1_000_000` in `model.learn` call above.

In [5]:
from IPython.display import HTML
from base64 import b64encode
mp4 = open('videos/a2c-agent-PongNoFrameskip-v4-step-0-to-step-1000.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

## Share the Video on Huggingface Account

### First let us login into our huggingface account

In [6]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import login # To log to our Hugging Face account to be able to upload models to the Hub.

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

**IMPORTANT**
Some users have reported facing following error while running the `package_to_hub` upload function. 

```
"Token is required (write-access action) but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens."
```

In such a case the following command will help you over come the issue

```
import huggingface_hub

huggingface_hub.login(token= <YOUR_HF_TOKEN>,
                     write_permission = True,
                    add_to_git_credential = True)
					
```

Another alternative is to use following command from command shell where the `venv` or `conda` environment for this repository has been activated and then follow the instructions to set the HuggingFace token.

```
huggingface-cli login

```

### Push to HuggingFace

You can execute the code below to push the trained agent to huggingface hub. Towards the end of output there will be a link of the hosted model that you can share with your friends and family. Share the url without the ending paths "tree/main" so that the url takes them to Model Card tab where they can see the animation. e.g. in my case it will be:<br/>
`https://huggingface.co/ashiskb/a2c-Atari-Pong/`<br/>
instead of<br/>
`https://huggingface.co/ashiskb/a2c-Atari-Pong/tree/main/`

NOTE: At the time of writing this book, there is a bug in push_to_hub which results in sample video not getting created. You can refer to this bug here - https://github.com/huggingface/huggingface_sb3/issues/33



In [7]:
eval_env = make_atari_env(env_id, n_envs=1, seed=0)
# Frame-stacking with 4 frames
eval_env = VecFrameStack(eval_env, n_stack=4)

obs = vec_env.reset()

# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub

# Please note repo_id is of the form <huggingface_id>/<name of repo>
# you will need to change this to "<your_huggingface_id>/dqn-LunarLander-v2"

package_to_hub(model=model, # Our trained model
               model_name="A2C-Atari-Pong", # The name of our trained model
               model_architecture="A2C", # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id="ashiskb/a2c-Atari-Pong", # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}
               commit_message="Push to Hub")

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m


  logger.warn(


Saving video to /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmptbjwf8a8/-step-0-to-step-1000.mp4
Moviepy - Building video /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmptbjwf8a8/-step-0-to-step-1000.mp4.
Moviepy - Writing video /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmptbjwf8a8/-step-0-to-step-1000.mp4



                                                                 

Moviepy - Done !
Moviepy - video ready /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmptbjwf8a8/-step-0-to-step-1000.mp4


ffmpeg version 7.0.2 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/7.0.2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --ena

[38;5;4mℹ Pushing repo ashiskb/a2c-Atari-Pong to the Hugging Face Hub[0m


pytorch_variables.pth:   0%|          | 0.00/864 [00:00<?, ?B/s]

policy.pth:   0%|          | 0.00/6.76M [00:00<?, ?B/s]

policy.optimizer.pth:   0%|          | 0.00/6.75M [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

A2C-Atari-Pong.zip:   0%|          | 0.00/13.8M [00:00<?, ?B/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/ashiskb/a2c-Atari-Pong/tree/main/[0m


CommitInfo(commit_url='https://huggingface.co/ashiskb/a2c-Atari-Pong/commit/f28e85b2963a32193b404a64daace8c53da48752', commit_message='Push to Hub', commit_description='', oid='f28e85b2963a32193b404a64daace8c53da48752', pr_url=None, repo_url=RepoUrl('https://huggingface.co/ashiskb/a2c-Atari-Pong', endpoint='https://huggingface.co', repo_type='model', repo_id='ashiskb/a2c-Atari-Pong'), pr_revision=None, pr_num=None)