# Unit 3: Deep Q-Learning with Atari Games 👾 using RL Baselines3 Zoo

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit4/thumbnail.jpg" alt="Unit 3 Thumbnail">

In this notebook, **you'll train a Deep Q-Learning agent** playing Space Invaders using [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo), a training framework based on [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) that provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

We're using the [RL-Baselines-3 Zoo integration, a vanilla version of Deep Q-Learning](https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html) with no extensions such as Double-DQN, Dueling-DQN, and Prioritized Experience Replay.

⬇️ Here is an example of what **you will achieve** ⬇️

In [1]:
%%html
<video controls autoplay><source src="https://huggingface.co/ThomasSimonini/ppo-SpaceInvadersNoFrameskip-v4/resolve/main/replay.mp4" type="video/mp4"></video>

### 🎮 Environments:

- [SpacesInvadersNoFrameskip-v4](https://gymnasium.farama.org/environments/atari/space_invaders/)

You can see the difference between Space Invaders versions here 👉 https://gymnasium.farama.org/environments/atari/space_invaders/#variants

### 📚 RL-Library:

- [RL-Baselines3-Zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

## Objectives of this notebook 🏆
At the end of the notebook, you will:
- Be able to understand deeper **how RL Baselines3 Zoo works**.
- Be able to **push your trained agent and the code to the Hub** with a nice video replay and an evaluation score 🔥.




## This notebook is from Deep Reinforcement Learning Course
<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/deep-rl-course-illustration.jpg" alt="Deep RL Course illustration"/>

In this free course, you will:

- 📖 Study Deep Reinforcement Learning in **theory and practice**.
- 🧑‍💻 Learn to **use famous Deep RL libraries** such as Stable Baselines3, RL Baselines3 Zoo, CleanRL and Sample Factory 2.0.
- 🤖 Train **agents in unique environments**

And more check 📚 the syllabus 👉 https://simoninithomas.github.io/deep-rl-course

Don’t forget to **<a href="http://eepurl.com/ic5ZUD">sign up to the course</a>** (we are collecting your email to be able to **send you the links when each Unit is published and give you information about the challenges and updates).**


The best way to keep in touch is to join our discord server to exchange with the community and with us 👉🏻 https://discord.gg/ydHrjt3WP5

## Prerequisites 🏗️
Before diving into the notebook, you need to:

🔲 📚 **[Study Deep Q-Learning by reading Unit 3](https://huggingface.co/deep-rl-course/unit3/introduction)**  🤗

We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues).

# Let's train a Deep Q-Learning agent playing Atari' Space Invaders 👾 and upload it to the Hub.

We strongly recommend students **to use Google Colab for the hands-on exercises instead of running them on their personal computers**.

By using Google Colab, **you can focus on learning and experimenting without worrying about the technical aspects of setting up your environments**.

To validate this hands-on for the certification process, you need to push your trained model to the Hub and **get a result of >= 200**.

To find your result, go to the leaderboard and find your model, **the result = mean_reward - std of reward**

For more information about the certification process, check this section 👉 https://huggingface.co/deep-rl-course/en/unit0/introduction#certification-process

## An advice 💡
It's better to run this colab in a copy on your Google Drive, so that **if it timeouts** you still have the saved notebook on your Google Drive and do not need to fill everything from scratch.

To do that you can either do `Ctrl + S` or `File > Save a copy in Google Drive.`

Also, we're going to **train it for 90 minutes with 1M timesteps**. By typing `!nvidia-smi` will tell you what GPU you're using.

And if you want to train more such 10 million steps, this will take about 9 hours, potentially resulting in Colab timing out. In that case, I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`.

## Set the GPU 💪
- To **accelerate the agent's training, we'll use a GPU**. To do that, go to `Runtime > Change Runtime type`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step1.jpg" alt="GPU Step 1">

- `Hardware Accelerator > GPU`

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/gpu-step2.jpg" alt="GPU Step 2">

# Install RL-Baselines3 Zoo and its dependencies 📚

If you see `ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.` **this is normal and it's not a critical error** there's a conflict of version. But the packages we need are installed.

In [2]:
# For now we install this update of RL-Baselines3 Zoo
!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo@update/hf

Collecting git+https://github.com/DLR-RM/rl-baselines3-zoo@update/hf
  Cloning https://github.com/DLR-RM/rl-baselines3-zoo (to revision update/hf) to /private/var/folders/d8/n91dxjpx127f7sdxww2g2m2c0000gn/T/pip-req-build-y9fxh14u
  Running command git clone -q https://github.com/DLR-RM/rl-baselines3-zoo /private/var/folders/d8/n91dxjpx127f7sdxww2g2m2c0000gn/T/pip-req-build-y9fxh14u
  Running command git checkout -b update/hf --track origin/update/hf
  Switched to a new branch 'update/hf'
  branch 'update/hf' set up to track 'origin/update/hf'.
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting rich
  Downloading rich-13.7.0-py3-none-any.whl (240 kB)
[K     |████████████████████████████████| 240 kB 254 kB/s eta 0:00:01
Collecting huggingface-sb3>=2.2.1
  Downloading huggingface_sb3-3.0-py3-none-any.whl (9.

IF AND ONLY IF THE VERSION ABOVE DOES NOT EXIST ANYMORE. UNCOMMENT AND INSTALL THE ONE BELOW

In [None]:
#!pip install rl_zoo3==2.0.0a9

In [3]:
!brew install swig cmake ffmpeg

Running `brew update --auto-update`...
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/portable-ruby/portable-ruby/blobs/sha256:02180ca8b8295422ae84921bcf034b7ee8ce5575488bd5e6a37a192e53cd5d34[0m
######################################################################### 100.0%
[34m==>[0m [1mPouring portable-ruby-3.1.4.el_capitan.bottle.tar.gz[0m
[34m==>[0m [1mAuto-updated Homebrew![0m
Updated 5 taps (homebrew/services, screenplaydev/tap, withgraphite/tap, homebrew/core and homebrew/cask).
[34m==>[0m [1mNew Formulae[0m
abi3audit                                python-attrs
action-validator                         python-boto3
ain                                      python-botocore
ali                                      python-brotli
amass                                    python-cachetools
ansible@8                                python-chardet
apkleaks                                 python-charset-normalizer
appstream                                python-cli-h


You have [1m59[0m outdated formulae and [1m1[0m outdated cask installed.

cmake 3.27.4 is already installed but outdated (so it will be upgraded).
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/swig/manifests/4.1.1[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mswig[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/swig/blobs/sha256:8596c81c9bdd3[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/cmake/manifests/3.28.0[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mcmake[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/cmake/blobs/sha256:847553448d22[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core

######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/aribb24/manifests/1.0.4[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32maribb24[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/aribb24/blobs/sha256:1b3810c428[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/dav1d/manifests/1.3.0[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mdav1d[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/dav1d/blobs/sha256:a7e2fcfb84c8[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/freetype/manifests/2.13.2-1[0m
#################################

######################################################################### 100.0%
[32m==>[0m [1mFetching [32mlibxcb[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libxcb/blobs/sha256:0fe2f8cf12c[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libx11/manifests/1.8.7[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mlibx11[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libx11/blobs/sha256:be011cbeead[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libxext/manifests/1.3.5[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mlibxext[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libxext/blobs/sh

######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/opencore-amr/manifests/0.1.6[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mopencore-amr[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/opencore-amr/blobs/sha256:9c8eb[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/openjpeg/manifests/2.5.0_1[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mopenjpeg[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/openjpeg/blobs/sha256:7ac174c6d[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/opus/manifests/1.4[0m
########################

######################################################################### 100.0%
[32m==>[0m [1mFetching [32mx265[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/x265/blobs/sha256:26943ec04286b[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/xvid/manifests/1.3.7[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mxvid[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/xvid/blobs/sha256:1ed7392496855[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libsodium/manifests/1.0.19[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mlibsodium[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libsodium/blobs/s

[34m==>[0m [1mPouring little-cms2--2.15.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/little-cms2/2.15: 21 files, 1.3MB
[32m==>[0m [1mInstalling ffmpeg dependency: [32mopenexr[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/openexr/manifests/3.2.1[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/5b8bbf02d04feb2a6e9c578e606f8aeacce2f261c79900426f7bc1d2b0c1717e--openexr-3.2.1.bottle_manifest.json
[34m==>[0m [1mPouring openexr--3.2.1.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/openexr/3.2.1: 204 files, 7.6MB
[32m==>[0m [1mInstalling ffmpeg dependency: [32mwebp[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/webp/manifests/1.3.2[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/952907f186b4d32b3e43b7af1c9e93bd484f9b61db1adb22b5fb4a5361cdd0e9--webp-1.3.2.bottle_manifest.json
[34m==>[0m [1mPouring webp--1.3.2.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/webp/1.3.2: 6

[34m==>[0m [1mPouring unbound--1.19.0.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/unbound/1.19.0: 58 files, 5.7MB
[32m==>[0m [1mInstalling ffmpeg dependency: [32mgnutls[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/gnutls/manifests/3.8.2[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/451d8f1012df2b218c76390367beb96684e0899d24cb219dfe87eb047945fc14--gnutls-3.8.2.bottle_manifest.json
[34m==>[0m [1mPouring gnutls--3.8.2.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/gnutls/3.8.2: 1,290 files, 10.7MB
[32m==>[0m [1mInstalling ffmpeg dependency: [32mlame[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/lame/manifests/3.100[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/31e57f3a54d6908e2c153f5eb435715208f3bd9db71e16470b553d72e051b269--lame-3.100.bottle_manifest.json
[34m==>[0m [1mPouring lame--3.100.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/lame/3.100: 27 file

[34m==>[0m [1mPouring libass--0.17.1.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/libass/0.17.1: 11 files, 628KB
[32m==>[0m [1mInstalling ffmpeg dependency: [32mlibbluray[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/libbluray/manifests/1.3.4[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/3ca2a7bc181ca21ac727ad165abb77758d74c9c260dd155b33f741b1eee5801a--libbluray-1.3.4.bottle_manifest.json
[34m==>[0m [1mPouring libbluray--1.3.4.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/libbluray/1.3.4: 21 files, 909.3KB
[32m==>[0m [1mInstalling ffmpeg dependency: [32mcjson[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/cjson/manifests/1.7.16-1[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/fb770679a7fe607f00feffd67590eea8d4b2a5f6fd59fccd8fa2b9587bb3615b--cjson-1.7.16-1.bottle_manifest.json
[34m==>[0m [1mPouring cjson--1.7.16.sonoma.bottle.1.tar.gz[0m
🍺  /usr/local/Ce

[34m==>[0m [1mPouring libsndfile--1.2.2.sonoma.bottle.1.tar.gz[0m
🍺  /usr/local/Cellar/libsndfile/1.2.2: 53 files, 1MB
[32m==>[0m [1mInstalling ffmpeg dependency: [32mrubberband[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/rubberband/manifests/3.3.0[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/d7b37a08cef72d7b9431b19abc4aaa4649642b6610ad91f39146347d2aaeb60a--rubberband-3.3.0.bottle_manifest.json
[34m==>[0m [1mPouring rubberband--3.3.0.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/rubberband/3.3.0: 13 files, 1.7MB
[32m==>[0m [1mInstalling ffmpeg dependency: [32msdl2[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/sdl2/manifests/2.28.5-1[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/d97096f602e91d0875ab86007a572967c79fda28774769a65b388cd353746ab6--sdl2-2.28.5-1.bottle_manifest.json
[34m==>[0m [1mPouring sdl2--2.28.5.sonoma.bottle.1.tar.gz[0m
🍺  /usr/loc

[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/llvm/manifests/17.0.6[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching dependencies for llvm: [32mreadline[39m, [32msqlite[39m and [32mpython@3.12[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/readline/manifests/8.2.7[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mreadline[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/readline/blobs/sha256:e1e1f92fc[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/sqlite/manifests/3.44.2[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32msqlite[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/sqlite/blobs/sha256:7fa600ef1bf[0m

######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/python-setuptools/manifests/69.[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mpython-setuptools[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/python-setuptools/blobs/sha256:[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/python-tabulate/manifests/0.9.0[0m
######################################################################### 100.0%
[32m==>[0m [1mFetching [32mpython-tabulate[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/python-tabulate/blobs/sha256:a7[0m
######################################################################### 100.0%
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/python-typing-extensions/ma

[32m==>[0m [1mInstalling dependencies for llvm: [32mreadline[39m, [32msqlite[39m and [32mpython@3.12[39m[0m
[32m==>[0m [1mInstalling llvm dependency: [32mreadline[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/readline/manifests/8.2.7[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/14125f7fa4b49853f76160b864f58379d90e52833ffeb8bd0643609bcd7f02a7--readline-8.2.7.bottle_manifest.json
[34m==>[0m [1mPouring readline--8.2.7.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/readline/8.2.7: 50 files, 1.7MB
[32m==>[0m [1mInstalling llvm dependency: [32msqlite[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/sqlite/manifests/3.44.2[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/1e19b0e6cb419159b49df50f3595c3b44e67ec5f5d1110ece66a9785df57a844--sqlite-3.44.2.bottle_manifest.json
[34m==>[0m [1mPouring sqlite--3.44.2.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/sqlite/

[34m==>[0m [1mPouring python-pluggy--1.3.0.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/python-pluggy/1.3.0: 37 files, 148.9KB
[32m==>[0m [1mInstalling llm dependency: [32mpython-setuptools[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/python-setuptools/manifests/69.[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/09482f31ffff76933073fa42dec97d769d75b19d88616d802de5e36dc20b44ec--python-setuptools-69.0.2.bottle_manifest.json
[34m==>[0m [1mPouring python-setuptools--69.0.2.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/python-setuptools/69.0.2: 262 files, 2.9MB
[32m==>[0m [1mInstalling llm dependency: [32mpython-tabulate[39m[0m
[34m==>[0m [1mDownloading https://ghcr.io/v2/homebrew/core/python-tabulate/manifests/0.9.0[0m
Already downloaded: /Users/michaelmentele/Library/Caches/Homebrew/downloads/e1eca57de55b30cec7498ca187a7367db82bdc50dcd6df4226578eb5554c9b1d--python-tabulate-0.9.0_1-1.bottle_manifest.json
[34m=

[34m==>[0m [1mPouring libgit2--1.7.1.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/libgit2/1.7.1: 104 files, 4.5MB
[32m==>[0m [1mInstalling [32mrust[39m[0m
[34m==>[0m [1mPouring rust--1.74.0.sonoma.bottle.2.tar.gz[0m
[34m==>[0m [1mCaveats[0m
zsh completions have been installed to:
  /usr/local/share/zsh/site-functions
[34m==>[0m [1mSummary[0m
🍺  /usr/local/Cellar/rust/1.74.0: 39,025 files, 968.3MB
[34m==>[0m [1mRunning `brew cleanup rust`...[0m
Removing: /usr/local/Cellar/rust/1.72.0_1... (38,318 files, 845.5MB)
Removing: /Users/michaelmentele/Library/Caches/Homebrew/rust--1.72.0_1... (153.2MB)
[32m==>[0m [1mUpgrading [32mwatchman[39m
  2023.08.28.00 -> 2023.12.04.00 
[0m
[34m==>[0m [1mPouring watchman--2023.12.04.00.sonoma.bottle.tar.gz[0m
🍺  /usr/local/Cellar/watchman/2023.12.04.00: 22 files, 17.0MB
[34m==>[0m [1mRunning `brew cleanup watchman`...[0m
Removing: /usr/local/Cellar/watchman/2023.08.28.00... (22 files, 14.7MB)
[32m==>[0m [1mChecki

To be able to use Atari games in Gymnasium we need to install atari package. And accept-rom-license to download the rom files (games files).

In [9]:
!pip3 install 'gymnasium[atari]'
!pip3 install 'gymnasium[accept-rom-license]'

Collecting shimmy[atari]<1.0,>=0.1.0
  Downloading Shimmy-0.2.1-py3-none-any.whl (25 kB)
Collecting ale-py~=0.8.1
  Downloading ale_py-0.8.1-cp38-cp38-macosx_10_15_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 377 kB/s eta 0:00:01
Installing collected packages: shimmy, ale-py
Successfully installed ale-py-0.8.1 shimmy-0.2.1
Collecting autorom[accept-rom-license]~=0.4.2
  Downloading AutoROM-0.4.2-py3-none-any.whl (16 kB)
Collecting AutoROM.accept-rom-license
  Downloading AutoROM.accept-rom-license-0.6.1.tar.gz (434 kB)
[K     |████████████████████████████████| 434 kB 257 kB/s eta 0:00:01
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
Building wheels for collected packages: AutoROM.accept-rom-license
  Building wheel for AutoROM.accept-rom-license (PEP 517) ... [?25ldone
[?25h  Created wheel for AutoROM.accept-rom-license: filename=AutoROM.accept

## Create a virtual display 🔽

During the notebook, we'll need to generate a replay video. To do so, with colab, **we need to have a virtual screen to be able to render the environment** (and thus record the frames).

Hence the following cell will install the librairies and create and run a virtual screen 🖥

In [22]:
#!brew install python-opengl
!pip3 install pyopengl
#!brew install xvfb
!brew install XQuartz
!pip3 install pyvirtualdisplay

[34m==>[0m [1mDownloading https://github.com/XQuartz/XQuartz/releases/download/XQuartz-2.8[0m
[34m==>[0m [1mDownloading from https://objects.githubusercontent.com/github-production-rel[0m
######################################################################### 100.0%
[32m==>[0m [1mInstalling Cask [32mxquartz[39m[0m
[34m==>[0m [1mRunning installer for xquartz with sudo; the password may be necessary.[0m
Password:



In [28]:
# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

FileNotFoundError: [Errno 2] No such file or directory: 'Xvfb'

## Train our Deep Q-Learning Agent to Play Space Invaders 👾

To train an agent with RL-Baselines3-Zoo, we just need to do two things:

1. Create a hyperparameter config file that will contain our training hyperparameters called `dqn.yml`.

This is a template example:

```
SpaceInvadersNoFrameskip-v4:
  env_wrapper:
    - stable_baselines3.common.atari_wrappers.AtariWrapper
  frame_stack: 4
  policy: 'CnnPolicy'
  n_timesteps: !!float 1e6
  buffer_size: 100000
  learning_rate: !!float 1e-4
  batch_size: 32
  learning_starts: 100000
  target_update_interval: 1000
  train_freq: 4
  gradient_steps: 1
  exploration_fraction: 0.1
  exploration_final_eps: 0.01
  # If True, you need to deactivate handle_timeout_termination
  # in the replay_buffer_kwargs
  optimize_memory_usage: False
```

Here we see that:
- We use the `Atari Wrapper` that preprocess the input (Frame reduction ,grayscale, stack 4 frames)
- We use `CnnPolicy`, since we use Convolutional layers to process the frames
- We train it for 10 million `n_timesteps`
- Memory (Experience Replay) size is 100000, aka the amount of experience steps you saved to train again your agent with.

💡 My advice is to **reduce the training timesteps to 1M,** which will take about 90 minutes on a P100. `!nvidia-smi` will tell you what GPU you're using. At 10 million steps, this will take about 9 hours, which could likely result in Colab timing out. I recommend running this on your local computer (or somewhere else). Just click on: `File>Download`.

In terms of hyperparameters optimization, my advice is to focus on these 3 hyperparameters:
- `learning_rate`
- `buffer_size (Experience Memory size)`
- `batch_size`

As a good practice, you need to **check the documentation to understand what each hyperparameters does**: https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#parameters



2. We start the training and save the models on `logs` folder 📁

- Define the algorithm after `--algo`, where we save the model after `-f` and where the hyperparameter config is after `-c`.

In [29]:
!python -m rl_zoo3.train --algo dqn --env SpaceInvadersNoFrameskip-v4  -f logs/  -c dqn.yml

#### Solution

In [None]:
!python -m rl_zoo3.train --algo dqn  --env SpaceInvadersNoFrameskip-v4 -f logs/ -c dqn.yml

## Let's evaluate our agent 👀
- RL-Baselines3-Zoo provides `enjoy.py`, a python script to evaluate our agent. In most RL libraries, we call the evaluation script `enjoy.py`.
- Let's evaluate it for 5000 timesteps 🔥

In [27]:
!python -m rl_zoo3.enjoy  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --no-render  --n-timesteps 5000  --folder logs/

#### Solution

In [None]:
!python -m rl_zoo3.enjoy  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --no-render  --n-timesteps 5000  --folder logs/

## Publish our trained model on the Hub 🚀
Now that we saw we got good results after the training, we can publish our trained model on the hub 🤗 with one line of code.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/space-invaders-model.gif" alt="Space Invaders model">

By using `rl_zoo3.push_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.

This way:
- You can **showcase our work** 🔥
- You can **visualize your agent playing** 👀
- You can **share with the community an agent that others can use** 💾
- You can **access a leaderboard 🏆 to see how well your agent is performing compared to your classmates** 👉  https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard

To be able to share your model with the community there are three more steps to follow:

1️⃣ (If it's not already done) create an account to HF ➡ https://huggingface.co/join

2️⃣ Sign in and then, you need to store your authentication token from the Hugging Face website.
- Create a new token (https://huggingface.co/settings/tokens) **with write role**

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/create-token.jpg" alt="Create HF Token">

- Copy the token
- Run the cell below and past the token

In [None]:
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.
notebook_login()
!git config --global credential.helper store

If you don't want to use a Google Colab or a Jupyter Notebook, you need to use this command instead: `huggingface-cli login`

3️⃣ We're now ready to push our trained agent to the 🤗 Hub 🔥

Let's run push_to_hub.py file to upload our trained agent to the Hub.

`--repo-name `: The name of the repo

`-orga`: Your Hugging Face username

`-f`: Where the trained model folder is (in our case `logs`)

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/notebooks/unit3/select-id.png" alt="Select Id">

In [None]:
!python -m rl_zoo3.push_to_hub  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --repo-name _____________________ -orga _____________________ -f logs/

#### Solution

In [None]:
!python -m rl_zoo3.push_to_hub  --algo dqn  --env SpaceInvadersNoFrameskip-v4  --repo-name dqn-SpaceInvadersNoFrameskip-v4  -orga ThomasSimonini  -f logs/

###.

Congrats 🥳 you've just trained and uploaded your first Deep Q-Learning agent using RL-Baselines-3 Zoo. The script above should have displayed a link to a model repository such as https://huggingface.co/ThomasSimonini/dqn-SpaceInvadersNoFrameskip-v4. When you go to this link, you can:

- See a **video preview of your agent** at the right.
- Click "Files and versions" to see all the files in the repository.
- Click "Use in stable-baselines3" to get a code snippet that shows how to load the model.
- A model card (`README.md` file) which gives a description of the model and the hyperparameters you used.

Under the hood, the Hub uses git-based repositories (don't worry if you don't know what git is), which means you can update the model with new versions as you experiment and improve your agent.

**Compare the results of your agents with your classmates** using the [leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) 🏆

## Load a powerful trained model 🔥
- The Stable-Baselines3 team uploaded **more than 150 trained Deep Reinforcement Learning agents on the Hub**.

You can find them here: 👉 https://huggingface.co/sb3

Some examples:
- Asteroids: https://huggingface.co/sb3/dqn-AsteroidsNoFrameskip-v4
- Beam Rider: https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4
- Breakout: https://huggingface.co/sb3/dqn-BreakoutNoFrameskip-v4
- Road Runner: https://huggingface.co/sb3/dqn-RoadRunnerNoFrameskip-v4

Let's load an agent playing Beam Rider: https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4

In [None]:
%%html
<video controls autoplay><source src="https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4/resolve/main/replay.mp4" type="video/mp4"></video>

1. We download the model using `rl_zoo3.load_from_hub`, and place it in a new folder that we can call `rl_trained`

In [None]:
# Download model and save it into the logs/ folder
!python -m rl_zoo3.load_from_hub --algo dqn --env BeamRiderNoFrameskip-v4 -orga sb3 -f rl_trained/

2. Let's evaluate if for 5000 timesteps

In [None]:
!python -m rl_zoo3.enjoy --algo dqn --env BeamRiderNoFrameskip-v4 -n 5000  -f rl_trained/ --no-render

Why not trying to train your own **Deep Q-Learning Agent playing BeamRiderNoFrameskip-v4? 🏆.**

If you want to try, check https://huggingface.co/sb3/dqn-BeamRiderNoFrameskip-v4#hyperparameters **in the model card, you have the hyperparameters of the trained agent.**

But finding hyperparameters can be a daunting task. Fortunately, we'll see in the next Unit, how we can **use Optuna for optimizing the Hyperparameters 🔥.**


## Some additional challenges 🏆
The best way to learn **is to try things by your own**!

In the [Leaderboard](https://huggingface.co/spaces/huggingface-projects/Deep-Reinforcement-Learning-Leaderboard) you will find your agents. Can you get to the top?

Here's a list of environments you can try to train your agent with:
- BeamRiderNoFrameskip-v4
- BreakoutNoFrameskip-v4
- EnduroNoFrameskip-v4
- PongNoFrameskip-v4

Also, **if you want to learn to implement Deep Q-Learning by yourself**, you definitely should look at CleanRL implementation: https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_atari.py

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit4/atari-envs.gif" alt="Environments"/>

________________________________________________________________________
Congrats on finishing this chapter!

If you’re still feel confused with all these elements...it's totally normal! **This was the same for me and for all people who studied RL.**

Take time to really **grasp the material before continuing and try the additional challenges**. It’s important to master these elements and having a solid foundations.

In the next unit, **we’re going to learn about [Optuna](https://optuna.org/)**. One of the most critical task in Deep Reinforcement Learning is to find a good set of training hyperparameters. And Optuna is a library that helps you to automate the search.






### This is a course built with you 👷🏿‍♀️

Finally, we want to improve and update the course iteratively with your feedback. If you have some, please fill this form 👉 https://forms.gle/3HgA7bEHwAmmLfwh9

We're constantly trying to improve our tutorials, so **if you find some issues in this notebook**, please [open an issue on the Github Repo](https://github.com/huggingface/deep-rl-class/issues).

See you on Bonus unit 2! 🔥

### Keep Learning, Stay Awesome 🤗