# Using RL Zoo Baseline3


[`RL Baselines3 Zoo`](https://rl-baselines3-zoo.readthedocs.io/en/master/) is a training framework for Reinforcement Learning (RL), using Stable Baselines3 (SB3), reliable implementations of reinforcement learning algorithms in PyTorch. It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos. In addition, it includes a collection of tuned hyperparameters for common environments and RL algorithms, and agents trained with those settings.

Github repository: https://github.com/DLR-RM/rl-baselines3-zoo

In this notebook, we will train and record demos as well as push the trained agents to Huggingface - all using RL Zoo sb3.

RL Zoo is supposed to be run from command line. However, we can use python notebooks to run commands using "!" bang character before the commands e.g., to run `pwd` unix command to list the current working directory, we can execute following command in a code cell `!pwd`.

We have been using this to install the dependencies while running these notebooks in Google Colab e.g.
```
!pip install "stable-baselines3[extra]
```

In [4]:
#!pip install rl_zoo3

Collecting rl_zoo3
  Downloading rl_zoo3-2.3.0-py3-none-any.whl.metadata (1.8 kB)
Collecting sb3-contrib<3.0,>=2.3.0 (from rl_zoo3)
  Downloading sb3_contrib-2.3.0-py3-none-any.whl.metadata (3.6 kB)
Collecting optuna>=3.0 (from rl_zoo3)
  Downloading optuna-4.0.0-py3-none-any.whl.metadata (16 kB)
Collecting pytablewriter~=1.2 (from rl_zoo3)
  Downloading pytablewriter-1.2.0-py3-none-any.whl.metadata (37 kB)
Collecting alembic>=1.5.0 (from optuna>=3.0->rl_zoo3)
  Downloading alembic-1.13.3-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna>=3.0->rl_zoo3)
  Downloading colorlog-6.8.2-py3-none-any.whl.metadata (10 kB)
Collecting sqlalchemy>=1.3.0 (from optuna>=3.0->rl_zoo3)
  Downloading SQLAlchemy-2.0.35-cp310-cp310-macosx_10_9_x86_64.whl.metadata (9.6 kB)
Collecting DataProperty<2,>=1.0.1 (from pytablewriter~=1.2->rl_zoo3)
  Downloading DataProperty-1.0.1-py3-none-any.whl.metadata (11 kB)
Collecting mbstrdecoder<2,>=1.0.0 (from pytablewriter~=1.2->rl_zoo3)
  Downloading

#### Running in Colab/Kaggle

If you are running this on Colab, please uncomment below cell and run this to install required dependencies.

In [None]:
## uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle
# !apt-get update 
# !apt-get install -y swig cmake ffmpeg freeglut3-dev xvfb

In [None]:
## Uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle

## Uncomment and run for Colab
# !git clone https://github.com/nsanghi/drl-2ed
# %cd /content/drl-2ed 
# !pip install  -r requirements.txt
# %cd chapter2


## Uncomment and run for Kaggle
# !git clone https://github.com/nsanghi/drl-2ed
# %cd /kaggle/working/drl-2ed 
# !pip install  -r requirements.txt
# %cd chapter2

## Training LunarLander using DQN

Same as the implementation we saw before - except this time done using RL Zoo

Please note that the default parameters printed at the start of executing below command can be changed. You can refer to RL Zoo documentation for more details. Please also note that these default parameters are different form the the defaults while running the `model.train` from `stablebaseline3` - https://stable-baselines3.readthedocs.io/en/master/modules/dqn.html#stable_baselines3.dqn.DQN

In [5]:
# Train a DQN agent on LunarLander-v2

!python -m rl_zoo3.train --algo dqn --env LunarLander-v2 --n-timesteps 100000 --log-interval 400 --progress

Seed: 3737628595
Loading hyperparameters from: /Users/ashis/venv-directory/venv-p310-RL-workspace/lib/python3.10/site-packages/rl_zoo3/hyperparams/dqn.yml
Default hyperparameters for environment (ones being tuned will be overridden):
OrderedDict([('batch_size', 128),
             ('buffer_size', 50000),
             ('exploration_final_eps', 0.1),
             ('exploration_fraction', 0.12),
             ('gamma', 0.99),
             ('gradient_steps', -1),
             ('learning_rate', 0.00063),
             ('learning_starts', 0),
             ('n_timesteps', 100000.0),
             ('policy', 'MlpPolicy'),
             ('policy_kwargs', 'dict(net_arch=[256, 256])'),
             ('target_update_interval', 250),
             ('train_freq', 4)])
Using 1 environments
Overwriting n_timesteps with n=100000
Creating test environment
Using cpu device
Log path: logs/dqn/LunarLander-v2_1
[2KEval num_timesteps=25000, episode_reward=-55.24 +/- 49.76━━━━━━[0m [32m24,981/100,000 [0m [ [33m

## Evaluting the agent

We will now evaluate the above trained agent by loading the best model saved by above command.



In [6]:
!python -m rl_zoo3.enjoy --algo dqn --env LunarLander-v2 --no-render --n-timesteps 5000 --folder logs/

Loading latest experiment, id=1
Loading logs/dqn/LunarLander-v2_1/LunarLander-v2.zip
Episode Reward: 46.67
Episode Length 96
Episode Reward: 48.81
Episode Length 98
Episode Reward: -3.00
Episode Length 107
Episode Reward: 10.32
Episode Length 100
Episode Reward: 238.38
Episode Length 155
Episode Reward: 237.41
Episode Length 327
Episode Reward: 4.81
Episode Length 97
Episode Reward: -65.38
Episode Length 93
Episode Reward: -431.50
Episode Length 328
Episode Reward: 220.13
Episode Length 281
Episode Reward: 230.98
Episode Length 433
Episode Reward: 53.84
Episode Length 92
Episode Reward: -297.33
Episode Length 453
Episode Reward: -475.07
Episode Length 726
Episode Reward: 28.07
Episode Length 94
Episode Reward: -154.49
Episode Length 159
Episode Reward: -341.21
Episode Length 134
Episode Reward: 28.65
Episode Length 97
Episode Reward: 3.44
Episode Length 122
Episode Reward: -475.18
Episode Length 779
Episode Reward: -2.28
Episode Length 99
21 Episodes
Mean reward: -52.09 +/- 222.31
Mean

## Recordig a video

Let us now record a video of trained agent

In [7]:
!python -m rl_zoo3.record_video --algo dqn --env LunarLander-v2 --exp-id 0 -f logs/ -n 1000

Loading latest experiment, id=1
Loading logs/dqn/LunarLander-v2_1/LunarLander-v2.zip
Loading logs/dqn/LunarLander-v2_1/LunarLander-v2.zip
Saving video to /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4
Moviepy - Building video /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4.
Moviepy - Writing video /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step-0-to-step-1000.mp4

Moviepy - Done !                                                                
Moviepy - video ready /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drl-ashiskb/logs/dqn/LunarLander-v2_1/videos/final-model-dqn-LunarLander-v2-step

## Display the video

In [8]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay


def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [9]:

show_videos(video_path='logs/dqn/LunarLander-v2_1/videos/', prefix='')

## Pushing to Huggingface

To share with others, you can push the trained model to huggingface. First we need to login into hugginfcae using the token


In [13]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import login # To log to our Hugging Face account to be able to upload models to the Hub.

login()
#!git config --global credential.helper store


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

**IMPORTANT**
Some users have reported facing following error while running `rl_zoo3.push_to_hub` .

```
"Token is required (write-access action) but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens."
```

In such a case the following command will help you over come the issue

```
import huggingface_hub

huggingface_hub.login(token= <YOUR_HF_TOKEN>,
                     write_permission = True,
                    add_to_git_credential = True)
					
```

Another alternative is to use following command from command shell where the `venv` or `conda` environment for this repository has been activated and then follow the instructions to set the HuggingFace token.

```
huggingface-cli login

```

In [14]:
# Before you run this, change -orga to your huggingface id

!python -m rl_zoo3.push_to_hub --algo dqn --env LunarLander-v2 -f logs/ -orga ashiskb -m "Initial commit"

## See model at Huggingface Hub

Click on link below to see the stored trained agent and video on huggingface

https://huggingface.co/ashiskb/dqn-LunarLander-v2

In your case it would look like

`https://huggingface.co/<orga>/<algo>-<env>`
