# Train your First Agent

In this Notebook we will train the 2nd agent. yet another environment, train the agent and then share the result using Huggingface ecosystem. We will be using following libraries:

1. `gymnasium` - which we saw in Listing 2.1.  [gymnasium](https://gymnasium.farama.org/) is a standard API for reinforcement learning, and a diverse collection of reference environments.
2. While, to gain understanding, we will be writing our own implementation code for lots of algorithms taught in the book, for actual work it makes sense to use standard libraries for all such RL tasks. One such library is [`Stable Baselines3 SB3`](https://stable-baselines3.readthedocs.io/en/master/) which is a set of reliable implementations of reinforcement learning algorithms in PyTorch. Accompanying this there is another library[`RL Baseline Zoo`](https://github.com/DLR-RM/rl-baselines3-zoo) which provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording.
3. We will also be using Huggingface to host the trained agents and be able to share the results with others. The book walks you through details of what is Huggingface and what all is available under huggingface ecosystem. In this book, we will be using a subset of the capabilities in huggingface. For now we will be using it to upload trained agents and demo videos so that the same can be shared with others. We will be using [`Huggingface sb3`](https://github.com/huggingface/huggingface_sb3) which is a library to load and upload Stable-baselines3.  models from the Hub. Before we can use it, we need a login to be created on [`Huggingface`](https://huggingface.co/join). You can follow the link to create an account.

#### Running in Colab/Kaggle

If you are running this on Colab or Kaggle, please uncomment below cell and run this to install required dependencies.

In [1]:
## uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle

# !apt-get update 
# !apt-get install -y swig cmake ffmpeg freeglut3-dev xvfb

In [2]:
## uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle

# !pip install "box2d-py==2.3.8"
# !pip install "stable-baselines3[extra]==2.1.0"
# !pip install "huggingface_sb3>=3.0"
# !pip install "moviepy==1.0.3"

## Import policy, RL agent

We will using DQN (Deep Q Network) policy to train the agent. For now we will treat this as a black box, diving deeper into it in a subsequent chapter.

In [3]:
import gymnasium as gym

from stable_baselines3 import DQN

2024-03-05 12:24:01.980660: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-05 12:24:02.048924: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-05 12:24:02.444359: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-05 12:24:02.444447: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-05 12:24:02.448106: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to regi

## Create the Gym env and instantiate the agent

For this example, we will use Lunar Lander environment.

"Landing outside landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt. Four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine. "

[Lunar Lander environment](https://gymnasium.farama.org/environments/box2d/lunar_lander/)

![Lunar Lander](https://gymnasium.farama.org/_images/lunar_lander.gif)


We chose the MlpPolicy because input of Lunar Lander is a feature vector, not images. The type of action to use (discrete/continuous) will be automatically deduced from the environment action space



In [4]:
model = DQN(
    "MlpPolicy",
    "LunarLander-v2",
    verbose=1,
    exploration_final_eps=0.1,
    target_update_interval=250,
)

Using cpu device
Creating environment from the given name 'LunarLander-v2'
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


We load a helper function to evaluate the agent:

In [5]:
from stable_baselines3.common.evaluation import evaluate_policy

Let's evaluate the un-trained agent, this should be a random agent.

In [6]:
# Separate env for evaluation
eval_env = gym.make("LunarLander-v2")

# Random Agent, before training
mean_reward, std_reward = evaluate_policy(
    model,
    eval_env,
    n_eval_episodes=10,
    deterministic=True,
)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")



mean_reward=-960.30 +/- 1079.1275705236621


## Train the agent and save it

Warning: this may take a while

In [7]:
# Train the agent
model.learn(total_timesteps=int(1e5), log_interval=400, progress_bar=True)
# Save the agent
model.save("dqn_lunar")
del model  # delete trained model to demonstrate loading

Output()

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 93       |
|    ep_rew_mean      | -176     |
|    exploration_rate | 0.1      |
| time/               |          |
|    episodes         | 400      |
|    fps              | 818      |
|    time_elapsed     | 46       |
|    total_timesteps  | 37904    |
----------------------------------


## Load the trained agent

In [8]:
model = DQN.load("dqn_lunar")

In [9]:
# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=209.67 +/- 22.374592803476926


## Record the video of trained agent

In [14]:
import gymnasium as gym
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

env_id = "LunarLander-v2"
video_folder = "logs/videos/"
video_length = 1000

vec_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])

obs = vec_env.reset()

# Record the video starting at the first step
vec_env = VecVideoRecorder(vec_env, video_folder,
                       record_video_trigger=lambda x: x == 0, video_length=video_length,
                       name_prefix=f"random-agent-{env_id}")

vec_env.reset()
for _ in range(video_length + 1):
  action, _state = model.predict(obs)
  obs, _, _, _ = vec_env.step(action)
# Save the video
vec_env.close()

Saving video to /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4
Moviepy - Building video /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4.
Moviepy - Writing video /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4



                                                                                                                        

Moviepy - Done !
Moviepy - video ready /home/nsanghi/sandbox/apress/drl-2ed/chapter2/logs/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4


In [15]:
from IPython.display import HTML
from base64 import b64encode

mp4 = open('./logs/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

## Upload the Video to Huggingface

We will need to login into huggingface using token which will allow this notebook to be able to upload files into your huggingface account. You can create/find huggingface token at https://huggingface.co/settings/tokens. This token will need to be inputed when we run `notebook_login()` below.

In [16]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

notebook_login()
!git config --global credential.helper store

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

**IMPORTANT**
Some users have reported facing following error while running the `package_to_hub` upload function. 

```
"Token is required (write-access action) but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens."
```

In such a case the following command will help you over come the issue

```
import huggingface_hub

huggingface_hub.login(token= <YOUR_HF_TOKEN>,
                     write_permission = True,
                    add_to_git_credential = True)
					
```

Another alternative is to use following command from command shell where the `venv` or `conda` environment for this repository has been activated and then follow the instructions to set the HuggingFace token.

```
huggingface-cli login

```

In [17]:
eval_env = gym.make("LunarLander-v2", render_mode='rgb_array')

# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub

# Please note repo_id is of the form <huggingface_id>/<name of repo>
# you will need to change this to "<your_huggingface_id>/dqn-LunarLander-v2"

package_to_hub(model=model, # Our trained model
               model_name="dqn-LunarLander-v2", # The name of our trained model
               model_architecture="DQN", # The model architecture we used: in our case PPO
               env_id="LunarLander-v2", # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id="nsanghi/dqn-LunarLander-v2", # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance nsanghi/dqn-LunarLander-v2
               commit_message="Push to Hub")

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m




Saving video to /tmp/tmprxs4ameq/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmprxs4ameq/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmprxs4ameq/-step-0-to-step-1000.mp4



ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers                                     
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --en

Moviepy - Done !
Moviepy - video ready /tmp/tmprxs4ameq/-step-0-to-step-1000.mp4


frame= 1001 fps=558 q=-1.0 Lsize=     192kB time=00:00:19.96 bitrate=  78.7kbits/s speed=11.1x    
video:179kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 7.011080%
[libx264 @ 0x55da91577a80] frame I:5     Avg QP:10.12  size:  2003
[libx264 @ 0x55da91577a80] frame P:255   Avg QP:24.55  size:   241
[libx264 @ 0x55da91577a80] frame B:741   Avg QP:26.80  size:   150
[libx264 @ 0x55da91577a80] consecutive B-frames:  0.6%  1.8%  0.9% 96.7%
[libx264 @ 0x55da91577a80] mb I  I16..4: 88.8%  4.7%  6.5%
[libx264 @ 0x55da91577a80] mb P  I16..4:  0.2%  0.5%  0.2%  P16..4:  2.1%  0.5%  0.2%  0.0%  0.0%    skip:96.4%
[libx264 @ 0x55da91577a80] mb B  I16..4:  0.0%  0.0%  0.1%  B16..8:  3.1%  0.3%  0.0%  direct: 0.1%  skip:96.3%  L0:53.2% L1:45.9% BI: 0.9%
[libx264 @ 0x55da91577a80] 8x8 transform intra:21.1% inter:17.2%
[libx264 @ 0x55da91577a80] coded y,uvDC,uvAC intra: 8.3% 14.0% 12.8% inter: 0.2% 0.3% 0.3%
[libx264 @ 0x55da91577a80] i16 v,h,dc,p: 88%  7%  5%  0%
[lib

[38;5;4mℹ Pushing repo nsanghi/dqn-LunarLander-v2 to the Hugging Face Hub[0m


dqn-LunarLander-v2.zip:   0%|          | 0.00/107k [00:00<?, ?B/s]

replay.mp4:   0%|          | 0.00/196k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/nsanghi/dqn-LunarLander-v2/tree/main/[0m


'https://huggingface.co/nsanghi/dqn-LunarLander-v2/tree/main/'

## Checking the Results on Huggingface

After successful upload, you will see a message at the end of above cell output giving you a link where you can view the model. **It will have a pattern like `https://huggingface.co/<yourusername>/dqn-LunarLander-v2/`**

Please click on this link to access the trained agent. You can also share this link with others to show the result of training. Share the url without the ending paths "tree/main" so that the link takes them to Model Card tab where they can see the animation. e.g. in my case it will be:<br/>
`https://huggingface.co/nsanghi/dqn-LunarLander-v2/`<br/>
instead of<br/>
`https://huggingface.co/nsanghi/dqn-LunarLander-v2/tree/main/`

NOTE: At the time of writing this book, there is a bug in push_to_hub which results in sample video not getting created. You can refer to this bug here - https://github.com/huggingface/huggingface_sb3/issues/33
