# Train Lunar Lander Agent

In this Notebook we will train the agent and then share the result using Huggingface ecosystem. We will be using following libraries:

1. [gymnasium](https://gymnasium.farama.org/) is a standard API for reinforcement learning, and a diverse collection of reference environments.
2. While, to gain understanding, we will be writing our own implementation code for lots of algorithms taught in the book, for actual work it makes sense to use standard libraries for all such RL tasks. One such library is [`Stable Baselines3 SB3`](https://stable-baselines3.readthedocs.io/en/master/) which is a set of reliable implementations of reinforcement learning algorithms in PyTorch. Accompanying this there is another library[`RL Baseline Zoo`](https://github.com/DLR-RM/rl-baselines3-zoo) which provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording.
3. We will also be using Huggingface to host the trained agents and be able to share the results with others. The book walks you through details of what is Huggingface and what all is available under huggingface ecosystem. Here, we will be using it to upload trained agents and demo videos so that the same can be shared with others. We will be using [`Huggingface sb3`](https://github.com/huggingface/huggingface_sb3) which is a library to load and upload Stable-baselines3.  models from the Hub. Before we can use it, we need a login to be created on [`Huggingface`](https://huggingface.co/join). You can follow the link to create an account.

#### Running in Colab/Kaggle

If you are running this on Colab or Kaggle, please uncomment below cell and run this to install required dependencies.

In [1]:
## uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle

# !apt-get update 
# !apt-get install -y swig cmake ffmpeg freeglut3-dev xvfb

In [16]:
## uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle

#!pip install "box2d-py==2.3.8"
#!pip install "stable-baselines3[extra]==2.1.0"
#!pip install "huggingface_sb3>=3.0"
#!pip install "moviepy==1.0.3"
#!pip install ipywidgets

## Import policy, RL agent

We will using DQN (Deep Q Network) policy to train the agent. For now we will treat this as a black box, diving deeper into it in a subsequent chapter.

In [1]:
import gymnasium as gym

from stable_baselines3 import DQN

## Create the Gym env and instantiate the agent

For this example, we will use Lunar Lander environment.

"Landing outside landing pad is possible. Fuel is infinite, so an agent can learn to fly and then land on its first attempt. Four discrete actions available: do nothing, fire left orientation engine, fire main engine, fire right orientation engine. "

[Lunar Lander environment](https://gymnasium.farama.org/environments/box2d/lunar_lander/)

![Lunar Lander](figs/lunar_lander.gif)


We chose the MlpPolicy because input of Lunar Lander is a feature vector, not images. The type of action to use (discrete/continuous) will be automatically deduced from the environment action space



In [2]:
model = DQN(
    "MlpPolicy", #policy
    "LunarLander-v2", #env
    verbose=1, #verbosity level
    target_update_interval=250,
    train_freq=16,
    gradient_steps=8,
    gamma=0.99,
    exploration_fraction=0.2,
    exploration_final_eps=0.1,
    learning_starts=1000,
    buffer_size=10000,
    batch_size=128,
    learning_rate=4e-3,
    policy_kwargs=dict(net_arch=[256, 256]),
    seed=2,
)

Using cpu device
Creating environment from the given name 'LunarLander-v2'
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


We load a helper function to evaluate the agent:

In [3]:
from stable_baselines3.common.evaluation import evaluate_policy

Let's evaluate the un-trained agent, this should be a random agent.

In [7]:
# Separate env for evaluation
eval_env = gym.make("LunarLander-v2")

# Random Agent, before training
mean_reward, std_reward = evaluate_policy(
    model,
    eval_env, #or, model.get_env()
    n_eval_episodes=10,
    deterministic=True,
)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-553.20 +/- 201.27844859413986


## Train the agent and save it

Warning: this may take a while

In [8]:
%%time
# Train the agent
model.learn(total_timesteps=int(1e5), log_interval=400, progress_bar=True)
# Save the agent
model.save("dqn_lunar")
del model  # delete trained model to demonstrate loading

Output()

----------------------------------
| rollout/            |          |
|    ep_len_mean      | 381      |
|    ep_rew_mean      | 119      |
|    exploration_rate | 0.1      |
| time/               |          |
|    episodes         | 400      |
|    fps              | 227      |
|    time_elapsed     | 411      |
|    total_timesteps  | 93524    |
| train/              |          |
|    learning_rate    | 0.004    |
|    loss             | 0.707    |
|    n_updates        | 46264    |
----------------------------------


CPU times: user 21min 36s, sys: 1min 1s, total: 22min 37s
Wall time: 7min 23s


## Load the trained agent

In [9]:
model = DQN.load("dqn_lunar")

In [11]:
# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=229.20 +/- 92.29589424939502


## Record the video of trained agent

In [12]:
import gymnasium as gym
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

env_id = "LunarLander-v2"
video_folder = "videos/"
video_length = 1000

vec_env = DummyVecEnv([lambda: gym.make(env_id, render_mode="rgb_array")])

obs = vec_env.reset()

# Record the video starting at the first step
vec_env = VecVideoRecorder(vec_env, video_folder,
                       record_video_trigger=lambda x: x == 0, video_length=video_length,
                       name_prefix=f"random-agent-{env_id}")

vec_env.reset()
for _ in range(video_length + 1):
  action, _state = model.predict(obs)
  obs, _, _, _ = vec_env.step(action)
# Save the video
vec_env.close()

Saving video to /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drb-fall24-rl/week-06/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4
Moviepy - Building video /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drb-fall24-rl/week-06/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4.
Moviepy - Writing video /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drb-fall24-rl/week-06/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4



                                                                

Moviepy - Done !
Moviepy - video ready /Users/ashis/Documents/Teaching/Reinforcement-Learning/Lectures/2024-lectures/drb-fall24-rl/week-06/videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4


In [13]:
from IPython.display import HTML
from base64 import b64encode

mp4 = open('videos/random-agent-LunarLander-v2-step-0-to-step-1000.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

## Upload the Video to Huggingface

We will need to login into huggingface using token which will allow this notebook to be able to upload files into your huggingface account. You can create/find huggingface token at https://huggingface.co/settings/tokens. This token will need to be inputed when we run `login()` below.

In [1]:
from huggingface_hub import login

In [2]:
login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [12]:
from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
#from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

#notebook_login()
#!git config --global credential.helper store

**IMPORTANT**
Some users have reported facing following error while running the `package_to_hub` upload function. 

```
"Token is required (write-access action) but no token found. You need to provide a token or be logged in to Hugging Face with `huggingface-cli login` or `huggingface_hub.login`. See https://huggingface.co/settings/tokens."
```

In such a case the following command will help you over come the issue

```
import huggingface_hub

huggingface_hub.login(token= <YOUR_HF_TOKEN>,
                     write_permission = True,
                    add_to_git_credential = True)
					
```

Another alternative is to use following command from command shell where the `venv` or `conda` environment for this repository has been activated and then follow the instructions to set the HuggingFace token.

```
huggingface-cli login

```

In [13]:
eval_env = gym.make("LunarLander-v2", render_mode='rgb_array')

# method save, evaluate, generate a model card and record a replay video of your agent before pushing the repo to the hub

# Please note repo_id is of the form <huggingface_id>/<name of repo>
# you will need to change this to "<your_huggingface_id>/dqn-LunarLander-v2"

package_to_hub(model=model, # Our trained model
               model_name="dqn-LunarLander-v2", # The name of our trained model
               model_architecture="DQN", # The model architecture we used: in our case PPO
               env_id="LunarLander-v2", # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id="ashiskb/dqn-LunarLander-v2", # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ashiskb/dqn-LunarLander-v2
               commit_message="Push to Hub")

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m




Saving video to /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmpsq0dj9j3/-step-0-to-step-1000.mp4
Moviepy - Building video /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmpsq0dj9j3/-step-0-to-step-1000.mp4.
Moviepy - Writing video /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmpsq0dj9j3/-step-0-to-step-1000.mp4



                                                                

Moviepy - Done !
Moviepy - video ready /var/folders/12/sb6k1cdx1dn2gkn_2l4ty8dr0000gn/T/tmpsq0dj9j3/-step-0-to-step-1000.mp4


ffmpeg version 7.0.2 Copyright (c) 2000-2024 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.3.9.4)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/7.0.2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libharfbuzz --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --ena

[38;5;4mℹ Pushing repo ashiskb/dqn-LunarLander-v2 to the Hugging Face Hub[0m


policy.optimizer.pth:   0%|          | 0.00/45.2k [00:00<?, ?B/s]

dqn-LunarLander-v2.zip:   0%|          | 0.00/107k [00:00<?, ?B/s]

Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

policy.pth:   0%|          | 0.00/44.3k [00:00<?, ?B/s]

pytorch_variables.pth:   0%|          | 0.00/864 [00:00<?, ?B/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/ashiskb/dqn-LunarLander-v2/tree/main/[0m


CommitInfo(commit_url='https://huggingface.co/ashiskb/dqn-LunarLander-v2/commit/7a0d2ff861fa154bbcb8ec9cef0038dd47bf99d8', commit_message='Push to Hub', commit_description='', oid='7a0d2ff861fa154bbcb8ec9cef0038dd47bf99d8', pr_url=None, repo_url=RepoUrl('https://huggingface.co/ashiskb/dqn-LunarLander-v2', endpoint='https://huggingface.co', repo_type='model', repo_id='ashiskb/dqn-LunarLander-v2'), pr_revision=None, pr_num=None)

## Checking the Results on Huggingface

After successful upload, you will see a message at the end of above cell output giving you a link where you can view the model. **It will have a pattern like `https://huggingface.co/<yourusername>/dqn-LunarLander-v2/`**

Please click on this link to access the trained agent. You can also share this link with others to show the result of training. Share the url without the ending paths "tree/main" so that the link takes them to Model Card tab where they can see the animation. e.g. in my case it will be:<br/>
`https://huggingface.co/ashiskb/dqn-LunarLander-v2/`<br/>
instead of<br/>
`https://huggingface.co/ashiskb/dqn-LunarLander-v2/tree/main/`


