<a href="https://colab.research.google.com/github/Ajay-user/Reinforcement-Learning-Repo/blob/main/LunarLander-v2/LunarLander_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Deep Reinforcement Learning agent a Lunar Lander agent that will learn to land correctly on the Moon 🌕. Using Stable-Baselines3 a Deep Reinforcement Learning library

*To accelerate the agent's training, we'll use a GPU.*

### The environment 🎮
- [LunarLander-v2](https://www.gymlibrary.dev/environments/box2d/lunar_lander/)

### The library used 📚
- [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/)

In [6]:
! pip install setuptools==65.5.0
! pip install 'stable-baselines3[extra]'
! pip install swig
! pip install pyglet==1.5.1
! pip install box2d-py
! pip install gym[box2D]  # we use gym==0.21

In [5]:
!sudo apt-get update
!apt install python-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

In [4]:
# Additional code for Stable-baselines3 to load and upload models from the Hugging Face 🤗 Hub.
! pip install huggingface_sb3 

In [1]:
import gym

from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env

In [2]:
# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

<pyvirtualdisplay.display.Display at 0x7fcd4622e550>

#### Create env

In [3]:
env = make_vec_env("LunarLander-v2", n_envs=16)

#### Create Model : Proximal Policy Optimization

In [4]:
model = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 1024,
    batch_size = 64,
    n_epochs = 4,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose=1)

Using cuda device


#### Train the agent

In [7]:
model.learn(total_timesteps=1000_000)

#### Save the PPO model

In [6]:
model_name = "ppo-LunarLander-v2"
model.save(model_name)

### Evaluate the model

In [13]:
# create a new environment for evaluation
eval_env = gym.make("LunarLander-v2")
# evaluate the model with 10 evaluation episodes and deterministic=True
mean_reward, std_reward = evaluate_policy(model=model, env=eval_env, n_eval_episodes=10, deterministic=True)

print(f'Mean reward = {mean_reward :0.2f} +/- {std_reward :0.2f}')

Mean reward = 269.14 +/- 15.56


### Publish our trained model on the Hub 🔥

By using `package_to_hub` **you evaluate, record a replay, generate a model card of your agent and push it to the hub**.


In [14]:
notebook_login()
!git config --global credential.helper store

Token is valid.
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [15]:
import gym
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env

from huggingface_sb3 import package_to_hub

# Define the name of the environment
env_id = "LunarLander-v2"

# Define the model architecture we used
model_architecture = "PPO"

## repo_id is the id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} 
repo_id = "Ajay-user/LunarLander-v2"

## Define the commit message
commit_message = "Upload PPO LunarLander-v2 trained agent"


# Create the evaluation env
eval_env = DummyVecEnv([lambda: gym.make(env_id)])



package_to_hub(model=model, # Our trained model
               model_name=model_name, # The name of our trained model 
               model_architecture=model_architecture, # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id=repo_id, # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name}
               commit_message=commit_message)




[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m




Saving video to /tmp/tmpzglq9we4/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo Ajay-user/LunarLander-v2 to the Hugging Face Hub[0m


Upload 4 LFS files:   0%|          | 0/4 [00:00<?, ?it/s]

pytorch_variables.pth:   0%|          | 0.00/431 [00:00<?, ?B/s]

policy.optimizer.pth:   0%|          | 0.00/87.9k [00:00<?, ?B/s]

ppo-LunarLander-v2.zip:   0%|          | 0.00/147k [00:00<?, ?B/s]

policy.pth:   0%|          | 0.00/43.3k [00:00<?, ?B/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/Ajay-user/LunarLander-v2/tree/main/[0m


'https://huggingface.co/Ajay-user/LunarLander-v2/tree/main/'