<a href="https://colab.research.google.com/github/dogukartal/ML-RoadMap/blob/main/RL/Hugging%20Face/LunarLander_v2/PPO/LunarLander.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install stable-baselines3 swig pip install gymnasium[box2d] huggingface_sb3
!sudo apt-get update
!apt install python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

In [None]:
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

- Action Space: *Discrete(4)*

- Observation Space: *Box([-1.5 -1.5 -5. -5. -3.1415927 -5. -0. -0. ], [1.5 1.5 5. 5. 3.1415927 5. 1. 1. ], (8,), float32)*

- Rewards: After every step a reward is granted. The total reward of an episode is the sum of the rewards for all the steps within that episode. For each step, the reward:

  - is increased/decreased the closer/further the lander is to the landing pad.

  - is increased/decreased the slower/faster the lander is moving.

  - is decreased the more the lander is tilted (angle not horizontal).

  - is increased by 10 points for each leg that is in contact with the ground.

  - is decreased by 0.03 points each frame a side engine is firing.

  - is decreased by 0.3 points each frame the main engine is firing.

  - The episode receive an additional reward of -100 or +100 points for crashing or landing safely respectively.

  - An episode is considered a solution if it scores at least 200 points.







In [None]:
import gymnasium as gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.callbacks import EvalCallback
from stable_baselines3.common.utils import set_random_seed
from stable_baselines3.common.monitor import Monitor

def make_env():
    env = gym.make(
        "LunarLander-v2",
        continuous=True,
        gravity=-10.0,
        enable_wind=False,
        wind_power=15.0,
        turbulence_power=1.5,
        render_mode="rgb_array"
    )
    return Monitor(env)

set_random_seed(0)

vec_env = make_vec_env(make_env, n_envs=16, seed=0)

model = PPO(
    "MlpPolicy",
    vec_env,
    n_steps=1024,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    gae_lambda=0.95,
    clip_range=0.2,
    ent_coef=0.01,
    learning_rate=3e-4,
    verbose=1,
    tensorboard_log="./tensorboard_logs/"
)

# Callbacks
eval_env = make_vec_env(make_env, n_envs=5, seed=100)
eval_callback = EvalCallback(
    eval_env,
    best_model_save_path="./best_model/",
    log_path="./logs/",
    eval_freq=10000,
    deterministic=True,
    render=False
)

model.learn(
    total_timesteps=1000000,
    callback=eval_callback,
    progress_bar=True,
    tb_log_name="PPO_run",
    reset_num_timesteps=False
)

model.save("ppo_lunar_lander_v2_final")

mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}")


In [None]:
from huggingface_hub import notebook_login

notebook_login()
!git config --global credential.helper store

In [None]:
model.load("ppo_lunar_lander_v2_final")
mean_reward, std_reward = evaluate_policy(model, model.get_env(), n_eval_episodes=10)
print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}")

In [None]:
from huggingface_sb3 import package_to_hub
from stable_baselines3.common.vec_env import DummyVecEnv

model_name = "ppo_lunar_lander_v2"
env_id = "LunarLander-v2"
model_architecture = "PPO"
repo_id = "dogukankartal/ppo-LunarLander-v2"
commit_message = "Upload PPO LunarLander-v2 trained agent"
eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode="rgb_array", continuous=True))])

package_to_hub(
    model=model,  # Our trained model
    model_name=model_name,  # The name of our trained model
    model_architecture=model_architecture,  # The model architecture we used: in our case PPO
    env_id=env_id,  # Name of the environment
    eval_env=eval_env,  # Evaluation Environment
    repo_id=repo_id,  # id of the model repository from the Hugging Face Hub (repo_id = {organization}/{repo_name} for instance ThomasSimonini/ppo-LunarLander-v2
    commit_message=commit_message,
)

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: if you encounter a bug, please open an issue.[0m
Saving video to /tmp/tmpu7qhdqu3/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmpu7qhdqu3/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmpu7qhdqu3/-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /tmp/tmpu7qhdqu3/-step-0-to-step-1000.mp4
[38;5;4mℹ Pushing repo dogukankartal/ppo-LunarLander-v2 to the Hugging Face
Hub[0m


ppo_lunar_lander_v2.zip:   0%|          | 0.00/152k [00:00<?, ?B/s]

[38;5;4mℹ Your model is pushed to the Hub. You can view your model here:
https://huggingface.co/dogukankartal/ppo-LunarLander-v2/tree/main/[0m


CommitInfo(commit_url='https://huggingface.co/dogukankartal/ppo-LunarLander-v2/commit/e8ad8d6d53673d0ffb8c3058777c1e2fd5d301f0', commit_message='Upload PPO LunarLander-v2 trained agent', commit_description='', oid='e8ad8d6d53673d0ffb8c3058777c1e2fd5d301f0', pr_url=None, pr_revision=None, pr_num=None)