<a href="https://colab.research.google.com/github/CaptainAmu/Reinforcement-Learning-Tutorial/blob/main/notebooks/unit1/unit1_try.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Dependencies & Install packages

In [None]:
%%html
<video controls autoplay><source src="https://huggingface.co/sb3/ppo-LunarLander-v2/resolve/main/replay.mp4" type="video/mp4"></video>

In [None]:
!apt install swig cmake

In [None]:
!pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit1/requirements-unit1.txt

In [None]:
# Use some of the newest version of packages

!pip install pygame==2.5.2 -q # -q quiet install
!pip install box2d-py==2.3.5 -q
!pip install gymnasium>=1.0.0 -q
!pip install stable-baselines3==2.0.0a5 -q
!pip install huggingface_sb3 -q

In [None]:
!sudo apt-get update
!sudo apt-get install -y python3-opengl
!apt install ffmpeg
!apt install xvfb
!pip3 install pyvirtualdisplay

```pygame==2.5.2```

一个游戏开发框架，RL 里常用来渲染环境画面（比如 LunarLander 的小飞船动起来）。在 gymnasium[box2d] 里是必需的。

```box2d-py==2.3.5```

Box2D 是一个 2D 物理引擎，用来模拟重力、碰撞等。LunarLander-v2 就是基于 Box2D 实现的。

```gymnasium```

强化学习的标准环境库（OpenAI Gym 的继承版本）。提供各种环境：LunarLander-v2、CartPole-v1、Atari 等。

```stable-baselines3==2.0.0a5```

常用的深度强化学习算法库（PPO、DQN、A2C 等）。你训练智能体时用的核心库。

```huggingface_sb3```

Hugging Face 提供的扩展包，方便把训练好的 RL 模型上传/下载到 Hugging Face Hub。类似“模型仓库管理工具”。[链接文字](https://)

---
## Import Packages

In [None]:
# Virtual display
from pyvirtualdisplay import Display

virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

In [None]:
import gymnasium
print(f'Using Gymnasium version {gymnasium.__version__}')

from huggingface_sb3 import load_from_hub, package_to_hub
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

from stable_baselines3 import PPO, DQN
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.monitor import Monitor

gagaga try

---
## Random policy

In [None]:
import gymnasium as gym

# Create environment and initial observation
env = gym.make("LunarLander-v2")
observation, info = env.reset()

# Sample actions and update environment
for _ in range(20):
  action = env.action_space.sample()
  print(f'Action taken: {action}')
  observation, reward, terminated, truncated, info = env.step(action)

  if terminated or truncated:
    print("Environment is reset!")
    observation, info = env.reset()

env.close()

## Investigate the environment

In [None]:
env = gym.make("LunarLander-v2")
print(f'Action space, {env.action_space}')
print(f'Action space sample: {env.action_space.sample()}')
print(f'Observation space, {env.observation_space}')
print(f'Observation space sample: {env.observation_space.sample()}')

The observation space (8 entries) describes the
* x-coord,
* y-coord,
* x-velocity,
* y-velocity,
* angle,
* angular-velocity,
* left_leg_on_ground,
* right_leg_on_ground.

In [None]:
env = make_vec_env("LunarLander-v2", n_envs = 16) # 16 envs in parallel

## Create a Model

In [None]:
# PPO Model
model_PPO = PPO(
    policy = 'MlpPolicy',
    env = env,
    n_steps = 1024,
    batch_size = 64,
    n_epochs = 4,
    gamma = 0.999,
    gae_lambda = 0.98,
    ent_coef = 0.01,
    verbose = 1
)

# DQN Model
model_DQN = DQN(
    policy = 'MlpPolicy',
    env = env,
    batch_size = 64,
    gamma = 0.999,
    learning_rate = 0.00025
    )

## Training a agent




In [None]:
model_PPO.learn(total_timesteps = 100000)
model_PPO.save("PPO-Lunarlander-v2")

In [None]:
model_DQN.learn(total_timesteps = 100000)
model_DQN.save("DQN-Lunarlander-v2")

## Evaluate an agent

In [None]:
def evaluate(agent):
  '''Evaluate the performance of a trained agent in LunarLander-v2'''
  eval_env = Monitor(gym.make("LunarLander-v2", render_mode = 'rgb_array'))
  mean_reward, std_reward = evaluate_policy(agent, eval_env, n_eval_episodes = 10, deterministic = True)
  print(f'Mean reward={mean_reward}+-{std_reward}')
  pass

evaluate(model_DQN)

## Pushing a model to the Hugging Face Hub
### 1. Create repo (with access token) and login to it.

In [None]:
notebook_login()
!git config --global credential.helper store

In [None]:
from huggingface_hub import login
from google.colab import userdata

hf_token = userdata.get('HF_TOKEN')  # 从 secret 中读取
login(hf_token)

### 2. Push model to Hub

In [None]:
import gymnasium as gym
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.env_util import make_vec_env
from huggingface_sb3 import package_to_hub

In [None]:
model = model_DQN
repo_id = 'ShuchengLi/dqn-LunarLander-v2'
model_name = 'dqn-LunarLander-v2'
env_id = 'LunarLander-v2'
model_architecture = 'DQN'
commit_message = 'Upload DQN LunarLander-v2 trained agent'
eval_env = DummyVecEnv([lambda: Monitor(gym.make(env_id, render_mode = 'rgb_array'))])

package_to_hub(model=model,
               model_name=model_name,
               model_architecture=model_architecture,
               env_id=env_id,
               eval_env=eval_env,
               repo_id=repo_id,
               commit_message=commit_message)

## Loading a model from HuggingFace & Evaluating

In [None]:
!pip install shimmy  # shimmy API conversion tool to convert Gym to compatible with Gymnasium usage.

In [None]:
from huggingface_sb3 import load_from_hub
from stable_baselines3 import DQN
repo_id = 'ShuchengLi/dqn-LunarLander-v2'
filename = 'dqn-LunarLander-v2.zip'

custom_objects = {
            "learning_rate": 0.0,
            "lr_schedule": lambda _: 0.0,
            "clip_range": lambda _: 0.0,
}

checkpoint = load_from_hub(repo_id, filename)
model_loaded = DQN.load(checkpoint, custom_objects=custom_objects, print_system_info=True)

In [None]:
eval_env = Monitor(gym.make("LunarLander-v2", render_mode = 'rgb_array'))
mean_reward, std_reward = evaluate_policy(model_loaded, eval_env, n_eval_episodes = 10, deterministic = True)
print(f"Mean reward={mean_reward:.2f} +/- {std_reward:.2f}")