<a href="https://colab.research.google.com/github/BA-Etchepareborda/Pytorch_Learning/blob/main/RL/pybullet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Baselines3 - PyBullet: Normalizing Features and Reward

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

Pybullet source code: https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/

Gymnasium compatible envs: https://github.com/araffin/pybullet_envs_gymnasium

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [1]:
# for autoformatting
# %load_ext jupyter_black

In [2]:
!pip install pybullet_envs_gymnasium
!pip install "stable-baselines3[extra]>=2.0.0a4"

Collecting pybullet_envs_gymnasium
  Downloading pybullet_envs_gymnasium-0.6.0-py3-none-any.whl.metadata (1.2 kB)
Collecting pybullet>=3.2.5 (from pybullet_envs_gymnasium)
  Downloading pybullet-3.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.8 kB)
Downloading pybullet_envs_gymnasium-0.6.0-py3-none-any.whl (22 kB)
Downloading pybullet-3.2.7-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (103.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.2/103.2 MB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pybullet, pybullet_envs_gymnasium
Successfully installed pybullet-3.2.7 pybullet_envs_gymnasium-0.6.0
Collecting stable-baselines3>=2.0.0a4 (from stable-baselines3[extra]>=2.0.0a4)
  Downloading stable_baselines3-2.7.0a1-py3-none-any.whl.metadata (4.8 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch<3.0,>=2.3->stable-baselines3>=2.0.0a4->stable-baselines3[extra]>=2.0.0a4)
  Downloading n

## Import policy, RL agent, Wrappers

In [3]:
import os

# Register pybullet envs
import pybullet_envs_gymnasium

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecNormalize, DummyVecEnv

## Create and wrap the environment with `VecNormalize`

Normalizing input features may be essential to successful training of an RL agent (by default, images are scaled but not other types of input), for instance when training on [PyBullet](https://github.com/bulletphysics/bullet3/) environments. For that, a wrapper exists and will compute a running average and standard deviation of input features (it can do the same for rewards).

More information about `VecNormalize`:
- [Documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#stable_baselines3.common.vec_env.VecNormalize)
- [Discussion](https://github.com/hill-a/stable-baselines/issues/698)

In [4]:
vec_env = make_vec_env("HalfCheetahBulletEnv-v0", n_envs=1)

vec_env = VecNormalize(vec_env, norm_obs=True, norm_reward=True, clip_obs=10.0)

### Train the agent

In [10]:
model = PPO("MlpPolicy", vec_env, verbose=1)
model.learn(total_timesteps=50000)

Using cuda device
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 1e+03     |
|    ep_rew_mean     | -1.17e+03 |
| time/              |           |
|    fps             | 419       |
|    iterations      | 1         |
|    time_elapsed    | 4         |
|    total_timesteps | 2048      |
----------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 1e+03       |
|    ep_rew_mean          | -1.14e+03   |
| time/                   |             |
|    fps                  | 390         |
|    iterations           | 2           |
|    time_elapsed         | 10          |
|    total_timesteps      | 4096        |
| train/                  |             |
|    approx_kl            | 0.011005487 |
|    clip_fraction        | 0.106       |
|    clip_range           | 0.2         |
|    entropy_loss         | -8.5        |
|    explained_variance   | -0.0314     |
| 

<stable_baselines3.ppo.ppo.PPO at 0x781870c30a50>

### Save the agent and the normalization

In [11]:
# Don't forget to save the VecNormalize statistics when saving the agent
log_dir = "/tmp/"
model.save(log_dir + "ppo_halfcheetah")
stats_path = os.path.join(log_dir, "vec_normalize.pkl")
vec_env.save(stats_path)

### Test model: load the saved agent and normalization

In [12]:
# Load the agent
model = PPO.load(log_dir + "ppo_halfcheetah")

# Load the saved statistics
vec_env = make_vec_env("HalfCheetahBulletEnv-v0", n_envs=1,)
vec_env = VecNormalize.load(stats_path, vec_env)
#  do not update them at test time
vec_env.training = False
# reward normalization is not needed at test time
vec_env.norm_reward = False

In [13]:
from stable_baselines3.common.evaluation import evaluate_policy

In [14]:
mean_reward, std_reward = evaluate_policy(model, vec_env)

print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}")

Mean reward = -1348.41 +/- 69.97
