# Stable Baselines3 - PyBullet: Normalizing Features and Reward

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

Pybullet source code: https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [1]:
# for autoformatting
# %load_ext jupyter_black

In [2]:
!pip install pybullet
!pip install "stable-baselines3[extra]>=2.0.0a4"

Collecting pybullet
  Downloading pybullet-3.2.5.tar.gz (80.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.5/80.5 MB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pybullet
  Building wheel for pybullet (setup.py) ... [?25l[?25hdone
  Created wheel for pybullet: filename=pybullet-3.2.5-cp310-cp310-linux_x86_64.whl size=101451407 sha256=2ec2b72f9f7e364727358b32459ad97fe57387e6ffcaa46b5e054d1da7248546
  Stored in directory: /root/.cache/pip/wheels/6b/fa/1a/c315a5133f0c9bf202a6daa5d70891120e7fe403e06e3407cc
Successfully built pybullet
Installing collected packages: pybullet
Successfully installed pybullet-3.2.5
Collecting stable-baselines3[extra]>=2.0.0a4
  Downloading stable_baselines3-2.0.0-py3-none-any.whl (178 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.4/178.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gymnasium==0.2

In [3]:
# for pybullet patches
!pip install "rl_zoo3>=2.0.0a4"

Collecting rl_zoo3>=2.0.0a4
  Downloading rl_zoo3-2.0.0-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.3/76.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sb3-contrib>=2.0.0 (from rl_zoo3>=2.0.0a4)
  Downloading sb3_contrib-2.0.0-py3-none-any.whl (80 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m80.3/80.3 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting gym==0.26.2 (from rl_zoo3>=2.0.0a4)
  Downloading gym-0.26.2.tar.gz (721 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m721.7/721.7 kB[0m [31m29.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting huggingface-sb3>=2.2.5 (from rl_zoo3>=2.0.0a4

## Import policy, RL agent, Wrappers

In [4]:
import os

# Patch and register pybullet envs
import rl_zoo3.gym_patches
import pybullet_envs

from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
from stable_baselines3.common.vec_env import VecNormalize, DummyVecEnv

## Create and wrap the environment with `VecNormalize`

Normalizing input features may be essential to successful training of an RL agent (by default, images are scaled but not other types of input), for instance when training on [PyBullet](https://github.com/bulletphysics/bullet3/) environments. For that, a wrapper exists and will compute a running average and standard deviation of input features (it can do the same for rewards).

More information about `VecNormalize`:
- [Documentation](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#stable_baselines3.common.vec_env.VecNormalize)
- [Discussion](https://github.com/hill-a/stable-baselines/issues/698)

In [5]:
# env = make_vec_env(
#     "HalfCheetahBulletEnv-v0",
#     env_kwargs=dict(apply_api_compatibility=True),
#     n_envs=1,
# )

# Pybullet doesn't support Gymnasium yet
import gym as gym26
env = gym26.make("HalfCheetahBulletEnv-v0", apply_api_compatibility=True)

env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.0)



### Train the agent

In [6]:
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=2000)

Using cpu device
-----------------------------
| time/              |      |
|    fps             | 464  |
|    iterations      | 1    |
|    time_elapsed    | 4    |
|    total_timesteps | 2048 |
-----------------------------


<stable_baselines3.ppo.ppo.PPO at 0x7fdbeca59ff0>

### Save the agent and the normalization

In [9]:
# Don't forget to save the VecNormalize statistics when saving the agent
log_dir = "/tmp/"
model.save(log_dir + "ppo_halfcheetah")
stats_path = os.path.join(log_dir, "vec_normalize.pkl")
env.save(stats_path)

In [15]:
pip install gymnasium[mujoco]


Collecting mujoco>=2.3.2 (from gymnasium[mujoco])
  Downloading mujoco-2.3.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.5/4.5 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
Collecting glfw (from mujoco>=2.3.2->gymnasium[mujoco])
  Downloading glfw-2.6.2-py2.py27.py3.py30.py31.py32.py33.py34.py35.py36.py37.py38-none-manylinux2014_x86_64.whl (208 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m208.2/208.2 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: glfw, mujoco
Successfully installed glfw-2.6.2 mujoco-2.3.6


### Test model: load the saved agent and normalization

In [18]:
# Load the agent
model = PPO.load(log_dir + "ppo_halfcheetah")

# Load the saved statistics
env = make_vec_env(
    "HalfCheetah",
    env_kwargs=dict(apply_api_compatibility=True),
    n_envs=1,
)
env = VecNormalize.load(stats_path, env)
# Do not update them at test time
env.training = False
# Reward normalization is not needed at test time
env.norm_reward = False


DependencyNotInstalled: ignored

In [17]:
# Load the agent
model = PPO.load(log_dir + "ppo_halfcheetah")

# Load the saved statistics
env = make_vec_env(
    "HalfCheetahBulletEnv-v0",
    env_kwargs=dict(apply_api_compatibility=True),
    n_envs=1,
)
env = VecNormalize.load(stats_path, env)
#  do not update them at test time
env.training = False
# reward normalization is not needed at test time
env.norm_reward = False

NameNotFound: ignored

In [11]:
from stable_baselines3.common.evaluation import evaluate_policy

In [12]:
mean_reward, std_reward = evaluate_policy(model, env)

print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}")



Mean reward = -35.13 +/- 2.60
