# Stable Baselines3 - Training, Saving and Loading

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a training framework for Reinforcement Learning (RL), using Stable Baselines3.

It provides scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [None]:
!apt-get update && apt-get install swig cmake
!pip install box2d-py
!pip install "stable-baselines3[extra]" #>=2.0.0a4"
!pip install sb3-contrib

0% [Working]            Hit:1 http://security.ubuntu.com/ubuntu jammy-security InRelease
0% [Connecting to archive.ubuntu.com (185.125.190.81)] [Connected to cloud.r-project.org (18.154.101                                                                                                    Hit:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
0% [Waiting for headers] [Waiting for headers] [Waiting for headers] [Connecting to ppa.launchpadcon                                                                                                    Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
0% [Waiting for headers] [Waiting for headers] [Connected to ppa.launchpadcontent.net (185.125.190.8                                                                                                    Ign:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
                                                                        

## Import policy, RL agent, ...

In [None]:
import gymnasium as gym
import numpy as np

from stable_baselines3 import PPO
from sb3_contrib import RecurrentPPO

from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

In [None]:
env = gym.make("CarRacing-v2", render_mode="rgb_array")

In [None]:
model = PPO('CnnPolicy', env, verbose=1, tensorboard_log="log",
)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


In [None]:
model = PPO(
    "CnnPolicy",
    env,
    verbose=1,
    learning_rate=3e-4,
    n_steps=2048,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    gae_lambda=0.95,
    clip_range=0.2,
    ent_coef=0.01,
)

Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.




Let's evaluate the un-trained agent, this should be a random agent.

In [None]:
eval_env = gym.make("CarRacing-v2", render_mode="rgb_array")

mean_reward, std_reward = evaluate_policy(
    model,
    eval_env,
    n_eval_episodes=5,
    deterministic=False,
)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-50.42 +/- 3.003255385121216


## Train the agent and save it

Warning: this may take a while

In [None]:
model.learn(total_timesteps=int(4e5), log_interval=10, progress_bar=False)

----------------------------------------
| rollout/                |            |
|    ep_len_mean          | 1e+03      |
|    ep_rew_mean          | -38.5      |
| time/                   |            |
|    fps                  | 21         |
|    iterations           | 10         |
|    time_elapsed         | 938        |
|    total_timesteps      | 20480      |
| train/                  |            |
|    approx_kl            | 0.05401954 |
|    clip_fraction        | 0.32       |
|    clip_range           | 0.2        |
|    entropy_loss         | -2.65      |
|    explained_variance   | 0.523      |
|    learning_rate        | 0.0003     |
|    loss                 | -0.0557    |
|    n_updates            | 90         |
|    policy_gradient_loss | -0.0469    |
|    std                  | 0.582      |
|    value_loss           | 0.16       |
----------------------------------------
---------------------------------------
| rollout/                |           |
|    ep_len_mean  

In [None]:
model.save("PPO5_CarRacing_"+str(int(400000)))

In [None]:
del model

## Load the trained agent

In [None]:
model = PPO.load("PPO1_CarRacing_400000", env=eval_env)

Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
Wrapping the env in a VecTransposeImage.


In [None]:
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=5, deterministic=False)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=37.56 +/- 63.25893394245754


### Prepare video recording

In [None]:
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [None]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay


def show_videos(video_path="", prefix=""):
    """
    Taken from https://github.com/eleurent/highway-env

    :param video_path: (str) Path to the folder containing videos
    :param prefix: (str) Filter the video, showing only the only starting with this prefix
    """
    html = []
    for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
        video_b64 = base64.b64encode(mp4.read_bytes())
        html.append(
            """<video alt="{}" autoplay
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>""".format(
                mp4, video_b64.decode("ascii")
            )
        )
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

We will record a video using the [VecVideoRecorder](https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html#vecvideorecorder) wrapper, you will learn about those wrapper in the next notebook.

In [None]:

def record_video(env_id, model, video_length=1000, prefix="", video_folder="videos/"):
    """
    :param env_id: (str)
    :param model: (RL model)
    :param video_length: (int)
    :param prefix: (str)
    :param video_folder: (str)
    """
    eval_env = DummyVecEnv([lambda: gym.make("CarRacing-v2", render_mode="rgb_array")])

    eval_env = VecVideoRecorder(
        eval_env,
        video_folder=video_folder,
        record_video_trigger=lambda step: step == 0,
        video_length=video_length,
        name_prefix=prefix,
    )

    obs = eval_env.reset()
    for _ in range(video_length):
        action, _ = model.predict(obs)
        obs, _, _, _ = eval_env.step(action)


    eval_env.close()

### Visualize trained agent



In [None]:
record_video("CarRacing-v2", model, video_length=1000, prefix="ppo2-carracing")

Saving video to /content/videos/ppo2-carracing-step-0-to-step-1000.mp4
Moviepy - Building video /content/videos/ppo2-carracing-step-0-to-step-1000.mp4.
Moviepy - Writing video /content/videos/ppo2-carracing-step-0-to-step-1000.mp4





Moviepy - Done !
Moviepy - video ready /content/videos/ppo2-carracing-step-0-to-step-1000.mp4


In [None]:
show_videos("videos", prefix="ppo2")