# Stable Baselines3 - Advanced Saving and Loading

Github Repo: [https://github.com/DLR-RM/stable-baselines3](https://github.com/DLR-RM/stable-baselines3)


[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.

It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.

Documentation is available online: [https://stable-baselines3.readthedocs.io/](https://stable-baselines3.readthedocs.io/)

## Introduction

In this notebook, you will learn how to use some advanced features of stable baselines3 (SB3): how to easily create a test environment for periodic evaluation and use a policy independently from a model (and how to save it, load it).

## Install Dependencies and Stable Baselines Using Pip


```
pip install stable-baselines3[extra]
```

In [None]:
!pip install stable-baselines3[extra]

## Import policy, RL agent, ...

In [13]:
import gym
import numpy as np

from stable_baselines3 import SAC, TD3
from stable_baselines3.common.evaluation import evaluate_policy

## Create the Gym env and instantiate the agent

For this example, we will use Pendulum environment.

"The inverted pendulum swingup problem is a classic problem in the control literature. In this version of the problem, the pendulum starts in a random position, and the goal is to swing it up so it stays upright."

Pendulum-v0 environment: [https://gym.openai.com/envs/Pendulum-v0/](https://gym.openai.com/envs/Pendulum-v0/)

![Pendulum](https://gym.openai.com/videos/2019-10-21--mqt8Qj1mwo/Pendulum-v0/poster.jpg)


We chose the MlpPolicy because input of Pendulum is a feature vector, not images.

The type of action to use (discrete/continuous) will be automatically deduced from the environment action space



### Create the environment and evaluation environment

Stable-Baselines3 allows to automatically create an environment for evaluation.
For that, you only to specify `create_eval_env=True` when passing the Gym ID of the environment.

In [None]:
model = SAC('MlpPolicy', 'Pendulum-v0', verbose=1, learning_rate=1e-3, create_eval_env=True)

Train the agent and evaluate it periodically on the test env.

Behind the scene, SB3 uses an [EvalCallback](https://stable-baselines3.readthedocs.io/en/master/guide/callbacks.html#evalcallback)

In [None]:
# Evaluate the model every 1000 steps on 5 test episodes and save the evaluation to the logs folder
model.learn(6000, eval_freq=1000, n_eval_episodes=5, eval_log_path="./logs/")

## Save the policy only

In SB3, you save the policy independently from the model if needed.

Note: if you don't save the complete model, you cannot continue training afterward

In [17]:
policy = model.policy
policy.save("sac_policy_pendulum.pkl")

In [20]:
env = model.get_env()

In [21]:
# Evaluate the policy
mean_reward, std_reward = evaluate_policy(policy, env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-147.77 +/- 106.82605743408203


## Load the policy only

In [14]:
from stable_baselines3.sac.policies import MlpPolicy

In [16]:
saved_policy = MlpPolicy.load("sac_policy_pendulum")

In [22]:
# Evaluate the loaded policy
mean_reward, std_reward = evaluate_policy(saved_policy, env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-183.97 +/- 78.42694854736328
