# Advanced Saving and Loading
In this example, we show how to use a policy independently from a model (and how to save it, load it) and save/load a replay buffer.

From: [https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#id3](https://stable-baselines3.readthedocs.io/en/master/guide/examples.html#id3)

By default, the replay buffer is not saved when calling `model.save()`, in order to save space on the disk (a replay buffer can be up to several GB when using images). However, SB3 provides a `save_replay_buffer()` and `load_replay_buffer()` method to save it separately.

In [2]:
from stable_baselines3 import SAC
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.sac.policies import MlpPolicy

In [4]:
# Create the model and the training environment
model = SAC("MlpPolicy", "Pendulum-v1", verbose=1, learning_rate=1e-3)

# Train the model
model.learn(total_timesteps=6000)

model.save("sac_pendulum")

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 200      |
|    ep_rew_mean     | -1.4e+03 |
| time/              |          |
|    episodes        | 4        |
|    fps             | 139      |
|    time_elapsed    | 5        |
|    total_timesteps | 800      |
| train/             |          |
|    actor_loss      | 24       |
|    critic_loss     | 0.0879   |
|    ent_coef        | 0.507    |
|    ent_coef_loss   | -0.928   |
|    learning_rate   | 0.001    |
|    n_updates       | 699      |
---------------------------------
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 200       |
|    ep_rew_mean     | -1.47e+03 |
| time/              |           |
|    episodes        | 8         |
|    fps             | 135       |
|    time_elapsed    | 11        |
|    total_timesteps | 1600      |
| train/             |           |
|    actor_loss      | 52.2      |
|    critic_loss     | 0.0652    |
| 

<stable_baselines3.sac.sac.SAC at 0x7f8f106c8ee0>

Using cuda device
Creating environment from the given name 'Pendulum-v1'
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 200       |
|    ep_rew_mean     | -1.57e+03 |
| time/              |           |
|    episodes        | 4         |
|    fps             | 149       |
|    time_elapsed    | 5         |
|    total_timesteps | 800       |
| train/             |           |
|    actor_loss      | 29.4      |
|    critic_loss     | 0.0431    |
|    ent_coef        | 0.5       |
|    ent_coef_loss   | -1        |
|    learning_rate   | 0.001     |
|    n_updates       | 699       |
----------------------------------
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 200       |
|    ep_rew_mean     | -1.53e+03 |
| time/              |           |
|    episodes        | 8         |
|    fps             | 135       |
|    time_

<stable_baselines3.sac.sac.SAC at 0x7f8f08118250>

In [6]:
# The saved model does not contain the replay buffer
loaded_model = SAC.load("sac_pendulum")
print(f"The loaded_model has {loaded_model.replay_buffer.size()} transitions in its buffer")

The loaded_model has 0 transitions in its buffer


In [7]:
# Now, save the replay buffer too
model.save_replay_buffer("sac_replay_buffer")

In [8]:
# Load it into the loaded_model
loaded_model.load_replay_buffer("sac_replay_buffer")

# Now the loaded replay is not empty anymore
print(f"The loaded_model has {loaded_model.replay_buffer.size()} transitions in its buffer")

The loaded_model has 6000 transitions in its buffer


In [9]:
# Save the policy independently from the model
# Note: if you don't save the complete model with `model.save()`
# you cannot continue training afterward
policy = model.policy
policy.save("sac_policy_pendulum")

In [10]:
# Retrieve the environment
env = model.get_env()

# Evaluate the policy
mean_reward, std_reward = evaluate_policy(policy, env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-123.37 +/- 78.91112482520943


In [11]:
# Load the policy independently from the model
saved_policy = MlpPolicy.load("sac_policy_pendulum")

# Evaluate the loaded policy
mean_reward, std_reward = evaluate_policy(saved_policy, env, n_eval_episodes=10, deterministic=True)

print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

mean_reward=-147.48 +/- 76.16165036682195
