<a href="https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/stable_baselines_her.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Baselines - Hindsight Experience Replay on Highway Env

Github Repo: [https://github.com/hill-a/stable-baselines](https://github.com/hill-a/stable-baselines)

Medium article: [https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82](https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82)

Highway env: [https://github.com/eleurent/highway-env](https://github.com/eleurent/highway-env) 

[RL Baselines Zoo](https://github.com/araffin/rl-baselines-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines.

It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.

Documentation is available online: [https://stable-baselines.readthedocs.io/](https://stable-baselines.readthedocs.io/)

## Install Dependencies and Stable Baselines Using Pip

List of full dependencies can be found in the [README](https://github.com/hill-a/stable-baselines).

```
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev zlib1g-dev
```


```
pip install stable-baselines[mpi]
```

In [0]:
# Stable Baselines only supports tensorflow 1.x for now
%tensorflow_version 1.x
# Install stable-baselines latest version
!pip install stable-baselines[mpi]==2.10.2

In [0]:
# Install highway-env
!pip install git+https://github.com/eleurent/highway-env#egg=highway-env

## Import policy, RL agent, ...

In [0]:
import gym
import highway_env
import numpy as np

from stable_baselines import HER, SAC, DDPG
from stable_baselines.ddpg import NormalActionNoise

pygame 1.9.6
Hello from the pygame community. https://www.pygame.org/contribute.html


## Create the Gym env and instantiate the agent

For this example, we will be using the parking environment from the [highway-env](https://github.com/eleurent/highway-env) repo by @eleurent.

The parking env is a goal-conditioned continuous control task, in which the vehicle must park in a given space with the appropriate heading.


![parking-env](https://raw.githubusercontent.com/eleurent/highway-env/gh-media/docs/media/parking-env.gif)



### Train Soft Actor-Critic (SAC) agent

Here, we use HER "future" goal sampling strategy, where we create 4 artificial transitions per real transition

Note: the hyperparameters (network architecture, discount factor, ...) where tuned for this task

In [0]:
env = gym.make("parking-v0")

In [0]:
# SAC hyperparams:
model = HER('MlpPolicy', env, SAC, n_sampled_goal=4,
            goal_selection_strategy='future',
            verbose=1, buffer_size=int(1e6),
            learning_rate=1e-3,
            gamma=0.95, batch_size=256,
            policy_kwargs=dict(layers=[256, 256, 256]))

In [0]:
# Train for 1e5 steps
model.learn(int(1e5))
# Save the trained agent
model.save('her_sac_highway')

In [0]:
# Load saved model
model = HER.load('her_sac_highway', env=env)

#### Evaluate the agent

In [0]:
obs = env.reset()

# Evaluate the agent
episode_reward = 0
for _ in range(1000):
	action, _ = model.predict(obs)
	obs, reward, done, info = env.step(action)
	episode_reward += reward
	if done or info.get('is_success', False):
		print("Reward:", episode_reward, "Success?", info.get('is_success', False))
		episode_reward = 0.0
		obs = env.reset()

### Train DDPG agent

In [0]:
# Create the action noise object that will be used for exploration
n_actions = env.action_space.shape[0]
noise_std = 0.2
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=noise_std * np.ones(n_actions))

model = HER('MlpPolicy', env, DDPG, n_sampled_goal=4,
            goal_selection_strategy='future',
            verbose=1, buffer_size=int(1e6),
            actor_lr=1e-3, critic_lr=1e-3, action_noise=action_noise,
            gamma=0.95, batch_size=256,
            policy_kwargs=dict(layers=[256, 256, 256]))

In [0]:
model.learn(int(2e5))

model.save('her_ddpg_highway')

#### Evaluate the agent

In [0]:
obs = env.reset()

# Evaluate the agent
episode_reward = 0
for _ in range(1000):
	action, _ = model.predict(obs)
	obs, reward, done, info = env.step(action)
	episode_reward += reward
	if done or info.get('is_success', False):
		print("Reward:", episode_reward, "Success?", info.get('is_success', False))
		episode_reward = 0.0
		obs = env.reset()