Skip to content
A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included.
Branch: master
Clone or download
gautams3 and araffin Trained model for SAC+HER on FetchReach-v1 (#35)
* Trained model for SAC+HER on FetchReach-v1

* Adding config file

* Removing Fetch envs for test_enjoy

The docker container doesn't contain MuJoCo, so we cannot test OpenAI gym's Fetch environments

* Skip FetchReach in CI
Latest commit 1743018 Jul 18, 2019

Build Status

RL Baselines Zoo: a Collection of Pre-Trained Reinforcement Learning Agents

A collection of trained Reinforcement Learning (RL) agents, with tuned hyperparameters, using Stable Baselines.

We are looking for contributors to complete the collection!

Goals of this repository:

  1. Provide a simple interface to train and enjoy RL agents
  2. Benchmark the different Reinforcement Learning algorithms
  3. Provide tuned hyperparameters for each environment and RL algorithm
  4. Have fun with the trained agents!

Enjoy a Trained Agent

If the trained agent exists, then you can see it in action using:

python --algo algo_name --env env_id

For example, enjoy A2C on Breakout during 5000 timesteps:

python --algo a2c --env BreakoutNoFrameskip-v4 --folder trained_agents/ -n 5000

Train an Agent

The hyperparameters for each environment are defined in hyperparameters/algo_name.yml.

If the environment exists in this file, then you can train an agent using:

python --algo algo_name --env env_id

For example (with tensorboard support):

python --algo ppo2 --env CartPole-v1 --tensorboard-log /tmp/stable-baselines/

Train for multiple environments (with one call) and with tensorboard logging:

python --algo a2c --env MountainCar-v0 CartPole-v1 --tensorboard-log /tmp/stable-baselines/

Continue training (here, load pretrained agent for Breakout and continue training for 5000 steps):

python --algo a2c --env BreakoutNoFrameskip-v4 -i trained_agents/a2c/BreakoutNoFrameskip-v4.pkl -n 5000

Note: when training TRPO, you have to use mpirun to enable multiprocessing:

mpirun -n 16 python --algo trpo --env BreakoutNoFrameskip-v4

Hyperparameter Tuning

We use Optuna for optimizing the hyperparameters.

Note: hyperparameters search is only implemented for PPO2/A2C/SAC/TRPO/DDPG for now. when using SuccessiveHalvingPruner ("halving"), you must specify --n-jobs > 1

Budget of 1000 trials with a maximum of 50000 steps:

python --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 \
  --sampler random --pruner median

Record a Video of a Trained Agent

Record 1000 steps:

python -m utils.record_video --algo ppo2 --env BipedalWalkerHardcore-v2 -n 1000

Current Collection: 100+ Trained Agents!

Scores can be found in To compute them, simply run python -m utils.benchmark.

Atari Games

7 atari games from OpenAI benchmark (NoFrameskip-v4 versions).

RL Algo BeamRider Breakout Enduro Pong Qbert Seaquest SpaceInvaders
A2C ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
ACKTR ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️

Additional Atari Games (to be completed):

RL Algo MsPacman
A2C ✔️
PPO2 ✔️
DQN ✔️

Classic Control Environments

RL Algo CartPole-v1 MountainCar-v0 Acrobot-v1 Pendulum-v0 MountainCarContinuous-v0
A2C ✔️ ✔️ ✔️ ✔️ ✔️
ACER ✔️ ✔️ ✔️ N/A N/A
ACKTR ✔️ ✔️ ✔️ N/A N/A
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ N/A N/A
DDPG N/A N/A N/A ✔️ ✔️
SAC N/A N/A N/A ✔️ ✔️
TRPO ✔️ ✔️ ✔️ ✔️

Box2D Environments

RL Algo BipedalWalker-v2 LunarLander-v2 LunarLanderContinuous-v2 BipedalWalkerHardcore-v2 CarRacing-v0
A2C ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️
DQN N/A ✔️ N/A N/A N/A
SAC ✔️ N/A ✔️ ✔️
TRPO ✔️ ✔️

PyBullet Environments

See Similar to MuJoCo Envs but with a free simulator: pybullet. We are using BulletEnv-v0 version.

Note: those environments are derived from Roboschool and are much harder than the Mujoco version (see Pybullet issue)

RL Algo Walker2D HalfCheetah Ant Reacher Hopper Humanoid
A2C ✔️ ✔️ ✔️ ✔️
PPO2 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ✔️ ✔️
SAC ✔️ ✔️ ✔️ ✔️ ✔️ ✔️

PyBullet Envs (Continued)

RL Algo Minitaur MinitaurDuck InvertedDoublePendulum InvertedPendulumSwingup
PPO2 ✔️ ✔️ ✔️ ✔️
SAC ✔️ ✔️

MiniGrid Envs

See A simple, lightweight and fast Gym environments implementation of the famous gridworld.

RL Algo Empty FourRooms DoorKey MultiRoom Fetch
PPO2 ✔️ ✔️

There are 19 environment groups (variations for each) in total.

Note that you need to specify --gym-packages gym_minigrid with and as it is not a standard Gym environment, as well as installing the custom Gym package module or putting it in python path.

pip install gym-minigrid
python --algo ppo2 --env MiniGrid-DoorKey-5x5-v0 --gym-packages gym_minigrid

This does the same thing as:

import gym_minigrid

Also, you may need to specify a Gym environment wrapper in hyperparameters, as MiniGrid environments have Dict observation space, which is not supported by StableBaselines for now.

  env_wrapper: gym_minigrid.wrappers.FlatObsWrapper

Colab Notebook: Try it Online!

You can train agents online using colab notebook.


Stable-Baselines PyPi Package

Min version: stable-baselines >= 2.5.1

apt-get install swig cmake libopenmpi-dev zlib1g-dev ffmpeg
pip install stable-baselines box2d box2d-kengz pyyaml pybullet optuna pytablewriter

Please see Stable Baselines README for alternatives.

Docker Images

Build docker image (CPU):

docker build . -f docker/Dockerfile.cpu -t rl-baselines-zoo-cpu


docker build . -f docker/Dockerfile.gpu -t rl-baselines-zoo

Pull built docker image (CPU):

docker pull araffin/rl-baselines-zoo-cpu

GPU image:

docker pull araffin/rl-baselines-zoo

Run script in the docker image:

./ python --algo ppo2 --env CartPole-v1


To run tests, first install pytest, then:

python -m pytest -v tests/


If you trained an agent that is not present in the rl zoo, please submit a Pull Request (containing the hyperparameters and the score too).


We would like to thanks our contributors: @iandanforth, @tatsubori

You can’t perform that action at this time.