<a href="https://colab.research.google.com/github/robertmoni/modelbasedrl/blob/master/rl_baselines_zoo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RL Baselines Zoo: Training in Colab



Github Repo: [https://github.com/araffin/rl-baselines-zoo](https://github.com/araffin/rl-baselines-zoo)

Stable-Baselines Repo: [https://github.com/hill-a/stable-baselines](https://github.com/hill-a/stable-baselines)

Medium article: [https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82](https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82)

# Install Dependencies



In [0]:
!apt-get update
!apt-get install swig cmake libopenmpi-dev zlib1g-dev ffmpeg freeglut3-dev xvfb
!pip install stable-baselines --upgrade
!pip install pybullet
!pip install box2d box2d-kengz pyyaml pytablewriter optuna

## Clone RL Baselines Zoo Repo

In [0]:
!git clone https://github.com/araffin/rl-baselines-zoo

In [0]:
cd rl-baselines-zoo/

/content/rl-baselines-zoo


## Train an RL Agent


The train agent can be found in the `logs/` folder.

Here we will train A2C on CartPole-v1 environment for 100 000 steps. 


To train it on Pong (Atari), you just have to pass `--env PongNoFrameskip-v4`

Note: You need to update `hyperparams/algo.yml` to support new environments. You can access it in the side panel of Google Colab. (see https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory)

In [0]:
!python train.py --algo a2c --env CartPole-v1 --n-timesteps 100000

#### Evaluate trained agent


You can remove the `--folder logs/` to evaluate pretrained agent.

In [0]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 5000 --folder logs/

#### Tune Hyperparameters

We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.

Tune the hyperparameters for PPO2, using a random sampler and median pruner, 2 parallels jobs,
with a budget of 1000 trials and a maximum of 50000 steps

In [0]:
!python -m train.py --algo ppo2 --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler random --pruner median

#### Record  a Video

In [0]:
# Set up display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [0]:
!python -m utils.record_video --algo a2c --env CartPole-v1 -f logs/ -n 1000

### Continue Training

Here, we will continue training of the previous model

In [0]:
!python train.py --algo a2c --env CartPole-v1 --n-timesteps 50000 -i logs/a2c/CartPole-v1.pkl

In [0]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 1000 --folder logs/