<a href="https://colab.research.google.com/github/StevenJokess/rl-colab-notebooks/blob/sb3/rl-baselines-zoo(colab).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RL Baselines3 Zoo: Training in Colab



Github Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo)

Stable-Baselines3 Repo: [https://github.com/DLR-RM/rl-baselines3-zoo](https://github.com/DLR-RM/stable-baselines3)


# Install Dependencies



In [1]:
!apt-get install swig cmake ffmpeg freeglut3-dev xvfb

Reading package lists... Done
Building dependency tree       
Reading state information... Done
freeglut3-dev is already the newest version (2.8.1-3).
swig is already the newest version (3.0.12-1).
cmake is already the newest version (3.10.2-1ubuntu2.18.04.1).
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
xvfb is already the newest version (2:1.19.6-1ubuntu4.8).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.


## Clone RL Baselines3 Zoo Repo

In [2]:
!git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo

fatal: destination path 'rl-baselines3-zoo' already exists and is not an empty directory.


In [3]:
%cd /content/rl-baselines3-zoo/

/content/rl-baselines3-zoo


### Install pip dependencies

In [4]:
!pip install -r requirements.txt

Collecting sphinxcontrib.spelling; extra == "docs"
  Using cached https://files.pythonhosted.org/packages/f6/62/796d8ae02732c162f8d53406f520c9f3c886a9ab24de4ef6995404c2b1d8/sphinxcontrib_spelling-7.1.0-py3-none-any.whl
[31mERROR: sphinxcontrib-spelling 7.1.0 has requirement Sphinx>=3.0.0, but you'll have sphinx 1.8.5 which is incompatible.[0m
Installing collected packages: sphinxcontrib.spelling
Successfully installed sphinxcontrib.spelling


## Train an RL Agent


The train agent can be found in the `logs/` folder.

Here we will train A2C on CartPole-v1 environment for 100 000 steps. 


To train it on Pong (Atari), you just have to pass `--env PongNoFrameskip-v4`

Note: You need to update `hyperparams/algo.yml` to support new environments. You can access it in the side panel of Google Colab. (see https://stackoverflow.com/questions/46986398/import-data-into-google-colaboratory)

In [5]:
!python train.py --algo a2c --env CartPole-v1 --n-timesteps 10 #100000

Seed: 851778535
OrderedDict([('ent_coef', 0.0),
             ('n_envs', 8),
             ('n_timesteps', 500000.0),
             ('policy', 'MlpPolicy')])
Using 8 environments
Overwriting n_timesteps with n=10
Creating test environment
Using cuda device
Log path: logs/a2c/CartPole-v1_3
Saving to logs/a2c/CartPole-v1_3


#### Evaluate trained agent


You can remove the `--folder logs/` to evaluate pretrained agent.

In [6]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 5000 --folder logs/

Loading latest experiment, id=3
Episode Reward: 61.00
Episode Length 61
Episode Reward: 47.00
Episode Length 47
Episode Reward: 36.00
Episode Length 36
Episode Reward: 66.00
Episode Length 66
Episode Reward: 40.00
Episode Length 40
Episode Reward: 43.00
Episode Length 43
Episode Reward: 59.00
Episode Length 59
Episode Reward: 43.00
Episode Length 43
Episode Reward: 51.00
Episode Length 51
Episode Reward: 47.00
Episode Length 47
Episode Reward: 37.00
Episode Length 37
Episode Reward: 73.00
Episode Length 73
Episode Reward: 39.00
Episode Length 39
Episode Reward: 42.00
Episode Length 42
Episode Reward: 72.00
Episode Length 72
Episode Reward: 34.00
Episode Length 34
Episode Reward: 61.00
Episode Length 61
Episode Reward: 39.00
Episode Length 39
Episode Reward: 47.00
Episode Length 47
Episode Reward: 83.00
Episode Length 83
Episode Reward: 36.00
Episode Length 36
Episode Reward: 49.00
Episode Length 49
Episode Reward: 43.00
Episode Length 43
Episode Reward: 67.00
Episode Length 67
Episode 

#### Tune Hyperparameters

We use [Optuna](https://optuna.org/) for optimizing the hyperparameters.

Tune the hyperparameters for PPO, using a tpe sampler and median pruner, 2 parallels jobs,
with a budget of 1000 trials and a maximum of 50000 steps

In [7]:
!python -m train.py --algo ppo --env MountainCar-v0 -n 50000 -optimize --n-trials 1000 --n-jobs 2 --sampler tpe --pruner median

/usr/bin/python3: Error while finding module specification for 'train.py' (AttributeError: module 'train' has no attribute '__path__')


### Record  a Video

In [8]:
# Set up display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [9]:
!python -m utils.record_video --algo a2c --env CartPole-v1 --exp-id 0 -f logs/ -n 1000

Loading latest experiment, id=3
Saving video to  /content/rl-baselines3-zoo/logs/videos/a2c-CartPole-v1-step-0-to-step-1000.mp4


### Display the video

In [10]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay

def show_videos(video_path='', prefix=''):
  """
  Taken from https://github.com/eleurent/highway-env

  :param video_path: (str) Path to the folder containing videos
  :param prefix: (str) Filter the video, showing only the only starting with this prefix
  """
  html = []
  for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
      video_b64 = base64.b64encode(mp4.read_bytes())
      html.append('''<video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>'''.format(mp4, video_b64.decode('ascii')))
  ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [11]:
show_videos(video_path='logs/videos/', prefix='a2c')

### Continue Training

Here, we will continue training of the previous model

In [12]:
!python train.py --algo a2c --env CartPole-v1 --n-timesteps 50000 -i logs/a2c/CartPole-v1_1/CartPole-v1.zip

Seed: 2971376131
OrderedDict([('ent_coef', 0.0),
             ('n_envs', 8),
             ('n_timesteps', 500000.0),
             ('policy', 'MlpPolicy')])
Using 8 environments
Overwriting n_timesteps with n=50000
Creating test environment
Loading pretrained agent
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/stable_baselines3/common/save_util.py", line 382, in load_from_zip_file
    with zipfile.ZipFile(load_path) as archive:
  File "/usr/lib/python3.6/zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "/usr/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 163, in <module>
    model = exp_manager.setup_experiment()
  File "/content/rl-baselines3-zoo/utils/exp_manager.py", line 158, in setup_experiment
    

In [13]:
!python enjoy.py --algo a2c --env CartPole-v1 --no-render --n-timesteps 1000 --folder logs/

Loading latest experiment, id=4
Traceback (most recent call last):
  File "enjoy.py", line 223, in <module>
    main()
  File "enjoy.py", line 94, in main
    raise ValueError(f"No model found for {algo} on {env_id}, path: {model_path}")
ValueError: No model found for a2c on CartPole-v1, path: logs/a2c/CartPole-v1_4/CartPole-v1.zip


In [14]:
!python train.py --algo a2c --env CartPole-v1 --n-timesteps 50000 -i logs/a2c/CartPole-v1_1/CartPole-v1

Traceback (most recent call last):
  File "train.py", line 126, in <module>
    ), "The trained_agent must be a valid path to a .zip file"
AssertionError: The trained_agent must be a valid path to a .zip file
