# Stable Baselines3 Tutorial - Getting Started

Github repo: https://github.com/araffin/rl-tutorial-jnrr19/tree/sb3/

Stable-Baselines3: https://github.com/DLR-RM/stable-baselines3

Documentation: https://stable-baselines3.readthedocs.io/en/master/

RL Baselines3 zoo: https://github.com/DLR-RM/rl-baselines3-zoo

[RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines3.

It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.


## Introduction

In this notebook, you will learn the basics for using stable baselines library: how to create a RL model, train it and evaluate it. Because all algorithms share the same interface, we will see how simple it is to switch from one algorithm to another.


## Install Dependencies and Stable Baselines3 Using Pip

List of full dependencies can be found in the [README](https://github.com/DLR-RM/stable-baselines3).


```
pip install stable-baselines3[extra]
```

In [2]:
!sudo apt-get install -y ffmpeg freeglut3-dev xvfb  # For visualization

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  fontconfig freeglut3 i965-va-driver libaacs0 libasound2 libasound2-data
  libass9 libasyncns0 libavc1394-0 libavcodec57 libavdevice57 libavfilter6
  libavformat57 libavresample3 libavutil55 libbdplus0 libbluray2 libbs2b0
  libcaca0 libcdio-cdda2 libcdio-paranoia2 libcdio17 libchromaprint1 libcroco3
  libcrystalhd3 libdc1394-22 libdrm-amdgpu1 libdrm-common libdrm-dev
  libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libegl-mesa0 libegl1
  libfftw3-double3 libflac8 libflite1 libfontenc1 libfribidi0 libgbm1
  libgdk-pixbuf2.0-0 libgdk-pixbuf2.0-bin libgdk-pixbuf2.0-common libgl1
  libgl1-mesa-dev libgl1-mesa-dri libglapi-mesa libgles1 libgles2 libglu1-mesa
  libglu1-mesa-dev libglvnd-core-dev libglvnd-dev libglvnd0 libglx-mesa0
  libglx0 libgme0 libgsm1 libice-dev libiec61883-0 libjack-jackd2-0 libllvm10
  libmp3lame0 libmpg123-0 libmy

Get:27 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgdk-pixbuf2.0-0 amd64 2.36.11-2 [165 kB]
Get:28 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpango-1.0-0 amd64 1.40.14-1ubuntu0.1 [153 kB]
Get:29 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpangoft2-1.0-0 amd64 1.40.14-1ubuntu0.1 [33.2 kB]
Get:30 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpangocairo-1.0-0 amd64 1.40.14-1ubuntu0.1 [20.8 kB]
Get:31 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 librsvg2-2 amd64 2.40.20-2ubuntu0.2 [98.6 kB]
Get:32 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libshine3 amd64 3.1.1-1 [22.9 kB]
Get:33 http://archive.ubuntu.com/ubuntu bionic/main amd64 libsnappy1v5 amd64 1.1.7-1 [16.0 kB]
Get:34 http://archive.ubuntu.com/ubuntu bionic/main amd64 libspeex1 amd64 1.2~rc1.2-1ubuntu2 [52.1 kB]
Get:35 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libsoxr0 amd64 0.1.2-3 [65.9 kB]
Get:36 http://archive.ubuntu.com/ubuntu bi

Get:104 http://archive.ubuntu.com/ubuntu bionic/main amd64 libasyncns0 amd64 0.8-6 [12.1 kB]
Get:105 http://archive.ubuntu.com/ubuntu bionic/main amd64 libflac8 amd64 1.3.2-1 [213 kB]
Get:106 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libsndfile1 amd64 1.0.28-4ubuntu0.18.04.1 [170 kB]
Get:107 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libpulse0 amd64 1:11.1-1ubuntu7.11 [266 kB]
Get:108 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libwayland-client0 amd64 1.16.0-1ubuntu1.1~18.04.3 [23.6 kB]
Get:109 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libwayland-cursor0 amd64 1.16.0-1ubuntu1.1~18.04.3 [10.1 kB]
Get:110 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libwayland-server0 amd64 1.16.0-1ubuntu1.1~18.04.3 [29.6 kB]
Get:111 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libgbm1 amd64 20.0.8-0ubuntu1~18.04.1 [27.6 kB]
Get:112 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libxcb-xfixes0 amd64 

Get:179 http://archive.ubuntu.com/ubuntu bionic/universe amd64 i965-va-driver amd64 2.1.0-0ubuntu1 [925 kB]
Get:180 http://archive.ubuntu.com/ubuntu bionic/universe amd64 va-driver-all amd64 2.1.0-3 [4376 B]
Get:181 http://archive.ubuntu.com/ubuntu bionic/main amd64 vdpau-driver-all amd64 1.1.1-3ubuntu1 [4674 B]
Get:182 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 x11-xkb-utils amd64 7.7+3ubuntu0.18.04.1 [160 kB]
Get:183 http://archive.ubuntu.com/ubuntu bionic/main amd64 xfonts-encodings all 1:1.0.4-2 [573 kB]
Get:184 http://archive.ubuntu.com/ubuntu bionic/main amd64 xfonts-utils amd64 1:7.7+6 [91.5 kB]
Get:185 http://archive.ubuntu.com/ubuntu bionic/main amd64 xfonts-base all 1:1.0.4+nmu1 [5914 kB]
Get:186 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 xserver-common all 2:1.19.6-1ubuntu4.8 [26.8 kB]
Get:187 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 xvfb amd64 2:1.19.6-1ubuntu4.8 [784 kB]
Fetched 77.7 MB in 5s (14.6 MB/s)
debconf: delayi

Selecting previously unselected package libwavpack1:amd64.
Preparing to unpack .../041-libwavpack1_5.1.0-2ubuntu1.5_amd64.deb ...
Unpacking libwavpack1:amd64 (5.1.0-2ubuntu1.5) ...
Selecting previously unselected package libwebp6:amd64.
Preparing to unpack .../042-libwebp6_0.6.1-2_amd64.deb ...
Unpacking libwebp6:amd64 (0.6.1-2) ...
Selecting previously unselected package libwebpmux3:amd64.
Preparing to unpack .../043-libwebpmux3_0.6.1-2_amd64.deb ...
Unpacking libwebpmux3:amd64 (0.6.1-2) ...
Selecting previously unselected package libx264-152:amd64.
Preparing to unpack .../044-libx264-152_2%3a0.152.2854+gite9a5903-2_amd64.deb ...
Unpacking libx264-152:amd64 (2:0.152.2854+gite9a5903-2) ...
Selecting previously unselected package libx265-146:amd64.
Preparing to unpack .../045-libx265-146_2.6-3_amd64.deb ...
Unpacking libx265-146:amd64 (2.6-3) ...
Selecting previously unselected package libxvidcore4:amd64.
Preparing to unpack .../046-libxvidcore4_2%3a1.3.5-1_amd64.deb ...
Unpacking libxv

Selecting previously unselected package libxshmfence1:amd64.
Preparing to unpack .../087-libxshmfence1_1.3-1_amd64.deb ...
Unpacking libxshmfence1:amd64 (1.3-1) ...
Selecting previously unselected package libdrm-amdgpu1:amd64.
Preparing to unpack .../088-libdrm-amdgpu1_2.4.101-2~18.04.1_amd64.deb ...
Unpacking libdrm-amdgpu1:amd64 (2.4.101-2~18.04.1) ...
Selecting previously unselected package libdrm-intel1:amd64.
Preparing to unpack .../089-libdrm-intel1_2.4.101-2~18.04.1_amd64.deb ...
Unpacking libdrm-intel1:amd64 (2.4.101-2~18.04.1) ...
Selecting previously unselected package libdrm-nouveau2:amd64.
Preparing to unpack .../090-libdrm-nouveau2_2.4.101-2~18.04.1_amd64.deb ...
Unpacking libdrm-nouveau2:amd64 (2.4.101-2~18.04.1) ...
Selecting previously unselected package libdrm-radeon1:amd64.
Preparing to unpack .../091-libdrm-radeon1_2.4.101-2~18.04.1_amd64.deb ...
Unpacking libdrm-radeon1:amd64 (2.4.101-2~18.04.1) ...
Selecting previously unselected package libllvm10:amd64.
Preparing 

Selecting previously unselected package libx11-dev:amd64.
Preparing to unpack .../133-libx11-dev_2%3a1.6.4-3ubuntu0.3_amd64.deb ...
Unpacking libx11-dev:amd64 (2:1.6.4-3ubuntu0.3) ...
Selecting previously unselected package libdrm-dev:amd64.
Preparing to unpack .../134-libdrm-dev_2.4.101-2~18.04.1_amd64.deb ...
Unpacking libdrm-dev:amd64 (2.4.101-2~18.04.1) ...
Selecting previously unselected package mesa-common-dev:amd64.
Preparing to unpack .../135-mesa-common-dev_20.0.8-0ubuntu1~18.04.1_amd64.deb ...
Unpacking mesa-common-dev:amd64 (20.0.8-0ubuntu1~18.04.1) ...
Selecting previously unselected package libglvnd-core-dev:amd64.
Preparing to unpack .../136-libglvnd-core-dev_1.0.0-2ubuntu2.3_amd64.deb ...
Unpacking libglvnd-core-dev:amd64 (1.0.0-2ubuntu2.3) ...
Selecting previously unselected package libgles1:amd64.
Preparing to unpack .../137-libgles1_1.0.0-2ubuntu2.3_amd64.deb ...
Unpacking libgles1:amd64 (1.0.0-2ubuntu2.3) ...
Selecting previously unselected package libgles2:amd64.
Pr

Selecting previously unselected package i965-va-driver:amd64.
Preparing to unpack .../178-i965-va-driver_2.1.0-0ubuntu1_amd64.deb ...
Unpacking i965-va-driver:amd64 (2.1.0-0ubuntu1) ...
Selecting previously unselected package va-driver-all:amd64.
Preparing to unpack .../179-va-driver-all_2.1.0-3_amd64.deb ...
Unpacking va-driver-all:amd64 (2.1.0-3) ...
Selecting previously unselected package vdpau-driver-all:amd64.
Preparing to unpack .../180-vdpau-driver-all_1.1.1-3ubuntu1_amd64.deb ...
Unpacking vdpau-driver-all:amd64 (1.1.1-3ubuntu1) ...
Selecting previously unselected package x11-xkb-utils.
Preparing to unpack .../181-x11-xkb-utils_7.7+3ubuntu0.18.04.1_amd64.deb ...
Unpacking x11-xkb-utils (7.7+3ubuntu0.18.04.1) ...
Selecting previously unselected package xfonts-encodings.
Preparing to unpack .../182-xfonts-encodings_1%3a1.0.4-2_all.deb ...
Unpacking xfonts-encodings (1:1.0.4-2) ...
Selecting previously unselected package xfonts-utils.
Preparing to unpack .../183-xfonts-utils_1%3a7

Setting up libgbm1:amd64 (20.0.8-0ubuntu1~18.04.1) ...
Setting up libxdamage-dev:amd64 (1:1.1.4-3) ...
Setting up libgl1-mesa-dri:amd64 (20.0.8-0ubuntu1~18.04.1) ...
Setting up libswresample2:amd64 (7:3.4.8-0ubuntu0.2) ...
Setting up libpangoft2-1.0-0:amd64 (1.40.14-1ubuntu0.1) ...
Setting up libxcb-dri2-0-dev:amd64 (1.13-2~ubuntu18.04) ...
Setting up libxcb-render0-dev:amd64 (1.13-2~ubuntu18.04) ...
Setting up libsndfile1:amd64 (1.0.28-4ubuntu0.18.04.1) ...
Setting up i965-va-driver:amd64 (2.1.0-0ubuntu1) ...
Setting up libxcb-dri3-dev:amd64 (1.13-2~ubuntu18.04) ...
Setting up libswscale4:amd64 (7:3.4.8-0ubuntu0.2) ...
Setting up libxext-dev:amd64 (2:1.3.3-1) ...
Setting up mesa-va-drivers:amd64 (20.0.8-0ubuntu1~18.04.1) ...
Setting up libxcb-shape0-dev:amd64 (1.13-2~ubuntu18.04) ...
Setting up libpostproc54:amd64 (7:3.4.8-0ubuntu0.2) ...
Setting up libegl-mesa0:amd64 (20.0.8-0ubuntu1~18.04.1) ...
Setting up libglx-mesa0:amd64 (20.0.8-0ubuntu1~18.04.1) ...
Setting up vdpau-driver-all:

In [131]:
#!apt-get install ffmpeg freeglut3-dev xvfb  # For visualization
!pip install stable-baselines3[extra]==v0.10.0



In [3]:
!lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.3 LTS
Release:	18.04
Codename:	bionic


In [132]:
# Determine CUDA version at /usr/local/cuda
!ls -ld /usr/local/cuda*
import os
import re
m = re.search(r'^cuda-(?P<cuda_version>.*)', os.path.basename(os.path.realpath('/usr/local/cuda')))
cuda_version = m.group('cuda_version')
cu_version = re.sub('\.', '', cuda_version)
cu_suffix = "+cu{ver}".format(ver=cu_version)

lrwxrwxrwx 1 root root    9 Nov 27  2019 /usr/local/cuda -> cuda-10.1
drwxr-xr-x 1 root root 4096 Jan 26 22:06 /usr/local/cuda-10.1
10.1
101


In [134]:
# 1. Get version of torch installed with stable-baselines3
# 2. Re-install torch with CUDA version that matches /usr/local/cuda
import torch
import re
torch_version = re.sub(r"\+cu.*$", "", torch.__version__)
!pip uninstall -y torch
torch_version = "torch=={ver}{cu}".format(
    ver=torch_version, 
    cu=cu_suffix)
!pip install $torch_version -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.7.1+cu101
  Downloading https://download.pytorch.org/whl/cu101/torch-1.7.1%2Bcu101-cp36-cp36m-linux_x86_64.whl (735.4 MB)
[K     |████████████████████████████████| 735.4 MB 21 kB/s  eta 0:00:0112    |███████▍                        | 170.9 MB 97.2 MB/s eta 0:00:06     |█████████▏                      | 210.2 MB 74.1 MB/s eta 0:00:08     |█████████▎                      | 213.8 MB 74.1 MB/s eta 0:00:08     |████████████████████▋           | 473.3 MB 45.3 MB/s eta 0:00:06     |███████████████████████         | 530.8 MB 102.0 MB/s eta 0:00:03     |███████████████████████████████▍| 721.5 MB 7.7 MB/s eta 0:00:02
Installing collected packages: torch
Successfully installed torch-1.7.1+cu101


In [74]:
!sudo apt install -y texlive-extra-utils

Reading package lists... Done
Building dependency tree       
Reading state information... Done
texlive-extra-utils is already the newest version (2017.20180305-2).
0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libmagickwand-dev is already the newest version (8:6.9.7.4+dfsg-16ubuntu6.9).
0 upgraded, 0 newly installed, 0 to remove and 60 not upgraded.


In [135]:
rlscope_version = '0.0.1'
rlscope_pip_version = "rlscope=={ver}{cu}".format(
    ver=rlscope_version, 
    cu=cu_suffix)
!pip install $rlscope_pip_version -f https://download.pytorch.org/whl/torch_stable.html

Looking in links: https://download.pytorch.org/whl/torch_stable.html




## Imports

Stable-Baselines3 works on environments that follow the [gym interface](https://stable-baselines3.readthedocs.io/en/master/guide/custom_env.html).
You can find a list of available environment [here](https://gym.openai.com/envs/#classic_control).

It is also recommended to check the [source code](https://github.com/openai/gym) to learn more about the observation and action space of each env, as gym does not have a proper documentation.
Not all algorithms can work with all action spaces, you can find more in this [recap table](https://stable-baselines3.readthedocs.io/en/master/guide/algos.html)

In [4]:
import gym
import numpy as np

The first thing you need to import is the RL model, check the documentation to know what you can use on which problem

In [5]:
from stable_baselines3 import PPO

The next thing you need to import is the policy class that will be used to create the networks (for the policy/value functions).
This step is optional as you can directly use strings in the constructor: 

```PPO('MlpPolicy', env)``` instead of ```PPO(MlpPolicy, env)```

Note that some algorithms like `SAC` have their own `MlpPolicy`, that's why using string for the policy is the recommened option.

In [6]:
from stable_baselines3.ppo.policies import MlpPolicy

## Create the Gym env and instantiate the agent

For this example, we will use CartPole environment, a classic control problem.

"A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. "

Cartpole environment: [https://gym.openai.com/envs/CartPole-v1/](https://gym.openai.com/envs/CartPole-v1/)

![Cartpole](https://cdn-images-1.medium.com/max/1143/1*h4WTQNVIsvMXJTCpXm_TAw.gif)


We chose the MlpPolicy because the observation of the CartPole task is a feature vector, not images.

The type of action to use (discrete/continuous) will be automatically deduced from the environment action space

Here we are using the [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo2.html) algorithm, which is an Actor-Critic method: it uses a value function to improve the policy gradient descent (by reducing the variance).

It combines ideas from [A2C](https://stable-baselines3.readthedocs.io/en/master/modules/a2c.html) (having multiple workers and using an entropy bonus for exploration) and [TRPO](https://stable-baselines.readthedocs.io/en/master/modules/trpo.html) (it uses a trust region to improve stability and avoid catastrophic drops in performance).

PPO is an on-policy algorithm, which means that the trajectories used to update the networks must be collected using the latest policy.
It is usually less sample efficient than off-policy alorithms like [DQN](https://stable-baselines.readthedocs.io/en/master/modules/dqn.html), [SAC](https://stable-baselines3.readthedocs.io/en/master/modules/sac.html) or [TD3](https://stable-baselines3.readthedocs.io/en/master/modules/td3.html), but is much faster regarding wall-clock time.


In [7]:
env = gym.make('CartPole-v1')

model = PPO(MlpPolicy, env, verbose=0)

We create a helper function to evaluate the agent:

In [8]:
def evaluate(model, num_episodes=100):
    """
    Evaluate a RL agent
    :param model: (BaseRLModel object) the RL Agent
    :param num_episodes: (int) number of episodes to evaluate it
    :return: (float) Mean reward for the last num_episodes
    """
    # This function will only work for a single Environment
    env = model.get_env()
    all_episode_rewards = []
    for i in range(num_episodes):
        episode_rewards = []
        done = False
        obs = env.reset()
        while not done:
            # _states are only useful when using LSTM policies
            action, _states = model.predict(obs)
            # here, action, rewards and dones are arrays
            # because we are using vectorized env
            obs, reward, done, info = env.step(action)
            episode_rewards.append(reward)

        all_episode_rewards.append(sum(episode_rewards))

    mean_episode_reward = np.mean(all_episode_rewards)
    print("Mean reward:", mean_episode_reward, "Num episodes:", num_episodes)

    return mean_episode_reward

Let's evaluate the un-trained agent, this should be a random agent.

In [9]:
# Random Agent, before training
mean_reward_before_train = evaluate(model, num_episodes=100)

Mean reward: 21.74 Num episodes: 100


Stable-Baselines already provides you with that helper:

In [10]:
from stable_baselines3.common.evaluation import evaluate_policy

In [11]:
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=100)

print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

mean_reward:9.51 +/- 0.77


## Train the agent and evaluate it

In [12]:
# Train the agent for 10000 steps
model.learn(total_timesteps=10000)

<stable_baselines3.ppo.ppo.PPO at 0x7fa9423a0908>

In [13]:
# Evaluate the trained agent
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=100)

print(f"mean_reward:{mean_reward:.2f} +/- {std_reward:.2f}")

mean_reward:411.20 +/- 118.12


Apparently the training went well, the mean reward increased a lot ! 

#RL-Scope
Lets annotate the evaluation inference loop with RL-Scope annotations to understand where time is spent.

In [14]:
# !rls-prof --help
!pip freeze | grep torch

torch==1.7.1


In [98]:
%%writefile test_writefile.py
import gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.ppo.policies import MlpPolicy

import argparse

import rlscope.api as rlscope

def main():
    parser = argparse.ArgumentParser(description="Evaluate an RL policy")
    rlscope.add_rlscope_arguments(parser)
    args = parser.parse_args()
    
    rlscope.handle_rlscope_args(
        parser=None, 
        args=args, 
        # PROBLEM: This overrides --rlscope-directory!
        # directory="./rlscope_traces",
        # paths['rlscope_directory']
#         reports_progress=True,
#         delay=True,
    )
    rlscope.prof.set_metadata({
        'algo': 'PPO',
        'env': 'CartPole-v1',
    })
    process_name = 'PPO_CartPole'
    phase_name = process_name
    
    env = gym.make('CartPole-v1')
    model = PPO(MlpPolicy, env, verbose=0)
    
    # Random Agent, before training
    with rlscope.prof.profile(process_name=process_name, phase_name=phase_name):
        # Q: Do we need this...?
#         rlscope.prof.enable_tracing()
        mean_reward_before_train = evaluate_rlscope(model, 
#                                                     num_episodes=10000,
#                                                     num_episodes=1000,
                                                    num_episodes=100,
                                                   )


def evaluate_rlscope(model, num_episodes=100):
    """
    Evaluate a RL agent
    :param model: (BaseRLModel object) the RL Agent
    :param num_episodes: (int) number of episodes to evaluate it
    :return: (float) Mean reward for the last num_episodes
    """
    # This function will only work for a single Environment
    env = model.get_env()
    all_episode_rewards = []

    with rlscope.prof.operation('training_loop'):
        for i in range(num_episodes):
            
#             rlscope.prof.report_progress(
#                 percent_complete=i/float(num_episodes),
#                 num_timesteps=i,
#                 total_timesteps=num_episodes)
            
            episode_rewards = []
            done = False
            obs = env.reset()
            while not done:
                with rlscope.prof.operation('inference'):
                    # _states are only useful when using LSTM policies
                    action, _states = model.predict(obs)
                with rlscope.prof.operation('step'):
                    # here, action, rewards and dones are arrays
                    # because we are using vectorized env
                    import pdb; pdb.set_trace()
                    obs, reward, done, info = env.step(action)
                    episode_rewards.append(reward)

            all_episode_rewards.append(sum(episode_rewards))

    mean_episode_reward = np.mean(all_episode_rewards)
    print("Mean reward:", mean_episode_reward, "Num episodes:", num_episodes)

    return mean_episode_reward

if __name__ == '__main__':
    main()

Overwriting test_writefile.py


In [64]:
!rls-prof --calibrate --parallel-runs python test_writefile.py --rlscope-directory ./rlscope_tutorial

> CMD:
  $ rls-calibrate run --parallel-runs python test_writefile.py --rlscope-directory ./rlscope_tutorial
  PWD=/home/jgleeson/clone/rlscope/jupyter
[32mINFO   [0m | PID=139988/MainProcess @ init_configs, calibration.py:1046 2021-02-01 21:36:55,500 [32mRun configurations:
  ./rlscope_tutorial/config_time_breakdown_repetition_*
  ./rlscope_tutorial/config_calibration_uninstrumented_repetition_*
  ./rlscope_tutorial/config_calibration_interception_repetition_*
  ./rlscope_tutorial/config_calibration_gpu_activities_repetition_*
  ./rlscope_tutorial/config_calibration_no_gpu_activities_repetition_*
  ./rlscope_tutorial/config_calibration_gpu_activities_api_time_repetition_*
  ./rlscope_tutorial/config_calibration_just_pyprof_annotations_repetition_*
  ./rlscope_tutorial/config_calibration_just_pyprof_interceptions_repetition_*[0m
[32mINFO   [0m | PID=139988/MainProcess @ run_configs, calibration.py:821 2021-02-01 21:36:55,500 [32mWriting run configuration shell commands to ./rlsc

In [65]:
!ls -l ./rlscope_tutorial/**/*.pdf

-rw-r--r-- 1 jgleeson jgleeson 108949 Feb  1 16:15 ./rlscope_tutorial/corrected_no/OverlapStackedBarPlot.overlap_type_CategoryOverlap.operation_training_time.pdf
-rw-r--r-- 1 jgleeson jgleeson 108954 Feb  1 16:15 ./rlscope_tutorial/corrected_no/OverlapStackedBarPlot.overlap_type_CategoryOverlap.operation_training_time.yerr.pdf
-rw-r--r-- 1 jgleeson jgleeson 110410 Feb  1 16:15 ./rlscope_tutorial/corrected_no/OverlapStackedBarPlot.overlap_type_CategoryOverlap.percent.pdf
-rw-r--r-- 1 jgleeson jgleeson 110415 Feb  1 16:15 ./rlscope_tutorial/corrected_no/OverlapStackedBarPlot.overlap_type_CategoryOverlap.percent.yerr.pdf


In [83]:
from IPython.core.display import display, HTML

AttributeError: type object 'HTML' has no attribute 'metdata'

In [97]:
from IPython.display import IFrame

from glob import glob

def display_pdfs(glob_expr):
    """
    Display all PDFs found in file glob expansion.
    e.g. 
    """
    paths = glob(glob_expr)
    pdfs = []
    for path in paths:
        # pdfs.append(WImage(path))
        pdfs.append(IFrame(path, width="100%", height="500"))
    for path, pdf in zip(paths, pdfs):
        display(path)
        display(pdf)
    return {'paths': paths, 'pdfs': pdfs}

display_pdfs('./rlscope_tutorial/**/*.operation_training_time.pdf')

'./rlscope_tutorial/corrected_no/OverlapStackedBarPlot.overlap_type_CategoryOverlap.operation_training_time.pdf'

{'paths': ['./rlscope_tutorial/corrected_no/OverlapStackedBarPlot.overlap_type_CategoryOverlap.operation_training_time.pdf'],
 'pdfs': [<IPython.lib.display.IFrame at 0x7faa8b237a58>]}

### Prepare video recording

In [11]:
# Set up fake display; otherwise rendering will fail
import os
os.system("Xvfb :1 -screen 0 1024x768x24 &")
os.environ['DISPLAY'] = ':1'

In [12]:
import base64
from pathlib import Path

from IPython import display as ipythondisplay

def show_videos(video_path='', prefix=''):
  """
  Taken from https://github.com/eleurent/highway-env

  :param video_path: (str) Path to the folder containing videos
  :param prefix: (str) Filter the video, showing only the only starting with this prefix
  """
  html = []
  for mp4 in Path(video_path).glob("{}*.mp4".format(prefix)):
      video_b64 = base64.b64encode(mp4.read_bytes())
      html.append('''<video alt="{}" autoplay 
                    loop controls style="height: 400px;">
                    <source src="data:video/mp4;base64,{}" type="video/mp4" />
                </video>'''.format(mp4, video_b64.decode('ascii')))
  ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

We will record a video using the [VecVideoRecorder](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecvideorecorder) wrapper, you will learn about those wrapper in the next notebook.

In [13]:
from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv

def record_video(env_id, model, video_length=500, prefix='', video_folder='videos/'):
  """
  :param env_id: (str)
  :param model: (RL model)
  :param video_length: (int)
  :param prefix: (str)
  :param video_folder: (str)
  """
  eval_env = DummyVecEnv([lambda: gym.make(env_id)])
  # Start the video at step=0 and record 500 steps
  eval_env = VecVideoRecorder(eval_env, video_folder=video_folder,
                              record_video_trigger=lambda step: step == 0, video_length=video_length,
                              name_prefix=prefix)

  obs = eval_env.reset()
  for _ in range(video_length):
    action, _ = model.predict(obs)
    obs, _, _, _ = eval_env.step(action)

  # Close the video recorder
  eval_env.close()

### Visualize trained agent



In [14]:
record_video('CartPole-v1', model, video_length=500, prefix='ppo2-cartpole')

In [15]:
show_videos('videos', prefix='ppo2')

## Bonus: Train a RL Model in One Line

The policy class to use will be inferred and the environment will be automatically created. This works because both are [registered](https://stable-baselines3.readthedocs.io/en/master/guide/quickstart.html).

In [None]:
model = PPO('MlpPolicy', "CartPole-v1", verbose=1).learn(1000)

## Conclusion

In this notebook we have seen:
- how to define and train a RL model using stable baselines3, it takes only one line of code ;)