<a href="https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/stable_baselines_wandb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Stable Baselines - Track Experiments with Weights and Biases

Github repo: https://github.com/araffin/rl-tutorial-jnrr19

Stable-Baselines3: https://github.com/DLR-RM/stable-baselines3

Documentation: https://stable-baselines.readthedocs.io/en/master/

RL Baselines3 zoo: https://github.com/DLR-RM/rl-baselines3-zoo

Weights & Biases: https://wandb.ai/site

Weights & Biases Docs: https://docs.wandb.ai/


## Introduction

[Weights & Biases (W&B)](https://wandb.ai/site) is a tool for machine learning experiment tracking, dataset versioning, and project collaboration.

<div><img /></div>

<img src="https://i.imgur.com/uEtWSEb.png" width="650" alt="Weights & Biases" />

<div><img /></div>

In this notebook, you will learn how to track reinforcement learning experiments using W&B. In particular, W&B helps track your experiment configs, metrics, and videos of the agents playing the game. At the end, you should see a run page like https://wandb.ai/wandb/cartpole_test/runs/37ppqzxc 

## Install Dependencies and Set up Virtual Displays for Video Recordings



In [None]:
!apt-get update && apt-get install python-opengl xvfb
!pip install pyvirtualdisplay stable_baselines3[extra] wandb

In [None]:
from pyvirtualdisplay import Display
virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

## Track experiments with W&B

Here is a clean end-to-end example to run. It will prompt you to login in to W&B if you haven't. 


In [None]:
import gym
import wandb
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

def make_env():
    env = gym.make("CartPole-v1", render_mode="rgb_array")
    env = gym.wrappers.RecordVideo(env, f"videos")  # record videos
    env = gym.wrappers.RecordEpisodeStatistics(env)  # record stats such as returns
    return env

config = {
    "policy": 'MlpPolicy',
    "total_timesteps": 25000
}

wandb.init(
    config=config,
    sync_tensorboard=True,  # automatically upload SB3's tensorboard metrics to W&B
    project="CartPole-v1",
    monitor_gym=True,       # automatically upload gym environements' videos
    save_code=True,
)

env = DummyVecEnv([make_env])
model = PPO(config['policy'], env, verbose=1, tensorboard_log=f"runs/ppo")
model.learn(total_timesteps=config['total_timesteps'])
wandb.finish()

After finishing the cell above you should see a dashboard similar to the gif below:

![](https://user-images.githubusercontent.com/5555347/122989248-97b5bd00-d370-11eb-95d6-52d56cfbce19.gif)