# RL - W&B and Stable-baselines3 (SB3)

Train a Proximal Policy Gradient (PPO) RL model to solve the [Cartpole problem](https://gsurma.medium.com/cartpole-introduction-to-reinforcement-learning-ed0eb5b58288), a classic intro to RL problem, using OpenAI Gym

We will use W&B to track the experiments. At the end, you should see a run page like https://wandb.ai/wandb/cartpole_test/runs/37ppqzxc



## Sets up dependencies

In [None]:
!apt install python-opengl xvfb
!pip install pyvirtualdisplay stable_baselines3[extra] wandb
from pyvirtualdisplay import Display
virtual_display = Display(visible=0, size=(1400, 900))
virtual_display.start()

## Run with SB3 with W&B

In [None]:
import gym
import wandb
import numpy as np

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

def make_env():
    env = gym.make("CartPole-v1")
    env = gym.wrappers.Monitor(env, f"videos")      # record videos
    env = gym.wrappers.RecordEpisodeStatistics(env) # record stats suhch as returns
    return env

experiment_name = "PPO"
config = {
    "policy": 'MlpPolicy',
    "total_timesteps": 25000
}
wandb.init(
    config=config,
    sync_tensorboard=True,
    project="cartpole_test",
    name=experiment_name,
    monitor_gym=True,  # automatically upload the videos
    save_code=True,
)

env = DummyVecEnv([make_env])
model = PPO('MlpPolicy', env, verbose=1, tensorboard_log=f"runs/{experiment_name}")
model.learn(total_timesteps=25000)
wandb.finish()