## This notebook will guide you through logging your experiments

In [7]:
# Import libraries 

# The Training Libs
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env

from stable_baselines3.common.vec_env import VecVideoRecorder , DummyVecEnv

# Import wandb stuff
import wandb
from wandb.integration.sb3 import WandbCallback

## Weights and biases (wandb)

[Weights and Biases](https://wandb.ai/) is a free and open source logging platform that allows you to track multiple results when you are training your model. It collects various metrics, and organizes them in an simple web dashboard. 

In this notebook, we will see how simple it is to start tracking your experiments with weights and biases

It can log all the training metrics that is output during the training phase, along with a video of our agent at that point in our training. Additionally, it also logs system status like CPU/GPU usage and temperature.

They have good documentation, which can be found here - https://docs.wandb.ai/

## Setup for wandb

1. Create an account on wandb - https://wandb.ai/

2. Install wandb: 

    Wandb is distributed as a python package. To install it, run `pip install wandb`.
    
3. Login into wandb:

    To login into wandb, run `wandb login`. Paste the api key. The key can be found here - https://wandb.ai/authorize

In [2]:
# Initilaize wandb
# https://docs.wandb.ai/guides/integrations/other/stable-baselines-3

config = {
    "policy_type": "MlpPolicy",
    "total_timesteps": 100000,
    "env_name": "LunarLander-v2",
    "learning_rate" : 0.0002,
}

run = wandb.init(
    project="LunarLander-v2", # It creates a project on wandb if it doesnt exist. The logging happens there
    config=config,
    sync_tensorboard=True,  # auto-upload tensorboard metrics
    monitor_gym=True,  # auto-upload the videos of agents playing the game
)

# After running this cell, go to https://wandb.ai/home to see your new project created

[34m[1mwandb[0m: Currently logged in as: [33msupersecurehuman[0m. Use [1m`wandb login --relogin`[0m to force relogin


## Now, we will create a environment and a model to log the training

This part is covered in the main notebook


In [3]:
env = make_vec_env('LunarLander-v2', n_envs=16)
# Use the folling line with caution. The video recorder will try to render the agent on the screen, so that ffmpeg can caputre it. Here, we have 16 envs set. Trying to render 16 envs on screen will
# be pretty resource intensive. 
# env = VecVideoRecorder(env, f"videos/{run.id}", record_video_trigger=lambda x: x % 2000 == 0, video_length=200) # Set the video recorder, to record our agent during training

# I would suggest you to add all your hyperparameters in the config dictionary defined before the wandb init step. This would help you to visualize the effect those hyper parameters
# have on your model, via the wandb dashboard
model = PPO(
    policy = config["policy_type"],
    env = env,
    learning_rate=config["learning_rate"],
    tensorboard_log="logs",
    verbose=1)

Using cuda device


In [4]:
# Now we do the magical stuff of logging to wandb. All you have to do is add the wandb callback to the model's callback like this

model.learn(total_timesteps=config["total_timesteps"], 
            callback=[WandbCallback(
                gradient_save_freq=100
            )])

Logging to logs/PPO_1
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 91.1     |
|    ep_rew_mean     | -170     |
| time/              |          |
|    fps             | 5432     |
|    iterations      | 1        |
|    time_elapsed    | 6        |
|    total_timesteps | 32768    |
---------------------------------
-----------------------------------------
| rollout/                |             |
|    ep_len_mean          | 98.9        |
|    ep_rew_mean          | -132        |
| time/                   |             |
|    fps                  | 1480        |
|    iterations           | 2           |
|    time_elapsed         | 44          |
|    total_timesteps      | 65536       |
| train/                  |             |
|    approx_kl            | 0.009069282 |
|    clip_fraction        | 0.0836      |
|    clip_range           | 0.2         |
|    entropy_loss         | -1.38       |
|    explained_variance   | 0.00211     |
|    lea

<stable_baselines3.ppo.ppo.PPO at 0x7fb0d70ccd50>

In [5]:
# Finish run
run.finish()

# This cell output will also give a global summary, along with giving you the link to view your run.

VBox(children=(Label(value='0.002 MB of 0.002 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
global_step,▁▃▆█
rollout/ep_len_mean,▁▃▅█
rollout/ep_rew_mean,▁▄▇█
time/fps,█▂▁▁
train/approx_kl,▁██
train/clip_fraction,▁▄█
train/clip_range,▁▁▁
train/entropy_loss,▁▄█
train/explained_variance,▁▇█
train/learning_rate,▁▁▁

0,1
global_step,131072.0
rollout/ep_len_mean,113.48
rollout/ep_rew_mean,-87.24881
time/fps,1031.0
train/approx_kl,0.01242
train/clip_fraction,0.16663
train/clip_range,0.2
train/entropy_loss,-1.30755
train/explained_variance,0.51275
train/learning_rate,0.0002


### Note

You need to enable tensorboard logging to view your training metrics in wandb dashboard.

## Package to 🤗 hub

In [10]:
# You have to disable wandb while packaging it to hub, because it seems to be interfering with package to hub function.
wandb.init(mode="disabled")



In [None]:
from huggingface_hub import notebook_login # To log to our Hugging Face account to be able to upload models to the Hub.

In [None]:
# Note: You just need to run notebook_login() once in any machine you are trying to login. The token is saved in you machine, making future access to your account easier
notebook_login()
!git config --global credential.helper store

In [11]:
from huggingface_sb3 import package_to_hub

env_id = config["env_name"]

model_architecture = "PPO"
model_name = "PPO-LunarLander-v2"

repo_id = "SuperSecureHuman/LunarLander_v2_PPO_wandb"

commit_message = "Initial Commit"

eval_env = DummyVecEnv([lambda: gym.make(env_id)])

package_to_hub(model=model, # Our trained model
               model_name=model_name, # The name of our trained model 
               model_architecture=model_architecture, # The model architecture we used: in our case PPO
               env_id=env_id, # Name of the environment
               eval_env=eval_env, # Evaluation Environment
               repo_id=repo_id, # id of the model repository from the Hugging Face Hub
               commit_message=commit_message)
eval_env.close()

[38;5;4mℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to 1min.
This is a work in progress: If you encounter a bug, please open an issue and use
push_to_hub instead.[0m


/home/venom/Desktop/deep-rl-class/unit1/unit1_bonus/hub/LunarLander_v2_PPO_wandb is already a clone of https://huggingface.co/SuperSecureHuman/LunarLander_v2_PPO_wandb. Make sure you pull the latest changes with `repo.git_pull()`.


Saving video to /home/venom/Desktop/deep-rl-class/unit1/unit1_bonus/-step-0-to-step-1000.mp4


ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
  configuration: --prefix=/home/venom/miniconda3/envs/RL --cc=/tmp/build/80754af9/ffmpeg_1587154242452/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-avresample --enable-gmp --enable-hardcoded-tables --enable-libfreetype --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --disable-nonfree --enable-gpl --enable-gnutls --disable-openssl --enable-libopenh264 --enable-libx264
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.

[38;5;4mℹ Pushing repo LunarLander_v2_PPO_wandb to the Hugging Face Hub[0m


Upload file PPO-LunarLander-v2.zip:  23%|##2       | 32.0k/141k [00:00<?, ?B/s]

Upload file PPO-LunarLander-v2/policy.optimizer.pth:  39%|###8      | 32.0k/82.8k [00:00<?, ?B/s]

Upload file PPO-LunarLander-v2/policy.pth:  76%|#######5  | 32.0k/42.2k [00:00<?, ?B/s]

Upload file replay.mp4:  34%|###3      | 32.0k/94.3k [00:00<?, ?B/s]

Upload file PPO-LunarLander-v2/pytorch_variables.pth: 100%|##########| 431/431 [00:00<?, ?B/s]

remote: Enforcing permissions...        
remote: Allowed refs: all        
To https://huggingface.co/SuperSecureHuman/LunarLander_v2_PPO_wandb
   b5f588e..67fd722  main -> main



[38;5;4mℹ Your model is pushed to the hub. You can view your model here:
https://huggingface.co/SuperSecureHuman/LunarLander_v2_PPO_wandb[0m


## Congarts!

Now you have now started to use wandb in your project. Do checkout the docs to know what are the other amazing stuff it is capable off!