## Visualizing Improvements of Reinforcement Learning Models

This is a notebook accompagnying the Github repository https://github.com/SimonTommerup/02456-deep-learning-rl. It contains a demonstration of an evaluation loop of one of the models as well as a recording of the model playing with added saliency maps. 


First to run this notebook, you will have to enable a GPU run-time in Google Colab under Run Time > Change Run Time Type. 

The first code cell clones into our Github repository and installs procgen. The output is currently suppressed by `%%capture`. 

In [1]:
%%capture
!git clone https://github.com/SimonTommerup/02456-deep-learning-rl.git
!pip install procgen

Next we will import a number of dependencies as well as some modules from the repository: 

In [2]:
import os
import torch
import imageio
from tqdm import tqdm
os.chdir("02456-deep-learning-rl/src")
import models
import saliency
import utils
print(f"Current working directory: {os.getcwd()}")

Current working directory: /content/02456-deep-learning-rl/src


The helper function `get_settings` is useful to avoid being confused by having two sets of parameter declarations:

In [6]:
def get_settings(num_envs=1, env_name="starpilot", start_level=1, num_levels=1, num_features=256, use_backgrounds=False):
  settings = {}
  settings["env_name"] = env_name
  settings["num_envs"] = num_envs
  settings["start_level"] = start_level
  settings["num_levels"] = num_levels
  settings["num_features"] = num_features
  settings["use_backgrounds"] = use_backgrounds
  return settings

The function `get_env_model_hook` returns a loaded model in evaluation mode, an environment with some specified settings and a hook which is used for recording. 

In [7]:
def get_env_model_hook(model_folder, encoder_name, settings):
  # create env
  env = utils.make_env(settings["num_envs"], 
                       env_name=settings["env_name"], 
                       start_level=settings["start_level"], 
                       num_levels=settings["num_levels"], 
                       use_backgrounds=settings["use_backgrounds"])

  # set correct encoder
  assert encoder_name in ["impala", "dqn"], "Encoder must be either impala or dqn"
  if encoder_name == "impala":
    encoder = models.ImpalaModel(env.observation_space.shape[0], settings["num_features"])
  elif encoder_name == "dqn":
    encoder = models.DQNEncoder(env.observation_space.shape[0], settings["num_features"])

  policy = models.Policy(encoder, settings["num_features"], env.action_space.n)

  #load tuned parameters and set to eval
  model_path = saliency.get_full_path(model_folder)
  policy.load_state_dict(torch.load(model_path))
  policy.cuda()
  policy.eval()

  # create hook
  hook = saliency.PolicyLogitsHook(policy)

  return [env, policy, hook]

In the next code cell we will specify which model to look at. This is model experiment `5` trained for `500_lvls` (i.e 500 levels) with the encoder `impala` and using `valclip` i.e (validation clipping). 

To try out other models, an explanation of the naming convention is provided in the readme at https://github.com/SimonTommerup/02456-deep-learning-rl under the subsection "Overview of experiments". 

After having selected another model to look at simply change the variable `model_folder` by selecting a folder name in the folder `experiments`. 

E.g. you could set `model_folder = 11_model_5_bigfish`. 

In [8]:
# NOTE: 
# Choose the right encoder:
# MODEL 2 = DQN
# MODEL 5 = Impala

model_folder = "5_500_lvls_impala_valclip"


The function `evaluation` evaluates a specified model. Our results was created with `num_envs = 32` and `num_levels = 10`. Further in order to evaluate it should be kept in mind that since we trained on 500 levels the starting level for evaluation should be at least 501. 

In [9]:
def evaluation(model_folder, env_model_hook, no_steps, settings):
  # Initialize
  frames = []
  rewards = []
  env = env_model_hook[0]
  policy = env_model_hook[1]

  storage = utils.Storage(env.observation_space.shape, no_steps, settings["num_envs"], gamma=0.99)

  obs = env.reset()
  for _ in tqdm(range(no_steps)):
    # Use policy on observation on frame
    action, log_prob, value = policy.act(obs)

    # Take step in environment
    next_obs,reward,done,info = env.step(action)

    # Save reward
    storage.store(obs, action, reward, done, info, log_prob, value)

    # update current observation
    obs = next_obs
  
  validation_reward = storage.get_reward(normalized_reward=True)
  print(f"Validation reward: {validation_reward}")


The followed code cell executes a validation of a model trained for 500 levels on `starpilot` using the `impala`-encoder and validation clipping:

In [10]:
no_steps=256
validation_settings = get_settings(num_envs=32, start_level=501, num_levels=10)
val_env_model_hook = get_env_model_hook(model_folder, "impala", validation_settings)
evaluation(model_folder, val_env_model_hook, no_steps, validation_settings)

100%|██████████| 256/256 [00:15<00:00, 16.59it/s]


Validation reward: 17.34375


The function `recording` makes a recording of a specified model with some specified environment settings. To make the video the number of environments should be set to the value 1. The parameter `no_steps` controls the number of steps or frames to be recorded. Also it should be kept in mind that a model trained on e.g. `starpilot` is also recorded with the parameter `env_name` set to `starpilot`. 


In [11]:
def recording(model_folder, env_model_hook, no_steps, settings):
  # Initialize
  frames = []
  rewards = []
  env = env_model_hook[0]
  policy = env_model_hook[1]
  hook = env_model_hook[2]

  obs = env.reset()
  for _ in tqdm(range(no_steps)):
    # Use policy on observation on frame
    action,_,_ = policy.act(obs)

    # Get logits
    logits = hook.get_logits()

    # Get saliency
    sf = saliency.saliency_frame(net=policy, hook=hook, logits=logits, frame=obs, pixel_step=4)

    # Set saliency mode
    mode = "max"
    sf = saliency.saliency_mode(sf, mode=mode)

    # Rendering
    frame = env.render(mode="rgb_array")

    constant = 200
    sigma = 5
    channel = saliency.color_to_channel("red")
    frame = saliency.saliency_on_procgen(frame, sf, channel=channel, constant=constant, sigma=sigma)

    # Record frame to frames stack
    frame = (torch.Tensor(frame)).byte()
    frames.append(frame)

    # Take step in environment
    obs,reward,_,_ = env.step(action)

  frames = torch.stack(frames)

  start_level = settings["start_level"]
  env_name = settings["env_name"]
  video_path = env_name + "_" + model_folder + "_" + f"level_played={start_level}" + "_" + f"c={constant}_" + f"sig={sigma}_"+ f"mode={mode}" + ".mp4"
  print(f"Saving video to {video_path}")
  imageio.mimsave(video_path, frames, fps=5)

The recorded movie can subsequently be found in the current working directory (should be `/content/02456-deep-learning-rl/src`, else see the output of code cell 2 above) in .mp4 format and you should be able to download this file and watch it.

The following code block records a movie of a model trained for 500 levels on `starpilot` using the `impala`-encoder and validation clipping:

In [12]:
import warnings
warnings.filterwarnings("ignore") # warning on nn.Upsample acknowledged & suppressed

# Set number of steps (equivalent to seen frames).
# The video is set to be recorded with 5 frames per second.
no_steps = 256
record_settings = get_settings()
record_env_model_hook = get_env_model_hook(model_folder, "impala", record_settings)
recording(model_folder, record_env_model_hook, no_steps, record_settings)

100%|██████████| 256/256 [11:12<00:00,  2.63s/it]


Saving video to starpilot_5_500_lvls_impala_valclip_level_played=1_c=200_sig=5_mode=max.mp4
