# Interpreting models

This notebook offers interpretability tools from Captum to help visualize & understand what a model has learned. 
The tools include:
- Primary Attribution: Evaluates contribution of each input feature to the output of a model.
- Layer Attribution: Evaluates contribution of each neuron in a given layer to the output of the model.  

For code blocks containing Layer Attribution methods, you may indicate which layer you want to inspect with that method.

In [None]:
import os
import torch
from datetime import datetime

# Make sure your cwd is the il-representations directory
if os.getcwd().split('/')[-1] == 'analysis':
    os.chdir("..")
print('Check cwd', os.getcwd())

In [None]:
from il_representations.scripts.interpret import (prepare_network, process_data, save_img, saliency_, integrated_gradient_, 
                                                  deep_lift_, layer_conductance_, layer_gradcam_, layer_act_, 
                                                  choose_layer, interp_ex, get_venv)
from il_representations.envs.config import benchmark_ingredient
import il_representations.envs.auto as auto_env

import sacred
from sacred.observers import FileStorageObserver
from sacred import Experiment
from stable_baselines3.common.utils import get_device
from captum.attr import LayerActivation, LayerGradientXActivation

## Adjust config

In [None]:
render_interp_ex = Experiment('render_interp', ingredients=[benchmark_ingredient, interp_ex], interactive=True)
interp_ex.observers.append(FileStorageObserver('runs/interpret_runs'))
now = datetime.now().strftime("%m-%d-%Y-%H-%M-%S")

@interp_ex.config
def config():
    ##### These should be the only things you need to modify in this code block #####
    encoder_path = os.path.join(os.getcwd(), 'runs/downloads/ActionConditionedTemporalCPC/249_epochs.ckpt')
    assert os.path.isfile(encoder_path), f'Please double check if {encoder_path} exists.'
    
    
    # Data settings
    # The benchmark is set by detecting il_representations/envs/config's bench_defaults.benchmark_name
    imgs = [30]  # index of the image to be inspected (int)
    assert all(isinstance(im, int) for im in imgs), 'imgs list should contain integers only'

    verbose = False
    
# When log_dir = None, the images will not be saved
log_dir = os.path.join(os.getcwd(), f'runs/interpret_runs/interpret-{now}')
os.system(f'mkdir -p {log_dir}')
    #################################################################################

print('log dir:', log_dir)

## Initial set up

In [None]:
render_interp_ex = Experiment('render_interp', ingredients=[benchmark_ingredient, interp_ex], interactive=True)

@render_interp_ex.main
def run():
    venv = get_venv()
    network = prepare_network(venv)
    images, labels = process_data(venv)
    return network, images, labels

r = render_interp_ex.run()
network = r.result[0]
images = r.result[1]
labels = r.result[2]
verbose = True

## Saliency

Saliency is a simple approach for computing input attribution, returning the gradient of the output with respect to the input. This approach can be understood as taking a first-order Taylor expansion of the network at the input, and the gradients are simply the coefficients of each feature in the linear representation of the model. The absolute value of these coefficients can be taken to represent feature importance.

In [None]:
def saliency():
    for img, label in zip(images, labels):
        original_img = img[0].permute(1, 2, 0).detach().numpy()
        saliency_(network, img, label, original_img, log_dir, False)

saliency()

## Integrated Gradients
Integrated gradients represents the integral of gradients with respect to inputs along the path from a given baseline to input.

In [None]:
def integrated_gradients():
    for img, label in zip(images, labels):
        original_img = img[0].permute(1, 2, 0).detach().numpy()
        integrated_gradient_(network, img.contiguous(), label, original_img, log_dir, False)

integrated_gradients()

## DeepLift
DeepLIFT is a back-propagation based approach that attributes a change to inputs based on the differences between the inputs and corresponding references (or baselines) for non-linear activations. As such, DeepLIFT seeks to explain the difference in the output from reference in terms of the difference in inputs from reference. DeepLIFT uses the concept of multipliers to "blame" specific neurons for the difference in output.

In [None]:
def deep_lift():
    for img, label in zip(images, labels):
        original_img = img[0].permute(1, 2, 0).detach().numpy()
        deep_lift_(network, img, label, original_img, log_dir, False)

deep_lift()

## Layer GradCAM
GradCAM is a layer attribution method designed for convolutional neural networks, and is usually applied to the last convolutional layer. GradCAM computes the gradients of the target output with respect to the given layer, averages for each output channel (dimension 2 of output), and multiplies the average gradient for each channel by the layer activations. The results are summed over all channels and a ReLU is applied to the output, returning only non-negative attributions.

In [None]:
def layer_gradcam():
    for img, label in zip(images, labels):
        ##### These should be the only things you need to modify in this code block #####
        module = 'encoder'
        idx = 4
        #################################################################################
        chosen_layer = choose_layer(network, module, idx)
        original_img = img[0].permute(1, 2, 0).detach().numpy()
        assert isinstance(chosen_layer, torch.nn.Conv2d), 'GradCAM is usually applied to the last ' \
                                                          'convolutional layer in the network.'
        if verbose:
            print(f"You have chosen {chosen_layer}")
        layer_gradcam_(network, chosen_layer, img, label, original_img, log_dir, False)

layer_gradcam()

## Layer Conductance
Conductance combines the neuron activation with the partial derivatives of both the neuron with respect to the input and the output with respect to the neuron to build a more complete picture of neuron importance.

In [None]:
def layer_conductance():
    for img, label in zip(images, labels):
        ##### These should be the only things you need to modify in this code block #####
        module = 'encoder'
        idx = 2
        #################################################################################
        chosen_layer = choose_layer(network, module, idx)
        if verbose:
            print(f"You have chosen {chosen_layer}")
        layer_conductance_(network, chosen_layer, img, label, log_dir, show_imgs=True, columns=10)

layer_conductance()

## Layer GradxAct

Layer Gradient X Activation is the analog of the Input X Gradient method for hidden layers in a network. It element-wise multiplies the layer's activation with the gradients of the target output with respect to the given layer.

In [None]:
def layer_gradxact():
    for img, label in zip(images, labels):
        ##### These should be the only things you need to modify in this code block #####
        module = 'encoder'
        idx = 2
        #################################################################################
        chosen_layer = choose_layer(network, module, idx)
        if verbose:
            print(f"You have chosen {chosen_layer}")
    
        layer_act_(network, chosen_layer, LayerGradientXActivation, 'layer_GradXActivation',
                   img, log_dir, show_imgs=True, attr_kwargs={'target': label})

layer_gradxact()

## Layer Activation

Layer Activation is a simple approach for computing layer attribution, returning the activation of each neuron in the identified layer.

In [None]:
def layer_activation():
    for img, label in zip(images, labels):
        ##### These should be the only things you need to modify in this code block #####
        module = 'encoder'
        idx = 2
        #################################################################################
        chosen_layer = choose_layer(network, module, idx)
        if verbose:
            print(f"You have chosen {chosen_layer}")
    
        layer_act_(network, chosen_layer, LayerActivation, 'layer_GradXActivation',
                   img, log_dir, show_imgs=True, attr_kwargs={})

layer_activation()