# Week 5: Interpretability 2: Feature visualisation and TSNE

Tutorial by Cher Bass and Emma Robinson 

In this second tutorial session on interpretability we will look at feature visualisation approaches, specifically layer visualisation through gradient ascent. Then we will look at using these for style transfer using Deep Dream and finally we will look at interpretation of network latent space embedddings using T-SNE

First let's mount our Drive and import the libraries we will need. As for part 1, we create examples based on code from the [visualizations repository](https://github.com/utkuozbulak/pytorch-cnn-visualizations). 

**Note** All the visualizations will be saved to `/generated` folder

In [None]:

from google.colab import drive
drive.mount('/content/drive')

# STUDENTS UPLOAD the Notebooks folder to your drive and specify the path to where you have placed the visualisations package folder
%cd /content/drive/My\ Drive/Colab\ Notebooks/AdvancedML/2021/


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader

import torchvision
from torchvision import datasets, models, transforms

import torchvision.datasets as datasets
import torchvision.models as models

import matplotlib.pyplot as plt
import numpy as np 
import visualizations
from visualizations.src.misc_functions import *
from visualizations.src.deep_dream import DeepDream

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


## Visualisation

The web journal [Distill.pub](https://distill.pub/) is a particularly strong source of information for those interested in network visualisation. In particular [this article](https://distill.pub/2017/feature-visualization/) by Chris Olah has been a strong source of information for this section.

The idea of feature visualisation is that we perform backpropagation, similarly to how it is performed when optimising the network. However, for visualisation, we keep the weights ($\mathbf{w}$) constant and instead optimise an activation $f(\mathbf{w,x})$ with respect to  input image $\mathbf{x}$. 

$$ \mathbf{x^*} = \max_{x s.t. || \mathbf{x}|| = \rho} f(\mathbf{w,x}) $$

Such that: $\mathbf{x_{t+1}}= \mathbf{x_{t}}- \gamma \frac{\delta f(\mathbf{w,x}}{\delta \mathbf{x}}$

### **Exercise 1. CNN Layer Visualization** 

Here we will use the layer visualisation approach of ([Erhan et al 2009](https://www.researchgate.net/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network ))

The first thing we need to do is load a pre-trained VGG16 model:
 


In [None]:
# load vgg model, and extract layers from the features modules only
pretrained_model = models.vgg16(pretrained=True).features
print(pretrained_model)

#### **Exercise 1.1 Implementing layer visualisation**

Implement the gradient update step for layer visualisation. Here you are updating the image for a specific channel of a specific layer. Thus the function has to make a stepwise forward pass through the network until it reaches the `selected_layer`. The output of these forward pass is then the target activation block. This must then be sliced to return the `selected_channel` to optimise against.

**To do**

1. Slice the correct channel from the activation returned from `selected_layer` (line 33). The layers activations are returned by using a for loop to perform a forward step by step until the correct layer is reached (lines 27-31)
2. Suggest a suitable loss function (line 35). Don't forget, the goal here is to maximise activation strength across the whole channel.

**Note** how, in the above function, the optimizer (line 20) is optimising against the `[processed_image]` rather than the network parameters. After tha the training is implemented as for standard network training (with `optimizer.zero_grad()`, `loss.backward()` and `optimizer.step()` implemented exactly as seen before)



In [None]:
from scipy.ndimage import gaussian_filter

def visualise_layer(selected_layer,selected_channel,model,processed_image,lr=0.1,iters=50,l2_reg=1e-4,regularise=False):
        
        if regularise:
          # STUDENT CODE ex 1.3 implement optimizer with weight decay
          optimizer = None
        else:
          # Define optimizer for the image
          optimizer = torch.optim.Adam([processed_image], lr=lr)

        x = processed_image
        for i in range(1, iters):
            optimizer.zero_grad()

            # Forward pass through network one layer at a time 
            for index, layer in enumerate(model):         
                x = layer(x) #forward pass though current layer
                #stop once target layer is reacher
                if index == selected_layer:
                  break
            # STUDENT CODE 1.1 - slice a filter from the layer to get a 2D output
            conv_output = None
            # STUDENT CODE 1.1 - Implement Loss function - we need to maximise activations across the layer
            loss = None
            # Backward
            loss.backward()
                  
            # Update image
            optimizer.step()
            
            # Assign processed image to a variable to move forward in the model
            x = processed_image
            
        
            print('Iteration:', str(i), 'Loss:', "{0:.2f}".format(loss.data.numpy()))
            if i % 10 == 0:
                created_image = recreate_image(processed_image)  
                im_path = './generated/ddream_l' + str(selected_layer) + \
                    '_f' + str(selected_channel) + '_iter' + str(i) + '.jpg'
                plt.imshow(created_image)
                plt.show()
        return processed_image
            
                


#### **Exercise 1.2  Train layer visualisation**

**To Do** 
- train with different layers and filters
- consider changing the learning rate of number of iterations

In this case our starting point is an `image` of random noise. Note, the function expects the image as a numpy integer array. `visualizations.src.misc_functions.preprocess_image` resizes and normalises the image to match the form expected by the network.

In [None]:
cnn_layer =28
cnn_filter=2

random_image = np.uint8(np.random.uniform(150, 180, (224, 224, 3)))
# Process image and return variable
processed_image = preprocess_image(random_image, False)
       
layer_vis = visualise_layer(cnn_layer,cnn_filter,pretrained_model,processed_image,iters=100, lr=0.5)

# plot image

# Recreate image - removes intensity normalisation
created_image = recreate_image(layer_vis)  
plt.imshow(created_image)
plt.show()


#### **Exercise 1.3 (optional) regularisation**

1. L2 regularisation can be implemented in PyTorch using the `weight_decay` argument of the optimiser. Try adding different levels of L2 regularisation this way
2. Consider also implementing activation/gradient clipping or Gaussian blurring?

In [None]:
# STUDENTS CODE - TRY EDITING THE LAYER VISUALISATION FN (ABOVE) TO IMPLEMENT REGULARISATION

# THEN IMPLEMENT FOR THE EXAMPLE AS ABOVE

## **Exercise 2: Deep Dream**


In deep dream, rather than optimising for a random image we instead pass a real image. Let's visualise the output from a later layer.

In [None]:
# THIS OPERATION IS MEMORY HUNGRY! #
# Because of the selected image is very large
# If it gives out of memory error or locks the computer
# Try it with a smaller image
cnn_layer = 28
filter_pos = 94

im_path = './visualizations/input_images/dd_tree.jpg'

image=Image.open(im_path).convert('RGB')
plt.imshow(image)
plt.show()

processed_image = preprocess_image(image, True)


In [None]:
layer_vis = visualise_layer(cnn_layer,filter_pos,pretrained_model,processed_image,lr=0.1,iters=100, regularise=False)

**To do** Try changing the learning rate or levels of regularisation.

Now you can try uploading a photo of yourself and giving it the DeepDream treatment!

## **Exercise 3 T-SNE**

Experiment with t-sne using the [scikit-learn implementation](xhttps://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html). In its most basic form this can be done in one line

<figure align="center">
<img src="https://drive.google.com/uc?id=1KqHdu9VY9O99QVpMF0JqKWrMLa4QoXiq"" alt="Drawing" width="600px;"/>
</figure>


For this exercise we will again use the data from notebook `1.1-fundamentals-solutions.ipynb` ("prem_vs_termwrois.pkl" - available from week 1 section of keats). This represents mean vales of three different types of cortical imaging data: cortical thickness, cortical folding and cortical myelination, all averaged within 100 regions of interest ROIS on the surface (300 features in total). There are 101 babies, 50 terms and 51 preterms. 

<figure align="center">
<img src="https://drive.google.com/uc?id=1ZbAn0R_ihQ4DCe1XyKaHIRZSvUQv3puh" alt="Drawing" width="900px;"/>
</figure>

**To do**

1. implement t-sne using scikit learn. Set `n_components=2`; fit the embedding for the dHCP data
2. experiment with changing the perplexity n the range 5 to 50
3. experiment with changing the metric to other options available through [`scipy.spatial.distance.pdista](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html) to e.g. correlation
4. In each instance plot the embedding with the points color coded by label

In [None]:
import pandas as pd
from sklearn import manifold
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
import copy

# STUDENTS CODE HERE - UPDATE THE PATH TO CORRESPOND TO WHERE YOU HAVE UPLOADED prem_vs_termwrois.pkl TO YOUR DRIVE #
file_path='/content/drive/My Drive/Colab Notebooks/AdvancedML/2021/01_fundamentals/prem_vs_termwrois.pkl'
# Read the data
df = pd.read_pickle(file_path)
data = df.values[:,:-2]
y = df.values[:,-1]

# STUDENTS CODE HERE -IMPLEMENT T-SNE FOR THIS DATASET  #
# vary parameters (e.g. perplexity) and see effect on embeddding

# plot result with different colours for each of the (premature and term baby labels)


# Source references

1. [visualizing-convolution-neural-networks-using-pytorch](https://towardsdatascience.com/visualizing-convolution-neural-networks-using-pytorch-3dfa8443e74e)
2. [DeepLearning-PadhAI](https://colab.research.google.com/github/Niranjankumar-c/DeepLearning-PadhAI/blob/master/DeepLearning_Materials/6_VisualizationCNN_Pytorch/CNNVisualisation.ipynb#scrollTo=uQI9jHcP6xfP)