# Experiments with network visualizations

Agenda:

* First layer visualization
* Saliency maps
* Class visualization
* DeepDream

In [None]:
import torch
from torch.autograd import Variable
import torchvision

import matplotlib.pyplot as plt
import numpy as np

Today, we won't train any models. Instead, we'll work with a pre-trained model called SqueezeNet.
https://github.com/DeepScale/SqueezeNet


In [None]:
model = torchvision.models.squeezenet1_1(pretrained=True)

for param in model.parameters():
    param.requires_grad = False
    
model

## First layer visualization

The first of methods we'll discuss today will involve visualizing the weights in the first convolutional layer of the network. 

This is possible, since the first layer interacts directly with the images. For the next layers, which interact with more abstract (and complex) outputs of the previous layers, such visualization is not possible.

In [None]:
first_layer = list(model.parameters())[0]
first_layer.size()

# 64 filters that interact with 3x3 patches of pixels (RGB values, hence the depth of the channel is also 3)

In [None]:
plt.figure(figsize=(10,10))

for i, flt in enumerate(first_layer):
    plt.subplot(8, 8, i+1)
    plt.imshow(flt.data)

plt.show()

Not very verbose, is it?

## Saliency maps

In the next technique we'll try to figure out which parts of the image made the biggest impact on the classification decision of the model. 

As the model has been trained on ImageNet dataset ( http://www.image-net.org/ ), we'll load some sample pictures from the dataset. 

As you can see, the dataset is split into 1000 classes! Wow!

In [None]:
def load_imagenet(download=False):
    imagenet_file = 'imagenet_val_25.npz'
    if download: subprocess.call(['wget', 'http://cs231n.stanford.edu/' + imagenet_file])
    f = np.load(imagenet_file)
    X = f['X']
    y = f['y']
    class_names = f['label_map'].item()
    return X, y, class_names

X, y, class_names = load_imagenet()
class_names
# datasets.

Let's take a look at the dataset!

In [None]:
def show_imagenet(i=0):
    pic = X[i]
    name = class_names[y[i]]
    print(name)
    plt.imshow(pic)
    plt.show()

show_imagenet(20)

Once we're done with gazing at the dataset, let's wrap the data into `torch.Variables` and get to work!

In [None]:
X_mean = np.array([0.485, 0.456, 0.406])
X_std = np.array([0.229, 0.224, 0.225])

# X_var = X.copy()
X_var = (X - X_mean) / X_std
X_var = X_var.transpose(0, 3, 1, 2)
X_var = Variable(torch.FloatTensor(X_var), requires_grad=True)
y_var = Variable(torch.LongTensor(y))

Salency maps are computed in an algorithm quite similiar to backpropagation. 

In backpropagation, we computed the gradients of loss with respect to weight matrices. In other words, we asked - how much the change of each weight would affect the loss function?

In the case of computing saliency maps, we'll also compute a gradient. It will be a gradient of the strength of classification as the desired class with respect to the input image. 

In other words - how much the change of which pixels affects the output classification? 
Which is precisely what we want to know!

In [None]:
def saliency_maps(X_var, y_var, model):
    # create a variable of one-hot vectors based on y_var (ground-truth labels)
    y_onehot = np.zeros((y_var.size()[0], y_var.max().data.numpy()[0] + 1))
    y_onehot[np.arange(y_var.size()[0]), y_var.data] = 1
    y_onehot = Variable(torch.FloatTensor(y_onehot), requires_grad=False)
    
    # compute classifications 
    y_pred = model(X_var)
    
    # compute gradients
    # y_onehot serves as an initial gradient 
    # 0s for wrong classes, 1s for the right classes
    # this way, we effectively compute only the gradient 
    # of the classification strength of the right class
    y_pred.backward(y_onehot)
  
    # backprop from ground-truth scores with initial gradients == 1
    
    # extracting the gradients with respect to inputs
    saliency = X_var.grad.data
    # to see which gradients are big, we'll consider 
    # their absolute values
    saliency = saliency.abs() 
    # each pixel has actually three values of gradient computed 
    # - with respect to each color channel
    # we'll consider only the biggest one
    saliency, _ = torch.max(saliency, dim=1)
    return saliency
    

Let's compute the saliencies!

In [None]:
saliencies = saliency_maps(X_var, y_var, model)

And finally, let's see wht tips the network off about the contents of the image!

In [None]:
for i, (x, s) in enumerate(zip(X, saliencies)):
    print(class_names[y[i]])

    plt.subplot(1, 2, 1)
    plt.imshow(x)
    plt.subplot(1, 2, 2)
    plt.imshow(s, cmap=plt.cm.hot)
    plt.show()