# Deep Learning course - LAB 8

## Feature Visualization in ConvNets

### Recap from previous Lab

* we learned to build some popular Convolutional Neural Network (CNN) architectyres
* we saw how to implement various techniques of Transfer Learning on CNNs

### Agenda for today

* we will learn about two main staples of Feature Visualization (FV) in CNNs:
    * saliency maps
    * Deep Dream

## Feature Visualization

FV is one of the main tools of Explainable AI (XAI) in CNNs. The aim is to answer the following question: «*What are the (visual) features that are being looked at/searched for by the model for a given image or class?*»

The field of research is wide and a lot of work has been done on this field, yet, there is still so much to learn and the framework is all but unified. The main reference for the works in this field are contained in the e-journal [distill.pub](distill.pub), and namely the articles by Christopher Olah and his group @ OpenAI, who concentrate on analyzing what single neurons or convolutional filters learn in specific CNN architectures and weights configuration. It's a daunting task because you have to scrutinize single neurons or filters in a huge CNN and analyze response to 1+ million images. The results, though, are staggering, and really let us draw considerations on the capability of the network, especially considering the aspect of **generalization**.

Below, a representation of a neuron in a ResNet50 trained on ImageNet, built as an «artificial, optimized image that maximizes activations of the given unit.»

![](img/trump_neurons.jpg)

Image from [OpenAI Microscope](https://microscope.openai.com/models/contrastive_4x/image_block_4_5_Add_6_0/89).

Note: in the same layer, there are 2559 more neurons to analyze...

### Saliency Maps

Saliency Maps (SMs) are one of the first example of FV for CNNs. The motivation behind them is to search which pixel of a given image was more *salient* in the classification of an image in a given class $c\in \{1,\dots,C\}$.

To rephrase it, we wish to find an approximation of which part of an image was deemed more important by the CNN relative to the given class $c$.

The intuition behind it is simple: pick an image $I_0$, forward-propagate it through the CNN, then backpropagate the gradient **on the image itself** (NB: we backpropagate on the pixels, not on the weights).

Let us call $M\in\mathbb{R}^{h\times w}$ the SM, and $M_{ij}$ the map for pixel $i, j$.

In the case of a grayscale image $I_0\in\mathbb{R}^{h\times w}$, the SM is:

$M_{ij} = \vert \partial \text{score} / \partial I_0  \vert$

where $\text{score}$ is the prediction of the model.

If we have a multi-channel (e.g. color) image $I_0\in\mathbb{R}^{h\times w\times \text{ch}}$, than the SM is:

$M_{ij} = \max_{\text{ch}}\{\vert \partial \text{score} / \partial I_0  \vert\}$

We may also have **negative saliency maps** and **positive saliency maps**:

$M_{ij}^{(+)} = \max\{M_{ij}, 0\};~~M_{ij}^{(-)} = - \min\{M_{ij}, 0\}$