# <a id='toc1_'></a>[Explainable Machine Learning](#toc0_)

**Skills**
1. Implement local explainable techniques like LIME, SHAP, and ICE plots using Python.
2. Implement global explainable techniques such as [Partial Dependence Plots](https://scikit-learn.org/stable/modules/partial_dependence.html) (PDP) and [Accumulated Local Effects](https://en.wikipedia.org/wiki/Accumulated_local_effects) (ALE) plots in Python.
3. Apply example-based explanation techniques to explain machine learning models using Python.
4. Visualize and explain neural network models using SOTA techniques in Python.
5. Critically evaluate interpretable attention and [saliency](https://en.wikipedia.org/wiki/Saliency_map) methods for transformer model explanations.
6. Explore emerging approaches to explainability for large language models (LLMs) and generative computer vision models.

---

**Table of contents**<a id='toc0_'></a>    
- [Explainable Machine Learning](#toc1_)    
- [Module 2️⃣](#toc2_)    
  - [Visualizing NN Predictions](#toc2_1_)    
    - [Feature Visualization](#toc2_1_1_)    
      - [Pros & Cons](#toc2_1_1_1_)    
    - [Feature Attribution](#toc2_1_2_)    
      - [Vanilla Gradient](#toc2_1_2_1_)    
        - [Process](#toc2_1_2_1_1_)    
      - [Grad-CAM](#toc2_1_2_2_)    
        - [Process](#toc2_1_2_2_1_)    
      - [Pros & Cons](#toc2_1_2_3_)    
  - [Explaining NNs](#toc2_2_)    
    - [Network Dissection, 2017](#toc2_2_1_)    
      - [Implementation](#toc2_2_1_1_)    
      - [Pros & Cons](#toc2_2_1_2_)    
    - [Concept Activation Vectors, 2018](#toc2_2_2_)    
      - [Pros & Cons](#toc2_2_2_1_)    
  - [Explainable Attention](#toc2_3_)    
    - [A Review of Attention](#toc2_3_1_)    
      - [Review Embeddings](#toc2_3_1_1_)    
      - [Self Attention](#toc2_3_1_2_)    
      - [Self-Attention vs Cross Attention](#toc2_3_1_3_)    
    - [Visualizing Attention](#toc2_3_2_)    
    - [Saliency Methods, 2020, as Alternatives](#toc2_3_3_)    
      - [Process](#toc2_3_3_1_)    
    - [Layer-wise Relevance Propagation, 2019](#toc2_3_4_)    
      - [Process](#toc2_3_4_1_)    
    - [Occlusion-based Saliency, 2017](#toc2_3_5_)    
- [Resources](#toc3_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc2_'></a>[Module 2️⃣](#toc0_)

## <a id='toc2_1_'></a>[Visualizing NN Predictions](#toc0_)

### <a id='toc2_1_1_'></a>[Feature Visualization](#toc0_)
> Process of making learned features in a NN explicit.
>
> Answers the question: *what does this neuron, channel or layer see?*

With feature visualization we are trying to maximize the activation of a neuron $h$. 

$$
    img^{*} = argmax_{img} \underbrace{h_{n,x,y,z}}_{\text{activation of a neuron}} (\overbrace{img}^{\text{input}})
$$
- $x,y=$ spatial position of neuron
- $n=$ layer
- $z=$ channel index

For the mean activation of an entire channel $z$ in layer $n$: 
$$
    img^{*} = argmax_{img} \sum_{x,y}h_{n,x,y,z} (img)
$$

Note that `minimize = maximize(-)`. 

#### <a id='toc2_1_1_1_'></a>[Pros & Cons](#toc0_)
| Pros | Cons |
|------|------|
| Unique insight into how NNs work | Many feature visualization images contain some features with no human interpretation | 
| Communicate in a non-technical way how NNs work | For large NNs, difficult to visualize complete network | 
| | Is not a complete picture of interactions | 

### <a id='toc2_1_2_'></a>[Feature Attribution](#toc0_)
> Indicate how much each feature in your model contributes to a predicition for an instance.

**Gradient-based Methods**
- Vanilla Gradient (Saliency Maps)
- DeconvNet
- Grad-CAM
- Guided Grad-CAM
- SmoothGrad

#### <a id='toc2_1_2_1_'></a>[Vanilla Gradient](#toc0_)
> $\textcolor{#ba4e00}{\textbf{Vanilla Gradient (Saliency Maps)}}$ provides pixel-level importance.
>
> Calculate the gradient of the loss fucntion for the class of interest wrt the input pixels.

##### <a id='toc2_1_2_1_1_'></a>[Process](#toc0_)
1. Forward pass image of interest
2. Compute the gradient of the class score of interest wrt the input pixels. 
   - set all other classes to zero.
3. Visualize the gradients
   - show absolute values or highlight negative/positive contributions.

#### <a id='toc2_1_2_2_'></a>[Grad-CAM](#toc0_)
> $\textcolor{#ba4e00}{\textbf{Grad-CAM}}$ provides region-level importance.
>
> = gradient weighted class activation map
>
> Analyzes which regions are activated in the feature maps of the last convolutional layers for a certain classification.

**Primary goal** of GradCAM is to understand at which parts of an image a convolutional layer looks for a certain classification.
- Does this by analyzing which regions are activated in the feature maps of the last convolutional layers.

##### <a id='toc2_1_2_2_1_'></a>[Process](#toc0_)
1. Forward prop the input image through the CNN.
2. obtain the raw score for the class of interest. 
   - This is the activation of the neuron before the softmax layer.
   - Set all other class activations to zero.
3. Then back prop the gradient of the class of interest to the last convolutional layer before the fully connected layers. 
4. Weight each feature map or pixel by the gradient for the class, and 
5. Calculate an average of the feature maps, weighted per pixel by the gradient then apply lute to the average feature map.
6. To visualize scale values to the interval between 0 and 1, upscale the image and overlay it over the original image.

#### <a id='toc2_1_2_3_'></a>[Pros & Cons](#toc0_)

| Pros | Cons |
|------|------|
| Visuals provide easily understandable explanations | Difficult to evaluate explanation (how do we know it is correct?) |
| Faster computation thatn methods like LIME or SHAP | Results can be unreliable | 
| Many methods to choose from | | 

## <a id='toc2_2_'></a>[Explaining NNs](#toc0_)

### <a id='toc2_2_1_'></a>[Network Dissection, 2017](#toc0_)
> Links human concepts with individual NN units.

- **Key hypothesis**: do convolutional neural networks learn disentangled features?
- $\textcolor{#ba4e00}{\textbf{Disentangled features}}$ just means that can individual network units detect specific real world concepts.

#### <a id='toc2_2_1_1_'></a>[Implementation](#toc0_)
1. Get images with human labeled visual concepts. 
   - These could be pixel-wise labeled images with concepts of different abstraction levels. 
2. Measure CNN channel activations for images.
3. Get the alignment of activations and labeled concepts.

#### <a id='toc2_2_1_2_'></a>[Pros & Cons](#toc0_)
| Pros | Cons |
|------|------|
| Expands upon insights from feature visualization | You need datasets that are labeled on the pixel level with the concepts (this takes a lot of effort to collect!) | 
| Communicate in a non-technical way how NNs work | Many units respond to the same concept and some to no concept at all | 
| Links units to concepts | Only aligns human concepts with positive activations (not with negative activations of channels) | 
| Detect concepts beyond the classes in the classification task | | 

### <a id='toc2_2_2_'></a>[Concept Activation Vectors, 2018](#toc0_)
> A numerical representation of a concept in the activation space of a NN layer.

$\textcolor{#ba4e00}{\textbf{TCAV}}$: For any given concept, TCAV measures the extent of that concept's influence on the model's prediction for a certain class.

#### <a id='toc2_2_2_1_'></a>[Pros & Cons](#toc0_)
| Pros | Cons |
|------|------|
| Customizable via concept dataset curation | Performs poorly on shallow NNs (concepts in deeper layers are more separable) | 
| Provides global explanations | Requires additional annotations to the dataset (costly) | 
| | Mostly used in images only |

## <a id='toc2_3_'></a>[Explainable Attention](#toc0_)

### <a id='toc2_3_1_'></a>[A Review of Attention](#toc0_)
#### <a id='toc2_3_1_1_'></a>[Review Embeddings](#toc0_)
- We want to represent a word as a fixed length vector
  - word $\rightarrow [1,4,5,62,2,33]$

> $\textcolor{#ba4e00}{\textbf{Embeddings}}$ are a method of converting textual information into vectors of real numbers, capturing semnatic and syntactic aspects of the data.
>
> - Acts as a compact representation of the original data, capturing its essential aspects. 

#### <a id='toc2_3_1_2_'></a>[Self Attention](#toc0_)

<img src="imgs/attention_1.png" alt="Attention-1" width="600" height="200">

> The goal of self attention is to improve the original embeddings (vector embeddings $v_1, v_2, v_3, v_4$) with context. 
- We would ideally like our output to be new representations that are better than the original representations

To get the new better representations we get scores $s_{ij}$ by multiplying each vector with each other.
- We then normalize all scores $s_{ij}$, this yields weights $w_{ij}$. We do this s.t. all weights $w_{ij}$ sum to $1$.
  - weights $w_{ij}=\text{normalized scores}$

Then reweight all the vectors. 
<img src="imgs/attention_2.png" alt="Attention-2" width="600" height="200">

We do this for each word in our sequence. 
<img src="imgs/attention_3.png" alt="Attention-3" width="600" height="200">

<img src="imgs/attention_4.png" alt="Attention-4" width="600" height="200">

Until now, no weights are being trained. We need to **introduce trainable parameters**:

<img src="imgs/attention_key_query_values.png" alt="Attention-Key-Query-Values" width="600" height="200">

Now we have Key, Query and Value matrices. These are our trainable weights. The matrices have dimension $k \times k$.

<img src="imgs/attention_key_query_value_matrix.png" alt="Attention-Key-Query-Value-Matrix" width="600" height="200">

<img src="imgs/self_attention_block.png" alt="Self-Attention-Block" width="600" height="200">

> **TLDR**: $\textcolor{#ba4e00}{\textbf{Self Attention}}$ is the process of **adding more context**.

But do we have enough attention? 

> $\textcolor{#ba4e00}{\textbf{Multi-Head Attention}}$: we parallelize attention mechanisms by having multiple heads $h$.

<img src="imgs/multi_head_attention.png" alt="Multi-Head-Attention" width="600" height="200">

#### <a id='toc2_3_1_3_'></a>[Self-Attention vs Cross Attention](#toc0_)
- Self Attention operates within a single sequence. 
- $\textcolor{#ba4e00}{\textbf{Cross Attention}}$ is used between two different sequences. 
- How Cross Attention works: 
  - For each element in one sequence (query sequence), cross-attention computes attention scores based on its relationship with every element in the other sequence (key-value-sequence).
  - This mechanism enables the model to selectively focus on relevant parts of the other sequence when generating an output. 
- Cross-attention is critical for tasks that involve understanding how elements from different sources related to one another.

### <a id='toc2_3_2_'></a>[Visualizing Attention](#toc0_)

**BertViz**, 2019
- Visualizing attention weights illuminates one type of architecture within the model but **does not necessarily provide a direct explanation for predictions**
- 

### <a id='toc2_3_3_'></a>[Saliency Methods, 2020, as Alternatives](#toc0_)
> Input $\textcolor{#ba4e00}{\textbf{Saliency Methods}}$ reveal why one particular model prediction was made in terms of how relevant each input word was to that prediction.
> 
> - to understand how the input text influences output predictions more directly.

$\textcolor{#ba4e00}{\textbf{Integrated Gradients}}=$ The path integral of the gradients along the straightline path from the baseline $x'$ to the input $x$.
- We consider the straight line path from the baseline $x'$ to the input $x$ and compute the gradients at all possible points along the path. 
- Integrated gradients are obtained by cumulating these gradients.

#### <a id='toc2_3_3_1_'></a>[Process](#toc0_)
1. Choose a baseline
   - Select a baseline input that represents an absence of features. 
   - E.g., if the input is an image, the baseline could be an image with all pixels set to zero. 
2. Generate interpolated points
   - Create a series of interpolated inputs between the baseline x prime and the actual input x. This can be done using a linear path.
3. Compute your gradients 
   - For each interpolated input $x_i$
   - Compute the gradient of the model's output wrt the input. This tells us how much each input feature at $x_i$ affects the output. 
4. Average the gradients. 
   - Average the gradients over all the interpolated inputs.
   - This average gradient represents the contribution of each feature along the path from the baseline to the actual input. Then 
5. Scale by the input difference. 
   - Multiply each average gradient by the difference between the actual input and the baseline. This scales the gradient contributions to reflect the actual input features. 

### <a id='toc2_3_4_'></a>[Layer-wise Relevance Propagation, 2019](#toc0_)
> Decompose the prediction of a NN computed over a sample down to relevance scores for the single input dimensions of the sample. 

#### <a id='toc2_3_4_1_'></a>[Process](#toc0_)
1. Start with forward pass to obtain output
2. Custom backward pass (at each layer redistributed the incoming relevance among inputs of that layer)
3. Relevance redistributed until we reach input layers

### <a id='toc2_3_5_'></a>[Occlusion-based Saliency, 2017](#toc0_)
> Compute input saliency by occluding (erasing) input features and measuring effects on the model.

# <a id='toc3_'></a>[Resources](#toc0_)

- [CNN Visualization](https://adamharley.com/nn_vis/cnn/3d.html)
- Paper: [Google Feature Visualization Interactive](https://distill.pub/2017/feature-visualization/)
- Paper: [2020, The elephant in the interpretability room: Why use attention as explanation when we have saliency methods](https://arxiv.org/pdf/2010.05607)