# Lecture 17 - Model Analysis and Explanation

provided by [Stanford CS224N](https://www.youtube.com/watch?v=rmVRLeJRkl4)

---

<div class="alert alert-block alert-info">
Table of Contents: <br>
    
<ul>
    <li>1. <a href="#1.-Introduction">Introduction</a></li>
    <li>2. <a href="#2.-Motivation">Motivation</a></li>
    <li>3. <a href="#3.-Levels-of-Abstraction">Levels of Abstraction</a></li>
    <li>4. <a href="#4.-Out-of-Domain-Evaluation-Sets">Out-of-Domain Evaluation Sets</a></li>
    <li>5. <a href="#5.-Influence-Studies-and-Adversarial-Examples">Influence Studies and Adversarial Examples</a></li>
    <li>6. <a href="#6.-Analyzing-Representations">Analyzing Representations</a></li>
    <li>7. <a href="#7.-Revisiting-Model-Ablations-as-Analysis">Revisiting Model Ablations as Analysis</a></li>
    <li>8. <a href="#8.-Resource">Resource</a></li>
</ul>
</div>

# 1. Introduction

Today's lecture covers:
* motivation
* analyzing 1 model at different levels of abstraction
* out-of-domain evaluation
* adversarial examples and influence studies
* analyzing internal/hidden representations
* revisiting model ablations as analysis

# 2. Motivation

There are biases in the model. We also have very little information as to how or why models make certain decisions.

# 3. Levels of Abstraction

At the very surface level we can see our LMs as just probability distributions. Then we can dive deeper and analyze the architecture and even further to analyze every single individual layer and component.

# 4. Out-of-Domain Evaluation Sets

We can evaluate the model by looking at its behavior. By looking at its behavior, we are not interested in what mechanisms it uses, but how it behaves in certain situations. We consider this in the context of natural language inference (and specifically, the entailment task) for the following example.

![image.png](attachment:image.png) <br>
_Figure 1. HANS._

The __Heuristic Analysis for NLI Systems (HANS)__ tests whether or not the model leverages syntactics. We consider it in this case: to classify whether or not a sentence entails another.

![image.png](attachment:image.png) <br>
_Figure 2. HANS evaluation._

HANS tests syntactic heuristics. Other syntactic problems we can test for is if present-tense verbs agree in number with their subjects. 

![image.png](attachment:image.png) <br>
_Figure 3. English present-tense verbs agree in number with their subjects._

This specific type of syntactic problem that we can evaluate for is characterized by __attractors__ which are words like "pizzas" that leads the model to believe the following verb needs to be plural.

![image.png](attachment:image.png) <br>
_Figure 4. Attractors._

Testing for syntactic heuristics is done by engineering a specific test set and seeing how the model _behaves_ when given this test set.

![image.png](attachment:image.png) <br>
_Figure 5. Minimum functionality tests._

__Minimum functionality tests__ aim to reveal a specific behavior in the model and nothing else. In Figure 5, we see how a certain syntax may lead to failure.

![image.png](attachment:image.png) <br>
_Figure 6. Closed-book behavioral study method to analyze model's world knowledge._

# 5. Influence Studies and Adversarial Examples

Here we take a look at a few methods and studies that aimed to explore why models behaved the way they do (their mechanisms).

![image.png](attachment:image.png) <br>
_Figure 7. Memory influence study._

One study looked to analyze if the model actually used a long memory context. So what the authors did was shuffle, reverse, or replace with a random sequence $k$ words away while varying this $k$. They notice that the memory isn't very large (from the chart it seems like it's roughly less than 20). So sequences roughly 20 words away from the current word can simply be random words and it wouldn't even affect the model!

![image.png](attachment:image.png) <br>
_Figure 8. Saliency maps._

Another way to analyze mechanisms is to look at what the model might rely on to make a prediction like a __saliency map__.

![image.png](attachment:image.png) <br>
_Figure 9. Simple gradient method for saliency maps._

Another method for analyzing model mechanisms is __input reduction__. That is, continually reduce the input (by removing the least important words) until the answer the model generates changes.

![image.png](attachment:image.png) <br>
_Figure 10. Input reduction._

Another method is __innocuous model breaking__. This method is done by changing irrelevant text in the input and trying to break the model's prediction.

![image.png](attachment:image.png) <br>
_Figure 11. Innocuous model breaking._

Another study analyzed if models were robust to __shuffled character input__ (with the first and last letter of each word in the same positions). They found, that by doing so, the BLEU scores dropped considerably.

![image.png](attachment:image.png) <br>
_Figure 12. Character-level noise._

# 6. Analyzing Representations

Some components within the network are naturally easier to inspect and analyze: __interpretable architecture components__. One, for instance, is attention.

![image.png](attachment:image.png) <br>
![image-2.png](attachment:image-2.png) <br>
![image-3.png](attachment:image-3.png) <br>
_Figure 13. Interpretable Attention._

Another idea is __interpretable hidden units to position__. We can analyze the activations of a single hidden unit.

![image.png](attachment:image.png) <br>
![image-2.png](attachment:image-2.png) <br>
_Figure 14. Analyzing activations of a single neuron._

Another idea is to interpret these hidden units in the context of subject-verb agreement: __interpretable hidden units to subject-verb agreement__. 

![image.png](attachment:image.png) <br>
_Figure 15. Subject-verb agreement._

LAMA from Lecture 15 is part of a class of NLP approaches for probing and exploring model behaviors and mechanisms. 

![image.png](attachment:image.png) <br>
_Figure 16. Probing: supervised analysis of NNs._

Probing via supervised analysis revealed some interesting properties of language models. 

![image.png](attachment:image.png) <br>
_Figure 17. Increasingly abstract linguistic properties captured in deeper layers of LMs._

Another probing study by Hewitt and Manning 2019 showed that BERT models make dependency parse tree structure easily accessible. In other words, by applying a simple linear transformation to vectors for words in a sentence, we can recreate a dependency tree.

![image.png](attachment:image.png) <br>
_Figure 18. Recovering dependency tree from BERT models._

Despite these interesting discoveries, these findings don't necessarily mean the model uses these mechanisms to predict.

# 7. Revisiting Model Ablations as Analysis 

We can also think of model ablations (testing what works and doesn't work) can be a way to analyze our models. 

![image.png](attachment:image.png) <br>
_Figure 19. Order of self-attention and feedfoward layers affect model performance._

# 8. Resource

If you missed the link right below the title, I'm providing the resource here again along with the course website.

- [Stanford CS224N](https://www.youtube.com/watch?v=rmVRLeJRkl4)
- [Course Website](http://web.stanford.edu/class/cs224n/)

This is a series of 23 lectures provided by Stanford.
