🔥 Locked in. Time to peek inside the mind of a transformer.

---

# 🧪 `09_lab_attention_visualization.ipynb`  
### 📁 `03_natural_language_processing`  
> Visualize **self-attention heads** in real transformer models like BERT or GPT-2.  
See which words attend to which — layer-by-layer, head-by-head.  
**Intuition meets interpretability** in this lab.

---

## 🎯 Learning Goals

- Understand **self-attention weights** as visual heatmaps  
- Use tools like **`bertviz`** to see where attention flows  
- Compare **different heads/layers**  
- Analyze attention patterns: positional, syntactic, semantic

---

## 💻 Runtime Design

| Feature          | Spec             |
|------------------|------------------|
| Platform         | ✅ Colab (recommended)  
| Model            | ✅ `bert-base-uncased` (or `gpt2`)  
| Tooling          | ✅ `bertviz` or `transformer-vis`  
| Hardware         | ✅ CPU/GPU (minimal VRAM)  

---

## 🔧 Section 1: Install & Import

```python
!pip install transformers bertviz

from transformers import BertTokenizer, BertModel
import torch
from bertviz import head_view, model_view
```

---

## 🔢 Section 2: Load Model & Tokenizer

```python
model_name = "bert-base-uncased"

model = BertModel.from_pretrained(model_name, output_attentions=True)
tokenizer = BertTokenizer.from_pretrained(model_name)
model.eval()
```

---

## 📄 Section 3: Prepare Input Sentence

```python
sentence = "The cat sat on the mat and looked at the dog."

inputs = tokenizer(sentence, return_tensors='pt')
input_ids = inputs['input_ids']
attention = model(**inputs).attentions  # Tuple of layers
```

---

## 🔬 Section 4: Visualize Attention — Model View

```python
# Full attention structure across layers and heads
model_view(attention, tokenizer.convert_ids_to_tokens(input_ids[0]))
```

> 🧠 Shows attention across **all layers + heads**  
> Hover over tokens to see which ones they focus on  
> Great for analyzing **which layers attend to what**

---

## 🔍 Section 5: Visualize Specific Head — Head View

```python
head_view(attention, tokenizer.convert_ids_to_tokens(input_ids[0]))
```

> 🔍 Lets you isolate specific **layer/head pairs**  
> See if early layers attend **positionally**, while later layers attend **semantically**

---

## 🧠 Optional: Use GPT-2 Instead

```python
from transformers import GPT2Tokenizer, GPT2Model

gpt2 = GPT2Model.from_pretrained("gpt2", output_attentions=True)
gpt2_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

sentence = "Transformers are changing the world."

gpt2_inputs = gpt2_tokenizer(sentence, return_tensors='pt')
attn = gpt2(**gpt2_inputs).attentions

head_view(attn, gpt2_tokenizer.convert_ids_to_tokens(gpt2_inputs['input_ids'][0]))
```

---

## 🧪 Section 6: What to Look For

| Layer | Pattern            | Meaning |
|-------|--------------------|---------|
| 0–3   | Positional          | Looks left/right like positional encodings  
| 4–8   | Phrase-based        | Attention spans across phrase boundaries  
| 9–12  | Semantic & summary  | Focuses on nouns, verbs, sentence ends  

---

## ✅ Wrap-Up

| What You Did             | ✅ |
|--------------------------|----|
| Visualized self-attention| ✅ |
| Understood head structure| ✅ |
| Interpreted patterns     | ✅ |
| Used BERT and GPT2       | ✅ |
| Colab safe               | ✅ |

---

## 🧠 What You Learned

- **Attention = context mapping**  
- Each head has its own **linguistic role**  
- Transformers **don’t see linearly** — they jump, link, and focus dynamically  
- You can now **debug models visually**, not just by numbers

---

✅ That's the final NLP lab in this batch.

Next up: `04_advanced_architectures` →  
Want to move to `07_lab_gnn_node_classification_with_cora.ipynb` and get graphy with it?