# Concept 3: Hands-on Attention Demo

This notebook provides a beginner-friendly introduction to how attention works in natural language processing models, with visualizations and simple demos.

## Attention Mechanism Explained

Imagine a spotlight at a concert focusing on different performers, highlighting important parts of the scene. Similarly, attention in models helps focus on relevant words in a sentence.

- 🎯 **Selective focus:** Emphasizes important information
- 📊 **Weighted combination:** Combines info based on importance
- 🔄 **Dynamic weights:** Changes focus depending on the context

## How Attention Weights Work

Let's consider the sentence: *"The quick brown fox jumps"*. When processing the word "fox":

When processing "fox":

- "The": 0.1 (low attention)
- "quick": 0.3 (medium attention)
- "brown": 0.8 (high attention)
- "fox": 1.0 (self-attention)
- "jumps": 0.2 (low attention)

## Attention Pattern Visualization

Below is a heatmap showing how attention weights are distributed among words in a sentence. Darker colors indicate stronger attention.

![Heatmap showing attention weights between words in a sentence, with darker colors indicating stronger connections, size 800x600](images/attention_heatmap.png)

**Dark areas = strong attention, Light areas = weak attention**

## Real-World Example: Sentiment Analysis

Review: "The movie was okay but the ending was absolutely terrible". 

- 🎯 Processing "terrible" → High attention to "ending".
- 🎯 Processing "okay" → High attention to "movie".
- ✅ **Result:** The model understands mixed sentiment correctly!

## Interactive Demo: Attention Visualization

Let's build a simple attention visualizer together!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

def visualize_attention(sentence, attention_weights):
    """Visualize attention patterns as heatmap"""
    words = sentence.split()
    
    # Create heatmap
    plt.figure(figsize=(10, 8))
    sns.heatmap(attention_weights, 
                xticklabels=words, 
                yticklabels=words,
                annot=True, 
                cmap='Blues',
                fmt='.2f')
    plt.title('Attention Weights Visualization')
    plt.ylabel('Query Words')
    plt.xlabel('Key Words')
    plt.show()

# Example attention matrix
sentence = "The cat sat on the mat"
words = sentence.split()
n_words = len(words)

# Mock attention weights (normally computed by model)
attention_matrix = np.random.rand(n_words, n_words)
# Make it more realistic
np.fill_diagonal(attention_matrix, 1.0)  # High self-attention

visualize_attention(sentence, attention_matrix)

You can run the above code to see a heatmap that visualizes how a simple model might pay attention to different words in a sentence.

## Concept 3 Made Simple

Think of attention like reading comprehension:
- 📖 Question: "Who sat on the mat?"
- 👁️ Your eyes automatically focus on "cat" and "sat"
- 🧠 You ignore less relevant words like "the"
- ✅ **Attention does exactly this automatically!**

## Interactive Attention Demo

Watch how attention weights change dynamically in this demo!

<svg id="attention-svg" width="800" height="400" class="svg"></svg>
<p><em>Watch how attention weights change dynamically!</em></p>

## Concept 3 from a Different Angle

Attention can be visualized as magnetic forces between words — the stronger the force, the higher the attention.

I hope attention visualization is clear now! It's like having smart spotlights illuminate important connections.

## Discussion

Attention mechanisms help models focus on the most relevant parts of the input dynamically.

Challenge question: In document summarization, what words would you expect to receive the highest attention weights?