<a href="https://colab.research.google.com/github/YashRavv/ai-mastery-journey/blob/main/week-01/intro_to_attention.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧠 Intro to Attention Mechanism

Welcome to your first AI Mastery notebook! In this notebook, you'll:
- Understand the **intuition behind attention**
- Visualize **dot products** as similarity scores
- Simulate a basic **Query-Key-Value (QKV)** attention mechanism

> Let’s explore what makes transformers so powerful – the ability to *focus attention* where it matters.


## 📘 What is Attention, Really?

**Analogy:** Imagine reading a sentence:
> "The lion saw the zebra near the river."

If you're trying to predict the word *river*, the model should focus more on *zebra* and *near*.

**Attention** helps the model assign **weights** to other words based on how relevant they are.

We'll simulate this with simple vectors next!

In [1]:
# 🧮 Dot Product as Similarity
import numpy as np

def dot_product_similarity(vec1, vec2):
    return np.dot(vec1, vec2)

# Define a Query and two Keys
query = np.array([1, 0])
key1 = np.array([1, 1])   # Related word
key2 = np.array([0, 1])   # Less related

print("Similarity with key1:", dot_product_similarity(query, key1))
print("Similarity with key2:", dot_product_similarity(query, key2))

Similarity with key1: 1
Similarity with key2: 0


## 🎯 What You Just Did
- Query is trying to match "relevant" keys.
- Dot product gives a *similarity score* (higher = more relevant)
- This is the heart of attention!

Next, we'll simulate **scaled dot-product attention** with softmax.

In [2]:
# 🔁 Attention Mechanism with Softmax
from scipy.special import softmax

# Define a query and 3 keys
query = np.array([1, 0])
keys = np.array([
    [1, 1],   # More relevant
    [0, 1],   # Less relevant
    [1, 0]    # Highly aligned
])

# Compute dot products
scores = np.dot(keys, query)
weights = softmax(scores)

print("Attention scores:", scores)
print("Attention weights:", weights)

Attention scores: [1 0 1]
Attention weights: [0.4223188 0.1553624 0.4223188]


## 📝 Reflect:
- What does attention *focus on* in this case?
- Try changing the query or keys — what happens?

Feel free to edit below and jot down your thoughts.

In [3]:
# ✍️ Your reflections here:
# 1. What surprised you?
# 2. What confused you?
# 3. How does this relate to LLMs?