### **Attention is a technique that lets a model focus on the most relevant parts of the input when producing an output — instead of treating everything as equally important.**

- Think of it like how humans read:
When you read a sentence, you don’t remember every word equally — your brain pays more attention to the important words.

 Simple Example (Human-style):

Sentence: “The cat sat on the mat because it was tired.”

Question: What does “it” refer to?

Your brain instantly links “it” → “cat”, not “mat”.

### **Types of attention mechanisms:-** 
1. Simplified Self-Attention (Each word decides which other words are important)

2. Self-Attention (Each word uses query, key, and value to understand context from all words)

3. Causal Attention (Each word can attend only to past words, not future ones)

4. Multi-Head Attention (Multiple attention heads look at the same sentence from different perspectives).

### **Self-attention means:**

- Words attend to other words in the same sentence

- Every word understands its context

*Example:*

“I went to the bank to deposit money”

The word “bank” attends strongly to “deposit” and “money”, not “river”

In [1]:
import torch

inputs = torch.tensor([
    [0.43, 0.15, 0.89], # your
    [0.55, 0.87, 0.66], # journey
    [0.57, 0.85, 0.64], # starts
    [0.22, 0.58, 0.33], # with
    [0.77, 0.25, 0.10], # one
    [0.05, 0.80, 0.55] # step
])

In [2]:
import matplotlib.pyplot as plt
# from mpl_toolkits.mplot3d import Axes3D

words = ["your", "journey", "starts", "with", "one", "step"]

x_coord = inputs[:, 0].numpy()
y_coord = inputs[:, 1].numpy()
z_coord = inputs[:, 2].numpy()


In [7]:
query = inputs[1]

attention_score = torch.empty(inputs.shape[0])
for i, x_i in enumerate(inputs):
    attention_score[i] = torch.dot(x_i, query)

print(attention_score)

tensor([0.9544, 1.4950, 1.4754, 0.8434, 0.7070, 1.0865])


In [10]:
attention_weights = attention_score/attention_score.sum()
print("Attention weights : ", attention_weights)
print("Sum : ", attention_weights.sum())

Attention weights :  tensor([0.1455, 0.2278, 0.2249, 0.1285, 0.1077, 0.1656])
Sum :  tensor(1.0000)
