# Positional Encoding in Transformers

## 📍 Advanced: Positional Encoding Basics

Transformers process all words (or tokens) in a sequence at the same time, which means they don't inherently know the order of the words. To help them understand the order, we add positional information to each word's representation.

Let's explore how this works!

## The Position Problem

**Challenge:** Transformers process all positions simultaneously — they don’t know the order of words!

- ❌ "Dog bites man" vs "Man bites dog" - same words, different meaning!
- ❌ Without position info, both sentences look identical to the model.
- ✅ **Solution:** Add positional information to word embeddings to give each word a sense of position.

## Sinusoidal Positional Encoding

**A clever mathematical trick:** Use sine and cosine waves to encode position information.

The formulas are:

PE(pos, 2i) = sin(pos / 10000^{2i/d_model})
PE(pos, 2i+1) = cos(pos / 10000^{2i/d_model})

- 🌊 Different frequencies for each dimension
- 🔄 Allows the model to learn relative positions
- 📏 Works for sequences longer than seen in training!

## Positional Encoding Visualization

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def positional_encoding(seq_len, d_model):
    """Generate sinusoidal positional encodings"""
    pos_enc = np.zeros((seq_len, d_model))
    for pos in range(seq_len):
        for i in range(0, d_model, 2):
            pos_enc[pos, i] = np.sin(pos / (10000 ** (2 * i / d_model)))
            if i + 1 < d_model:
                pos_enc[pos, i + 1] = np.cos(pos / (10000 ** (2 * i / d_model)))
    return pos_enc

# Visualize positional encodings
seq_len, d_model = 50, 128
pos_encodings = positional_encoding(seq_len, d_model)

plt.figure(figsize=(12, 8))
plt.pcolormesh(pos_encodings, cmap='coolwarm')
plt.xlabel('Encoding Dimension')
plt.ylabel('Sequence Position')
plt.title('Positional Encoding Visualization')
plt.colorbar()
plt.show()

### 🚀 Open this in Colab
[Open Task in Colab](https://colab.research.google.com/github/Roopesht/codeexamples/blob/main/genai/nlp_basics/25/advanced.ipynb)

## Positional Encoding Made Simple

**Think of it like GPS coordinates:**

- 📍 Each word gets a unique "coordinates" based on its position
- 🗺️ Even if words are the same, their positions make them distinguishable
- 🧭 The model learns: "The" at position 1 ≠ "The" at position 5

## Why sinusoidal functions are preferred

Positional encodings using sine and cosine functions are better than simple position numbers because:
- They allow the model to learn relative positions rather than fixed absolute positions.
- They enable generalization to longer sequences than seen during training.
- Their continuous nature captures the notion of distance and order more effectively.