# ðŸ“˜ Scaled Dot-Product Attention

## 1. `scaled_dot_product_attention.ipynb`

This module implements the **Scaled Dot-Product Attention** mechanism using **NumPy**, following the formulation used in Transformers.

---

## ðŸ”¢ Formula

For a single attention head:

[
\text{Attention}(Q, K, V) = \text{softmax}\left( \frac{QK^\top}{\sqrt{d_k}} \right)V
]

Where:

| Symbol | Meaning      | Shape                   |
| ------ | ------------ | ----------------------- |
| **Q**  | Query matrix | `(..., seq_len_q, d_k)` |
| **K**  | Key matrix   | `(..., seq_len_k, d_k)` |
| **V**  | Value matrix | `(..., seq_len_k, d_v)` |

---

## âœ¨ Features

This implementation:

* Uses **NumPy** for all matrix operations
* Computes a **numerically stable softmax**
* Produces: **Weights** and **Vector**



In [1]:
from typing import Tuple

import numpy as np

In [None]:
def softmax(x):
    x = x - np.max(x, axis=-1, keepdims=True)
    return np.exp(x) / np.sum(np.exp(x), axis=-1, keepdims=True)

def scaled_dot_product_attention(Q, K, V):
    d_k = Q.shape[-1]
    scores = np.matmul(Q, K.T) / np.sqrt(d_k)     # QK^T / sqrt(d_k)
    attention_weights = softmax(scores)           # softmax
    context_vector = np.matmul(attention_weights, V)
    return attention_weights, context_vector


# ---- USER INPUT SECTION ----

print("Enter Q matrix (example: [[1,0,1]] ):")
Q = np.array(eval(input("Q = ")))

print("Enter K matrix (example: [[1,0,1],[0,1,0]] ):")
K = np.array(eval(input("K = ")))

print("Enter V matrix (example: [[5,5],[1,1]] ):")
V = np.array(eval(input("V = ")))

attn, ctx = scaled_dot_product_attention(Q, K, V)

print("\nAttention Weights:\n", attn)
print("\nContext Vector:\n", ctx)

Enter Q matrix (example: [[1,0,1]] ):


Enter K matrix (example: [[1,0,1],[0,1,0]] ):
Enter V matrix (example: [[5,5],[1,1]] ):

Attention Weights:
 [[0.76036844 0.23963156]]

Context Vector:
 [[4.04147377 4.04147377]]
