
    # Transformer Fundamentals – Guided Notebook 03 — Scoring (Dot Product & Scaling)
    **Date:** 2025-10-29  
    **Style:** Guided, hands-on; from-scratch first, then frameworks; interactive visuals

    ## Learning Objectives

- Understand why dot-product similarity and scaling by sqrt(d_k) are used.
- Explore numerical stability with/without scaling.
- Visualize how score magnitudes change with dimensionality.


    ## TL;DR
    Scaling by sqrt(d_k) keeps softmax in a sensible range as dimensions grow, aiding stable gradients and sharper-yet-controlled attention.


## Concept Overview
- Dot products grow in magnitude with vector dimension; scaling normalizes score variance.
- Unscaled scores can saturate softmax → vanishing gradients.


In [None]:

# %% [setup] Environment check & minimal installs (run once per kernel)
# Target: Python 3.12.12, PyTorch 2.5+, transformers 4.44+, datasets 3+, ipywidgets 8+, matplotlib 3.8+
import sys, platform, subprocess, os

print("Python:", sys.version)
print("Platform:", platform.platform())

# Optional: uncomment to install/upgrade on this machine (internet required)
# !pip install --upgrade pip
# !pip install "torch>=2.5" "transformers>=4.44" "datasets>=3.0.0" "ipywidgets>=8.1.0" "matplotlib>=3.8" "umap-learn>=0.5.6"

try:
    import torch
    print("Torch:", torch.__version__, "| CUDA available:", torch.cuda.is_available())
    if torch.cuda.is_available():
        print("CUDA device name:", torch.cuda.get_device_name(0))
except Exception as e:
    print("PyTorch not available yet:", e)

%config InlineBackend.figure_format = 'retina'
from IPython.display import display, HTML
try:
    import ipywidgets as widgets
    from ipywidgets import interact, interactive
    print("ipywidgets:", widgets.__version__)
except Exception as e:
    print("ipywidgets not available yet:", e)

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)


In [None]:

# %% [utils] Small helpers used throughout
import numpy as np

def softmax(x, axis=-1):
    x = x - np.max(x, axis=axis, keepdims=True)
    e = np.exp(x)
    return e / np.sum(e, axis=axis, keepdims=True)

def cosine_sim(a, b, eps=1e-9):
    a_norm = a / (np.linalg.norm(a, axis=-1, keepdims=True) + eps)
    b_norm = b / (np.linalg.norm(b, axis=-1, keepdims=True) + eps)
    return np.dot(a_norm, b_norm.T)

def show_heatmap(mat, xticklabels=None, yticklabels=None, title=""):
    plt.figure()
    plt.imshow(mat, aspect="auto")
    plt.colorbar()
    if xticklabels is not None: plt.xticks(range(len(xticklabels)), xticklabels, rotation=45, ha="right")
    if yticklabels is not None: plt.yticks(range(len(yticklabels)), yticklabels)
    plt.title(title)
    plt.tight_layout()
    plt.show()


In [None]:

# %% [experiment] Scaling effect demo (NumPy)
import ipywidgets as widgets

def scaling_demo(d_k=16):
    T = 10
    X = np.random.randn(T, d_k)
    Q = X.copy()
    K = X.copy()
    scores_unscaled = Q @ K.T
    scores_scaled = scores_unscaled / np.sqrt(d_k)
    print("d_k:", d_k)
    print("scores_unscaled std:", scores_unscaled.std())
    print("scores_scaled   std:", scores_scaled.std())
    show_heatmap(softmax(scores_unscaled, -1), title="Softmax(Unscaled)")
    show_heatmap(softmax(scores_scaled, -1),   title="Softmax(Scaled)")

widgets.interact(scaling_demo, d_k=widgets.IntSlider(min=4, max=256, step=4, value=16))


### Framework Tie-in
- Inspect PyTorch’s `nn.MultiheadAttention` source or confirm scaling behavior by probing outputs as you vary `embed_dim`.



---
### Bonus: Multilingual Extension
- Swap the tokenizer/model for a multilingual variant (e.g., `bert-base-multilingual-cased` or `xlm-roberta-base`).
- Repeat a small slice of the notebook (tokenization, attention map) on non-English sentences and compare.



---
## Reflection & Next Steps
- What changed when you tweaked dimensions, temperatures, or prompts?
- Where did the attention concentrate, and did it match your intuition?
- Re-run the interactive widgets on your own text.
- Save a copy of the figures that best illustrate your understanding.
