# 🚀 Interactive Mode Available!

Typical static notebooks are boring. We have a dedicated interactive module for this week.

[👉 **Click here to open the Interactive Visualization**](../../interactive_platform/modules/week3_llm_variants/interactive.html)

*(Note: Open this link in a new tab to keep the notebook running)*

# 🚀 Interactive Mode Available!

Typical static notebooks are boring. We have a dedicated interactive module for this week.

[👉 **Click here to open the Interactive Visualization**](../../interactive_platform/modules/week3_llm_variants/interactive.html)

*(Note: Open this link in a new tab to keep the notebook running)*

# Week 3: LLM Variants & Latent Spaces

Welcome to Week 3! We are moving from structure (Transformers) to actual models (GPT-2) and concepts (Latent Spaces).

## Goals:
1.  **Tokenization**: Understand how text becomes numbers.
2.  **GPT-2**: Verify the "Decoder-Only" architecture.
3.  **Latent Space**: Visualize word relationships.

In [None]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
model.eval()

print("GPT-2 Loaded!")

## 1. Tokenization Deep Dive

GPT-2 uses Byte-Pair Encoding (BPE). It splits common words into subwords.

In [None]:
text = "The quick brown fox jumps over the lazy dog"
tokens = tokenizer.tokenize(text)
ids = tokenizer.encode(text)

print("Tokens:", tokens)
print("IDs:", ids)

# Try a complex word
complex_word = "antigravity"
print(f"\n'{complex_word}' tokens:", tokenizer.tokenize(complex_word))

## 2. Text Generation

Let's see the model in action.

In [None]:
def generate(prompt, max_length=20):
    input_ids = tokenizer.encode(prompt, return_tensors='pt')
    
    # Generate
    out = model.generate(
        input_ids, 
        max_length=max_length, 
        do_sample=True,    # Random sampling
        temperature=0.7    # Creativity control
    )
    
    return tokenizer.decode(out[0], skip_special_tokens=True)

print(generate("Once upon a time in AI,"))

## 3. Investigating Latent Spaces

We can look at the embedding layer `wte` (Word Token Embeddings) to see how the model groups words.

In [None]:
word_pairs = [("King", "Queen"), ("Man", "Woman"), ("Paris", "France"), ("Apple", "Car")]

embeddings = model.transformer.wte.weight  # [vocab_size, n_embd]

def get_sim(w1, w2):
    id1 = tokenizer.encode(w1)[0]
    id2 = tokenizer.encode(w2)[0]
    e1 = embeddings[id1]
    e2 = embeddings[id2]
    
    # Cosine similarity
    sim = torch.nn.functional.cosine_similarity(e1.unsqueeze(0), e2.unsqueeze(0))
    return sim.item()

for w1, w2 in word_pairs:
    print(f"Sim({w1}, {w2}) = {get_sim(w1, w2):.4f}")