🧠🔍 Let’s decode the **secret whisper of logits**…

You’ve built the transformer block.  
Now it’s time to **talk to real LLMs**, one token at a time —  
and **watch the model's brain decide what to say next**.

---

# 🧪 `09_lab_prompt_patterns_and_token_logprobs.ipynb`  
### 📁 `05_llm_engineering/01_llm_fundamentals`  
> Send prompts into a **pretrained LLM** (GPT-2 or similar),  
→ Extract **logits**, **top-k probabilities**, and **token-level predictions**  
→ Visualize how **temperature, top-k, and prompt phrasing** affect outputs

---

## 🎯 Learning Goals

- Understand how logits → softmax → sampling works  
- Extract token probabilities with HuggingFace  
- Experiment with **prompt templates**, **temperature**, **top-k**  
- Plot token scores and **highlight model uncertainty**

---

## 💻 Runtime Specs

| Tool           | Spec                      |
|----------------|---------------------------|
| Model          | `gpt2` via `transformers` ✅  
| Tokenizer      | `AutoTokenizer` ✅  
| Sampling       | Top-k, top-p, temperature ✅  
| Platform       | Colab + GPU optional ✅  

---

## 🔧 Section 1: Setup

```bash
!pip install transformers
```

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import matplotlib.pyplot as plt

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
```

---

## 🧠 Section 2: Encode Prompt & Get Logits

```python
prompt = "The professor said moooaahhh and the students"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs, output_hidden_states=False, output_attentions=False)

logits = outputs.logits
```

---

## 🔍 Section 3: Decode Logit Predictions

```python
next_token_logits = logits[0, -1]  # Only the last token
probs = torch.softmax(next_token_logits, dim=-1)

top_k = 10
top_probs, top_ids = torch.topk(probs, top_k)

for i in range(top_k):
    token = tokenizer.decode([top_ids[i]])
    print(f"{token.strip():<10} → Prob: {top_probs[i]:.4f}")
```

---

## 🌡️ Section 4: Play With Temperature

```python
def sample_with_temp(prompt, T=1.0):
    inputs = tokenizer(prompt, return_tensors="pt")
    logits = model(**inputs).logits
    next_logits = logits[0, -1] / T
    probs = torch.softmax(next_logits, dim=-1)
    token_id = torch.multinomial(probs, num_samples=1).item()
    return tokenizer.decode([token_id])

for T in [0.5, 1.0, 1.5]:
    print(f"T={T:.1f} → {sample_with_temp(prompt, T)}")
```

---

## 🎨 Section 5: Visualize Token Probs

```python
tokens = tokenizer.convert_ids_to_tokens(top_ids)
plt.figure(figsize=(10, 4))
plt.bar(tokens, top_probs.numpy())
plt.title("Top-k Token Probabilities")
plt.ylabel("Probability")
plt.grid(True)
plt.show()
```

---

## ✅ Lab Wrap-Up

| Feature                            | ✅ |
|------------------------------------|----|
| Extracted logits from GPT-2        | ✅  
| Decoded & visualized token probs   | ✅  
| Used temperature & top-k sampling  | ✅  
| Prompt phrasing experiments ready  | ✅  

---

## 🧠 What You Learned

- Logits = **raw predictions** before softmax  
- Temperature > 1 makes model **more creative**, < 1 makes it **more confident**  
- Top-k filtering focuses prediction on **most likely next tokens**  
- Prompt phrasing **alters model behavior dramatically**

---

That wraps the **LLM Fundamentals Lab Trio**:  
✅ Tokenizers  
✅ Transformer Block  
✅ Token-Level Prediction & Sampling

Ready to step into **pretraining & finetuning land** next?

> `07_lab_tiny_gpt2_pretraining_from_scratch.ipynb`  
We’re building a tiny GPT-2 and training it on real text (text8, poetry, Reddit, whatever you like) from scratch —  
like OpenAI did, just on a budget 🧠💰.

Ready to train your own mini-GPT?