# 00 — Map of the Journey
## Character-Level Language Modeling with LSTM

---


## 🎯 Concept Primer

### What is Character-Level Language Modeling?

A **character-level language model** predicts the next character in a sequence, given all previous characters. Unlike word-level models, we work directly with individual letters, spaces, and punctuation marks.

**Example:**
- Input: `"You will rejoice to hea"`
- Model predicts: `"r"` (the next character)

### Why LSTM?

**LSTM (Long Short-Term Memory)** networks maintain memory through *gates* that control what information to keep, forget, or output:
- **Forget gate**: What to throw away from memory
- **Input gate**: What new information to store
- **Output gate**: What to output based on memory

This memory mechanism lets the model learn long-range patterns like "an open quote needs a close quote" or "if then needs an else."

### What Breaks If We Skip This?

Without proper understanding of the pipeline, you'll struggle to debug shape mismatches, loss explosions, or nonsensical generated text.

---


## 📊 The Complete Pipeline

```
1. LOAD TEXT
   └─> Raw string: "You will rejoice..."

2. TOKENIZE (Character-Level)
   └─> List of chars: ['Y', 'o', 'u', ' ', 'w', 'i', 'l', 'l', ...]

3. BUILD VOCABULARY
   ├─> c2ix: {'a': 0, 'b': 1, ..., 'z': 25, ' ': 26, ...}
   └─> ix2c: {0: 'a', 1: 'b', ..., 26: ' ', ...}

4. CONVERT TO IDs
   └─> [34, 14, 20, 26, 22, 8, 11, 11, ...]

5. CREATE SLIDING WINDOWS
   ├─> Features: [34, 14, 20, 26, 22, 8, 11, 11]  (input sequence)
   └─> Labels:   [14, 20, 26, 22, 8, 11, 11, 32]  (shifted by 1)

6. DATALOADER (Batching)
   └─> Shape: [batch_size, seq_length]

7. LSTM MODEL
   ├─> Embedding: [B, T] → [B, T, embedding_dim]
   ├─> LSTM: [B, T, embedding_dim] → [B, T, hidden_size]
   └─> Linear: [B, T, hidden_size] → [B, T, vocab_size]

8. LOSS (CrossEntropyLoss)
   ├─> Logits: [B*T, vocab_size]
   └─> Labels: [B*T]

9. TRAINING LOOP
   └─> Forward → Loss → Backward → Optimizer Step

10. GENERATION (Sampling)
    ├─> Start with prompt: "You will rejoice to hear"
    ├─> Loop: feed last char → get logits → sample → append
    └─> Generate 500 new characters
```

---


## ✅ Objectives

By the end of this overview, you should be able to:

- [ ] Explain what "character-level" means and why it's simpler than word-level
- [ ] Describe the three LSTM gates and their purposes
- [ ] Trace the flow from raw text → tokens → IDs → batches → model → loss
- [ ] Understand why we need both `c2ix` (encoding) and `ix2c` (decoding)
- [ ] Recognize tensor shapes at each pipeline stage

---


## 🎓 Acceptance Criteria

**You pass this notebook when:**

✅ You can narrate the full pipeline (steps 1-10 above) in under 90 seconds  
✅ You can explain why labels are "features shifted by 1"  
✅ You understand why we reshape to `[B*T, vocab_size]` for CrossEntropyLoss  

---


## 🧠 Key Concepts to Remember

### Why Character-Level?
- **Small vocabulary**: ~50-100 unique characters vs. 10,000+ words
- **No tokenizer complexity**: No need for BPE, WordPiece, etc.
- **Directly teaches sequence modeling**: Clear input/output alignment

### Shape Notation Used Throughout
- `B` = batch size (e.g., 36)
- `T` = sequence length / time steps (e.g., 48)
- `E` = embedding dimension (e.g., 48)
- `H` = hidden size (e.g., 96)
- `V` = vocab size (e.g., 50-80 depending on unique chars)

### The Training Signal
**Predict the next character** — that's it! The model learns patterns like:
- After `"th"`, `"e"` is likely
- Open quotes need close quotes
- Sentence structure and punctuation patterns

---


## 💭 Reflection Prompt

**In your own words:**

1. What is a "hidden state" in an LSTM?  
   *(Write your one-sentence definition here before moving to the next notebook)*

2. Why do we need both `h` (hidden state) and `c` (cell state) in LSTM?

3. What's the difference between *training* (using real next chars) and *generation* (sampling predicted chars)?

---


## 🚀 Next Steps

Proceed to:
- **Notebook 01**: Load and slice the Frankenstein text
- **Notebook 02**: Build character vocabulary and convert text to IDs
- **Notebook 03**: Create Dataset and DataLoader for batching
- **Notebook 04**: Define the LSTM model architecture
- **Notebook 05**: Train the model
- **Notebook 06**: Generate new text
- **Notebook 99**: Lab notes for reflections

---

*Remember: These notebooks contain TODOs and hints, not complete solutions. Learning happens when you write the code yourself!* 🎓
