# 5. Text Generation

**Estimated Time**: ~2 hours

**Prerequisites**: Notebooks 1-4 (understanding of tokenization, pipelines, and encoder-decoder architecture from summarization)

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** how autoregressive language models generate text one token at a time
2. **Apply** different decoding strategies (greedy, beam search, sampling)
3. **Control** creativity vs coherence using temperature, top-k, and top-p parameters
4. **Prevent** repetitive text using repetition penalty and no-repeat n-gram settings
5. **Build** a creative writing assistant with mood control

## Setup

Run this cell first. If you completed Notebooks 1-4, you already have the core packages ready.

In [None]:
# Core imports
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print("Setup complete!")

---

# Part 1: Conceptual Foundation

## What is Text Generation?

**In plain English**: Text generation is when a model writes new text by predicting one word at a time, using what it's already written to decide what comes next.

**Technical definition**: Autoregressive text generation models predict the probability of the next token given all previous tokens, then sample or select from that distribution to extend the sequence.

### Visual Example

```
PROMPT: "The robot walked into the kitchen and"

Step 1: "The robot walked into the kitchen and" → [opened] (most likely next word)
Step 2: "The robot walked into the kitchen and opened" → [the] 
Step 3: "The robot walked into the kitchen and opened the" → [refrigerator]
Step 4: "The robot walked into the kitchen and opened the refrigerator" → [.]

OUTPUT: "The robot walked into the kitchen and opened the refrigerator."
```

### Text Generation vs Summarization

| Aspect | Summarization (Notebook 4) | Text Generation (This Notebook) |
|--------|---------------------------|----------------------------------|
| **Goal** | Compress long text to short | Extend short text to longer |
| **Architecture** | Encoder-Decoder | Decoder-only (typically) |
| **Input** | Complete document | Prompt/starter text |
| **Output** | Shorter, faithful summary | Creative continuation |
| **Key Challenge** | Preserve key information | Maintain coherence & creativity |

```
SUMMARIZATION (Notebook 4):               TEXT GENERATION (This Notebook):
┌────────────────────────┐                ┌────────────────────────┐
│ Long input document    │                │ Short prompt           │
│ (500 words)            │                │ (10 words)             │
└───────────┬────────────┘                └───────────┬────────────┘
            │ COMPRESS                                │ EXPAND
            ▼                                         ▼
┌────────────────────────┐                ┌────────────────────────┐
│ Short summary          │                │ Long continuation      │
│ (50 words)             │                │ (200+ words)           │
└────────────────────────┘                └────────────────────────┘
```

### How Autoregressive Generation Works: Decoder-Only

Unlike summarization's encoder-decoder, most text generation uses **decoder-only** models:

```
              DECODER-ONLY MODEL (e.g., GPT-2)
    ┌──────────────────────────────────────────────────────┐
    │                                                      │
    │  "The robot walked" → [Predict next] → "into"       │
    │                                                      │
    │  "The robot walked into" → [Predict next] → "the"   │
    │                                                      │
    │  "The robot walked into the" → [Predict next] → ... │
    │                                                      │
    └──────────────────────────────────────────────────────┘

Each token prediction uses ALL previous tokens as context.
This is called "autoregressive" - each output depends on previous outputs.
```

Popular decoder-only models for text generation:
- **GPT-2** (OpenAI): The classic open-source text generator
- **GPT-Neo/GPT-J** (EleutherAI): Open-source GPT alternatives
- **Llama/Mistral** (Meta/Mistral AI): Modern open-weight models

### Connection to Previous Notebooks

| Notebook | Architecture | Direction |
|----------|--------------|----------|
| 1-3 (MLM, NER, QA) | Encoder-only | Understanding |
| 4 (Summarization) | Encoder-Decoder | Input → Shorter output |
| **5 (Text Generation)** | **Decoder-only** | **Input → Longer output** |

```
ENCODER-ONLY (Notebooks 1-3):    ENCODER-DECODER (Notebook 4):    DECODER-ONLY (This Notebook):
┌────────────────┐               ┌─────────┐ ┌─────────┐          ┌────────────────┐
│    ENCODER     │               │ ENCODER │─│ DECODER │          │    DECODER     │
│                │               │         │ │         │          │                │
│ Bidirectional  │               │ Encode  │ │ Decode  │          │ Left-to-right  │
│ context        │               │ input   │ │ output  │          │ generation     │
└────────────────┘               └─────────┘ └─────────┘          └────────────────┘
     Tasks:                           Tasks:                           Tasks:
     - Fill masks                     - Summarization                  - Story writing
     - NER                            - Translation                    - Code completion
     - QA extraction                                                   - Chatbots
```

### The Core Challenge: Decoding Strategies

At each step, the model outputs **probabilities for all possible next tokens**. How do we choose which one?

```
After "The cat sat on the":

Token Probabilities:
┌────────────┬───────────┐
│ Token      │ Prob      │
├────────────┼───────────┤
│ mat        │ 0.35      │  ← Highest (greedy would pick this)
│ floor      │ 0.25      │
│ couch      │ 0.15      │
│ bed        │ 0.10      │
│ chair      │ 0.08      │
│ ...        │ ...       │
└────────────┴───────────┘

Different strategies choose differently from this distribution!
```

| Strategy | Description | When to Use |
|----------|-------------|-------------|
| **Greedy** | Always pick highest probability | Predictable, deterministic |
| **Beam Search** | Track multiple candidates | Balanced quality |
| **Top-k Sampling** | Random from top k tokens | Creative, varied |
| **Top-p (Nucleus)** | Random from smallest set summing to p | Creative, adaptive |
| **Temperature** | Adjust probability sharpness | Control randomness |

### Real-World Applications

Text generation powers many practical applications:

- **Creative Writing**: Story continuation, poetry, dialogue
- **Code Completion**: GitHub Copilot, code suggestions
- **Chatbots**: Conversational AI responses
- **Content Creation**: Marketing copy, product descriptions
- **Autocomplete**: Email suggestions, search queries
- **Data Augmentation**: Generate synthetic training data

### Key Terminology

| Term | Definition |
|------|------------|
| **Autoregressive** | Each output depends on all previous outputs |
| **Decoding** | The process of converting model outputs to text |
| **Temperature** | Controls randomness (lower = focused, higher = random) |
| **Top-k** | Limits selection to the k most likely tokens |
| **Top-p (Nucleus)** | Limits selection to tokens covering p probability mass |
| **Beam Search** | Explores multiple candidate sequences simultaneously |
| **Repetition Penalty** | Discourages the model from repeating tokens |
| **Prompt** | The input text that starts/guides generation |

### Check Your Understanding

Before moving on, try to answer these questions (answers at the end):

1. What does "autoregressive" mean in the context of text generation?
   - A) The model generates all tokens simultaneously
   - B) Each token prediction depends on all previous tokens
   - C) The model only looks at the original prompt

2. What does higher temperature do to text generation?
   - A) Makes output more random and diverse
   - B) Makes output more focused and deterministic
   - C) Makes generation faster

3. Which architecture do most text generation models use?
   - A) Encoder-only (like BERT)
   - B) Encoder-Decoder (like BART)
   - C) Decoder-only (like GPT)

4. What is the purpose of top-k sampling?
   - A) Generate exactly k tokens
   - B) Limit selection to the k most likely next tokens
   - C) Run the model k times and average results

---

# Part 2: Basic Implementation

## Your First Text Generation Pipeline

Let's create a text generation pipeline and write some continuations:

In [None]:
# Create a text generation pipeline
# Using GPT-2, a classic and lightweight text generator
generator = pipeline("text-generation", model="gpt2")

# Simple prompt to continue
prompt = "In a distant galaxy, a young explorer discovered"

# Generate continuation
result = generator(
    prompt,
    max_new_tokens=50,  # Generate up to 50 new tokens
    num_return_sequences=1,  # Return 1 sequence
    do_sample=True,  # Use sampling for variety
    temperature=0.7,  # Moderate creativity
)

print("Prompt:")
print(f"  {prompt}")
print(f"\n{'='*60}\n")
print("Generated continuation:")
print(f"  {result[0]['generated_text']}")

### Understanding the Output

The text generation pipeline returns a list of dictionaries, each containing:
- `generated_text`: The complete text (prompt + generated continuation)

Let's examine what was generated:

In [None]:
# Analyze the generation
full_text = result[0]['generated_text']
generated_only = full_text[len(prompt):]

print("Generation Statistics:")
print("="*40)
print(f"  Prompt length:     {len(prompt.split())} words")
print(f"  Generated length:  {len(generated_only.split())} words")
print(f"  Total length:      {len(full_text.split())} words")
print(f"\nGenerated text only:")
print(f"  {generated_only.strip()}")

### Generating Multiple Sequences

One powerful feature is generating multiple different continuations from the same prompt:

In [None]:
# Generate multiple continuations
prompt = "The secret to happiness is"

results = generator(
    prompt,
    max_new_tokens=30,
    num_return_sequences=3,  # Generate 3 different continuations
    do_sample=True,
    temperature=0.8,
)

print(f"Prompt: \"{prompt}\"")
print("="*60)

for i, result in enumerate(results, 1):
    continuation = result['generated_text'][len(prompt):].strip()
    print(f"\nVersion {i}:")
    print(f"  ...{continuation}")

### Different Types of Prompts

Text generation works with many types of starting prompts:

In [None]:
# Different prompt styles
prompts = {
    "Story Opening": "Once upon a time, in a castle made of crystal,",
    "News Headline": "Breaking: Scientists announce that",
    "Technical": "To implement a binary search algorithm, first",
    "Dialogue": '"I can\'t believe you actually did it," she said.',
    "Question Start": "The most important question we must ask is:",
}

print("Different Prompt Styles:")
print("="*70)

for style, prompt in prompts.items():
    result = generator(
        prompt,
        max_new_tokens=25,
        do_sample=True,
        temperature=0.7,
        pad_token_id=generator.tokenizer.eos_token_id,
    )
    
    continuation = result[0]['generated_text'][len(prompt):].strip()
    print(f"\n[{style}]")
    print(f"  Prompt: {prompt}")
    print(f"  Generated: ...{continuation}")

---

## Exercise 1: Multiple Story Starters (Guided)

**Difficulty**: Basic | **Time**: 10-15 minutes

**Your task**: Create a function that generates multiple creative continuations for story prompts.

### Step 1: Create a story generator function

In [None]:
def generate_story_continuations(prompt, num_versions=3, length=50):
    """
    Generate multiple story continuations from a prompt.
    
    Args:
        prompt: The story starter
        num_versions: How many different continuations to generate
        length: Maximum tokens to generate
        
    Returns:
        List of continuation strings (without the prompt)
    """
    results = generator(
        prompt,
        max_new_tokens=length,
        num_return_sequences=num_versions,
        do_sample=True,
        temperature=0.8,
        top_k=50,
        pad_token_id=generator.tokenizer.eos_token_id,
    )
    
    continuations = []
    for result in results:
        continuation = result['generated_text'][len(prompt):].strip()
        continuations.append(continuation)
    
    return continuations


# Test the function
test_prompt = "The old lighthouse keeper had a secret that"

stories = generate_story_continuations(test_prompt, num_versions=3)

print(f"Story Prompt: \"{test_prompt}\"")
print("="*60)

for i, story in enumerate(stories, 1):
    print(f"\n--- Version {i} ---")
    print(f"...{story}")

### Step 2: Try different genres

In [None]:
# Genre-specific prompts
genre_prompts = {
    "Mystery": "Detective Morgan examined the bloody knife and realized",
    "Sci-Fi": "The spaceship's AI suddenly announced,",
    "Romance": "Their eyes met across the crowded room, and",
    "Horror": "The door creaked open by itself, revealing",
}

print("Genre-Based Story Generation:")
print("="*70)

for genre, prompt in genre_prompts.items():
    continuations = generate_story_continuations(prompt, num_versions=2, length=40)
    
    print(f"\n[{genre.upper()}]")
    print(f"Prompt: \"{prompt}\"")
    for i, cont in enumerate(continuations, 1):
        print(f"  Version {i}: ...{cont[:100]}...")

### Step 3: Try your own prompts

In [None]:
# YOUR CODE HERE
# Create your own story prompts and generate continuations

my_prompt = "Write your story starter here"

# Uncomment to run:
# my_stories = generate_story_continuations(my_prompt, num_versions=3)
# for i, story in enumerate(my_stories, 1):
#     print(f"\nVersion {i}: ...{story}")

---

# Part 3: Intermediate Exploration

## Decoding Strategies Deep Dive

How we select the next token dramatically affects output quality. Let's explore each strategy:

In [None]:
# Sample prompt for experiments
experiment_prompt = "The future of artificial intelligence"

# GREEDY DECODING: Always pick the highest probability token
print("DECODING STRATEGY COMPARISON")
print("="*70)
print(f"Prompt: \"{experiment_prompt}\"\n")

# Greedy (deterministic)
greedy_result = generator(
    experiment_prompt,
    max_new_tokens=40,
    do_sample=False,  # Greedy - no sampling
    pad_token_id=generator.tokenizer.eos_token_id,
)
print("[GREEDY DECODING]")
print("  Always picks the highest probability token.")
print("  Deterministic - same output every time.")
print(f"  Result: ...{greedy_result[0]['generated_text'][len(experiment_prompt):].strip()}")

In [None]:
# BEAM SEARCH: Explore multiple paths, keep best ones
beam_result = generator(
    experiment_prompt,
    max_new_tokens=40,
    num_beams=4,  # Explore 4 candidates at each step
    do_sample=False,
    early_stopping=True,
    pad_token_id=generator.tokenizer.eos_token_id,
)
print("\n[BEAM SEARCH (num_beams=4)]")
print("  Explores multiple candidate sequences.")
print("  Better quality than greedy, still deterministic.")
print(f"  Result: ...{beam_result[0]['generated_text'][len(experiment_prompt):].strip()}")

In [None]:
# TOP-K SAMPLING: Random selection from top k tokens
print("\n[TOP-K SAMPLING (k=50)]")
print("  Randomly samples from the 50 most likely tokens.")
print("  Different output each time (stochastic).")

for i in range(2):
    topk_result = generator(
        experiment_prompt,
        max_new_tokens=40,
        do_sample=True,
        top_k=50,
        pad_token_id=generator.tokenizer.eos_token_id,
    )
    print(f"  Run {i+1}: ...{topk_result[0]['generated_text'][len(experiment_prompt):].strip()[:80]}...")

In [None]:
# TOP-P (NUCLEUS) SAMPLING: Sample from smallest set covering p probability
print("\n[TOP-P SAMPLING (p=0.9)]")
print("  Samples from smallest set of tokens covering 90% probability.")
print("  Adaptive - uses fewer tokens when model is confident.")

for i in range(2):
    topp_result = generator(
        experiment_prompt,
        max_new_tokens=40,
        do_sample=True,
        top_p=0.9,
        pad_token_id=generator.tokenizer.eos_token_id,
    )
    print(f"  Run {i+1}: ...{topp_result[0]['generated_text'][len(experiment_prompt):].strip()[:80]}...")

### Understanding Temperature

Temperature is the most intuitive creativity control. It adjusts how "sharp" the probability distribution is:

```
Original probabilities: [0.5, 0.3, 0.15, 0.05]

Low temperature (0.3):  [0.85, 0.12, 0.028, 0.002]  ← More focused
High temperature (1.5): [0.35, 0.28, 0.22, 0.15]    ← More uniform
```

In [None]:
# Temperature comparison
temp_prompt = "The meaning of life is"

temperatures = [0.3, 0.7, 1.0, 1.5]

print("TEMPERATURE COMPARISON")
print("="*70)
print(f"Prompt: \"{temp_prompt}\"\n")

for temp in temperatures:
    result = generator(
        temp_prompt,
        max_new_tokens=35,
        do_sample=True,
        temperature=temp,
        top_k=50,
        pad_token_id=generator.tokenizer.eos_token_id,
    )
    
    creativity = "Very focused" if temp < 0.5 else "Balanced" if temp < 1.0 else "Creative" if temp < 1.3 else "Wild"
    continuation = result[0]['generated_text'][len(temp_prompt):].strip()
    
    print(f"[Temperature = {temp}] ({creativity})")
    print(f"  ...{continuation[:100]}...\n")

### Preventing Repetition

A common problem with text generation is repetitive output. Several parameters help prevent this:

In [None]:
# Repetition problem demonstration
rep_prompt = "I love pizza because pizza is"

print("REPETITION PREVENTION")
print("="*70)
print(f"Prompt: \"{rep_prompt}\" (designed to encourage repetition)\n")

# Without any repetition control
no_control = generator(
    rep_prompt,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7,
    pad_token_id=generator.tokenizer.eos_token_id,
)
print("[No Repetition Control]")
print(f"  ...{no_control[0]['generated_text'][len(rep_prompt):].strip()}\n")

# With repetition penalty
with_penalty = generator(
    rep_prompt,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7,
    repetition_penalty=1.2,  # Penalize repeated tokens
    pad_token_id=generator.tokenizer.eos_token_id,
)
print("[Repetition Penalty = 1.2]")
print(f"  ...{with_penalty[0]['generated_text'][len(rep_prompt):].strip()}\n")

# With no_repeat_ngram_size
with_ngram = generator(
    rep_prompt,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.7,
    no_repeat_ngram_size=3,  # No 3-gram can repeat
    pad_token_id=generator.tokenizer.eos_token_id,
)
print("[No Repeat N-gram = 3]")
print(f"  ...{with_ngram[0]['generated_text'][len(rep_prompt):].strip()}")

### Parameter Reference Guide

| Parameter | Range | Effect | Recommended |
|-----------|-------|--------|-------------|
| `temperature` | 0.1 - 2.0 | Creativity level | 0.7-0.9 for creative, 0.3-0.5 for focused |
| `top_k` | 1 - 100 | Token pool size | 40-60 |
| `top_p` | 0.1 - 1.0 | Dynamic token pool | 0.9-0.95 |
| `repetition_penalty` | 1.0 - 2.0 | Discourage repeats | 1.1-1.3 |
| `no_repeat_ngram_size` | 2 - 5 | Block n-gram repeats | 2-3 |
| `num_beams` | 1 - 10 | Beam search width | 4-6 for quality |

---

## Exercise 2: Temperature Exploration (Semi-guided)

**Difficulty**: Intermediate | **Time**: 15-20 minutes

**Your task**: Build a function that generates text at multiple temperatures and helps you understand the creativity-coherence tradeoff.

**Hints**:
1. Generate the same prompt at temperatures from 0.2 to 1.8
2. Analyze the outputs for coherence and creativity
3. Consider measuring vocabulary diversity

In [None]:
# YOUR CODE HERE

def analyze_temperature_effect(prompt, temperatures, samples_per_temp=2):
    """
    Generate text at multiple temperatures and analyze the outputs.
    
    Args:
        prompt: The starting text
        temperatures: List of temperature values to test
        samples_per_temp: Number of samples per temperature
        
    Returns:
        dict with temperature as key and analysis as value
    """
    results = {}
    
    for temp in temperatures:
        samples = []
        all_words = []
        
        for _ in range(samples_per_temp):
            result = generator(
                prompt,
                max_new_tokens=50,
                do_sample=True,
                temperature=temp,
                top_k=50,
                pad_token_id=generator.tokenizer.eos_token_id,
            )
            text = result[0]['generated_text'][len(prompt):].strip()
            samples.append(text)
            all_words.extend(text.lower().split())
        
        # Calculate vocabulary diversity
        unique_words = len(set(all_words))
        total_words = len(all_words)
        diversity = unique_words / total_words if total_words > 0 else 0
        
        results[temp] = {
            'samples': samples,
            'diversity': diversity,
            'unique_words': unique_words,
            'total_words': total_words,
        }
    
    return results


# Test the function
test_prompt = "The robot looked at the sunset and thought about"
temps_to_test = [0.3, 0.5, 0.7, 1.0, 1.3, 1.6]

analysis = analyze_temperature_effect(test_prompt, temps_to_test, samples_per_temp=2)

print("TEMPERATURE ANALYSIS")
print("="*70)
print(f"Prompt: \"{test_prompt}\"\n")

for temp, data in analysis.items():
    print(f"\n[Temperature = {temp}]")
    print(f"  Vocabulary diversity: {data['diversity']:.2%}")
    print(f"  Unique/Total words: {data['unique_words']}/{data['total_words']}")
    for i, sample in enumerate(data['samples'], 1):
        print(f"  Sample {i}: ...{sample[:60]}...")

In [None]:
# Visualize the creativity-coherence tradeoff
print("\nCREATIVITY-COHERENCE TRADEOFF")
print("="*70)
print("")
print("Temperature |  Diversity  |  Characteristic")
print("-" * 50)

for temp, data in sorted(analysis.items()):
    div = data['diversity']
    
    if temp < 0.5:
        char = "Very predictable, may repeat"
    elif temp < 0.8:
        char = "Balanced, good for most uses"
    elif temp < 1.1:
        char = "Creative, some surprises"
    elif temp < 1.4:
        char = "Highly creative, occasional nonsense"
    else:
        char = "Very random, often incoherent"
    
    bar = '*' * int(div * 30)
    print(f"    {temp:.1f}     |  {bar:30s} | {char}")

---

# Part 4: Advanced Topics

## Under the Hood: Token-by-Token Generation

Let's see exactly how the model generates text one token at a time:

In [None]:
# Load model and tokenizer separately for manual inspection
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

print(f"Model: {model_name}")
print(f"Vocabulary size: {tokenizer.vocab_size:,}")
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
# Step-by-step generation visualization
prompt = "The cat sat"

print("STEP-BY-STEP GENERATION")
print("="*70)
print(f"Starting prompt: \"{prompt}\"\n")

# Tokenize the prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")
print(f"Step 0 - Tokenized prompt:")
print(f"  Token IDs: {input_ids[0].tolist()}")
print(f"  Tokens: {[tokenizer.decode([t]) for t in input_ids[0]]}")

# Generate tokens one at a time
current_ids = input_ids.clone()
generated_tokens = []

print("\nGenerating tokens one at a time:")
print("-" * 50)

for step in range(8):  # Generate 8 tokens
    with torch.no_grad():
        outputs = model(current_ids)
        logits = outputs.logits
    
    # Get probabilities for the next token
    next_token_logits = logits[0, -1, :]
    probs = torch.softmax(next_token_logits, dim=-1)
    
    # Get top 5 candidates
    top_probs, top_indices = torch.topk(probs, 5)
    
    print(f"\nStep {step + 1}:")
    print(f"  Current text: \"{tokenizer.decode(current_ids[0])}\"")
    print(f"  Top 5 candidates:")
    for prob, idx in zip(top_probs, top_indices):
        token = tokenizer.decode([idx])
        marker = " ← SELECTED" if idx == top_indices[0] else ""
        print(f"    \"{token}\" ({prob:.1%}){marker}")
    
    # Select the highest probability token (greedy)
    next_token = top_indices[0].unsqueeze(0).unsqueeze(0)
    current_ids = torch.cat([current_ids, next_token], dim=-1)
    generated_tokens.append(tokenizer.decode([top_indices[0]]))

print("\n" + "="*50)
print(f"Final text: \"{tokenizer.decode(current_ids[0])}\"")
print(f"Generated tokens: {generated_tokens}")

### How Temperature Affects Probabilities

In [None]:
# Visualize temperature's effect on probability distribution
prompt = "The weather today is"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model(input_ids)
    logits = outputs.logits[0, -1, :]

print("TEMPERATURE EFFECT ON PROBABILITIES")
print("="*70)
print(f"Prompt: \"{prompt}\"\n")

temperatures = [0.3, 0.7, 1.0, 1.5]

for temp in temperatures:
    # Apply temperature
    scaled_logits = logits / temp
    probs = torch.softmax(scaled_logits, dim=-1)
    
    # Get top 5
    top_probs, top_indices = torch.topk(probs, 5)
    
    print(f"[Temperature = {temp}]")
    for prob, idx in zip(top_probs, top_indices):
        token = tokenizer.decode([idx]).strip()
        bar = '*' * int(prob * 40)
        print(f"  {token:12s} {prob:6.1%} {bar}")
    print()

### Constrained Generation

Sometimes you want to control what the model generates more precisely:

In [None]:
# Using stopping criteria
from transformers import StoppingCriteria, StoppingCriteriaList

class StopOnPunctuation(StoppingCriteria):
    """Stop generation when we hit certain punctuation."""
    def __init__(self, tokenizer, stop_tokens):
        self.tokenizer = tokenizer
        self.stop_tokens = stop_tokens
    
    def __call__(self, input_ids, scores, **kwargs):
        last_token = self.tokenizer.decode(input_ids[0, -1])
        return any(stop in last_token for stop in self.stop_tokens)


# Generate until end of sentence
stopper = StopOnPunctuation(tokenizer, ['.', '!', '?'])
stopping_criteria = StoppingCriteriaList([stopper])

prompt = "The scientist discovered that"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

output = model.generate(
    input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.7,
    stopping_criteria=stopping_criteria,
    pad_token_id=tokenizer.eos_token_id,
)

print("CONSTRAINED GENERATION (Stop at sentence end)")
print("="*60)
print(f"Prompt: \"{prompt}\"")
print(f"Generated: {tokenizer.decode(output[0], skip_special_tokens=True)}")

### Limitations of Text Generation

| Limitation | Description | Mitigation |
|------------|-------------|------------|
| **Factual errors** | Models can generate plausible but false information | Verify facts, use for creative not factual tasks |
| **Repetition** | Long generations may loop | Use repetition_penalty, no_repeat_ngram |
| **Coherence drift** | Loses track of context over long generations | Use shorter generations, better prompts |
| **Bias** | Reflects biases in training data | Be aware, review outputs |
| **Context limit** | Limited by model's max context length | Truncate or summarize context |

---

## Exercise 3: Controlled Generation (Independent)

**Difficulty**: Advanced | **Time**: 15-20 minutes

**Your task**: Build a class that generates text with different "styles" by adjusting parameters.

**Requirements**:
1. Support at least 3 generation styles (e.g., "creative", "focused", "balanced")
2. Each style should have appropriate parameter settings
3. Allow for custom parameter overrides

In [None]:
# YOUR CODE HERE

class StyledTextGenerator:
    """Generate text with different style presets."""
    
    STYLES = {
        'focused': {
            'description': 'Predictable, coherent output',
            'params': {
                'temperature': 0.3,
                'top_k': 30,
                'top_p': 0.85,
                'repetition_penalty': 1.1,
            }
        },
        'balanced': {
            'description': 'Good balance of creativity and coherence',
            'params': {
                'temperature': 0.7,
                'top_k': 50,
                'top_p': 0.9,
                'repetition_penalty': 1.2,
            }
        },
        'creative': {
            'description': 'More surprising and varied output',
            'params': {
                'temperature': 1.0,
                'top_k': 80,
                'top_p': 0.95,
                'repetition_penalty': 1.3,
            }
        },
        'wild': {
            'description': 'Highly creative, may be incoherent',
            'params': {
                'temperature': 1.4,
                'top_k': 100,
                'top_p': 0.98,
                'repetition_penalty': 1.4,
            }
        },
    }
    
    def __init__(self):
        """Initialize with the text generation pipeline."""
        self.generator = pipeline("text-generation", model="gpt2")
    
    def list_styles(self):
        """List available styles."""
        print("Available Styles:")
        print("="*50)
        for name, config in self.STYLES.items():
            print(f"  {name:12s} - {config['description']}")
    
    def generate(self, prompt, style='balanced', max_tokens=50, **overrides):
        """
        Generate text with a specific style.
        
        Args:
            prompt: Starting text
            style: One of the predefined styles
            max_tokens: Maximum new tokens to generate
            **overrides: Override specific parameters
            
        Returns:
            dict with generated text and metadata
        """
        if style not in self.STYLES:
            raise ValueError(f"Unknown style: {style}. Use list_styles() to see options.")
        
        # Get style parameters and apply overrides
        params = self.STYLES[style]['params'].copy()
        params.update(overrides)
        
        # Generate
        result = self.generator(
            prompt,
            max_new_tokens=max_tokens,
            do_sample=True,
            pad_token_id=self.generator.tokenizer.eos_token_id,
            **params
        )
        
        full_text = result[0]['generated_text']
        continuation = full_text[len(prompt):].strip()
        
        return {
            'prompt': prompt,
            'style': style,
            'continuation': continuation,
            'full_text': full_text,
            'params': params,
        }
    
    def compare_styles(self, prompt, max_tokens=40):
        """Generate the same prompt with all styles for comparison."""
        results = {}
        for style in self.STYLES:
            results[style] = self.generate(prompt, style, max_tokens)
        return results


# Create the generator
styled_gen = StyledTextGenerator()

# List available styles
styled_gen.list_styles()

In [None]:
# Compare all styles on the same prompt
test_prompt = "In the year 2150, humanity had finally"

comparison = styled_gen.compare_styles(test_prompt)

print("STYLE COMPARISON")
print("="*70)
print(f"Prompt: \"{test_prompt}\"\n")

for style, result in comparison.items():
    desc = styled_gen.STYLES[style]['description']
    print(f"[{style.upper()}] - {desc}")
    print(f"  Temperature: {result['params']['temperature']}")
    print(f"  Output: ...{result['continuation'][:100]}...\n")

In [None]:
# Try with custom overrides
custom_result = styled_gen.generate(
    "The mysterious package contained",
    style='creative',
    max_tokens=60,
    temperature=0.9,  # Override the creative style's temperature
)

print("CUSTOM GENERATION (creative style with temperature override)")
print("="*60)
print(f"Prompt: \"{custom_result['prompt']}\"")
print(f"Style: {custom_result['style']}")
print(f"Parameters: {custom_result['params']}")
print(f"\nGenerated: ...{custom_result['continuation']}")

---

# Part 5: Mini-Project

## Project: Creative Writing Assistant

**Scenario**: You're building a creative writing tool that helps authors overcome writer's block by generating story continuations with different moods.

**Your goal**: Build a `CreativeWritingAssistant` class that:
1. Generates story continuations with specified moods (mysterious, cheerful, tense, etc.)
2. Provides multiple variations for each generation
3. Allows mood blending (e.g., 70% mysterious, 30% cheerful)
4. Estimates the "mood match" of generated text

In [None]:
# MINI-PROJECT: Creative Writing Assistant
# ========================================

class CreativeWritingAssistant:
    """
    Generates story continuations with mood control.
    """
    
    # Mood configurations with generation parameters and seed words
    MOODS = {
        'mysterious': {
            'description': 'Dark, enigmatic, suspenseful',
            'params': {'temperature': 0.8, 'top_p': 0.92},
            'seed_words': ['shadow', 'secret', 'hidden', 'whisper', 'unknown', 'strange', 'mysterious'],
            'prompt_prefix': 'In a mysterious tone: ',
        },
        'cheerful': {
            'description': 'Happy, optimistic, light-hearted',
            'params': {'temperature': 0.9, 'top_p': 0.95},
            'seed_words': ['bright', 'happy', 'joy', 'smile', 'laugh', 'wonderful', 'delightful'],
            'prompt_prefix': 'In a cheerful tone: ',
        },
        'tense': {
            'description': 'Suspenseful, urgent, dramatic',
            'params': {'temperature': 0.7, 'top_p': 0.9},
            'seed_words': ['sudden', 'heart', 'racing', 'danger', 'fear', 'quickly', 'desperate'],
            'prompt_prefix': 'In a tense, dramatic tone: ',
        },
        'romantic': {
            'description': 'Emotional, tender, passionate',
            'params': {'temperature': 0.85, 'top_p': 0.93},
            'seed_words': ['heart', 'love', 'gentle', 'tender', 'eyes', 'touch', 'beautiful'],
            'prompt_prefix': 'In a romantic tone: ',
        },
        'melancholic': {
            'description': 'Sad, reflective, nostalgic',
            'params': {'temperature': 0.75, 'top_p': 0.9},
            'seed_words': ['remember', 'lost', 'gone', 'faded', 'memory', 'sigh', 'alone'],
            'prompt_prefix': 'In a melancholic, reflective tone: ',
        },
    }
    
    def __init__(self):
        """Initialize the writing assistant."""
        self.generator = pipeline("text-generation", model="gpt2")
    
    def list_moods(self):
        """Display available moods."""
        print("Available Moods:")
        print("="*50)
        for mood, config in self.MOODS.items():
            print(f"  {mood:12s} - {config['description']}")
    
    def generate(self, story_start, mood='mysterious', num_variations=3, max_tokens=60):
        """
        Generate story continuations with a specific mood.
        
        Args:
            story_start: The beginning of the story
            mood: The desired mood for continuation
            num_variations: Number of different continuations to generate
            max_tokens: Maximum new tokens per continuation
            
        Returns:
            dict with variations and mood analysis
        """
        if mood not in self.MOODS:
            raise ValueError(f"Unknown mood: {mood}. Use list_moods() to see options.")
        
        mood_config = self.MOODS[mood]
        
        # Generate variations
        variations = []
        for _ in range(num_variations):
            result = self.generator(
                story_start,
                max_new_tokens=max_tokens,
                num_return_sequences=1,
                do_sample=True,
                repetition_penalty=1.2,
                no_repeat_ngram_size=3,
                pad_token_id=self.generator.tokenizer.eos_token_id,
                **mood_config['params']
            )
            
            continuation = result[0]['generated_text'][len(story_start):].strip()
            mood_score = self._calculate_mood_match(continuation, mood)
            
            variations.append({
                'text': continuation,
                'mood_score': mood_score,
                'word_count': len(continuation.split()),
            })
        
        # Sort by mood match
        variations.sort(key=lambda x: x['mood_score'], reverse=True)
        
        return {
            'story_start': story_start,
            'mood': mood,
            'mood_description': mood_config['description'],
            'variations': variations,
        }
    
    def _calculate_mood_match(self, text, mood):
        """
        Estimate how well the text matches the target mood.
        Simple approach: count mood-related words.
        """
        text_lower = text.lower()
        seed_words = self.MOODS[mood]['seed_words']
        
        # Count matches
        matches = sum(1 for word in seed_words if word in text_lower)
        
        # Normalize to 0-1 scale
        max_possible = min(len(seed_words), len(text.split()) // 5)  # Reasonable max
        score = matches / max(max_possible, 1)
        
        return min(score, 1.0)  # Cap at 1.0
    
    def blend_moods(self, story_start, primary_mood, secondary_mood, 
                    primary_weight=0.7, max_tokens=60):
        """
        Generate with blended mood parameters.
        
        Args:
            story_start: The beginning of the story
            primary_mood: Main mood
            secondary_mood: Secondary mood to blend in
            primary_weight: Weight for primary mood (0-1)
            max_tokens: Maximum new tokens
            
        Returns:
            Generated continuation with blended mood
        """
        if primary_mood not in self.MOODS or secondary_mood not in self.MOODS:
            raise ValueError("Invalid mood specified.")
        
        secondary_weight = 1 - primary_weight
        
        # Blend parameters
        p1 = self.MOODS[primary_mood]['params']
        p2 = self.MOODS[secondary_mood]['params']
        
        blended_params = {
            'temperature': p1['temperature'] * primary_weight + p2['temperature'] * secondary_weight,
            'top_p': p1['top_p'] * primary_weight + p2['top_p'] * secondary_weight,
        }
        
        # Generate
        result = self.generator(
            story_start,
            max_new_tokens=max_tokens,
            do_sample=True,
            repetition_penalty=1.2,
            pad_token_id=self.generator.tokenizer.eos_token_id,
            **blended_params
        )
        
        continuation = result[0]['generated_text'][len(story_start):].strip()
        
        return {
            'story_start': story_start,
            'primary_mood': primary_mood,
            'secondary_mood': secondary_mood,
            'blend': f"{primary_weight:.0%}/{secondary_weight:.0%}",
            'blended_params': blended_params,
            'continuation': continuation,
            'primary_score': self._calculate_mood_match(continuation, primary_mood),
            'secondary_score': self._calculate_mood_match(continuation, secondary_mood),
        }
    
    def format_output(self, result):
        """Format generation results for display."""
        lines = []
        lines.append("="*70)
        lines.append(f"Story Start: \"{result['story_start']}\"")
        lines.append(f"Target Mood: {result['mood']} ({result['mood_description']})")
        lines.append("="*70)
        
        for i, var in enumerate(result['variations'], 1):
            lines.append(f"\n--- Variation {i} (mood match: {var['mood_score']:.0%}) ---")
            lines.append(f"...{var['text']}")
        
        return '\n'.join(lines)


# Create the assistant
writer = CreativeWritingAssistant()

# List available moods
writer.list_moods()

In [None]:
# Generate with a specific mood
story_start = "The old mansion stood at the end of the overgrown path, and"

result = writer.generate(
    story_start,
    mood='mysterious',
    num_variations=3,
    max_tokens=50
)

print(writer.format_output(result))

In [None]:
# Try different moods on the same story start
print("\nCOMPARING MOODS ON SAME STORY START")
print("="*70)

story = "She opened the letter and read the first line:"
print(f"Story: \"{story}\"\n")

for mood in ['mysterious', 'cheerful', 'tense', 'romantic']:
    result = writer.generate(story, mood=mood, num_variations=1, max_tokens=40)
    best = result['variations'][0]
    print(f"[{mood.upper()}] ({result['mood_description']})")
    print(f"  ...{best['text'][:100]}...\n")

In [None]:
# Try mood blending
print("\nMOOD BLENDING")
print("="*70)

story = "As the sun set behind the mountains, she thought about"

blended = writer.blend_moods(
    story,
    primary_mood='romantic',
    secondary_mood='melancholic',
    primary_weight=0.6,
)

print(f"Story: \"{blended['story_start']}\"")
print(f"\nMood Blend: {blended['primary_mood']} ({blended['blend'].split('/')[0]}) + {blended['secondary_mood']} ({blended['blend'].split('/')[1]})")
print(f"Blended params: {blended['blended_params']}")
print(f"\nContinuation:")
print(f"  ...{blended['continuation']}")
print(f"\nMood Scores:")
print(f"  {blended['primary_mood']}: {blended['primary_score']:.0%}")
print(f"  {blended['secondary_mood']}: {blended['secondary_score']:.0%}")

In [None]:
# Try your own story and mood
# Uncomment and modify:

# my_story = "Your story start here"
# my_result = writer.generate(my_story, mood='cheerful', num_variations=3)
# print(writer.format_output(my_result))

### Extension Ideas

If you want to extend this project further:

1. **Genre presets**: Add genre-specific settings (sci-fi, fantasy, thriller)
2. **Character voice**: Generate dialogue in character-specific voices
3. **Plot suggestions**: Generate plot twist ideas based on current story
4. **Style analysis**: Analyze writing style and suggest improvements
5. **Continuation chains**: Build longer stories by chaining generations

---

# Part 6: Wrap-Up

## Key Takeaways

1. **Autoregressive generation** predicts one token at a time, using previous tokens as context

2. **Decoding strategies** dramatically affect output quality:
   - Greedy: predictable but potentially repetitive
   - Beam search: better quality, still deterministic
   - Sampling (top-k, top-p): creative and varied

3. **Temperature** is the key creativity control:
   - Lower (0.3-0.5): focused, predictable
   - Medium (0.7-0.9): balanced, recommended for most uses
   - Higher (1.0+): creative but may become incoherent

4. **Repetition prevention** is crucial for longer generations:
   - repetition_penalty: discourages token repetition
   - no_repeat_ngram_size: blocks n-gram repetition

5. **Prompt engineering** significantly affects output quality and direction

## Common Mistakes to Avoid

| Mistake | Why It's a Problem |
|---------|-------------------|
| Very high temperature (>1.5) | Output becomes incoherent and random |
| No repetition control | Long generations loop and repeat |
| Treating output as factual | Models confidently generate false information |
| Ignoring prompt design | Poor prompts lead to off-topic generations |

## What's Next?

In **Notebook 6: Zero-Shot Classification**, you'll learn:
- How models classify text into categories they weren't trained on
- The power of Natural Language Inference for classification
- How to design effective label sets for your tasks

This builds on text generation - both demonstrate the flexibility of language models beyond their original training!

---

## Solutions

### Check Your Understanding (Quiz Answers)

1. **B) Each token prediction depends on all previous tokens** - This is what "autoregressive" means
2. **A) Makes output more random and diverse** - Higher temperature flattens the probability distribution
3. **C) Decoder-only (like GPT)** - Most text generation models use decoder-only architecture
4. **B) Limit selection to the k most likely next tokens** - Reduces randomness while allowing variety

### Exercise 2: Temperature Analysis (Key Insights)

In [None]:
# Key insights from temperature exploration:

# 1. Very low temperature (0.2-0.4):
#    - Highly predictable output
#    - Good for factual completions
#    - May be repetitive

# 2. Medium temperature (0.6-0.8):
#    - Best balance for most use cases
#    - Creative but coherent
#    - Recommended default

# 3. High temperature (1.0-1.3):
#    - Good for brainstorming
#    - More surprising word choices
#    - May occasionally produce odd phrases

# 4. Very high temperature (1.5+):
#    - Often incoherent
#    - Useful only for very experimental generation

recommended_settings = {
    'factual_completion': {'temperature': 0.3, 'top_p': 0.85},
    'general_writing': {'temperature': 0.7, 'top_p': 0.9},
    'creative_writing': {'temperature': 0.9, 'top_p': 0.95},
    'brainstorming': {'temperature': 1.1, 'top_p': 0.98},
}

print("Recommended settings by use case:")
print("="*50)
for use_case, params in recommended_settings.items():
    print(f"  {use_case:20s}: temp={params['temperature']}, top_p={params['top_p']}")

---

## Additional Resources

- [Hugging Face Text Generation Docs](https://huggingface.co/docs/transformers/main_classes/text_generation)
- [GPT-2 Paper](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) - Language Models are Unsupervised Multitask Learners
- [The Curious Case of Neural Text Degeneration](https://arxiv.org/abs/1904.09751) - Top-p (nucleus) sampling paper
- [How to Generate Text](https://huggingface.co/blog/how-to-generate) - Excellent Hugging Face blog post
- [CTRL Paper](https://arxiv.org/abs/1909.05858) - Conditional generation with control codes