# Tutorial 10-3: Generative Magic â€“ "Text Generation with GPT-2"

**Course:** CSEN 342: Deep Learning  
**Topic:** Decoder Transformers, Autoregressive Generation, Sampling Strategies, and Prompting

## Objective
While BERT is a master of *understanding* (classification, NER), models like **GPT (Generative Pre-trained Transformer)** are masters of *creation*.

GPT is a **Decoder-only** architecture. It is trained to predict the next token in a sequence given all previous tokens. This simple objective allows it to generate coherent essays, code, and stories.

In this tutorial, we will:
1.  **Load GPT-2:** A smaller predecessor to the famous GPT-3/4, but conceptually identical.
2.  **Implement the Generation Loop:** Manually write the code to generate text token-by-token.
3.  **Compare Decoding Strategies:** Implement Greedy Search, Beam Search, and Top-K Sampling to see how they affect creativity.
4.  **Prompt Engineering:** Use "Few-Shot" prompting to make the model solve logic puzzles without fine-tuning.

**NOTE**: Run this notebook under the `Transformers Bundle` kernel instead of the class kernel.

---

## Part 1: Robust Setup (Manual Download)

As always, we download the model files manually to bypass cluster restrictions. We will use `gpt2` (117M parameters).

In [1]:
import os
import torch
import torch.nn.functional as F
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import matplotlib.pyplot as plt
import numpy as np

# Define paths
model_root = '../data/gpt2_local'
os.makedirs(model_root, exist_ok=True)

def download_file(url, save_path):
    if not os.path.exists(save_path):
        print(f"Downloading {os.path.basename(save_path)}...")
        exit_code = os.system(f"wget -nc -q -O {save_path} {url}")
        if exit_code != 0:
            print(f"Error downloading {url}")

# Download GPT-2 Files
base_hf_url = "https://huggingface.co/gpt2/resolve/main/"
files_to_fetch = [
    "config.json",
    "pytorch_model.bin",
    "vocab.json",
    "merges.txt",
    "tokenizer.json",
    "tokenizer_config.json"
]

for filename in files_to_fetch:
    download_file(base_hf_url + filename, os.path.join(model_root, filename))

# Load Model and Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained(model_root)
model = GPT2LMHeadModel.from_pretrained(model_root)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
model.eval() # Important: Turn off dropout!

print("GPT-2 Loaded.")

Downloading config.json...
Downloading pytorch_model.bin...
Downloading vocab.json...
Downloading merges.txt...
Downloading tokenizer.json...
Downloading tokenizer_config.json...
GPT-2 Loaded.


---

## Part 2: The Autoregressive Loop

GPT generates text one token at a time. The output at step $t$ becomes the input at step $t+1$.

Let's implement the most basic version: **Greedy Search**. At each step, we simply pick the token with the highest probability.

In [2]:
def generate_greedy(prompt, max_length=50):
    # 1. Encode Input
    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
    
    # 2. Generation Loop
    generated_ids = input_ids
    
    with torch.no_grad():
        for _ in range(max_length):
            # Forward pass
            outputs = model(generated_ids)
            logits = outputs.logits
            
            # Get logits for the LAST token only (Batch, Seq, Vocab)
            next_token_logits = logits[:, -1, :]
            
            # Greedy: Pick highest probability
            next_token_id = torch.argmax(next_token_logits, dim=-1).unsqueeze(-1)
            
            # Append to sequence
            generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)
            
            # Stop if EOS token (optional, usually not needed for free generation)
            if next_token_id.item() == tokenizer.eos_token_id:
                break
                
    # 3. Decode
    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print("--- Greedy Search ---")
print(generate_greedy("The secret to a happy life is"))

--- Greedy Search ---
The secret to a happy life is to be able to do something that you love.

I'm not saying that I'm a bad person, but I'm not saying that I'm a bad person. I'm just saying that I'm not a bad person.

I


### Discussion
Greedy search often gets stuck in repetitive loops (e.g., "I don't know. I don't know. I don't know."). This is because picking the *single* most likely word is not how humans write; we often pick words that are surprising but fitting.

---

## Part 3: Sampling Strategies (Creativity)

To fix repetition, we use **Sampling**. Instead of `argmax`, we sample from the probability distribution.

### 3.1 Temperature
We divide logits by a temperature $T$ before softmax.
* $T < 1$: Makes distribution sharper (more conservative).
* $T > 1$: Makes distribution flatter (more random/creative).

### 3.2 Top-K Sampling
We only sample from the top $K$ most likely tokens. This prevents the model from choosing complete nonsense words from the "tail" of the distribution.



In [3]:
def generate_sampling(prompt, max_length=50, temperature=1.0, top_k=0):
    input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
    generated_ids = input_ids
    
    with torch.no_grad():
        for _ in range(max_length):
            outputs = model(generated_ids)
            next_token_logits = outputs.logits[:, -1, :]
            
            # Apply Temperature
            next_token_logits = next_token_logits / temperature
            
            # Apply Top-K Filtering
            if top_k > 0:
                # Remove all tokens with a probability less than the last token of the top-k
                indices_to_remove = next_token_logits < torch.topk(next_token_logits, top_k)[0][..., -1, None]
                next_token_logits[indices_to_remove] = -float('Inf')
            
            # Convert to Probabilities
            probs = F.softmax(next_token_logits, dim=-1)
            
            # Sample
            next_token_id = torch.multinomial(probs, num_samples=1)
            
            generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)
            
    return tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print("--- High Temperature (Creative/Chaotic) ---")
print(generate_sampling("The secret to a happy life is", temperature=1.2, top_k=50))

print("\n--- Low Temperature (Safe/Boring) ---")
print(generate_sampling("The secret to a happy life is", temperature=0.7, top_k=50))

--- High Temperature (Creative/Chaotic) ---
The secret to a happy life is to never be left out. By using the free will movement of people with very different ideologies, they are able to get what they want on a level that has no basis in fact; the more liberal ones tend to have the freedom of the freer

--- Low Temperature (Safe/Boring) ---
The secret to a happy life is to be able to be yourself.

Photo Credit: Shutterstock.com

The next time you're looking to get on a plane, make sure you're familiar with the rules of the game.

Advertisement

Advertisement

I


---

## Part 4: Few-Shot Prompting

As mentioned in the lecture (Slide 84), large models like GPT-3 are "Few-Shot Learners". Even with our small GPT-2, we can demonstrate this effect.

We will try to make the model perform **Sentiment Analysis** not by training it, but by giving it examples in the prompt.

In [4]:
# We format the prompt as a list of examples
few_shot_prompt = """
Review: This movie was fantastic! Sentiment: Positive
Review: I hated every minute of it. Sentiment: Negative
Review: The acting was okay but the plot was boring. Sentiment: Negative
Review: An absolute masterpiece of cinema. Sentiment: Positive
Review: I fell asleep halfway through. Sentiment:"""

print("--- Few-Shot Prompting ---")
# We generate just a few tokens to complete the pattern
output = generate_greedy(few_shot_prompt, max_length=3)

# Print only the new part
print(output)

--- Few-Shot Prompting ---

Review: This movie was fantastic! Sentiment: Positive
Review: I hated every minute of it. Sentiment: Negative
Review: The acting was okay but the plot was boring. Sentiment: Negative
Review: An absolute masterpiece of cinema. Sentiment: Positive
Review: I fell asleep halfway through. Sentiment: Negative
Review


### Conclusion
You've just built a generative text pipeline!

1.  **Generation Loop:** You saw how LMs generate text autoregressively.
2.  **Sampling:** You learned that `Temperature` and `Top-K` control the trade-off between coherence and creativity.
3.  **Prompting:** You saw that you can "program" a language model just by giving it text examples, without updating any weights (Zero/Few-Shot learning).