
# Week 3: NLP Foundations: Text Generation

This lab runs smoothly on **Google Colab**. It contains **two coding-only exercises**:
- **Exercise 1 (heavily scaffolded)**: Use a **Transformer** (GPT-style causal language model) for **text completion**.
- **Exercise 2 (lightly scaffolded)**: Use the same model to **generate variations at different `temperature` values**.




## Before You Start: Use a GPU on Google Colab

1. Go to **Runtime ▶ Change runtime type**.
2. Set **Hardware accelerator** to **T4 GPU**, then click **Save**.
3. Run the cell below to confirm Colab sees your GPU.


In [None]:

# Check GPU availability in Colab
import torch, os, sys, subprocess

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("CUDA device name:", torch.cuda.get_device_name(0))
    # Optional: show GPU details
    try:
        subprocess.run(["nvidia-smi"], check=False)
    except Exception as e:
        print("nvidia-smi not available:", e)
else:
    print("Tip: In Colab, go to Runtime > Change runtime type > GPU, then rerun this cell.")


**bold text**
## Setup

This will install and import the required libraries. Rerun if the runtime restarts.


In [None]:

# If running on Colab, install dependencies quickly.
def in_colab():
    try:
        import google.colab  # type: ignore
        return True
    except Exception:
        return False

if in_colab():
    # Colab usually has a recent torch. Upgrade transformers.
    !pip -q install transformers==4.44.2 accelerate
else:
    print("Running outside Colab. Ensure you have: transformers>=4.44.2, accelerate, torch.")

import torch, random
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using device:", device)
random.seed(42)
torch.manual_seed(42)



## What Are We Using? Quick Concepts

- **Transformer (GPT-style, causal language model)**: A neural network that predicts the next token given previous tokens. "Causal" means it only attends to the left (past context), which is ideal for **text generation and completion**.
- **Tokenizer**: Converts text to token IDs and back. Models operate on token IDs.
- **Generation**: We provide a **prompt** and ask the model to produce **`max_new_tokens`** tokens.

### Key Generation Parameters
- **`max_new_tokens`**: How many tokens to generate beyond your prompt.
- **`temperature`**: Controls randomness during sampling.
  - `0.0` ~ deterministic (with greedy decoding); higher values produce more diverse outputs.
- **`do_sample`**: Whether to sample from a distribution instead of picking the single most likely token.
  - `False` = **greedy decoding** (deterministic, may be repetitive or dull).
  - `True` = **sampling** (enables `temperature`, `top_k`, `top_p`).
- **`top_k`**: Sample only from the **top K** highest-probability tokens at each step (e.g., 50).
- **`top_p` (nucleus sampling)**: Sample from the **smallest set of tokens whose cumulative probability ≥ p** (e.g., 0.9 or 0.95).
- **`no_repeat_ngram_size`**: Discourages repeating n-grams of a given size (e.g., 3).
- **`eos_token_id`**: Token ID that signals "end of sequence". If the model produces it, generation stops early.

> For this lab we will use **`distilgpt2`**, a small GPT-2 style model that runs on CPU or GPU.



## Exercise 1 — Heavily Scaffolded: Text Completion with a Transformer

**Goal**: Load a GPT-style model and complete a prompt. Then tweak parameters to observe differences.

**What to do**: Just **run the cells**. Then **change the parameter values** and re-run to see effects.


In [None]:

# 1) Load a small, CPU/GPU-friendly model
MODEL_NAME = "distilgpt2"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME).to(device)

text_gen = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if device == "cuda" else -1,
)

print("Model loaded:", MODEL_NAME, "| Device:", device)


In [None]:

# 2) Provide a prompt to complete
prompt = "In natural language processing, a tokenizer is"

# 3) Generation parameters (feel free to change and re-run)
params = {
    "max_new_tokens": 50,
    "temperature": 0.6,
    "top_k": 50,
    "top_p": 0.9,
    "do_sample": True,
    "no_repeat_ngram_size": 3,
    "eos_token_id": tokenizer.eos_token_id,
}

outputs = text_gen(prompt, **params)
print(outputs[0]["generated_text"])



### Try These Quick Variations (Run and Compare)
- Set **`do_sample=False`** (greedy decoding). Remove `top_k` and `top_p` from `params`. How does the output change?
- Increase **`max_new_tokens`** to see when the model starts to ramble.
- Set **`temperature=0.0`** with `do_sample=True` (still near-deterministic sampling) vs **`temperature=1.3`** for more surprising outputs.


In [None]:

# Optional: Batch generation for multiple prompts
prompts = [
    "Transformers revolutionised NLP because",
    "Attention mechanisms allow models to",
    "Fine-tuning a language model involves",
]

batch_outputs = text_gen(prompts, **params)
for i, out in enumerate(batch_outputs):
    print(f"--- Prompt {i+1} ---")
    print(out[0]["generated_text"], end="\n\n")



## Exercise 2 — Lightly Scaffolded: Explore `temperature`

**Goal**: Generate multiple continuations from the **same prompt** at different temperatures and observe the differences.

**What to do**: Just **run the cell**. Then **change the temperatures** and re-run to see how style and variability change.


In [None]:

from typing import List

def generate_with_temperature(prompt: str, temps: List[float], max_new_tokens: int = 60):
    results = {}
    for t in temps:
        outputs = text_gen(
            prompt,
            max_new_tokens=max_new_tokens,
            temperature=t,
            do_sample=(t > 0.0),  # if t==0.0 we effectively get greedy behavior
            top_k=50,
            top_p=0.95,
            no_repeat_ngram_size=3,
            eos_token_id=tokenizer.eos_token_id,
        )
        results[t] = outputs[0]["generated_text"]
    return results

# Your experiment (feel free to change)
prompt = "Write a short, vivid description of a sunrise over a quiet city."
temperatures = [0.0, 0.3, 0.7, 1.0, 1.3]

results = generate_with_temperature(prompt, temperatures)

for t, text in results.items():
    print(f"===== Temperature = {t} =====")
    print(text, end="\n\n")



## (Optional, Instructor Demo) Using a Hosted API Model

If you have API access, you can compare with a larger chat model. This is **optional** and **commented out**. Do **not** share your API key.


In [None]:

# OPTIONAL: Requires `openai` >= 1.0.0
# %pip -q install openai

# from openai import OpenAI
# import os
# os.environ['OPENAI_API_KEY'] = "YOUR_KEY_HERE"  # or set securely in the environment
# client = OpenAI()

# def chat_generate(prompt: str, temperature: float = 0.7, model: str = "gpt-4o-mini"):
#     resp = client.chat.completions.create(
#         model=model,
#         temperature=temperature,
#         messages=[
#             {"role": "system", "content": "You are a helpful writing assistant."},
#             {"role": "user", "content": prompt},
#         ],
#         max_tokens=150,
#     )
#     return resp.choices[0].message.content

# for t in [0.0, 0.3, 0.7, 1.0, 1.3]:
#     print(f"=== Temperature {t} ===")
#     print(chat_generate("Write a 3-sentence micro-story about a lost key.", temperature=t))
#     print()
