# 🧠 Transformers, Tokenization, and T5 — Hands-on Notebook

**Prepared by:** Angelica Cassandra Loria  
**Prepared for:** Advanced Placement (AP-level) High School Guest Lecture  
**Last updated:** 2025-10-15 07:41

---

## What you'll learn
1. What a **Transformer** is (high-level).
2. How **tokenization** works (turn text → numbers → text).
3. How to use **T5** (Text-to-Text Transfer Transformer) with Hugging Face.
4. Try **example tasks**: summarization, translation, and simple Q&A.
5. (Optional) Explore decoding strategies (greedy vs beam search) & prompt design.

> 👉 This notebook is designed to run on **Google Colab** or any Jupyter environment with Python 3.10+.


## Prerequisites

- Virtual Environment: An isolated folder that contains its own Python and installed packages. It keeps each project's libraries separate, so they don't interfere with one another. To create a virtual environment
`python -m venv myenv`
- GPU (Graphics Processing Unit) is like a super-fast calculator for parallel math. While your CPU is good at many tasls, it handles only a few computations at a time. A GPU can handle **thousands of math operations at once.**
- CUDA (Compute Unified Device Architecture): a special language that allows Python to talk to the GPU



## 0. Setup (Colab-friendly)

- Installs required libraries.
- Checks GPU.
- Imports everything we need.


In [2]:

# If running on Colab, uncomment the next line
# !pip -q install transformers torch accelerate sentencepiece datasets --upgrade

import torch, sys, platform
print("Python:", sys.version.split()[0])
print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("GPU name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU only")


Python: 3.12.12
PyTorch: 2.8.0+cu126
CUDA available: False
GPU name: CPU only



## 1. What is a Transformer? (High-level)

- A **Transformer** reads a sequence (e.g., sentence) and produces another sequence.
- It uses **self-attention** to decide which words to focus on.
- **Encoder–Decoder** setup (used by T5):  
  - **Encoder**: reads the input and builds a rich representation.  
  - **Decoder**: writes the output one token at a time, attending to the encoder.

> In practice: we give the model text + a *task prefix*, and it generates the answer as text.



## 2. Tokenization 101

We can't feed raw text directly to a neural network. We must:
1. **Tokenize**: map text → subword tokens (integers).
2. **Create attention masks**: tell the model which tokens are padding vs real text.
3. **Decode**: map token IDs back to human-readable text.

We'll use T5's **SentencePiece** tokenizer.


In [3]:

from transformers import T5TokenizerFast

tokenizer = T5TokenizerFast.from_pretrained("t5-small")

sample = "Transformers make NLP tasks easier and more unified."
enc = tokenizer(sample, return_tensors="pt")
print("Input IDs:", enc["input_ids"][0][:20])
print("Attention mask:", enc["attention_mask"][0][:20])
print("Decoded back:", tokenizer.decode(enc["input_ids"][0], skip_special_tokens=True))
print("Vocab size:", tokenizer.vocab_size)
print("Pad token / ID:", tokenizer.pad_token, tokenizer.pad_token_id)
print("EOS token / ID:", tokenizer.eos_token, tokenizer.eos_token_id)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Input IDs: tensor([31220,     7,   143,   445,  6892,  4145,  1842,    11,    72,     3,
        22927,     5,     1])
Attention mask: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
Decoded back: Transformers make NLP tasks easier and more unified.
Vocab size: 32100
Pad token / ID: <pad> 0
EOS token / ID: </s> 1



### Exercise A (5 minutes)
Try your **own sentence** below. Inspect the tokens and decode back. What do you notice about punctuation and capitalization?


In [4]:

your_text = "Type your sentence here and experiment with tokenization!"
enc2 = tokenizer(your_text, return_tensors="pt")
print("IDs:", enc2["input_ids"][0][:40])
print("Decoded:", tokenizer.decode(enc2["input_ids"][0], skip_special_tokens=True))


IDs: tensor([ 6632,    39,  7142,   270,    11,  5016,    28, 14145,  1707,    55,
            1])
Decoded: Type your sentence here and experiment with tokenization!



## 3. Meet T5 (Text-to-Text Transfer Transformer)

T5 treats **every** NLP task as *text → text*. Examples:
- `summarize: <article>`
- `translate English to French: <text>`
- `question: <question>  context: <passage>`

We'll start with the small variant `t5-small` to run on CPU or classroom GPUs.


In [5]:

from transformers import T5ForConditionalGeneration

device = "cuda" if torch.cuda.is_available() else "cpu"
model = T5ForConditionalGeneration.from_pretrained("t5-small").to(device)
model.eval()

def t5_generate(prompt: str, max_new_tokens=60, **gen_kwargs) -> str:
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, **gen_kwargs)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Ready! Device:", device)


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Ready! Device: cpu



## 4. Example Task — Summarization

We add the prefix `summarize:` and give it a paragraph.


In [15]:

paragraph = (
    "summarize: Artificial Intelligence (AI) helps computers learn patterns in data, "
    "so they can make predictions or generate human-like text. It powers chatbots, "
    "translation systems, and tools that summarize long articles for faster reading."
)

summary = t5_generate(paragraph, max_new_tokens=48)
print("Summary:", summary)


Summary: AI helps computers learn patterns in data, so they can make predictions or generate human-like text. it powers chatbots, translation systems, and tools that summarize long articles for faster reading.



## 5. Example Task — Translation (English → French)

T5 supports translation via prompts like: `translate English to French:`


In [16]:

translation_prompt = "translate English to French: Hello, how are you today? I hope your class enjoys this demo."
print(t5_generate(translation_prompt, max_new_tokens=64))


Bonjour, comment êtes-vous aujourd'hui?


> Try `translate English to Chinese`, what is the output? And why do you think it gave you that output?


In [22]:
from transformers import MT5ForConditionalGeneration, T5Tokenizer

model_mt5_name = "google/mt5-small"  # multilingual T5
tokenizer = T5Tokenizer.from_pretrained(model_mt5_name)
model_mt5 = MT5ForConditionalGeneration.from_pretrained(model_mt5_name)

text_mt5 = "translate English to Chinese: How are you today?"
inputs_mt5 = tokenizer(text_mt5, return_tensors="pt")

outputs_mt5 = model_mt5.generate(**inputs_mt5, max_length=40)
print(tokenizer.decode(outputs_mt5[0], skip_special_tokens=True))


tokenizer_config.json:   0%|          | 0.00/82.0 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/553 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


pytorch_model.bin:   0%|          | 0.00/1.20G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

<extra_id_0>



## 6. Example Task — Simple Question Answering

T5 can do simple QA when given a **question** and optional **context**. For open-domain questions, results may vary without context.


In [17]:

qa_prompt = "question: What does a Transformer use to focus on important words?"
print(t5_generate(qa_prompt, max_new_tokens=32))


important words



### Bonus: QA with Short Context


In [18]:

context = (
    "A Transformer uses attention mechanisms. Self-attention lets the model weigh different words "
    "to decide which parts of the input are most relevant."
)
prompt_with_ctx = f"question: What does a Transformer use to focus on important words?  context: {context}"
print(t5_generate(prompt_with_ctx, max_new_tokens=32))


attention mechanisms



## 7. Decoding Strategies

**Greedy search** picks the best next token each step, which can be safe but sometimes dull.  
**Beam search** considers multiple candidate sequences to find a better overall result.

Try both and compare.


In [19]:

long_text = (
    "summarize: Natural language processing enables computers to interpret and generate human language. "
    "It includes applications such as sentiment analysis, translation, and question answering. "
    "Transformers have advanced NLP by using attention mechanisms to model long-range dependencies effectively."
)

print("Greedy:")
print(t5_generate(long_text, max_new_tokens=64, do_sample=False))

print("\nBeam search (num_beams=4):")
print(t5_generate(long_text, max_new_tokens=64, num_beams=4, early_stopping=True))


Greedy:
natural language processing enables computers to interpret and generate human language. it includes applications such as sentiment analysis, translation, and question answering.

Beam search (num_beams=4):
natural language processing enables computers to interpret and generate human language. it includes applications such as sentiment analysis, translation, and question answering.



## 8. Prompt Experiments (Mini Lab)

Try different **task prefixes** and instructions:
- `paraphrase: <text>` (works sometimes, but T5-small wasn't explicitly fine-tuned for paraphrase)
- `explain like I'm five: <concept>`
- `translate English to Chinese: <text>`
- `grammar: <sentence>` (may be inconsistent without fine-tuning)

> Note: Pretrained T5 can *approximate* many tasks but shines when **fine-tuned** on a specific dataset.


In [11]:

your_prompt = "paraphrase: Transformers help models focus on important words using attention."
print(t5_generate(your_prompt, max_new_tokens=64))


Paraphrase



## 9. Performance Notes

- **Model size matters**: `t5-small` (60M params) is fast; `t5-base/large` are heavier.
- **GPU helps**, but CPU works for short inputs.
- **Max token length** and `max_new_tokens` affect speed and output quality.

> Use `t5-small` for live demos



## 10. Wrap-Up & Reflection

- Transformers read and write text using **attention**.
- Tokenization converts text ↔ numbers.
- T5 uses **task prefixes** to solve many problems with one model.

### Exit Ticket (2 minutes)
Write a one-sentence idea for an app you could build with T5 (e.g., homework explainer, language study buddy).  
What task prefix would you use? Why?



## Appendix — Troubleshooting

- **CUDA out of memory**: Switch to CPU, shorten input, or use `t5-small`.
- **Slow on CPU**: Reduce `max_new_tokens`, keep inputs short.
- **Tokenization bugs**: Ensure `sentencepiece` is installed (Colab command above).
