In [1]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

Mounted at /content/drive


# Text Summarization with T5 and Attention

## Introduction
Text summarization generates a concise summary from longer text. Traditional Seq2Seq models encode the input into a single vector, which may lose important information. Transformer models like **T5** use the **attention mechanism** to focus on relevant parts of the input for each output token, improving accuracy and fluency.

We **finetune T5** on a subset of the **CNN/DailyMail** dataset to adapt the pre-trained model for summarization. This approach leverages pre-trained language understanding while optimizing for our summarization task.

## Why Attention?
- Focuses on important tokens in the input sequence.  
- Improves sequence-to-sequence performance.  
- Enables interpretability through attention visualization.

## Dataset
- **CNN/DailyMail (v3.0.0)**, using `train[:1%]` for fast experimentation on limited resources like Google Colab T4.  

## Benefits
- Produces concise, fluent, and semantically accurate summaries.  
- Handles long input sequences better than traditional Seq2Seq models.  


In [27]:
from datasets import load_dataset

dataset = load_dataset("cnn_dailymail", "3.0.0", split="train[:1%]")
print(dataset[0])

README.md: 0.00B [00:00, ?B/s]

train-00000-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

train-00001-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

train-00002-of-00003.parquet:   0%|          | 0.00/259M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/34.7M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/30.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/287113 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/13368 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11490 [00:00<?, ? examples/s]

{'article': 'LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won\'t cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don\'t plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar," he told an Australian interviewer earlier this month. "I don\'t think I\'ll be particularly extravagant. "The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box office char

In [28]:
print("Example article:")
print(dataset[0]['article'])
print("\nExample summary:")
print(dataset[0]['highlights'])


Example article:
LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar," he told an Australian interviewer earlier this month. "I don't think I'll be particularly extravagant. "The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box office char

In [29]:
!pip install -q transformers sentencepiece

## Load Pretrained Model and Tokenizer
We use **T5-small** which is pretrained for text-to-text tasks.


In [30]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

## Preprocess Data
Tokenize articles and summaries to prepare input for T5.

- Max input length: 512 tokens  
- Max target length: 128 tokens  
- Pad sequences to fixed length


In [31]:
max_input_length = 512
max_target_length = 128

def preprocess(example):
    input_enc = tokenizer(
        example['article'],
        truncation=True,
        padding='max_length',
        max_length=max_input_length
    )
    target_enc = tokenizer(
        example['highlights'],
        truncation=True,
        padding='max_length',
        max_length=max_target_length
    )
    return {
        'input_ids': input_enc['input_ids'],
        'attention_mask': input_enc['attention_mask'],
        'labels': target_enc['input_ids']
    }

# Apply preprocessing
tokenized_dataset = dataset.map(preprocess)


Map:   0%|          | 0/2871 [00:00<?, ? examples/s]

## Prepare DataLoader
Convert dataset into PyTorch format and create batches.


In [47]:
# Keep only necessary columns and convert to dict
tokenized_dataset = tokenized_dataset.remove_columns(
    [col for col in tokenized_dataset.column_names if col not in ['input_ids','attention_mask','labels']]
)
tokenized_dataset = {key: tokenized_dataset[key] for key in ['input_ids','attention_mask','labels']}

# Create PyTorch Dataset
class CNNDailyMailDataset(torch.utils.data.Dataset):
    def __init__(self, encodings):
        self.encodings = encodings
    def __len__(self):
        return len(self.encodings['input_ids'])
    def __getitem__(self, idx):
        return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}

train_dataset = CNNDailyMailDataset(tokenized_dataset)
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)


## Training Setup
- Optimizer: AdamW  
- Loss is calculated internally by T5  
- Train for **1–2 epochs** on small subset (Colab T4)


In [48]:
import torch
from torch.optim import AdamW

# Device
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)

# Optimizer
optimizer = AdamW(model.parameters(), lr=5e-5)


In [52]:
from tqdm import tqdm

model.train()
for epoch in range(6):
    loop = tqdm(train_loader, leave=True)
    for batch in loop:
        optimizer.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())


Epoch 0: 100%|██████████| 1436/1436 [02:45<00:00,  8.70it/s, loss=1.67]
Epoch 1: 100%|██████████| 1436/1436 [02:45<00:00,  8.68it/s, loss=1.63]
Epoch 2: 100%|██████████| 1436/1436 [02:44<00:00,  8.70it/s, loss=1.04]
Epoch 3: 100%|██████████| 1436/1436 [02:45<00:00,  8.70it/s, loss=1.95]
Epoch 4: 100%|██████████| 1436/1436 [02:45<00:00,  8.69it/s, loss=1.42]
Epoch 5: 100%|██████████| 1436/1436 [02:45<00:00,  8.69it/s, loss=0.564]


In [53]:
model.eval()

examples = dataset[:5]['article']  # pick a few examples
inputs = tokenizer(examples, max_length=512, truncation=True, return_tensors="pt", padding="max_length")
inputs = {key: val.to(device) for key, val in inputs.items()}

summary_ids = model.generate(
    input_ids=inputs['input_ids'],
    attention_mask=inputs['attention_mask'],
    max_length=50,
    num_beams=4,
    early_stopping=True
)

summaries = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in summary_ids]

for i, summ in enumerate(summaries):
    print(f"Article:\n{examples[i]}\n")
    print(f"Predicted Summary:\n{summ}\n")
    print("-"*50)


Article:
LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar," he told an Australian interviewer earlier this month. "I don't think I'll be particularly extravagant. "The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box office chart. Detai