# Abstractive Summarization using PEGASUS on XSum

This notebook demonstrates abstractive summarization using the **PEGASUS** model fine-tuned on the **XSum** dataset. XSum contains highly abstractive and diverse summaries for BBC news articles, making it ideal for evaluating concise summarization models.


In [1]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
!pip install transformers datasets torch accelerate rouge_score  --quiet

In [6]:
from transformers import PegasusTokenizer, PegasusForConditionalGeneration
from datasets import load_dataset
import torch

2025-06-07 06:21:21.636758: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1749277281.864765      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749277281.929158      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


### 🔹 Step 1: Load Pretrained PEGASUS Model and Tokenizer

We use the `google/pegasus-xsum` variant trained specifically on the XSum dataset.


In [None]:
model_name = "google/pegasus-xsum"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

### 🔹 Step 2: Load XSum Dataset

We use the Hugging Face `datasets` library to load the XSum dataset and sample a test document.


In [None]:
dataset = load_dataset("xsum")
sample = dataset["test"][0]["document"]
sample

'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said the need for acc

### 🔹 Step 3: Tokenize the Input Document

Prepare the input text by tokenizing with PEGASUS tokenizer.


In [None]:
inputs = tokenizer(
    sample,
    truncation=True,
    padding="longest",
    return_tensors="pt"
)

### 🔹 Step 4: Generate Abstractive Summary

Use the `generate` method with beam search to produce a summary.


In [None]:
summary_ids = model.generate(
    inputs["input_ids"],
    max_length=60,
    num_beams=5,
    length_penalty=1.0,
    early_stopping=True
)

### 🔹 Step 5: Decode the Generated Summary

Convert token IDs back into human-readable text.


In [None]:
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

# 6. Print
print("\n📰 Original Article:\n", sample)
print("\n📌 Abstractive Summary:\n", summary)


📰 Original Article:
 Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.
Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.
The Welsh Government said more people than ever were getting help to address housing problems.
Changes to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.
Prison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.
However, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.
Andrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said t