# 📚 NoteBook 2 PEGASUS Evaluation

# 🚀 PROJECT PLAN
MKEM Implementation – Transformer-Based Abstractive Text Summarization

# 🎯 Problem Statement Recap

# 🔍 Objective:
To build and compare transformer-based summarization models (T5, BART, Pegasus) and then enhance them using MKEM (Multi-Knowledge-Enhanced Model) on curated English news datasets.

# 📌 Phase-1 Objective

# ✅ Implement the following 3 summarization models:
    
PEGASUS (Google)---NoteBook(2)

BART (Facebook)---NoteBook(3)

T5 (Google)---NoteBook(1)

# ✅ Evaluate on 3 benchmark datasets:
    
CNN/DailyMail

XSum

MultiNews

# ✅ Evaluation Metrics:
    
ROUGE-1

ROUGE-2

ROUGE-L

BERTScore

# 📊 Final Output (Per Model × Dataset):
    
You must submit structured results:

Dataset name

Model used

ROUGE-1, ROUGE-2, ROUGE-L, BERTScore

Short analysis/observations

# 📌 Phase-2 Objective:Final Comparison + MKEM-- NoteBook (4)

**🎯 Task: Model Comparison + MKEM Fusion**

# 🚀PEGASUS Implementation Plan

# **1.🚀PEGASUS on CNN**

**🔹 Step 1: Install & Import**

In [4]:
from transformers import PegasusTokenizer, PegasusForConditionalGeneration
import torch
import pandas as pd

In [2]:
import torch
print(torch.version.cuda)

None


In [3]:
!pip install torch==2.6.0+cpu --index-url https://download.pytorch.org/whl/cpu

Looking in indexes: https://download.pytorch.org/whl/cpu



[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


**🔹 Step 2: Load PEGASUS model & tokenizer**

In [5]:
model_name = "google/pegasus-cnn_dailymail"
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

Some weights of PegasusForConditionalGeneration were not initialized from the model checkpoint at google/pegasus-cnn_dailymail and are newly initialized: ['model.decoder.embed_positions.weight', 'model.encoder.embed_positions.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


generation_config.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


**🔹 Step 3: Load Dataset from CSV**

In [34]:
# Load full datasets 
from datasets import load_dataset

cnn_dailymail = load_dataset("cnn_dailymail", "3.0.0")

In [36]:
print("📊 CNN/DailyMail Test Set:", cnn_dailymail['test'].shape)

📊 CNN/DailyMail Test Set: (11490, 3)


In [37]:
# Load CNN dataset saved from Notebook 1
df_cnn = pd.read_csv("cnn_dailymail.csv")

**🔹Step 4: Inspect the Data**

In [8]:
df_cnn.head()

Unnamed: 0,article,highlights,id
0,"LONDON, England (Reuters) -- Harry Potter star...",Harry Potter star Daniel Radcliffe gets £20M f...,42c027e4ff9730fbb3de84c1af0d2c506e41c3e4
1,Editor's note: In our Behind the Scenes series...,Mentally ill inmates in Miami are housed on th...,ee8871b15c50d0db17b0179a6d2beab35065f1e9
2,"MINNEAPOLIS, Minnesota (CNN) -- Drivers who we...","NEW: ""I thought I was going to die,"" driver sa...",06352019a19ae31e527f37f7571c6dd7f0c5da37
3,WASHINGTON (CNN) -- Doctors removed five small...,"Five small polyps found during procedure; ""non...",24521a2abb2e1f5e34e6824e0f9e56904a2b0e88
4,(CNN) -- The National Football League has ind...,"NEW: NFL chief, Atlanta Falcons owner critical...",7fe70cc8b12fab2d0a258fababf7d9c6b5e1262a


**🔹 Step 5: Define Generation Function**

In [9]:
def generate_summary(text):
    inputs = tokenizer(text, truncation=True, padding="longest", return_tensors="pt").to(device)
    summary_ids = model.generate(inputs["input_ids"], max_length=60, min_length=10, length_penalty=2.0, num_beams=4, early_stopping=True)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)


**🔹 Step 6: Generate Summaries for Evaluation**

In [11]:
sample_articles = df_cnn["article"][:3] 
generated_summaries = []

for i, article in enumerate(sample_articles):
    print(f"\n📰 Original Article #{i+1}:\n", article[:500], "...\n")
    
    summary = generate_summary(article)
    generated_summaries.append(summary)
    
    print(f"✍️ PEGASUS Summary #{i+1}:\n", summary)
    print("-" * 80)


📰 Original Article #1:
 LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as s ...

✍️ PEGASUS Summary #1:
 Harry Potter star Daniel Radcliffe gains access to a reported £20 million fortune .<n>Young actor says he has no plans to fritter his cash away .<n>Radcliffe's earnings from the first five Potter films have been held in a trust fund .
--------------------------------------------------------------------------------

📰 Original Article #2:
 Editor's note: In our Behind the Scenes series, CNN correspondents share their experiences in covering n

**Define CNN predictions and references**

In [12]:
pegasus_cnn_preds = generated_summaries
pegasus_cnn_refs = df_cnn["highlights"][:len(generated_summaries)]

**🔹 Step 7: Evaluate with ROUGE & BERTScore**

In [15]:
from evaluate import load
import pandas as pd

# Load evaluation metrics
rouge = load("rouge")
bertscore = load("bertscore")

# Global list to collect results
summary_results = []

# Define evaluation function
def evaluate_metrics(dataset_name, predictions, references):
    # ROUGE computation
    rouge_scores = rouge.compute(predictions=predictions, references=references, use_stemmer=True)

    # BERTScore computation
    bert_scores = bertscore.compute(predictions=predictions, references=references, lang="en")
    avg_bertscore = sum(bert_scores["f1"]) / len(bert_scores["f1"])

    # Append results
    summary_results.append({
        "Dataset": dataset_name,
        "ROUGE-1": round(rouge_scores["rouge1"], 4),
        "ROUGE-2": round(rouge_scores["rouge2"], 4),
        "ROUGE-L": round(rouge_scores["rougeL"], 4),
        "BERTScore": round(avg_bertscore, 4)
    })


In [16]:
evaluate_metrics("CNN DailyMail", pegasus_cnn_preds, pegasus_cnn_refs)

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  return forward_call(*args, **kwargs)


**🔹Step 8: PEGASUS with CNN ROUGE & BERTScore**

In [17]:
df_summary = pd.DataFrame(summary_results)
print("✅ PEGASUS Model ROUGE + BERTScore Summary")
print(df_summary.to_string(index=False))

✅ PEGASUS Model ROUGE + BERTScore Summary
      Dataset  ROUGE-1  ROUGE-2  ROUGE-L  BERTScore
CNN DailyMail   0.5595   0.4425   0.5201     0.9102


# 2.🚀PEGASUS on XSUM

**🔹 Step 9: Load XSum Dataset**

In [38]:
# ✅ Load datasets properly
from datasets import load_dataset

# XSum (extreme summarization)
xsum = load_dataset("xsum")

Using the latest cached version of the dataset since xsum couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at C:\Users\SAMIM IMTIAZ\.cache\huggingface\datasets\xsum\default\1.2.0\082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71 (last modified on Sun Aug  3 14:51:50 2025).


In [39]:
# ✅ Show original dataset sizes (test set only or all splits as needed)
print("📊 XSum Test Set:", xsum['test'].shape)

📊 XSum Test Set: (11334, 3)


In [18]:
import pandas as pd

# Load XSum dataset saved from Notebook 1
df_xsum = pd.read_csv("xsum.csv")

**🔹Step 10: Inspect the Data**

In [19]:
df_xsum.head()

Unnamed: 0,document,summary,id
0,"The full cost of damage in Newton Stewart, one...",Clean-up operations are continuing across the ...,35232142
1,A fire alarm went off at the Holiday Inn in Ho...,Two tourist buses have been destroyed by fire ...,40143035
2,Ferrari appeared in a position to challenge un...,Lewis Hamilton stormed to pole position at the...,35951548
3,"John Edward Bates, formerly of Spalding, Linco...",A former Lincolnshire Police officer carried o...,36266422
4,Patients and staff were evacuated from Cerahpa...,An armed man who locked himself into a room at...,38826984


**🔹 Step 11: Generate PEGASUS Summaries for XSum**

In [20]:
sample_articles = df_xsum["document"][:3]
generated_summaries = []

for i, article in enumerate(sample_articles):
    print(f"\n📰 Original XSum Article #{i+1}:\n", article[:500], "...\n")
    
    summary = generate_summary(article)
    generated_summaries.append(summary)
    
    print(f"✍️ PEGASUS Summary #{i+1}:\n", summary)
    print("-" * 80)

# Store predictions and references
pegasus_xsum_preds = generated_summaries
pegasus_xsum_refs = df_xsum["summary"][:3]


📰 Original XSum Article #1:
 The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The water ...

✍️ PEGASUS Summary #1:
 Many roads in Peeblesshire remain badly affected by standing water .<n>Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct .<n>First Minister Nicola Sturgeon visited the area to inspect the damage .
--------------------------------------------------------------------------------

📰 Original XSum Article #2:
 A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and gu

**🔹 Step 11: Evaluate with ROUGE & BERTScore**

In [21]:
evaluate_metrics("XSum", pegasus_xsum_preds, pegasus_xsum_refs) 

  return forward_call(*args, **kwargs)


**🔹Step 12: PEGASUS with CNN & XSUM ROUGE & BERTScore**

In [23]:
df_summary = pd.DataFrame(summary_results)
print("✅ PEGASUS Model ROUGE + BERTScore Summary")
print(df_summary.to_string(index=False))

✅ PEGASUS Model ROUGE + BERTScore Summary
      Dataset  ROUGE-1  ROUGE-2  ROUGE-L  BERTScore
CNN DailyMail   0.5595   0.4425   0.5201     0.9102
         XSum   0.2265   0.0681   0.1677     0.8609


# 3.🚀PEGASUS on MultiNews

**🔹Step 13: Load MultiNews Dataset**

In [40]:
# ✅ Load datasets properly
from datasets import load_dataset

# MultiNews (multi-document summarization)
multi_news = load_dataset("multi_news")

Using the latest cached version of the dataset since multi_news couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at C:\Users\SAMIM IMTIAZ\.cache\huggingface\datasets\multi_news\default\1.0.0\2f1f69a2bedc8ad1c5d8ae5148e4755ee7095f465c1c01ae8f85454342065a72 (last modified on Sun Aug  3 14:55:04 2025).


In [41]:
# ✅ Show original dataset sizes (test set only or all splits as needed)
print("📊 MultiNews Test Set:", multi_news['test'].shape)

📊 MultiNews Test Set: (5622, 2)


In [24]:
import pandas as pd

# Load dataset saved in Notebook 1
df_multi = pd.read_csv("multi_news.csv")

**🔹Step 14: Inspect the Data**

In [25]:
df_multi.head()

Unnamed: 0,document,summary
0,"National Archives \n \n Yes, it’s that time ag...",– The unemployment rate dropped to 8.2% last m...
1,LOS ANGELES (AP) — In her first interview sinc...,"– Shelly Sterling plans ""eventually"" to divorc..."
2,"GAITHERSBURG, Md. (AP) — A small, private jet ...",– A twin-engine Embraer jet that the FAA descr...
3,Tucker Carlson Exposes His Own Sexism on Twitt...,– Tucker Carlson is in deep doodoo with conser...
4,A man accused of removing another man's testic...,– What are the three most horrifying words in ...


**🔹Step 15: Generate PEGASUS Summaries**

In [26]:
sample_articles = df_multi["document"][:3]
generated_summaries = []

for i, article in enumerate(sample_articles):
    print(f"\n📰 Original MultiNews Article #{i+1}:\n", article[:500], "...\n")
    
    summary = generate_summary(article)
    generated_summaries.append(summary)
    
    print(f"✍️ PEGASUS Summary #{i+1}:\n", summary)
    print("-" * 80)

# Set predictions and references
pegasus_multi_preds = generated_summaries
pegasus_multi_refs = df_multi["summary"][:3]



📰 Original MultiNews Article #1:
 National Archives 
 
 Yes, it’s that time again, folks. It’s the first Friday of the month, when for one ever-so-brief moment the interests of Wall Street, Washington and Main Street are all aligned on one thing: Jobs. 
 
 A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m. New York time offering one of the most important snapshots on how the economy fared during the previous month. Expectations are for 203,000 new jobs to be created, according to economists p ...

✍️ PEGASUS Summary #1:
 A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m.<n>Expectations are for 203,000 new jobs to be created, compared to 227,000 jobs added in February .<n>The unemployment rate is expected to hold steady at 8.3%
--------------------------------------------------------------------------------

📰 Original MultiNews Article #2:
 LOS ANGELES (AP) — In her first interview since the NBA banned her estra

 **🔹Step 16: Evaluate with ROUGE + BERTScore**

In [28]:
evaluate_metrics("MultiNews", pegasus_multi_preds, pegasus_multi_refs)

  return forward_call(*args, **kwargs)


**🔹Step 17: PEGASUS with CNN,XSUM & MultiNews ROUGE & BERTScore**

In [29]:
# Final PEGASUS results summary
df_summary = pd.DataFrame(summary_results)
print("✅ PEGASUS Model ROUGE + BERTScore Summary")
print(df_summary.to_string(index=False))

✅ PEGASUS Model ROUGE + BERTScore Summary
      Dataset  ROUGE-1  ROUGE-2  ROUGE-L  BERTScore
CNN DailyMail   0.5595   0.4425   0.5201     0.9102
         XSum   0.2265   0.0681   0.1677     0.8609
    MultiNews   0.3221   0.1206   0.2295     0.8481


# 💾 Save the Scores to .CSV Files

**So that we can use to comapair models in different NoteBooks**

In [30]:
# Save to CSV
df_summary["Model"] = "PEGASUS"  
df_summary.to_csv("pegasus_scores.csv", index=False)