# 📚 NoteBook 6 ProphetNet Evaluation

# 🚀 PROJECT PLAN
MKEM Implementation – Transformer-Based Abstractive Text Summarization

# 🎯 Problem Statement Recap

# 🔍 Objective:

To build and compare transformer-based summarization models (T5, BART, Pegasus,BARTScore,ProphetNet,BigBird,LED,mTS,FLAN-T5,GPT 3.5 Turbo) and then enhance them using MKEM (Multi-Knowledge-Enhanced Model) on curated English news datasets.

# 📌 Phase-1 Objective

✅ Implement the following 3 summarization models:
    
PEGASUS (Google)---NoteBook(2)

BART (Facebook)---NoteBook(3)

T5 (Google)---NoteBook(1)

Final Comparison + MKEM---NoteBook(4)

NewsSum(Indian Newspaper)---NoteBook(5)

BARTScore---NoteBook(6)

ProphetNet---NoteBook(7)

BigBird-Pegasus---NoteBook(8)

LED(Longformer)---NoteBook(9)

mTS ---NoteBook(10)

FLAN-T5---NoteBook(11)

GPT-3.5 Turbo---NoteBook(12)

# ✅ Evaluate on 3 benchmark datasets:
    
1. CNN/DailyMail

2. Newssum (IndianNewsPaper)

# ✅ Evaluation Metrics:
    
ROUGE-1

ROUGE-2

ROUGE-L

BERTScore

# 📊 Final Output (Per Model × Dataset):
    
You must submit structured results:

Dataset name

Model used

ROUGE-1, ROUGE-2, ROUGE-L, BERTScore

Inference Time

GPU used

Short analysis/observations

# 1.🚀 ProphetNet on CNN Dataset

**✏️ Step 1: Install & Import Libraries**

In [60]:
from transformers import ProphetNetTokenizer, ProphetNetForConditionalGeneration
import torch
import pandas as pd

**✏️ Step 2: Load Model & Tokenizer**

In [56]:
model_name = "microsoft/prophetnet-large-uncased"
tokenizer = ProphetNetTokenizer.from_pretrained(model_name)
model = ProphetNetForConditionalGeneration.from_pretrained(model_name)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)

tokenizer_config.json:   0%|          | 0.00/141 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


prophetnet.tokenizer: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.57G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

**✏️ Step 3: Load CNN Dataset**

In [57]:
import pandas as pd

df_cnn = pd.read_csv("cnn_dailymail.csv")
df_cnn = df_cnn.dropna(subset=["article", "highlights"])
df_cnn = df_cnn[df_cnn["article"].str.strip().astype(bool)]
df_cnn = df_cnn[:5]  # For quick testing
df_cnn.head()

Unnamed: 0,article,highlights,id
0,"LONDON, England (Reuters) -- Harry Potter star...",Harry Potter star Daniel Radcliffe gets £20M f...,42c027e4ff9730fbb3de84c1af0d2c506e41c3e4
1,Editor's note: In our Behind the Scenes series...,Mentally ill inmates in Miami are housed on th...,ee8871b15c50d0db17b0179a6d2beab35065f1e9
2,"MINNEAPOLIS, Minnesota (CNN) -- Drivers who we...","NEW: ""I thought I was going to die,"" driver sa...",06352019a19ae31e527f37f7571c6dd7f0c5da37
3,WASHINGTON (CNN) -- Doctors removed five small...,"Five small polyps found during procedure; ""non...",24521a2abb2e1f5e34e6824e0f9e56904a2b0e88
4,(CNN) -- The National Football League has ind...,"NEW: NFL chief, Atlanta Falcons owner critical...",7fe70cc8b12fab2d0a258fababf7d9c6b5e1262a


**✏️ Step 4: Define Summarization Function**

In [61]:
def summarize_with_prophetnet(text):
    inputs = tokenizer(text, max_length=1024, return_tensors="pt", truncation=True).to(device)
    summary_ids = model.generate(
        inputs["input_ids"],
        max_length=150,
        min_length=40,
        num_beams=4,
        length_penalty=2.0,
        early_stopping=True
    )
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)


**✏️ Step 5: Generate Predictions**

In [62]:
prophet_preds = [summarize_with_prophetnet(article) for article in df_cnn["article"]]
prophet_refs = df_cnn["highlights"].tolist()

**✏️ Step 6: Evaluate with ROUGE & BERTScore**

In [65]:
!pip install evaluate



In [70]:
import evaluate
import numpy as np

# Load metrics
rouge = evaluate.load("rouge")
bertscore = evaluate.load("bertscore")

# 🔍 ROUGE
rouge_result = rouge.compute(predictions=prophet_preds, references=prophet_refs)
rouge1 = rouge_result["rouge1"]
rouge2 = rouge_result["rouge2"]
rougeL = rouge_result["rougeL"]

# 🔍 BERTScore
bert_score = bertscore.compute(predictions=prophet_preds, references=prophet_refs, lang="en")
bertscore_f1 = np.mean(bert_score["f1"])

# 📊 Final Output
print("📊 ProphetNet on CNN Dataset")
print(f"ROUGE-1 Score: {rouge1:.4f}")
print(f"ROUGE-2 Score: {rouge2:.4f}")
print(f"ROUGE-L Score: {rougeL:.4f}")
print(f"BERTScore F1 : {bertscore_f1:.4f}")

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  return forward_call(*args, **kwargs)


📊 ProphetNet on CNN Dataset
ROUGE-1 Score: 0.2368
ROUGE-2 Score: 0.0647
ROUGE-L Score: 0.1433
BERTScore F1 : 0.8246


# 💾 Save the Scores to .CSV Files

**So that we can use to comapair models in different NoteBooks**

In [71]:
import pandas as pd

# ✅ Create a DataFrame to store model evaluation results
prophetnet_scores = pd.DataFrame({
    "Model": ["ProphetNet"],
    "Dataset": ["CNN/DailyMail"],
    "ROUGE-1": [rouge1],
    "ROUGE-2": [rouge2],
    "ROUGE-L": [rougeL],
    "BERTScore": [bertscore_f1],
    "GPU_Used": [torch.cuda.get_device_name(0) if torch.cuda.is_available() else "CPU"],
    "Inference_Time": ["TBD"],  # You can later replace "TBD" with actual measured time
    "Comments": ["ProphetNet performs reasonably well on news summarization task."]
})

# ✅ Save to CSV
prophetnet_scores.to_csv("model_scores_prophetnet_cnn.csv", index=False)

# ✅ Preview
prophetnet_scores

Unnamed: 0,Model,Dataset,ROUGE-1,ROUGE-2,ROUGE-L,BERTScore,GPU_Used,Inference_Time,Comments
0,ProphetNet,CNN/DailyMail,0.23685,0.064652,0.143279,0.82461,CPU,TBD,ProphetNet performs reasonably well on news su...


# 2.🚀 ProphetNet on NewsSum Dataset

**✏️ Step 1: Load NewsSum Dataset**

In [76]:
import pandas as pd

# Load the cleaned NewsSum dataset
df_newsum = pd.read_csv("newsum_cleaned.csv")

# Optional: Limit to top 5 for testing
df_newsum = df_newsum[:5]

# Preview the structure
df_newsum.head()

Unnamed: 0,Headline,Article,Category,Summary
0,Elephant death brings to fore man-animal confl...,The death of a pregnant elephant in the buffer...,Local News,Thousands of farmers in Kerala have either aba...
1,Cases filed after two â€˜commit suicideâ€™ in ...,Two suicides were reported from Vadodara and D...,Crime and Justice,"In the first incident, a 30-year-old woman all..."
2,Woman alleges father tied to MP hospital bed o...,A day after a woman alleged that her father ha...,Health and Wellness,"The hospital denied the allegation, saying the..."
3,"Sena member, author, app designer â€“ the many...","Assistant police inspector Sachin Vaze, who wa...",Defense,"On Saturday, Vaze along with police constables..."
4,"Manager, owner of resort where Gujarat Congres...","The manager and owner of a resort in Rajkot, w...",Politics,The resort is reportedly owned by Indranil Raj...


**✏️Step 2: Generate Summaries with ProphetNet**

In [77]:
# Generate summaries using ProphetNet for NewsSum
prophet_preds_newsum = [summarize_with_prophetnet(article) for article in df_newsum["Article"]]
prophet_refs_newsum = df_newsum["Summary"].tolist()

**✏️Step 3: Evaluate with ROUGE and BERTScore**

In [80]:
import evaluate

# Load metrics
rouge = evaluate.load("rouge")
bertscore = evaluate.load("bertscore")

# ROUGE evaluation
rouge_result_newsum = rouge.compute(predictions=prophet_preds_newsum, references=prophet_refs_newsum)
print("🔍 ProphetNet - NewsSum ROUGE:\n", rouge_result_newsum)

# BERTScore evaluation
bertscore_result_newsum = bertscore.compute(predictions=prophet_preds_newsum, references=prophet_refs_newsum, lang="en")
avg_bertscore_newsum = sum(bertscore_result_newsum["f1"]) / len(bertscore_result_newsum["f1"])
print("🧠 ProphetNet - NewsSum BERTScore:", avg_bertscore_newsum)

# ✅ Save to CSV
prophetnet_scores.to_csv("ProphetNet_Newsum_Scores.csv", index=False)

# ✅ Preview
prophetnet_scores

🔍 ProphetNet - NewsSum ROUGE:
 {'rouge1': 0.21813003663003663, 'rouge2': 0.10525130598522978, 'rougeL': 0.17054029304029303, 'rougeLsum': 0.17054029304029303}


Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


🧠 ProphetNet - NewsSum BERTScore: 0.8251974701881408


Unnamed: 0,Model,Dataset,ROUGE-1,ROUGE-2,ROUGE-L,BERTScore,GPU_Used,Inference_Time,Comments
0,ProphetNet,CNN/DailyMail,23680000.0,6470000.0,14330000.0,82460000.0,CPU,TBD,ProphetNet performs reasonably well on news su...


# 💾 Step 4: Save Evaluation Scores to CSV

In [79]:
import csv

# Save ProphetNet + NewsSum results to CSV
with open("ProphetNet_Newsum_Scores.csv", mode="w", newline="") as file:
    writer = csv.writer(file)
    writer.writerow(["Model", "Dataset", "ROUGE-1", "ROUGE-2", "ROUGE-L", "BERTScore"])
    writer.writerow([
        "ProphetNet", "NewsSum",
        rouge_result_newsum["rouge1"],
        rouge_result_newsum["rouge2"],
        rouge_result_newsum["rougeL"],
        avg_bertscore_newsum
    ])