# 📚 NoteBook 3 BART Evaluation

# 🚀 PROJECT PLAN

MKEM Implementation – Transformer-Based Abstractive Text Summarization

# 🎯 Problem Statement Recap

# 🔍 Objective:

To build and compare transformer-based summarization models (T5, BART, Pegasus) and then enhance them using MKEM (Multi-Knowledge-Enhanced Model) on curated English news datasets.

# 📌 Phase-1 Objective

# ✅ Implement the following 3 summarization models:
    
PEGASUS (Google)---NoteBook(2)

BART (Facebook)---NoteBook(3)

T5 (Google)---NoteBook(1)

# ✅ Evaluate on 3 benchmark datasets:
    
CNN/DailyMail

XSum

MultiNews

# ✅ Evaluation Metrics:
    
ROUGE-1

ROUGE-2

ROUGE-L

BERTScore

# 📊 Final Output (Per Model × Dataset):
    
You must submit structured results:

Dataset name

Model used

ROUGE-1, ROUGE-2, ROUGE-L, BERTScore

Short analysis/observations

# 📌 Phase-2 Objective:Final Comparison + MKEM-- NoteBook (4)

**🎯 Task: Model Comparison + MKEM Fusion**

# 🚀BART Implementation Plan

# 1.🚀BART on CNN

**🔹 Step 1: Setup & Imports**

In [29]:
from transformers import BartTokenizer, BartForConditionalGeneration
import torch
import pandas as pd

**🔹 Step 2: Load BART model & tokenizer**

In [30]:
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

**🔹 Step 3: Load Dataset (from CSV)**

In [31]:
# Load full datasets 
from datasets import load_dataset

cnn_dailymail = load_dataset("cnn_dailymail", "3.0.0")

In [32]:
print("📊 CNN/DailyMail Test Set:", cnn_dailymail['test'].shape)

📊 CNN/DailyMail Test Set: (11490, 3)


In [33]:
# Load CNN dataset saved from Notebook 1
df_cnn = pd.read_csv("cnn_dailymail.csv")

**🔹Step 4: Inspect the Data**

In [34]:
df_cnn.head()

Unnamed: 0,article,highlights,id
0,"LONDON, England (Reuters) -- Harry Potter star...",Harry Potter star Daniel Radcliffe gets £20M f...,42c027e4ff9730fbb3de84c1af0d2c506e41c3e4
1,Editor's note: In our Behind the Scenes series...,Mentally ill inmates in Miami are housed on th...,ee8871b15c50d0db17b0179a6d2beab35065f1e9
2,"MINNEAPOLIS, Minnesota (CNN) -- Drivers who we...","NEW: ""I thought I was going to die,"" driver sa...",06352019a19ae31e527f37f7571c6dd7f0c5da37
3,WASHINGTON (CNN) -- Doctors removed five small...,"Five small polyps found during procedure; ""non...",24521a2abb2e1f5e34e6824e0f9e56904a2b0e88
4,(CNN) -- The National Football League has ind...,"NEW: NFL chief, Atlanta Falcons owner critical...",7fe70cc8b12fab2d0a258fababf7d9c6b5e1262a


**🔹 Step 5: Clean Data**

In [35]:
# Clean the dataset
df_cnn = df_cnn.dropna(subset=["article", "highlights"])
df_cnn = df_cnn[df_cnn["article"].str.strip().astype(bool)]

**Remove Empty or Invalid Entries**

**🔹 Step 6: Define the Summary Generation Function**

In [36]:
def generate_summary(text):
    inputs = tokenizer(
        text,
        truncation=True,
        max_length=1024,  # BART max input size
        padding="longest",
        return_tensors="pt"
    ).to(device)

    summary_ids = model.generate(
        inputs["input_ids"],
        max_length=60,
        min_length=10,
        length_penalty=2.0,
        num_beams=4,
        early_stopping=True
    )
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

**🔹 Step 7: Generate Summaries**

In [37]:
sample_articles = df_cnn["article"][:3]  
generated_summaries = []

for i, article in enumerate(sample_articles):
    print(f"\n📰 Original Article #{i+1}:\n", article[:500], "...\n")
    
    summary = generate_summary(article)
    generated_summaries.append(summary)
    
    print(f"✍️ BART Summary #{i+1}:\n", summary)
    print("-" * 80)



📰 Original Article #1:
 LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as s ...

✍️ BART Summary #1:
 Harry Potter star Daniel Radcliffe turns 18 on Monday. He gains access to a reported £20 million ($41.1 million) fortune. Radcliffe's earnings from the first five Potter films have been held in a trust fund.
--------------------------------------------------------------------------------

📰 Original Article #2:
 Editor's note: In our Behind the Scenes series, CNN correspondents share their experiences in covering news and analyze the stories be

**🔹 Step 8: Save Predictions & References**

**This will prepare the results for evaluation:**

In [38]:
bart_cnn_preds = generated_summaries
bart_cnn_refs = df_cnn["highlights"][:len(generated_summaries)]

**🔹Step 9: Evaluate with ROUGE & BERTScore**

In [40]:
from evaluate import load

rouge = load("rouge")
bertscore = load("bertscore")

summary_results = []  # New list or continue same 

**evaluation function** 

In [41]:
def evaluate_metrics(dataset_name, predictions, references):
    rouge_scores = rouge.compute(predictions=predictions, references=references, use_stemmer=True)
    bert_scores = bertscore.compute(predictions=predictions, references=references, lang="en")
    avg_bertscore = sum(bert_scores["f1"]) / len(bert_scores["f1"])

    summary_results.append({
        "Dataset": dataset_name,
        "ROUGE-1": round(rouge_scores["rouge1"], 4),
        "ROUGE-2": round(rouge_scores["rouge2"], 4),
        "ROUGE-L": round(rouge_scores["rougeL"], 4),
        "BERTScore": round(avg_bertscore, 4)
    })

In [42]:
evaluate_metrics("CNN DailyMail", bart_cnn_preds, bart_cnn_refs)

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  return forward_call(*args, **kwargs)


**🔹 Step 10: Display the Result**

In [43]:
import pandas as pd

df_summary = pd.DataFrame(summary_results)
print("✅ BART Model ROUGE + BERTScore Summary")
print(df_summary.to_string(index=False))

✅ BART Model ROUGE + BERTScore Summary
      Dataset  ROUGE-1  ROUGE-2  ROUGE-L  BERTScore
CNN DailyMail   0.5277   0.2867   0.3625     0.8904


# 2.🚀BART on XSUM

**🔹 Step 1: Load XSum Dataset (from Notebook 1 CSV)**

In [44]:
# ✅ Load datasets properly
from datasets import load_dataset

# XSum (extreme summarization)
xsum = load_dataset("xsum")

Using the latest cached version of the dataset since xsum couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at C:\Users\SAMIM IMTIAZ\.cache\huggingface\datasets\xsum\default\1.2.0\082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71 (last modified on Sun Aug  3 14:51:50 2025).


In [45]:
# ✅ Show original dataset sizes (test set only or all splits as needed)
print("📊 XSum Test Set:", xsum['test'].shape)

📊 XSum Test Set: (11334, 3)


In [46]:
import pandas as pd

# Load XSum dataset saved from Notebook 1
df_xsum = pd.read_csv("xsum.csv")

**Step 2: Clean up**

In [47]:
df_xsum = df_xsum.dropna(subset=["document", "summary"])
df_xsum = df_xsum[df_xsum["document"].str.strip().astype(bool)]

**🔹Step 3: Inspect the Data**

In [48]:
df_xsum.head()

Unnamed: 0,document,summary,id
0,"The full cost of damage in Newton Stewart, one...",Clean-up operations are continuing across the ...,35232142
1,A fire alarm went off at the Holiday Inn in Ho...,Two tourist buses have been destroyed by fire ...,40143035
2,Ferrari appeared in a position to challenge un...,Lewis Hamilton stormed to pole position at the...,35951548
3,"John Edward Bates, formerly of Spalding, Linco...",A former Lincolnshire Police officer carried o...,36266422
4,Patients and staff were evacuated from Cerahpa...,An armed man who locked himself into a room at...,38826984


**🔹 Step 4: Generate Summaries with BART on XSum**

In [49]:
sample_articles = df_xsum["document"][:3]
generated_summaries = []

for i, article in enumerate(sample_articles):
    print(f"\n📰 Original XSum Article #{i+1}:\n", article[:500], "...\n")

    summary = generate_summary(article)
    generated_summaries.append(summary)

    print(f"✍️ BART Summary #{i+1}:\n", summary)
    print("-" * 80)


📰 Original XSum Article #1:
 The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed.
Repair work is ongoing in Hawick and many roads in Peeblesshire remain badly affected by standing water.
Trains on the west coast mainline face disruption due to damage at the Lamington Viaduct.
Many businesses and householders were affected by flooding in Newton Stewart after the River Cree overflowed into the town.
First Minister Nicola Sturgeon visited the area to inspect the damage.
The water ...

✍️ BART Summary #1:
 The full cost of damage in Newton Stewart, one of the areas worst affected, is still being assessed. First Minister Nicola Sturgeon visited the area to inspect the damage. A flood alert remains in place across the Borders.
--------------------------------------------------------------------------------

📰 Original XSum Article #2:
 A fire alarm went off at the Holiday Inn in Hope Street at about 04:20 BST on Saturday and guests were asked 

**🔹 Step 5: Evaluate Summaries for XSum**

In [50]:
# Save XSum predictions and references
bart_xsum_preds = generated_summaries
bart_xsum_refs = df_xsum["summary"][:len(generated_summaries)]

In [51]:
# Evaluate
evaluate_metrics("XSum", bart_xsum_preds, bart_xsum_refs)

  return forward_call(*args, **kwargs)


**🔹 Step 6: Evaluate with ROUGE & BERTScore**

In [27]:
#summary_results = []  # Clear previous evaluations before new ones

In [52]:
import pandas as pd

df_summary = pd.DataFrame(summary_results)
print("✅ BART Model ROUGE + BERTScore Summary")
print(df_summary.to_string(index=False))

✅ BART Model ROUGE + BERTScore Summary
      Dataset  ROUGE-1  ROUGE-2  ROUGE-L  BERTScore
CNN DailyMail   0.5277   0.2867   0.3625     0.8904
         XSum   0.2018   0.0347   0.1296     0.8680


# 3.🚀BART on MultiNews

**🔹 Step 1: Load XSum Dataset (from Notebook 1 CSV)**

In [53]:
# ✅ Load datasets properly
from datasets import load_dataset

# MultiNews (multi-document summarization)
multi_news = load_dataset("multi_news")

Using the latest cached version of the dataset since multi_news couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at C:\Users\SAMIM IMTIAZ\.cache\huggingface\datasets\multi_news\default\1.0.0\2f1f69a2bedc8ad1c5d8ae5148e4755ee7095f465c1c01ae8f85454342065a72 (last modified on Sun Aug  3 14:55:04 2025).


In [54]:
# ✅ Show original dataset sizes (test set only or all splits as needed)
print("📊 MultiNews Test Set:", multi_news['test'].shape)

📊 MultiNews Test Set: (5622, 2)


In [55]:
import pandas as pd

# Load dataset saved in Notebook 1
df_multi = pd.read_csv("multi_news.csv")

**🔹Step 2: Inspect the Data**

In [56]:
df_multi.head()

Unnamed: 0,document,summary
0,"National Archives \n \n Yes, it’s that time ag...",– The unemployment rate dropped to 8.2% last m...
1,LOS ANGELES (AP) — In her first interview sinc...,"– Shelly Sterling plans ""eventually"" to divorc..."
2,"GAITHERSBURG, Md. (AP) — A small, private jet ...",– A twin-engine Embraer jet that the FAA descr...
3,Tucker Carlson Exposes His Own Sexism on Twitt...,– Tucker Carlson is in deep doodoo with conser...
4,A man accused of removing another man's testic...,– What are the three most horrifying words in ...


**🔹 Step 3: Clean the data**

In [57]:
df_multi = df_multi.dropna(subset=["document", "summary"])
df_multi = df_multi[df_multi["document"].str.strip().astype(bool)]

**🔹 Step 4: Generate Summaries**

In [58]:
sample_articles = df_multi["document"][:3]
generated_summaries = []

for i, article in enumerate(sample_articles):
    print(f"\n📰 Original MultiNews Article #{i+1}:\n", article[:500], "...\n")

    summary = generate_summary(article)
    generated_summaries.append(summary)

    print(f"✍️ BART Summary #{i+1}:\n", summary)
    print("-" * 80)


📰 Original MultiNews Article #1:
 National Archives 
 
 Yes, it’s that time again, folks. It’s the first Friday of the month, when for one ever-so-brief moment the interests of Wall Street, Washington and Main Street are all aligned on one thing: Jobs. 
 
 A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m. New York time offering one of the most important snapshots on how the economy fared during the previous month. Expectations are for 203,000 new jobs to be created, according to economists p ...

✍️ BART Summary #1:
 A fresh update on the U.S. employment situation for January hits the wires at 8:30 a.m. New York time. Expectations are for 203,000 new jobs to be created, according to economists polled by Dow Jones Newswires. The unemployment rate is expected
--------------------------------------------------------------------------------

📰 Original MultiNews Article #2:
 LOS ANGELES (AP) — In her first interview since the NBA banned her estranged 

**🔹 Step 5: Save Predictions and References**

In [59]:
bart_multi_preds = generated_summaries
bart_multi_refs = df_multi["summary"][:len(generated_summaries)]

**🔹 Step 6: Evaluate**

In [60]:
evaluate_metrics("MultiNews", bart_multi_preds, bart_multi_refs)

  return forward_call(*args, **kwargs)


**🔹Step 7: BART with CNN,XSUM & MultiNews ROUGE & BERTScore**

In [61]:
# Final BART results summary
df_summary = pd.DataFrame(summary_results)
print("✅ PEGASUS Model ROUGE + BERTScore Summary")
print(df_summary.to_string(index=False))

✅ PEGASUS Model ROUGE + BERTScore Summary
      Dataset  ROUGE-1  ROUGE-2  ROUGE-L  BERTScore
CNN DailyMail   0.5277   0.2867   0.3625     0.8904
         XSum   0.2018   0.0347   0.1296     0.8680
    MultiNews   0.2866   0.1078   0.1727     0.8510


# 💾 Save the Scores to .CSV Files

**So that we can use to comapair models in different NoteBooks**

In [62]:
# Save to CSV
df_summary["Model"] = "BART"  
df_summary.to_csv("bart_scores.csv", index=False)