# Set Up Google Colab Environment
###  enable GPU in Runtime > Change runtime type > Hardware accelerator > GPU


Install Required Libraries

Load the Dataset

Preprocess the Text

Apply the Transformer Model

Evaluate Summary Quality

In [6]:
# Step 2: Install Required Libraries
!pip install transformers datasets



In [7]:
from datasets import load_dataset


In [8]:
# use a small sample from the CNN dataset
dataset = load_dataset("cnn_dailymail", "3.0.0", split='test[:1%]')  # Load a small portion for demonstration


In [9]:
# Display some samples from the dataset
for i in range(2):
    print(f"Article {i+1}: {dataset[i]['article']}\n")
    print(f"Summary {i+1}: {dataset[i]['highlights']}\n")


Article 1: (CNN)The Palestinian Authority officially became the 123rd member of the International Criminal Court on Wednesday, a step that gives the court jurisdiction over alleged crimes in Palestinian territories. The formal accession was marked with a ceremony at The Hague, in the Netherlands, where the court is based. The Palestinians signed the ICC's founding Rome Statute in January, when they also accepted its jurisdiction over alleged crimes committed "in the occupied Palestinian territory, including East Jerusalem, since June 13, 2014." Later that month, the ICC opened a preliminary examination into the situation in Palestinian territories, paving the way for possible war crimes investigations against Israelis. As members of the court, Palestinians may be subject to counter-charges as well. Israel and the United States, neither of which is an ICC member, opposed the Palestinians' efforts to join the body. But Palestinian Foreign Minister Riad al-Malki, speaking at Wednesday's c

In [10]:
# Step 4: Preprocess the Text
from transformers import BartTokenizer

# Load the BART tokenizer
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")



In [11]:
# Tokenize the input texts
def tokenize_function(examples):
    return tokenizer(examples["article"], padding="max_length", truncation=True, max_length=1024)

# Tokenize the dataset
tokenized_dataset = dataset.map(tokenize_function, batched=True)

In [12]:
# Step 5: Apply the Transformer Model
from transformers import BartForConditionalGeneration
import torch


In [13]:
# Load the BART model
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")

# Generate summaries
def generate_summary(article):
    inputs = tokenizer(article, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(inputs["input_ids"], max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)


In [14]:
# Generate summaries for the first 5 articles in the dataset
summaries = []
for i in range(5):
    summary = generate_summary(dataset[i]["article"])
    summaries.append(summary)
    print(f"Article {i+1} Summary: {summary}\n")

Article 1 Summary: The Palestinian Authority becomes the 123rd member of the International Criminal Court. The move gives the court jurisdiction over alleged crimes in Palestinian territories. Israel and the United States opposed the Palestinians' efforts to join the body.

Article 2 Summary: Theia, a one-year-old bully breed mix, was hit by a car and buried in a field. Four days after her apparent death, the dog managed to stagger to a nearby farm. She suffered a dislocated jaw, leg injuries and a caved-in sinus cavity. She still requires surgery to help her breathe.

Article 3 Summary: Mohammad Javad Zarif is the Iranian foreign minister. He has been John Kerry's opposite number in securing a breakthrough in nuclear discussions. But there are some facts about Zarif that are less well-known.

Article 4 Summary: The five were exposed to Ebola in Sierra Leone in March, but none developed the deadly virus. They are clinicians for Partners in Health, a Boston-based aid group. One of the f

# Evaluate Summary Quality
## ROUGE Score:  If you want to evaluate the summaries quantitatively, you can install the rouge-score library and compute the ROUGE scores.

this quantitative evaluation provides insight into how well your summaries match the reference summaries. You can adjust the number of articles or metrics as per your needs.

In [15]:
!pip install rouge-score


Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=7f1d084a01c9abc7a628937bd5af5772e65f799a3e84c8eadab076671e5c10aa
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2


In [16]:
from rouge_score import rouge_scorer

# Initialize the ROUGE scorer
scorer = rouge_scorer.RougeScorer(
    ['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

# Create lists to hold the original and generated summaries
original_summaries = [dataset[i]['highlights'] for i in range(5)]
generated_summaries = summaries  # Using the summaries generated earlier

# Calculate ROUGE scores
for i in range(5):
    scores = scorer.score(original_summaries[i], generated_summaries[i])
    print(f"Article {i+1} ROUGE Scores: {scores}\n")


Article 1 ROUGE Scores: {'rouge1': Score(precision=0.5135135135135135, recall=0.5588235294117647, fmeasure=0.5352112676056339), 'rouge2': Score(precision=0.3611111111111111, recall=0.3939393939393939, fmeasure=0.37681159420289856), 'rougeL': Score(precision=0.4594594594594595, recall=0.5, fmeasure=0.47887323943661975)}

Article 2 ROUGE Scores: {'rouge1': Score(precision=0.42592592592592593, recall=0.5348837209302325, fmeasure=0.4742268041237113), 'rouge2': Score(precision=0.20754716981132076, recall=0.2619047619047619, fmeasure=0.23157894736842105), 'rougeL': Score(precision=0.35185185185185186, recall=0.4418604651162791, fmeasure=0.3917525773195876)}

Article 3 ROUGE Scores: {'rouge1': Score(precision=0.4, recall=0.4, fmeasure=0.4000000000000001), 'rouge2': Score(precision=0.20588235294117646, recall=0.20588235294117646, fmeasure=0.20588235294117646), 'rougeL': Score(precision=0.2571428571428571, recall=0.2571428571428571, fmeasure=0.2571428571428571)}

Article 4 ROUGE Scores: {'rouge