In [1]:
!pip install torch transformers datasets rouge_score

Collecting datasets
  Downloading datasets-2.21.0-py3-none-any.whl.metadata (21 kB)
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-2.21.0-py3-none-any.whl (527 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.3/527.3 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m 

In [21]:
import torch
from transformers import pipeline
from datasets import load_dataset
from rouge_score import rouge_scorer

model_name = "facebook/bart-large-cnn"
summarizer = pipeline("summarization", model=model_name)

dataset = load_dataset("cnn_dailymail", "3.0.0", split="test[8:9]")


In [22]:
dataset['article']

['(CNN)Filipinos are being warned to be on guard for flash floods and landslides as tropical storm Maysak approached the Asian island nation Saturday. Just a few days ago, Maysak gained super typhoon status thanks to its sustained 150 mph winds. It has since lost a lot of steam as it has spun west in the Pacific Ocean. It\'s now classified as a tropical storm, according to the Philippine national weather service, which calls it a different name, Chedeng. It boasts steady winds of more than 70 mph (115 kph) and gusts up to 90 mph as of 5 p.m. (5 a.m. ET) Saturday. Still, that doesn\'t mean Maysak won\'t pack a wallop. Authorities took preemptive steps to keep people safe such as barring outdoor activities like swimming, surfing, diving and boating in some locales, as well as a number of precautionary evacuations. Gabriel Llave, a disaster official, told PNA that tourists who arrive Saturday in and around the coastal town of Aurora "will not be accepted by the owners of hotels, resorts, 

In [30]:

### Mentioning different prompts
## zero-shot learning Prompt, One shot Learning Prompt, Negative Prompt

prompts = [
    "Summarize the key points of the given news article in 3-4 sentences",  ## zero-shot learning
    "The storm was located 350 miles northeast of london and it is moving east at 6 mph. Summarize me like this about the storm direction ", ## One shot Learning
    " summarize about what might be impact of storm and where the storm is heading without mentioning about activities that peaple are barred from", ## Negative Prompting
]

def generate_summaries(prompts, articles):
    summaries = {}
    for prompt in prompts:
        summaries[prompt] = []
        for article in articles:
            input_text = prompt.format(article['article'])
            summary = summarizer(input_text, max_length=40, min_length=12, do_sample=False)[0]['summary_text']
            summaries[prompt].append(summary)
    return summaries

summaries = generate_summaries(prompts, dataset)

# Evaluate summaries using ROUGE
def evaluate_summaries(summaries, references):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    results = {}

    for prompt, generated_summaries in summaries.items():
        total_scores = {'rouge1': 0, 'rouge2': 0, 'rougeL': 0}
        for generated, reference in zip(generated_summaries, references):
            scores = scorer.score(reference, generated)
            total_scores['rouge1'] += scores['rouge1'].fmeasure
            total_scores['rouge2'] += scores['rouge2'].fmeasure
            total_scores['rougeL'] += scores['rougeL'].fmeasure

        # Average the scores
        num_samples = len(generated_summaries)
        results[prompt] = {k: v / num_samples for k, v in total_scores.items()}

    return results

# Prepare reference summaries from the dataset
references = [article['highlights'] for article in dataset]

# Evaluate the generated summaries
evaluation_results = evaluate_summaries(summaries, references)

# Print evaluation results
for prompt, scores in evaluation_results.items():
    print(f"Prompt: {prompt}")
    print(f"ROUGE-1: {scores['rouge1']:.4f}, ROUGE-2: {scores['rouge2']:.4f}, ROUGE-L: {scores['rougeL']:.4f}\n")

Your max_length is set to 40, but your input_length is only 18. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)
Your max_length is set to 40, but your input_length is only 32. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=16)
Your max_length is set to 40, but your input_length is only 27. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=13)


Prompt: Summarize the key points of the given news article in 3-4 sentences
ROUGE-1: 0.1026, ROUGE-2: 0.0000, ROUGE-L: 0.0513

Prompt: The storm was located 350 miles northeast of london and it is moving east at 6 mph. Summarize me like this about the storm direction 
ROUGE-1: 0.2791, ROUGE-2: 0.0000, ROUGE-L: 0.0930

Prompt:  summarize about what might be impact of storm and where the storm is heading without mentioning about activities that peaple are barred from
ROUGE-1: 0.2041, ROUGE-2: 0.0000, ROUGE-L: 0.1224



In [None]:
### Explaining Rouge Score

# ROUGE-N: This measures the overlap of n-grams (contiguous sequences of n items from a given sample of text) between the generated summary and the reference summary.
# ROUGE-1: Overlap of unigrams (single words).
# ROUGE-2: Overlap of bigrams (pairs of adjacent words).
# ROUGE-L: This metric evaluates the longest common subsequence (LCS) between the generated and reference summaries. It considers the sequence of words in their original order, which helps to assess the structural similarity of the summaries.