# Task 2: Text Summarization

## Objective
Create a system that summarizes lengthy articles, blogs, or news into concise summaries.  
I will implement both extractive and abstractive summarization methods.

## Dataset
I use the CNN/DailyMail dataset (version 3.0.0) which contains news articles and human-written summaries (highlights).  
For speed, we will load only the first 100 articles.

---

## Setup & Installation

```python
# Install required libraries
!pip install datasets transformers[sentencepiece] spacy --quiet
!python -m spacy download en_core_web_sm --quiet


#Load Dataset


In [5]:
from datasets import load_dataset

# Load first 100 samples from train split of CNN/DailyMail dataset
dataset = load_dataset("cnn_dailymail", "3.0.0", split="train[:100]")

# Print sample article and summary
print("Sample article:\n", dataset[0]['article'][:500], "...\n")
print("Reference summary:\n", dataset[0]['highlights'])


Sample article:
 LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as s ...

Reference summary:
 Harry Potter star Daniel Radcliffe gets £20M fortune as he turns 18 Monday .
Young actor says he has no plans to fritter his cash away .
Radcliffe's earnings from first five Potter films have been held in trust fund .


#Extractive Summarization
Extractive summarization picks key sentences based on word frequency.

In [7]:
import spacy
from heapq import nlargest

nlp = spacy.load("en_core_web_sm")

def extractive_summary(text, max_sentences=3):
    doc = nlp(text)
    stopwords = spacy.lang.en.stop_words.STOP_WORDS

    # Calculate word frequencies ignoring stopwords and non-alphabetic tokens
    word_freq = {}
    for word in doc:
        if word.text.lower() not in stopwords and word.is_alpha:
            word_freq[word.text.lower()] = word_freq.get(word.text.lower(), 0) + 1

    max_freq = max(word_freq.values()) if word_freq else 1
    for word in word_freq:
        word_freq[word] /= max_freq

    # Score sentences based on word frequencies
    sentence_scores = {}
    for sent in doc.sents:
        for word in sent:
            if word.text.lower() in word_freq:
                sentence_scores[sent] = sentence_scores.get(sent, 0) + word_freq[word.text.lower()]

    # Pick the top sentences as summary
    summary_sentences = nlargest(max_sentences, sentence_scores, key=sentence_scores.get)
    return " ".join([sent.text for sent in summary_sentences])


#Abstractive Summarization
Abstractive summarization generates new sentences that capture the meaning of the text.

In [8]:
from transformers import pipeline

# Load pre-trained BART summarization model
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

def abstractive_summary(text):
    summary = summarizer(text, max_length=130, min_length=30, do_sample=False)
    return summary[0]['summary_text']


Device set to use cpu


#Run Summarization on Sample Article

In [9]:
article = dataset[0]['article']
reference_summary = dataset[0]['highlights']

print("=== Original Article ===")
print(article[:1000], "\n")  # Print first 1000 characters

print("=== Extractive Summary ===")
print(extractive_summary(article), "\n")

print("=== Abstractive Summary ===")
print(abstractive_summary(article), "\n")

print("=== Reference Summary ===")
print(reference_summary)


=== Original Article ===
LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar," he told an Australian interviewer earlier this month. "I don't think I'll be particularly extravagant. "The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," currently six places below his number one movie on the UK box off