# Task 2: Text Summarization
 ===========================
###  Author : Dur e Yashfeen 
### Date : 10- Feb- 2025
### Objective: Create a system that summarizes lengthy articles, blogs, or news into concise  summaries.



## 📖 The Story of Smart Summarization ✨📜

In a fast-paced digital world 🌍, people struggle to keep up with endless articles, research papers, and reports 📄. Meet Sarah, a student 📚 who often felt overwhelmed by lengthy academic papers 😵. She wished for a magical tool to extract key points quickly! 🏃‍♀️💡

One day, Sarah discovered an AI-powered text summarizer 🧠🤖. With just a click, the tool analyzed massive texts and generated concise, meaningful summaries ✍️🔍. Now, Sarah could focus on understanding concepts without spending hours reading! 🕒🎉

This AI summarizer changed Sarah’s life, making studying efficient and enjoyable! 🚀


## Steps for Text Summarization 📝
1. **Import Necessary Libraries** – Load essential tools for text processing.
2. **Load and Process Text** – Tokenize the text into sentences.
3. **Extract Important Sentences** – Use TF-IDF and dimensionality reduction to select key sentences.
4. **Generate Summary** – Extract meaningful information and display the summary.
5. **Conclusion** – Understanding how AI simplifies summarization.


In [None]:

# importing necessary libraries
import pandas as pd
import re
from tqdm.autonotebook import tqdm as notebook_tqdm
import spacy
import torch
from transformers import pipeline, BartForConditionalGeneration, BartTokenizer
from collections import Counter
from heapq import nlargest
import nltk
from nltk.corpus import stopwords
from rouge import Rouge
import warnings
warnings.filterwarnings("ignore")


In [3]:

# Load the dataset
def load_data(train_path, test_path, val_path):
    train_data = pd.read_csv("./datasets/train.csv")
    test_data = pd.read_csv("./datasets/train.csv")
    val_data = pd.read_csv("./datasets/validation.csv")
    return train_data, test_data, val_data


In [5]:
# Preprocess textual data for summarization
nltk.download('stopwords')

# Load English NLP model from spaCy
nlp = spacy.load("en_core_web_sm")

# Sample text for summarization
text = """Your input article or text goes here. The model will preprocess, extract important sentences, 
and generate an abstractive summary based on context."""


[nltk_data] Downloading package stopwords to C:\Users\DUR E
[nltk_data]     YASHFEEN\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [6]:

### 1️⃣ Preprocessing Function ###
def preprocess_text(text):
    # Convert to lowercase
    text = text.lower()


In [7]:
# Remove special characters
text = re.sub(r'\s+', ' ', text)  # Remove extra spaces
text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    


In [10]:
# Tokenization, removing stopwords, and lemmatization
doc = nlp(text)
stop_words = set(stopwords.words('english'))
	
filtered_tokens = [token.lemma_ for token in doc if token.text not in stop_words and not token.is_punct]
	
filtered_text = " ".join(filtered_tokens)
print(filtered_text)
processed_text = preprocess_text(text)


your input article text go the model preprocess extract important sentence generate abstractive summary base context


In [11]:
### 2️⃣ Extractive Summarization (TextRank + Frequency-based) ###
def extractive_summary(text, num_sentences=3):
    doc = nlp(text)
    sentence_scores = {}
    
    word_freq = Counter([token.text for token in doc if token.is_alpha])
    
    for sent in doc.sents:
        for word in sent:
            if word.text in word_freq:
                sentence_scores[sent] = sentence_scores.get(sent, 0) + word_freq[word.text]
    
    summary_sentences = nlargest(num_sentences, sentence_scores, key=sentence_scores.get)
    return " ".join([sent.text for sent in summary_sentences])

extractive_result = extractive_summary(text)


In [13]:

### 3️⃣ Abstractive Summarization (BART Transformer) ###
# Load pre-trained model
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)


In [14]:

def abstractive_summary(text, max_length=150):
    inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=1024, truncation=True)
    summary_ids = model.generate(inputs, max_length=max_length, min_length=40, length_penalty=2.0, num_beams=4)
    return tokenizer.decode(summary_ids[0], skip_special_tokens=True)

abstractive_result = abstractive_summary(text)


In [15]:

### 4️⃣ Evaluate Summaries (ROUGE Score) ###
rouge = Rouge()
scores = rouge.get_scores(abstractive_result, extractive_result)

# Print results
print("\n📌 Extractive Summary:")
print(extractive_result)

print("\n📌 Abstractive Summary:")
print(abstractive_result)

print("\n🚀 ROUGE Evaluation Scores:")
print(scores)



📌 Extractive Summary:
Your input article or text goes here The model will preprocess extract important sentences and generate an abstractive summary based on context

📌 Abstractive Summary:
summarize: Your input article or text goes here. The model will preprocess extract important sentences and generate an abstractive summary based on context. It will generate a summary of an article based on its context.

🚀 ROUGE Evaluation Scores:
[{'rouge-1': {'r': 1.0, 'p': 0.8148148148148148, 'f': 0.897959178725531}, 'rouge-2': {'r': 1.0, 'p': 0.6363636363636364, 'f': 0.7777777730246914}, 'rouge-l': {'r': 1.0, 'p': 0.8148148148148148, 'f': 0.897959178725531}}]


### Step 4: Conclusion 🎯✅

With AI-driven text summarization, reading long documents becomes effortless! 🚀 Now, students like Sarah and professionals can grasp essential information quickly and stay ahead in their fields! 📚✨

---

The End 🎬🎉
