# 📚✨ Text Summarization Project ✨📚

## 🚀🔍 Introduction

Welcome to the Text Summarization project! In this project, we will use Natural Language Processing (NLP) techniques to create a text summarization tool. The aim is to condense long articles or documents into shorter summaries while retaining the essential information. 📄💡

## 🛠️🔧 Libraries and Tools

In this project, we will utilize several libraries and tools to achieve our goal. Here’s a brief overview of each:

### 🐍📝 NLTK (Natural Language Toolkit)

**NLTK** is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources. In our project, we will use NLTK for tokenizing and processing the text. 📑🔠

### 🅰️🔡 String

The **string** library in Python contains a collection of string operations and constants. We will use it to handle text data efficiently. 🧵🔤

### 🚫🔠 Stopwords

**Stopwords** are common words (like "the", "is", "in") that are usually ignored in text processing. NLTK provides a list of stopwords for various languages. We will use these stopwords to filter out non-essential words from our text. ❌📝

### ✂️🔠 Tokenization

Tokenization is the process of splitting text into individual words or sentences. We will use NLTK's `word_tokenize` and `sent_tokenize` methods to tokenize our text data. 🔍🔠

### 📈🔝 Heapq (nlargest)

The **heapq** library provides an implementation of the heap queue algorithm. We will use the `nlargest` method to extract the most important sentences from our text based on their scores. 📊🏅

## 📋💡 Project Workflow

Here's an overview of the workflow for our Text Summarization project:

1. **Text Preprocessing** 🛠️🔍
   - Load the text data.
   - Convert text to lowercase and remove punctuation.
   - Tokenize the text into sentences and words.

2. **Stopwords Removal** ❌🔤
   - Filter out stopwords from the tokenized words.

3. **Word Frequency Calculation** 📊🔡
   - Calculate the frequency of each word in the text.

4. **Sentence Scoring** 🏅🔍
   - Score each sentence based on the frequency of words it contains.

5. **Summary Generation** 📄✨
   - Select the top N sentences with the highest scores to generate the summary.

## 🔍🛠️ Implementation Details

Let's dive into the details of each step:

### 1. Text Preprocessing 🛠️🔍

In this step, we will load our text data, convert it to lowercase, and remove any punctuation. We will then tokenize the text into sentences and words.

### 2. Stopwords Removal ❌🔤

We will remove common stopwords from the tokenized words to focus on the more meaningful words in the text.

### 3. Word Frequency Calculation 📊🔡

Next, we will calculate the frequency of each word in the text. Words that appear more frequently will be considered more important.

### 4. Sentence Scoring 🏅🔍

Each sentence in the text will be scored based on the frequency of the words it contains. Sentences with higher scores are considered more important.

### 5. Summary Generation 📄✨

Finally, we will select the top N sentences with the highest scores to generate the summary of the text.

## 📈🔍 Conclusion

By the end of this project, you will have a working text summarization tool that can condense lengthy articles into concise summaries. This project demonstrates the power of NLP techniques in extracting meaningful information from large texts. 📚💼

## 🚀💻 Let's Get Started!

Open a new Jupyter notebook and follow along with the implementation steps to build your own text summarization tool. Happy coding! 🎉👩‍💻👨‍💻


In [85]:
import nltk
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from heapq import nlargest

In [86]:
text = """Once upon a time, in a land far, far away, there was a small, quaint village nestled between rolling hills and lush, green valleys. This village, known to its inhabitants as Elmswood, was home to a curious assortment of characters. Among them were a kindly old baker named Mr. Thornton, who had been baking bread for the villagers for as long as anyone could remember, and Mrs. Patterson, the schoolteacher, whose stern demeanor hid a heart of gold. The village also had its fair share of children, who could often be seen playing games in the fields, their laughter echoing through the air like the sweetest of melodies. In the center of Elmswood, there was a magnificent old oak tree, which the villagers believed to be enchanted. It was said that the tree had stood there for centuries, silently watching over the village and its people. Every spring, the tree would burst into a brilliant display of blossoms, filling the air with a sweet, heady fragrance that could be smelled for miles around. Under the shade of this grand tree, villagers would gather for festivals, picnics, and, on occasion, to resolve disputes, which were often settled with the help of the village elder, Old Man Harper, whose wisdom was revered by all.

One fine day, as the sun rose over the horizon, casting a golden glow over the village, a traveler arrived. This traveler, whose name was Samuel, had journeyed from a distant city, seeking peace and quiet. He had heard tales of Elmswood's beauty and tranquility and had decided to see it for himself. As he walked through the village, he marveled at the quaint cottages with their thatched roofs, the vibrant gardens bursting with flowers of every color, and the friendly faces that greeted him at every turn. Samuel soon found himself standing in front of the old oak tree, feeling a sense of calm wash over him. He decided to sit beneath its branches for a while, to rest and take in the beauty of his surroundings. As he sat there, he noticed a small, leather-bound book lying on the ground. Curiosity piqued, he picked it up and opened it, revealing pages filled with elegant, flowing script. The book seemed to be a diary of sorts, chronicling the life of a young girl named Emily, who had lived in the village many years ago.

Samuel became engrossed in Emily's story, learning about her hopes, dreams, and the challenges she had faced. She wrote of her love for a boy named Thomas, who had been taken away to fight in a distant war, and of her longing for his return. As Samuel read on, he felt a deep connection to Emily's words, as if he were sharing in her joys and sorrows. The hours passed quickly, and before he knew it, the sun was beginning to set, casting long shadows across the village. Reluctantly, Samuel closed the book and stood up, determined to learn more about Emily and the village's history. Over the next few days, Samuel explored Elmswood, talking to the villagers and gathering stories about Emily and Thomas. He visited the local library, where he found old letters and photographs that provided further glimpses into their lives. Through these stories, he learned that Thomas had eventually returned from the war, and he and Emily had been reunited. They had lived long, happy lives together, raising a family and contributing to the village in many ways.

Inspired by their love and resilience, Samuel decided to stay in Elmswood and make it his home. He bought a small cottage near the edge of the village and began to integrate himself into the community. He took up gardening, planting flowers and vegetables, and often helped Mr. Thornton in the bakery, learning the secrets of his delicious bread. Samuel also became close friends with Mrs. Patterson, who shared with him her vast knowledge of the village's history and traditions. As the years went by, Samuel found a sense of belonging he had never known before. He became an integral part of the village, participating in festivals and celebrations, and sharing stories of his own travels with the villagers. He often thought of Emily and Thomas, whose love story had brought him to Elmswood and changed his life forever. Under the shade of the old oak tree, Samuel would sit and reflect on how fortunate he was to have found such a special place, where he could finally find peace and happiness.
"""

# ✂️🔠 Tokenization and Stopwords Removal

In this step, we will:

1. **Tokenize the Text** 📝🔍
   - We will use the `word_tokenize` function from NLTK to split the text into individual words (tokens).

2. **Remove Stopwords** 🚫🔤
   - We will filter out common stopwords using the list of stopwords provided by NLTK.

3. **Build Word Frequency Dictionary** 📊🔡
   - We will create a dictionary to store the frequency of each word that is not a stopword or punctuation.

In [87]:
tokens = word_tokenize(text)
stopwords = set(stopwords.words('english'))


In [88]:
word_freq = {}
for word in tokens:
    if word.lower() not in stopwords and word not in string.punctuation:
        if word not in word_freq:
            word_freq[word] = 1
        else:
            word_freq[word] += 1

# 📊🔡 Building Word Frequency Dictionary

In this step, we will:

1. **Calculate the Maximum Word Frequency** 📈🔍
   - Find the maximum frequency of any word in the text to normalize the word frequencies.

2. **Normalize Word Frequencies** ⚖️🔡
   - Divide the frequency of each word by the maximum frequency to get normalized frequencies.

3. **Tokenize Sentences** 📝🔠
   - Split the text into sentences using the `sent_tokenize` function from NLTK.

4. **Score Sentences** 🏅🔍
   - Calculate a score for each sentence based on the normalized frequency of the words it contains.

5. **Select Top Sentences for Summary** 📄✨
   - Select the top 40% of sentences with the highest scores to create the summary.


In [89]:
max_freq = max(word_freq.values())
print(max_freq)

14


In [90]:
for word in word_freq.keys():
    word_freq[word] = word_freq[word] / max_freq

In [91]:
sent_tokens = sent_tokenize(text)
print(sent_tokens)

['Once upon a time, in a land far, far away, there was a small, quaint village nestled between rolling hills and lush, green valleys.', 'This village, known to its inhabitants as Elmswood, was home to a curious assortment of characters.', 'Among them were a kindly old baker named Mr. Thornton, who had been baking bread for the villagers for as long as anyone could remember, and Mrs. Patterson, the schoolteacher, whose stern demeanor hid a heart of gold.', 'The village also had its fair share of children, who could often be seen playing games in the fields, their laughter echoing through the air like the sweetest of melodies.', 'In the center of Elmswood, there was a magnificent old oak tree, which the villagers believed to be enchanted.', 'It was said that the tree had stood there for centuries, silently watching over the village and its people.', 'Every spring, the tree would burst into a brilliant display of blossoms, filling the air with a sweet, heady fragrance that could be smelle

In [92]:
sent_scores = {}
for sent in sent_tokens:
    for word in word_tokenize(sent.lower()):
        if word in word_freq:
            if sent not in sent_scores:
                sent_scores[sent] = word_freq[word]
            else:
                sent_scores[sent] += word_freq[word]

In [93]:
select_len = int(len(sent_tokens) * 0.4)
summary = nlargest(select_len, sent_scores, key=sent_scores.get)

In [94]:
summary = ' '.join(summary)

print(summary)

Under the shade of this grand tree, villagers would gather for festivals, picnics, and, on occasion, to resolve disputes, which were often settled with the help of the village elder, Old Man Harper, whose wisdom was revered by all. The village also had its fair share of children, who could often be seen playing games in the fields, their laughter echoing through the air like the sweetest of melodies. Under the shade of the old oak tree, Samuel would sit and reflect on how fortunate he was to have found such a special place, where he could finally find peace and happiness. Among them were a kindly old baker named Mr. Thornton, who had been baking bread for the villagers for as long as anyone could remember, and Mrs. Patterson, the schoolteacher, whose stern demeanor hid a heart of gold. As he walked through the village, he marveled at the quaint cottages with their thatched roofs, the vibrant gardens bursting with flowers of every color, and the friendly faces that greeted him at every 

In [95]:
len(summary)

2052

# ✨📚 Summarize Text Function

In this step, we will:

1. **Define the Function** 🛠️📜
   - Create a function `summarize_text` that takes the text and an optional summary ratio as inputs.

2. **Tokenize the Text** 📝🔍
   - Use the `word_tokenize` function from NLTK to split the text into individual words (tokens).

3. **Remove Stopwords** 🚫🔤
   - Filter out common stopwords using the list of stopwords provided by NLTK.

4. **Build Word Frequency Dictionary** 📊🔡
   - Create a dictionary to store the frequency of each word that is not a stopword or punctuation.

5. **Normalize Word Frequencies** ⚖️🔡
   - Divide the frequency of each word by the maximum frequency to get normalized frequencies.

6. **Tokenize Sentences** 📝🔠
   - Split the text into sentences using the `sent_tokenize` function from NLTK.

7. **Score Sentences** 🏅🔍
   - Calculate a score for each sentence based on the normalized frequency of the words it contains.

8. **Select Top Sentences for Summary** 📄✨
   - Select the top sentences with the highest scores to create the summary.

9. **Return Summary and Lengths** 📋📏
   - Return the summary along with the original and summary lengths.

In [96]:
from nltk.corpus import stopwords
def summarize_text(text, summary_ratio=0.4):
    tokens = word_tokenize(text)
    stop_words = set(stopwords.words('english'))
    
    word_freq = {}
    for word in tokens:
        if word.lower() not in stop_words and word not in string.punctuation:
            if word not in word_freq:
                word_freq[word] = 1
            else:
                word_freq[word] += 1
    
    max_freq = max(word_freq.values())
    
    for word in word_freq.keys():
        word_freq[word] = word_freq[word] / max_freq
    
    sent_tokens = sent_tokenize(text)
    
    sent_scores = {}
    for sent in sent_tokens:
        for word in word_tokenize(sent.lower()):
            if word in word_freq:
                if sent not in sent_scores:
                    sent_scores[sent] = word_freq[word]
                else:
                    sent_scores[sent] += word_freq[word]
    
    select_len = int(len(sent_tokens) * summary_ratio)
    summary_sentences = nlargest(select_len, sent_scores, key=sent_scores.get)
    
    summary = ' '.join(summary_sentences)
    
    original_length = len(text)
    summary_length = len(summary)
    
    return summary, original_length, summary_length



In [97]:
text="""In the heart of a bustling city, there was a small, unassuming bookstore called "Whispers of Time." Despite its modest exterior, the shop was a treasure trove of forgotten tales and hidden gems. The owner, Mr. Benson, was a gentle, elderly man who had a knack for finding the perfect book for every visitor.

One rainy afternoon, a young woman named Lily stumbled into the store, seeking refuge from the storm. She was a writer struggling with a creative block, her once-flowing ideas now dried up. Mr. Benson greeted her with a warm smile and guided her to a dusty, leather-bound book tucked away on a high shelf.

"This book chooses its reader," he said mysteriously. Intrigued, Lily opened the book and found herself lost in the vivid world within its pages. The story of a courageous heroine on a quest to save her village resonated deeply with her.

As she read, inspiration began to flow once more. Lily spent hours in the bookstore, devouring the tale and scribbling notes. By the time she left, the rain had stopped, and she felt a renewed sense of purpose.

"Whispers of Time" had given her the spark she needed to write again, and she returned often, each visit igniting her creativity anew.






"""

In [98]:
summarize_text(text)

('"Whispers of Time" had given her the spark she needed to write again, and she returned often, each visit igniting her creativity anew. In the heart of a bustling city, there was a small, unassuming bookstore called "Whispers of Time." Mr. Benson greeted her with a warm smile and guided her to a dusty, leather-bound book tucked away on a high shelf. "This book chooses its reader," he said mysteriously. The owner, Mr. Benson, was a gentle, elderly man who had a knack for finding the perfect book for every visitor.',
 1208,
 517)

### 📝✨ Conclusion

In this project, we successfully implemented a text summarization algorithm using Natural Language Processing (NLP) techniques. We utilized several key libraries:

1. **NLTK** 🌐📚
   - For tokenizing text into words and sentences.
   - For removing stopwords.

2. **String** 🔤🔧
   - For handling punctuation.

3. **Heapq** 📊🔝
   - For selecting the top sentences based on their scores.

By following the steps of tokenization, stopwords removal, building a word frequency dictionary, normalizing word frequencies, scoring sentences, and selecting top sentences, we were able to generate a concise summary that captures the essence of the original text.

This approach to text summarization is efficient and effective, providing a clear and readable summary for large documents. It can be further enhanced with additional NLP techniques and customized according to specific requirements.

### 🌟🙏 Thank You!

Thank you for following along with this project on text summarization! 🎉 We hope you found it informative and helpful. If you have any questions or feedback, feel free to reach out. Happy summarizing! 📚✨
