# Text Summarization with TextRank

This Jupyter Notebook demonstrates how to summarize a news article using the TextRank algorithm. The notebook includes functions for text preprocessing, sentence tokenization, similarity matrix creation, and the TextRank summarization process. Additionally, a user interface is provided for easy interaction.

## Features

- **Sentence Tokenization**: Splits text into sentences while handling common abbreviations and sentence terminators.
- **Text Preprocessing**: Cleans text by removing special characters and extra whitespace.
- **Similarity Matrix Creation**: Uses TF-IDF and cosine similarity to create a similarity matrix for sentences.
- **TextRank Summarization**: Applies the TextRank algorithm to rank sentences and generate a summary.
- **User Interface**: Provides an interactive UI for inputting text, selecting the number of summary sentences, and displaying the summary.

## Requirements

- `networkx`
- `numpy`
- `scikit-learn`
- `ipywidgets`

## Usage

1. **Install Required Packages**:
    ```python
    !pip install networkx numpy scikit-learn
    ```

2. **Import Libraries**:
    ```python
    import re
    import numpy as np
    import networkx as nx
    from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
    from IPython.display import display
    import ipywidgets as widgets
    ```

3. **Define Functions**:
    - `split_into_sentences(text)`: Tokenizes text into sentences.
    - `preprocess_text(text)`: Cleans the text.
    - `create_similarity_matrix(sentences)`: Creates a similarity matrix using TF-IDF.
    - `text_rank_summarize(text, num_sentences=3)`: Summarizes text using TextRank.

4. **Create User Interface**:
    ```python
    article_input = widgets.Textarea(...)
    sentence_slider = widgets.IntSlider(...)
    summary_output = widgets.Output(...)
    summarize_button = widgets.Button(...)
    ```

5. **Define Button Click Handler**:
    ```python
    def on_button_clicked(b):
        ...
    summarize_button.on_click(on_button_clicked)
    ```

6. **Display Widgets**:
    ```python
    display(header)
    display(article_input)
    display(sentence_slider)
    display(summarize_button)
    display(widgets.HTML(value="<h2>Summary:</h2>"))
    display(summary_output)
    ```

## How to Use

1. Enter the news article text in the provided text area.
2. Adjust the slider to select the number of sentences for the summary.
3. Click the 'Summarize' button to generate and display the summary.

The summarizer is now ready to use. Enter your text and click 'Summarize' to get a concise summary of the article.

In [None]:
# Install required packages
!pip install networkx numpy scikit-learn

import re
import numpy as np
import networkx as nx
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from IPython.display import display
import ipywidgets as widgets



In [None]:
def split_into_sentences(text):
    """A simple sentence tokenizer that doesn't rely on NLTK"""
    # Split text based on common sentence terminators
    text = text.replace('!', '.')
    text = text.replace('?', '.')
    # Handle common abbreviations
    text = text.replace('Dr.', 'Dr')
    text = text.replace('Mr.', 'Mr')
    text = text.replace('Mrs.', 'Mrs')
    text = text.replace('Ms.', 'Ms')
    text = text.replace('Prof.', 'Prof')
    text = text.replace('e.g.', 'eg')
    text = text.replace('i.e.', 'ie')

    # Split into sentences
    sentences = [s.strip() + '.' for s in text.split('.') if s.strip()]
    return sentences

In [None]:
def preprocess_text(text):
    """Clean the text"""
    # Remove special characters and extra whitespace
    text = re.sub(r'\s+', ' ', text)
    text = re.sub(r'[^\w\s.,!?]', '', text)
    return text.strip()

In [None]:
def create_similarity_matrix(sentences):
    """Create a similarity matrix using TF-IDF"""
    # Create a count matrix
    count_vectorizer = CountVectorizer(stop_words='english')
    count_matrix = count_vectorizer.fit_transform(sentences)

    # Convert to TF-IDF matrix
    tfidf = TfidfTransformer()
    tfidf_matrix = tfidf.fit_transform(count_matrix)

    # Calculate cosine similarity
    similarity_matrix = (tfidf_matrix * tfidf_matrix.T).toarray()

    return similarity_matrix

In [None]:
def text_rank_summarize(text, num_sentences=3):
    """Summarize text using TextRank algorithm"""
    try:
        # Preprocess text
        text = preprocess_text(text)

        # Split into sentences
        sentences = split_into_sentences(text)

        # If there are fewer sentences than requested, return the whole text
        if len(sentences) <= num_sentences:
            return text

        # Create similarity matrix
        similarity_matrix = create_similarity_matrix(sentences)

        # Create graph and apply PageRank
        nx_graph = nx.from_numpy_array(similarity_matrix)
        scores = nx.pagerank(nx_graph)

        # Sort sentences by score
        ranked_sentences = sorted(((scores[i], s) for i, s in enumerate(sentences)), reverse=True)

        # Select top sentences
        selected_sentences = [s for _, s in ranked_sentences[:num_sentences]]

        # Reorder sentences based on their original position
        original_order = []
        for sentence in sentences:
            if sentence in selected_sentences:
                original_order.append(sentence)

        # Join sentences
        summary = ' '.join(original_order)
        return summary

    except Exception as e:
        return f"Error generating summary: {str(e)}"

In [None]:
# Create the user interface
article_input = widgets.Textarea(
    value='',
    placeholder='Paste your news article here...',
    description='Article:',
    disabled=False,
    layout=widgets.Layout(width='100%', height='200px')
)

sentence_slider = widgets.IntSlider(
    value=3,
    min=1,
    max=10,
    step=1,
    description='Sentences:',
    disabled=False,
    continuous_update=False,
    orientation='horizontal',
    readout=True,
    readout_format='d'
)

summary_output = widgets.Output(
    layout=widgets.Layout(border='1px solid #ddd', padding='10px', min_height='100px', margin='10px 0')
)

summarize_button = widgets.Button(
    description='Summarize',
    disabled=False,
    button_style='success',
    tooltip='Click to summarize the text',
    icon='check'
)

# Define button click handler
def on_button_clicked(b):
    summary_output.clear_output()

    text = article_input.value
    sentences = sentence_slider.value

    if not text.strip():
        with summary_output:
            print("Please enter an article to summarize.")
        return

    with summary_output:
        print("Generating summary...")
        summary = text_rank_summarize(text, sentences)
        summary_output.clear_output()
        print(summary)

# Connect the button click event to the handler
summarize_button.on_click(on_button_clicked)

# Create a header
header = widgets.HTML(value="<h1>News Article Summarizer</h1>")

# Display all widgets
display(header)
display(article_input)
display(sentence_slider)
display(summarize_button)
display(widgets.HTML(value="<h2>Summary:</h2>"))
display(summary_output)

print("Simplified News Article Summarizer is ready to use! Enter your text and click 'Summarize'.")

HTML(value='<h1>News Article Summarizer</h1>')

Textarea(value='', description='Article:', layout=Layout(height='200px', width='100%'), placeholder='Paste you…

IntSlider(value=3, continuous_update=False, description='Sentences:', max=10, min=1)

Button(button_style='success', description='Summarize', icon='check', style=ButtonStyle(), tooltip='Click to s…

HTML(value='<h2>Summary:</h2>')

Output(layout=Layout(border='1px solid #ddd', margin='10px 0', min_height='100px', padding='10px'))

Simplified News Article Summarizer is ready to use! Enter your text and click 'Summarize'.
