# Text Summarizer

## Working of the function created

1. A function is created to do the task of text summarization.
1. The function takes a passage as input.
1. The passage is then convereted into a list of sentences.
1. Then each of the sentences are preproceesed. 
1. The preprocessing involves removing everything apart from alphabets, coverting into lowercase, removing stop words, Stemmimg words using PorterStemmer.
1. A corpus is created which is a list of these preprocessed sentence.
1. This corpus is vectorised using BagofWords(CountVectorizer) or TF-IDF(TfidfVectorizer)
1. A matrix is obtained that has a number of rows equal to the number of sentences in the passages.
1. The row wise sum of the vector is found out.
1. A score for each sentences is found out. Importanat sentences will have a higher score.
1. The 30% percent of of the highest scoring sentences are taken.
1. These are combined to make the summary of the passage.

#### Here the summarization is done using the help of NLTK library

# Importing the necessary libraries

In [1]:
import numpy as np
import pandas as pd
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import re
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer

# Defining the Functions

## Using TF-IDF(TfidfVectorizer)


In [4]:
def Find_summary_tfidf():
    # Taking in the input passage
    passage = input("Enter the passage you want to summarize: ")
    sentences = nltk.sent_tokenize(passage)
    n = len(sentences)
    ps = PorterStemmer() #Creating an instance of the porter stemmer.
    
    #preprocessing
    corpus = []
    for i in range(len(sentences)):
        sent = re.sub('[^a-zA-Z0-9]', " ", sentences[i])
        sent = sent.lower()
        sent = nltk.word_tokenize(sent)
        sent = [ps.stem(word) for word in sent if word not in set(stopwords.words("english"))]
        corpus.append(" ".join(sent))
    
    #Bag of words
    tf = TfidfVectorizer()
    # Vectoring the corpus
    x = tf.fit_transform(corpus).toarray()
    # Finding the score of each sentences
    scored_sentences = list(x.sum(axis = 1))
    temp_score = list(scored_sentences.copy())
    temp_score.sort(reverse=True)
    
    #selelcting important 3 sentenses
    indexs = []# to store indexes
    for i in range(int(.3 * n)):
        # getting the indexes of the highest scoring sentences.
        indexs.append(scored_sentences.index(temp_score[i]))
    
    # Sorting in order of appearance in the passage
    indexs.sort()
    summary = [sentences[i] for i in indexs]
    final_summary = " ".join(summary)# joining the sentences to form summary
    
    # Printing the summary
    print("________________________________")
    print("Number of sentences in passage :", len(sentences))
    print("Number of sentences in Summary :", len(summary))


    print("\nThe Summary :\n")
    print(final_summary)
    
    return    

## Testing on a News Article


In [6]:
Find_summary_tfidf()

Enter the passage you want to summarize: The propriety M1 Chip from Apple has already made its way to a number of the company’s products. This includes MacBooks, iMacs, iMac mini, and even the iPad Pro. Now, a new report has found that the iPad Pro with the company’s silicon has an irreparable security flaw. According to developer Hector Martin (Via PhoneArena), the M1 based iPad Pro suffers from a vulnerability that exists on a hardware level of the M1. In other words, this is an issue that cannot be fixed through a simple software update. The Cupertino based giant has apparently violated an Arm architecture specification requirement as well. This means that there is no simple method of fixing the issue.  The developer further explained that the flaw basically allows two applications to covertly exchange data without using normal operating system features. While this is a vulnerability, it, fortunately, does not pose any serious security risks. Even in a worst case scenario this secur

## Using BagofWords (CountVectorizer)

In [2]:
def Find_summary_BoW():
    # Taking in the input passage
    passage = input("Enter the passage you want to summarize: ")
    sentences = nltk.sent_tokenize(passage)
    n = len(sentences)
    ps = PorterStemmer() #Creating an instance of the porter stemmer.
    
    #preprocessing
    corpus = []
    for i in range(len(sentences)):
        sent = re.sub('[^a-zA-Z0-9]', " ", sentences[i])
        sent = sent.lower()
        sent = nltk.word_tokenize(sent)
        sent = [ps.stem(word) for word in sent if word not in set(stopwords.words("english"))]
        corpus.append(" ".join(sent))
    
    #Bag of words
    cv = CountVectorizer()
    # Vectoring the corpus
    x = cv.fit_transform(corpus).toarray()
    # Finding the score of each sentences
    scored_sentences = list(x.sum(axis = 1))
    temp_score = list(scored_sentences.copy())
    temp_score.sort(reverse=True)
    
    #selelcting important 3 sentenses
    indexs = []# to store indexes
    for i in range(int(.3 * n)):
        # getting the indexes of the highest scoring sentences.
        indexs.append(scored_sentences.index(temp_score[i]))
    
    # Sorting in order of appearance in the passage
    indexs.sort()
    summary = [sentences[i] for i in indexs]
    final_summary = " ".join(summary)# joining the sentences to form summary
    
    # Printing the summary
    print("________________________________")
    print("Number of sentences in passage :", len(sentences))
    print("Number of sentences in Summary :", len(summary))


    print("\nThe Summary :\n")
    print(final_summary)
    
    return    

## Testing on a News Article

In [3]:
Find_summary_BoW()

Enter the passage you want to summarize: The propriety M1 Chip from Apple has already made its way to a number of the company’s products. This includes MacBooks, iMacs, iMac mini, and even the iPad Pro. Now, a new report has found that the iPad Pro with the company’s silicon has an irreparable security flaw. According to developer Hector Martin (Via PhoneArena), the M1 based iPad Pro suffers from a vulnerability that exists on a hardware level of the M1. In other words, this is an issue that cannot be fixed through a simple software update. The Cupertino based giant has apparently violated an Arm architecture specification requirement as well. This means that there is no simple method of fixing the issue.  The developer further explained that the flaw basically allows two applications to covertly exchange data without using normal operating system features. While this is a vulnerability, it, fortunately, does not pose any serious security risks. Even in a worst case scenario this secur