# Text Summarization

**Summarization** can be defined as a task of producing a concise and fluent summary while preserving key information and overall meaning.

**Below is the code flow to generate summarize text:-**

Input article → split into sentences → remove stop words → build a similarity matrix → generate rank based on matrix → pick top N sentences for summary.

# 1. Import all necessary libraries

In [1]:
import pandas as pd
import numpy as np
from nltk.corpus import stopwords
from nltk.cluster.util import cosine_distance
import numpy as np
import networkx as nx

# 2. Generate clean sentences

In [2]:
def read_article(file_name):
    file = open(file_name, "r", encoding="utf-8")
    filedata = file.readlines()
    article = filedata[0].split(". ")
    sentences = []
    for i, sentence in enumerate(article):
        sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" ")) 
#         if i >= 151: 
#             break            
    return sentences

# 3. sentence similarity function

In [3]:
def sentence_similarity(sent1, sent2, stopwords=None):
    if stopwords is None:
        stopwords = []
 
    sent1 = [w.lower() for w in sent1]
    sent2 = [w.lower() for w in sent2]
 
    all_words = list(set(sent1 + sent2))
 
    vector1 = [0] * len(all_words)
    vector2 = [0] * len(all_words)
 
    # build the vector for the first sentence
    for w in sent1:
        if w in stopwords:
            continue
        vector1[all_words.index(w)] += 1
 
    # build the vector for the second sentence
    for w in sent2:
        if w in stopwords:
            continue
        vector2[all_words.index(w)] += 1
 
    return 1 - cosine_distance(vector1, vector2)

# 4. Similarity matrix

**This is where we will be using cosine similarity to find similarity between sentences.**

In [4]:
def build_similarity_matrix(sentences, stop_words):
    # Create an empty similarity matrix
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
 
    for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
            if idx1 == idx2: #ignore if both are same sentences
                continue 
            similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1], sentences[idx2], stop_words)
    return similarity_matrix

# 5. Generate Summary Method

In [5]:
def generate_summary(file_name, top_n=10):
    
    stop_words = stopwords.words('english')
    
    summarize_text = []
    
    # Step 1 - Read text and tokenize
    sentences =  read_article(file_name)
    print(len(sentences))
    
    # Step 2 - Generate Similary Martix across sentences
    sentence_similarity_martix = build_similarity_matrix(sentences, stop_words)
    
    # Step 3 - Rank sentences in similarity martix
    sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_martix)
    scores = nx.pagerank(sentence_similarity_graph)
    
    # Step 4 - Sort the rank and pick top sentences
    ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)), reverse=True)    
    #print("Indexes of top ranked_sentence order are ", ranked_sentence)
    
    for i in range(top_n):
        summarize_text.append(" ".join(ranked_sentence[i][1]))
        
    # Step 5 - Offcourse, output the summarize text
    print("Summarize Text: \n", ". ".join(summarize_text))


In [31]:
generate_summary('Social.txt', 10)

85
Summarize Text: 
 Social Networking on Teenagers Social media is not only giving a positive impact but also a critical role in the lives of teenagers. Social Media in Business Social Media has used verbal communication, and some sites also have webcams that provide the conversation along with online screening to watch each other. The skills of reading and writing of teenagers can also get better from social media. Social networking and social media provide people to share their content, news, ideas, and information at a faster speed. It has a significant influence on people, and technology is also increasing because of social media. Social media also provide you with the service of buying and selling.  If one doesn’t have the facility of social media, they can surround anyone who is using social media too much. Social Media Essay Conclusion
. Handling of Social Media bad influence Exposure to social media is a part of today’s modern life, but it has many adverse impacts on the youth