 Steps involved in Text Summarization

- Text Cleaning (Removing Stopwords and Punctuation)
- Sentence Tokenization
- Word Tokenization
- Word-frequency table
- Clustering
- Summarization

In [29]:
text = """A brief state of unconsciousness may tear your world apart or may zap your mere life.
Lack of sleep plays an upper hand in road accidents every now and then.
Most people are aware of the dangers of driving while intoxicated, but many do not know that drowsiness also impairs judgment, performance and reaction times, just like alcohol and drugs do.
Severe injuries, thousands of deaths and billions of monetary loses results from drowsy driving.
So, it’s important to develop a system in vehicles to minimize such mis happenings."""

In [30]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [31]:
stopwords = list(STOP_WORDS)
stopwords

['really',
 'less',
 'during',
 'no',
 'ten',
 'over',
 "'ve",
 'up',
 'along',
 'becomes',
 'them',
 'yet',
 'each',
 'thus',
 'will',
 'latterly',
 'before',
 'was',
 'could',
 'per',
 'therein',
 'several',
 'what',
 'somewhere',
 'six',
 'sixty',
 'now',
 'doing',
 'although',
 'here',
 'become',
 'whole',
 'for',
 'twenty',
 'when',
 'some',
 'formerly',
 'whom',
 'they',
 'becoming',
 'latter',
 'then',
 '’m',
 'whether',
 'which',
 'hers',
 'somehow',
 'she',
 'amount',
 'back',
 "n't",
 'fifty',
 'against',
 'upon',
 'beforehand',
 'can',
 'were',
 'from',
 "'m",
 'part',
 'used',
 'done',
 'why',
 'across',
 'those',
 'never',
 'are',
 'as',
 'might',
 '‘d',
 'one',
 'an',
 'show',
 'too',
 'did',
 'last',
 'nine',
 'meanwhile',
 'put',
 'on',
 'thru',
 'or',
 'around',
 'be',
 'thereby',
 'down',
 'few',
 'once',
 'toward',
 'every',
 'various',
 'due',
 'together',
 'just',
 'these',
 'may',
 'either',
 'must',
 'something',
 'two',
 'after',
 'towards',
 'moreover',
 'where

In [32]:
nlp = spacy.load('en_core_web_sm')

In [33]:
doc = nlp(text)

In [34]:
tokens = [token.text for token in doc]
print(tokens)

['A', 'brief', 'state', 'of', 'unconsciousness', 'may', 'tear', 'your', 'world', 'apart', 'or', 'may', 'zap', 'your', 'mere', 'life', '.', '\n', 'Lack', 'of', 'sleep', 'plays', 'an', 'upper', 'hand', 'in', 'road', 'accidents', 'every', 'now', 'and', 'then', '.', '\n', 'Most', 'people', 'are', 'aware', 'of', 'the', 'dangers', 'of', 'driving', 'while', 'intoxicated', ',', 'but', 'many', 'do', 'not', 'know', 'that', 'drowsiness', 'also', 'impairs', 'judgment', ',', 'performance', 'and', 'reaction', 'times', ',', 'just', 'like', 'alcohol', 'and', 'drugs', 'do', '.', '\n', 'Severe', 'injuries', ',', 'thousands', 'of', 'deaths', 'and', 'billions', 'of', 'monetary', 'loses', 'results', 'from', 'drowsy', 'driving', '.', '\n', 'So', ',', 'it', '’s', 'important', 'to', 'develop', 'a', 'system', 'in', 'vehicles', 'to', 'minimize', 'such', 'mis', 'happenings', '.']


In [35]:
punctuation = punctuation +'\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [36]:
word_frequencies = {}
for word in doc:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text] +=1

In [37]:
print(word_frequencies)

{'brief': 1, 'state': 1, 'unconsciousness': 1, 'tear': 1, 'world': 1, 'apart': 1, 'zap': 1, 'mere': 1, 'life': 1, 'Lack': 1, 'sleep': 1, 'plays': 1, 'upper': 1, 'hand': 1, 'road': 1, 'accidents': 1, 'people': 1, 'aware': 1, 'dangers': 1, 'driving': 2, 'intoxicated': 1, 'know': 1, 'drowsiness': 1, 'impairs': 1, 'judgment': 1, 'performance': 1, 'reaction': 1, 'times': 1, 'like': 1, 'alcohol': 1, 'drugs': 1, 'Severe': 1, 'injuries': 1, 'thousands': 1, 'deaths': 1, 'billions': 1, 'monetary': 1, 'loses': 1, 'results': 1, 'drowsy': 1, 'important': 1, 'develop': 1, 'system': 1, 'vehicles': 1, 'minimize': 1, 'mis': 1, 'happenings': 1}


In [38]:
max_frequency = max(word_frequencies.values())
max_frequency

2

In [39]:
for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency
print(word_frequencies)

{'brief': 0.5, 'state': 0.5, 'unconsciousness': 0.5, 'tear': 0.5, 'world': 0.5, 'apart': 0.5, 'zap': 0.5, 'mere': 0.5, 'life': 0.5, 'Lack': 0.5, 'sleep': 0.5, 'plays': 0.5, 'upper': 0.5, 'hand': 0.5, 'road': 0.5, 'accidents': 0.5, 'people': 0.5, 'aware': 0.5, 'dangers': 0.5, 'driving': 1.0, 'intoxicated': 0.5, 'know': 0.5, 'drowsiness': 0.5, 'impairs': 0.5, 'judgment': 0.5, 'performance': 0.5, 'reaction': 0.5, 'times': 0.5, 'like': 0.5, 'alcohol': 0.5, 'drugs': 0.5, 'Severe': 0.5, 'injuries': 0.5, 'thousands': 0.5, 'deaths': 0.5, 'billions': 0.5, 'monetary': 0.5, 'loses': 0.5, 'results': 0.5, 'drowsy': 0.5, 'important': 0.5, 'develop': 0.5, 'system': 0.5, 'vehicles': 0.5, 'minimize': 0.5, 'mis': 0.5, 'happenings': 0.5}


In [40]:
sentence_tokens = [sent for sent in doc.sents]
print(sentence_tokens)

[A brief state of unconsciousness may tear your world apart or may zap your mere life.
, Lack of sleep plays an upper hand in road accidents every now and then.
, Most people are aware of the dangers of driving while intoxicated, but many do not know that drowsiness also impairs judgment, performance and reaction times, just like alcohol and drugs do.
, Severe injuries, thousands of deaths and billions of monetary loses results from drowsy driving.
, So, it’s important to develop a system in vehicles to minimize such mis happenings.]


In [41]:
sentence_scores = {}
for sent in sentence_tokens:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_scores.keys():
                sentence_scores[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_scores[sent]+= word_frequencies[word.text.lower()]

In [42]:
sentence_scores

{A brief state of unconsciousness may tear your world apart or may zap your mere life.: 4.5,
 Lack of sleep plays an upper hand in road accidents every now and then.: 3.0,
 Most people are aware of the dangers of driving while intoxicated, but many do not know that drowsiness also impairs judgment, performance and reaction times, just like alcohol and drugs do.: 8.0,
 Severe injuries, thousands of deaths and billions of monetary loses results from drowsy driving.: 5.0,
 So, it’s important to develop a system in vehicles to minimize such mis happenings.: 3.5}

In [43]:
from heapq import nlargest

In [44]:
select_length = int(len(sentence_tokens)*0.3)
select_length

1

In [45]:
summary = nlargest(select_length,sentence_scores,key = sentence_scores.get)

In [46]:
summary

[Most people are aware of the dangers of driving while intoxicated, but many do not know that drowsiness also impairs judgment, performance and reaction times, just like alcohol and drugs do.]

In [47]:
final_summary = [word.text for word in summary]

In [48]:
summary = ' '.join(final_summary)

In [49]:
print(summary)

Most people are aware of the dangers of driving while intoxicated, but many do not know that drowsiness also impairs judgment, performance and reaction times, just like alcohol and drugs do.



In [50]:
print(text)

A brief state of unconsciousness may tear your world apart or may zap your mere life.
Lack of sleep plays an upper hand in road accidents every now and then.
Most people are aware of the dangers of driving while intoxicated, but many do not know that drowsiness also impairs judgment, performance and reaction times, just like alcohol and drugs do.
Severe injuries, thousands of deaths and billions of monetary loses results from drowsy driving.
So, it’s important to develop a system in vehicles to minimize such mis happenings.


In [51]:
len(text)

529

In [52]:
len(summary)

191