Extractive Text Summarization using NLP Techniques

In [1]:
#In Extractive text Summarization, the summary is created by selecting and extracting important sentences or phrases
#directly from the original text without any modification  

In [2]:
#Text->>

text = """The Earth plays a vital role in our lives. It provides us with habitat, water, food, etc. 
The Earth came into existence millions of years ago, and there have been billions of animals and humans that have 
walked the same Earth as we do now. The Earth is home to over 5 million species of plants and animals, most of which 
have still not been identified or recorded. Essay on Earth in English is a common subject in schools as it is an 
important topic for children to think about and discuss. The Earth can be studied and written about in many different 
ways; you can write about it in terms of climate change, species, land formation, water composition and even the formation 
of the solar system and Earth’s position in it. The possibilities are endless! Here, we will discuss essay on Earth for
class 1, 2 & 3 for kids."""

In [3]:
#Counting Characters in our Text
len(text)

825

In [4]:
#Importing all the libraries those are required
import spacy

#rest 2 libraries we are using for text cleaning
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [5]:
nlp = spacy.load('en_core_web_sm')

In [6]:
#Now using the loaded train variable 
doc = nlp(text)

In [7]:
doc #All the sentences are stored in it

The Earth plays a vital role in our lives. It provides us with habitat, water, food, etc. 
The Earth came into existence millions of years ago, and there have been billions of animals and humans that have 
walked the same Earth as we do now. The Earth is home to over 5 million species of plants and animals, most of which 
have still not been identified or recorded. Essay on Earth in English is a common subject in schools as it is an 
important topic for children to think about and discuss. The Earth can be studied and written about in many different 
ways; you can write about it in terms of climate change, species, land formation, water composition and even the formation 
of the solar system and Earth’s position in it. The possibilities are endless! Here, we will discuss essay on Earth for
class 1, 2 & 3 for kids.

In [8]:
#Now using tokenization and removing stoppers from the text

#it is done to filter out common word like punctuation mark 
tokens=[token.text.lower() for token in doc 
        if not token.is_stop and not token.is_punct and token.text!='\n' ]

In [9]:
tokens

['earth',
 'plays',
 'vital',
 'role',
 'lives',
 'provides',
 'habitat',
 'water',
 'food',
 'etc',
 'earth',
 'came',
 'existence',
 'millions',
 'years',
 'ago',
 'billions',
 'animals',
 'humans',
 'walked',
 'earth',
 'earth',
 'home',
 '5',
 'million',
 'species',
 'plants',
 'animals',
 'identified',
 'recorded',
 'essay',
 'earth',
 'english',
 'common',
 'subject',
 'schools',
 'important',
 'topic',
 'children',
 'think',
 'discuss',
 'earth',
 'studied',
 'written',
 'different',
 'ways',
 'write',
 'terms',
 'climate',
 'change',
 'species',
 'land',
 'formation',
 'water',
 'composition',
 'formation',
 'solar',
 'system',
 'earth',
 'position',
 'possibilities',
 'endless',
 'discuss',
 'essay',
 'earth',
 'class',
 '1',
 '2',
 '3',
 'kids']

In [10]:
#Another way to find the tokens
token1=[]
stopWords=list(STOP_WORDS)
allowed_pos=['ADJ','PROPN','VERB','NOUN']
for token in doc:
    if(token.text in stopWords or token.text in punctuation):
        continue
    if(token.pos_ in allowed_pos):
        token1.append(token.text)

In [11]:
token1

['Earth',
 'plays',
 'vital',
 'role',
 'lives',
 'provides',
 'habitat',
 'water',
 'food',
 'Earth',
 'came',
 'existence',
 'millions',
 'years',
 'billions',
 'animals',
 'humans',
 'walked',
 'Earth',
 'Earth',
 'home',
 'species',
 'plants',
 'animals',
 'identified',
 'recorded',
 'Essay',
 'Earth',
 'English',
 'common',
 'subject',
 'schools',
 'important',
 'topic',
 'children',
 'think',
 'discuss',
 'Earth',
 'studied',
 'written',
 'different',
 'ways',
 'write',
 'terms',
 'climate',
 'change',
 'species',
 'land',
 'formation',
 'water',
 'composition',
 'formation',
 'solar',
 'system',
 'Earth',
 'position',
 'possibilities',
 'endless',
 'discuss',
 'essay',
 'Earth',
 'class',
 'kids']

In [12]:
#Calculating freq of words in tokens
from collections import Counter

In [14]:
word_freq=Counter(tokens)

In [15]:
word_freq

Counter({'earth': 8,
         'water': 2,
         'animals': 2,
         'species': 2,
         'essay': 2,
         'discuss': 2,
         'formation': 2,
         'plays': 1,
         'vital': 1,
         'role': 1,
         'lives': 1,
         'provides': 1,
         'habitat': 1,
         'food': 1,
         'etc': 1,
         'came': 1,
         'existence': 1,
         'millions': 1,
         'years': 1,
         'ago': 1,
         'billions': 1,
         'humans': 1,
         'walked': 1,
         'home': 1,
         '5': 1,
         'million': 1,
         'plants': 1,
         'identified': 1,
         'recorded': 1,
         'english': 1,
         'common': 1,
         'subject': 1,
         'schools': 1,
         'important': 1,
         'topic': 1,
         'children': 1,
         'think': 1,
         'studied': 1,
         'written': 1,
         'different': 1,
         'ways': 1,
         'write': 1,
         'terms': 1,
         'climate': 1,
         'change': 1,
     

In [18]:
max_freq=max(word_freq.values())
max_freq

8

In [19]:
#normalized all words
for word in word_freq.keys():
    word_freq[word]=word_freq[word]/max_freq

In [20]:
word_freq

Counter({'earth': 1.0,
         'water': 0.25,
         'animals': 0.25,
         'species': 0.25,
         'essay': 0.25,
         'discuss': 0.25,
         'formation': 0.25,
         'plays': 0.125,
         'vital': 0.125,
         'role': 0.125,
         'lives': 0.125,
         'provides': 0.125,
         'habitat': 0.125,
         'food': 0.125,
         'etc': 0.125,
         'came': 0.125,
         'existence': 0.125,
         'millions': 0.125,
         'years': 0.125,
         'ago': 0.125,
         'billions': 0.125,
         'humans': 0.125,
         'walked': 0.125,
         'home': 0.125,
         '5': 0.125,
         'million': 0.125,
         'plants': 0.125,
         'identified': 0.125,
         'recorded': 0.125,
         'english': 0.125,
         'common': 0.125,
         'subject': 0.125,
         'schools': 0.125,
         'important': 0.125,
         'topic': 0.125,
         'children': 0.125,
         'think': 0.125,
         'studied': 0.125,
         'writte

In [21]:
sent_token=[sent.text for sent in doc.sents]

In [22]:
sent_token

['The Earth plays a vital role in our lives.',
 'It provides us with habitat, water, food, etc. \n',
 'The Earth came into existence millions of years ago, and there have been billions of animals and humans that have \nwalked the same Earth as we do now.',
 'The Earth is home to over 5 million species of plants and animals, most of which \nhave still not been identified or recorded.',
 'Essay on Earth in English is a common subject in schools as it is an \nimportant topic for children to think about and discuss.',
 'The Earth can be studied and written about in many different \nways; you can write about it in terms of climate change, species, land formation, water composition and even the formation \nof the solar system and Earth’s position in it.',
 'The possibilities are endless!',
 'Here, we will discuss essay on Earth for\nclass 1, 2 & 3 for kids.']