# TEXT SUMMARIZATION USING THE FREQUENCY METHOD

In this method we find the frequency of all the words in our text data and store the text data and its frequency in a dictionary. After that, we tokenize our text data. The sentences which contain more high frequency words will be kept in our final summary data.

 ### Importing Libraries 
 Importing Libraries and downloading necessary packages

In [9]:
import nltk
nltk.download()


showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


In [28]:
from nltk.tokenize import word_tokenize,sent_tokenize
from nltk.corpus import stopwords
from string import punctuation


Input Text to summarize

In [None]:
text  = """
You see the problem with these young children is they do not want to listen now I told Jane
to take the trash out but she refused. Now, Ethan and Abby are fighting over a cup of tea. I
don't know what to do for them, but probably we shall take them to the daycare, then they
can be okay. Also, they will be having exams starting next week. So, they will need to be
sleeping early enough so that they can get enough rest for for so that they can perform well
in the exam. Other than that, everything at home is okay. How is your trip and the children
are expecting some chocolate when you return. So, whatever you do, stock up on chocolate
biscuits, sweets, and all these pleasantries that children like they will be waiting by
"""
text

Obtain a list of stop words. Stop words are words that don't add meaning to a sentence

In [69]:
stop_words = set(stopwords.words("english"))
words = word_tokenize(text)
words

['You',
 'see',
 'the',
 'problem',
 'with',
 'these',
 'young',
 'children',
 'is',
 'they',
 'do',
 'not',
 'want',
 'to',
 'listen',
 'now',
 'I',
 'told',
 'Jane',
 'to',
 'take',
 'the',
 'trash',
 'out',
 'but',
 'she',
 'refused',
 '.',
 'Now',
 ',',
 'Ethan',
 'and',
 'Abby',
 'are',
 'fighting',
 'over',
 'a',
 'cup',
 'of',
 'tea',
 '.',
 'I',
 'do',
 "n't",
 'know',
 'what',
 'to',
 'do',
 'for',
 'them',
 ',',
 'but',
 'probably',
 'we',
 'shall',
 'take',
 'them',
 'to',
 'the',
 'daycare',
 ',',
 'then',
 'they',
 'can',
 'be',
 'okay',
 '.',
 'Also',
 ',',
 'they',
 'will',
 'be',
 'having',
 'exams',
 'starting',
 'next',
 'week',
 '.',
 'So',
 ',',
 'they',
 'will',
 'need',
 'to',
 'be',
 'sleeping',
 'early',
 'enough',
 'so',
 'that',
 'they',
 'can',
 'get',
 'enough',
 'rest',
 'for',
 'for',
 'so',
 'that',
 'they',
 'can',
 'perform',
 'well',
 'in',
 'the',
 'exam',
 '.',
 'Other',
 'than',
 'that',
 ',',
 'everything',
 'at',
 'home',
 'is',
 'okay',
 '.',
 'H

Originally punctuation doesnot come come with new line character '\n so we add that to punctuations

In [70]:
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n\n\n'

### Creating Frequency Table

Word tokenize the entire text. We have to create the dictionary with key as words and value as number of times word is repeated.

In [71]:
word_frequencies = {}
for word in words:
    if word.lower() not in stop_words:
        if word.lower() not in punctuation:
            if word not in word_frequencies.keys():
                word_frequencies[word] = 1
            else:
                word_frequencies[word] += 1

word_frequencies

{'see': 1,
 'problem': 1,
 'young': 1,
 'children': 3,
 'want': 1,
 'listen': 1,
 'told': 1,
 'Jane': 1,
 'take': 2,
 'trash': 1,
 'refused': 1,
 'Ethan': 1,
 'Abby': 1,
 'fighting': 1,
 'cup': 1,
 'tea': 1,
 "n't": 1,
 'know': 1,
 'probably': 1,
 'shall': 1,
 'daycare': 1,
 'okay': 2,
 'Also': 1,
 'exams': 1,
 'starting': 1,
 'next': 1,
 'week': 1,
 'need': 1,
 'sleeping': 1,
 'early': 1,
 'enough': 2,
 'get': 1,
 'rest': 1,
 'perform': 1,
 'well': 1,
 'exam': 1,
 'everything': 1,
 'home': 1,
 'trip': 1,
 'expecting': 1,
 'chocolate': 2,
 'return': 1,
 'whatever': 1,
 'stock': 1,
 'biscuits': 1,
 'sweets': 1,
 'pleasantries': 1,
 'like': 1,
 'waiting': 1}

### Normalize word frequncies
Obtain the maximum frequncy from the dictionary and divide all frequencies with the max frequency to obtain normalized frequencies

In [62]:
max_frequency = max(word_frequencies.values())
print(max_frequency)

3


In [63]:
for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency

### Sentence Tokenization


In [64]:
sentences = sent_tokenize(text)
sentences

['\nYou see the problem with these young children is they do not want to listen now I told Jane\nto take the trash out but she refused.',
 'Now, Ethan and Abby are fighting over a cup of tea.',
 "I\ndon't know what to do for them, but probably we shall take them to the daycare, then they\ncan be okay.",
 'Also, they will be having exams starting next week.',
 'So, they will need to be\nsleeping early enough so that they can get enough rest for for so that they can perform well\nin the exam.',
 'Other than that, everything at home is okay.',
 'How is your trip and the children\nare expecting some chocolate when you return.',
 'So, whatever you do, stock up on chocolate\nbiscuits, sweets, and all these pleasantries that children like they will be waiting by']

### Weighted frequencies of the sentences

Weighted frequency for each sentence is obtained by adding together the frequency occurence of each word in the sentence.

In [72]:
sentences = sent_tokenize(text)
sentenceValue = dict()

for sentence in sentences:
    for word, freq in word_frequencies.items():
        if word in sentence.lower():
            if sentence in sentenceValue:
                sentenceValue[sentence] += freq
            else:
                sentenceValue[sentence] = freq
   
   
   
sumValues = 0
for sentence in sentenceValue:
    sumValues += sentenceValue[sentence]




In [73]:
average = int(sumValues / len(sentenceValue))

### Obtaining Summary

In [76]:
summary = ""
for sentence in sentences:
    if (sentence in sentenceValue) and (sentenceValue[sentence]>(1.2*average)):
        summary += " "+ sentence

print(summary)

 
You see the problem with these young children is they do not want to listen now I told Jane
to take the trash out but she refused. So, they will need to be
sleeping early enough so that they can get enough rest for for so that they can perform well
in the exam. So, whatever you do, stock up on chocolate
biscuits, sweets, and all these pleasantries that children like they will be waiting by


In [75]:
print(text)


You see the problem with these young children is they do not want to listen now I told Jane
to take the trash out but she refused. Now, Ethan and Abby are fighting over a cup of tea. I
don't know what to do for them, but probably we shall take them to the daycare, then they
can be okay. Also, they will be having exams starting next week. So, they will need to be
sleeping early enough so that they can get enough rest for for so that they can perform well
in the exam. Other than that, everything at home is okay. How is your trip and the children
are expecting some chocolate when you return. So, whatever you do, stock up on chocolate
biscuits, sweets, and all these pleasantries that children like they will be waiting by



# METHOD 2

### Importing libraries

In [None]:


from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize


###  Input text - to summarize

In [54]:

text  = """
You see the problem with these young children is they do not want to listen now I told Jane
to take the trash out but she refused. Now, Ethan and Abby are fighting over a cup of tea. I
don't know what to do for them, but probably we shall take them to the daycare, then they
can be okay. Also, they will be having exams starting next week. So, they will need to be
sleeping early enough so that they can get enough rest for for so that they can perform well
in the exam. Other than that, everything at home is okay. How is your trip and the children
are expecting some chocolate when you return. So, whatever you do, stock up on chocolate
biscuits, sweets, and all these pleasantries that children like they will be waiting by
"""


### Tokenizing the text

In [55]:
stopWords = set(stopwords.words("english"))
words = word_tokenize(text)

# Creating a frequency table to keep the
# score of each word

freqTable = dict()
for word in words:
	word = word.lower()
	if word in stopWords:
		continue
	if word in freqTable:
		freqTable[word] += 1
	else:
		freqTable[word] = 1


### Creating a dictionary to keep the score of each sentence

In [56]:


sentences = sent_tokenize(text)
sentenceValue = dict()

for sentence in sentences:
	for word, freq in freqTable.items():
		if word in sentence.lower():
			if sentence in sentenceValue:
				sentenceValue[sentence] += freq
			else:
				sentenceValue[sentence] = freq



sumValues = 0
for sentence in sentenceValue:
	sumValues += sentenceValue[sentence]

# Average value of a sentence from the original text

average = int(sumValues / len(sentenceValue))


### Storing sentences into our summary.

In [57]:


summary = ''
for sentence in sentences:
	if (sentence in sentenceValue) and (sentenceValue[sentence] > (1.2 * average)):
		summary += " " + sentence
print(summary)


 So, they will need to be
sleeping early enough so that they can get enough rest for for so that they can perform well
in the exam.
