## Steps Involved in Text Summarization:

1) Text cleaning
2) Sentence Tokenization
3) Word Tokenization
4) Word-frequency table
5) Summarization.

Text Summarization is converting whole text into medium size which contains only important speech.

In [25]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [26]:
stopwords = list(STOP_WORDS)
stopwords

['whatever',
 'whoever',
 'off',
 'your',
 'too',
 'fifteen',
 'give',
 'least',
 'along',
 'full',
 'there',
 'anyhow',
 'if',
 'although',
 'top',
 'not',
 'did',
 'wherever',
 '‘re',
 'sometime',
 'namely',
 'have',
 'rather',
 'they',
 'again',
 'back',
 'an',
 'towards',
 'against',
 'might',
 'six',
 'empty',
 'just',
 're',
 'am',
 '‘ll',
 'perhaps',
 'amongst',
 'became',
 'less',
 'we',
 'whereafter',
 'get',
 'whence',
 'then',
 'always',
 'often',
 '’s',
 'so',
 'thence',
 'may',
 'however',
 'ours',
 'under',
 'well',
 'call',
 'into',
 'yet',
 'anyway',
 'whereupon',
 'both',
 'enough',
 'seem',
 'except',
 'whereas',
 'yourselves',
 'the',
 'twelve',
 'latterly',
 'himself',
 'a',
 'about',
 'it',
 'yours',
 'ten',
 'nine',
 'by',
 'must',
 'keep',
 'ca',
 'when',
 'whether',
 'is',
 'move',
 'already',
 'be',
 'bottom',
 'made',
 'who',
 'whom',
 'moreover',
 'down',
 'meanwhile',
 'else',
 'somewhere',
 'their',
 'them',
 'she',
 'go',
 'nothing',
 'hereby',
 '’ve',
 's

In [27]:
nlp = spacy.load('en_core_web_sm')

In [28]:
text = """The self-study lessons in this section are written and organised according to the levels of the Common European Framework of Reference for languages (CEFR).
There are different types of texts and interactive exercises that practise the reading skills you need to do well in your studies, to get ahead at work and to communicate in English in your free time.
This story was really good and exciting with an sudden end. I thought the text message was a joke, thats exactly the reason why I think this kind of text is really interesting. The question is really awesome and difficult.
The input device used for games, the game controller, varies across platforms. Common controllers include gamepads, joysticks, mouse devices, keyboards, the touchscreens of mobile devices, or even a person's body, using a Kinect sensor.
Players view the game on a display device such as a television or computer monitor or sometimes on virtual reality head-mounted display goggles. There are often game sound effects, music and voice actor lines which come from loudspeakers or headphones.
Some games in the 2000s include haptic, vibration-creating effects, force feedback peripherals and virtual reality headsets.
The electronic systems used to play video games are called platforms. Video games are developed and released for one or several platforms and may not be available on others.
Specialized platforms such as arcade games, which present the game in a large, typically coin-operated chassis, were common in the 1980s in video arcades, but declined in popularity as other, more affordable platforms became available. 
These include dedicated devices such as video game consoles, as well as general-purpose computers like a laptop, desktop or handheld computing devices."""

In [29]:
doc = nlp(text)

In [30]:
tokens = [token.text for token in doc]
tokens

['The',
 'self',
 '-',
 'study',
 'lessons',
 'in',
 'this',
 'section',
 'are',
 'written',
 'and',
 'organised',
 'according',
 'to',
 'the',
 'levels',
 'of',
 'the',
 'Common',
 'European',
 'Framework',
 'of',
 'Reference',
 'for',
 'languages',
 '(',
 'CEFR',
 ')',
 '.',
 '\n',
 'There',
 'are',
 'different',
 'types',
 'of',
 'texts',
 'and',
 'interactive',
 'exercises',
 'that',
 'practise',
 'the',
 'reading',
 'skills',
 'you',
 'need',
 'to',
 'do',
 'well',
 'in',
 'your',
 'studies',
 ',',
 'to',
 'get',
 'ahead',
 'at',
 'work',
 'and',
 'to',
 'communicate',
 'in',
 'English',
 'in',
 'your',
 'free',
 'time',
 '.',
 '\n',
 'This',
 'story',
 'was',
 'really',
 'good',
 'and',
 'exciting',
 'with',
 'an',
 'sudden',
 'end',
 '.',
 'I',
 'thought',
 'the',
 'text',
 'message',
 'was',
 'a',
 'joke',
 ',',
 'that',
 's',
 'exactly',
 'the',
 'reason',
 'why',
 'I',
 'think',
 'this',
 'kind',
 'of',
 'text',
 'is',
 'really',
 'interesting',
 '.',
 'The',
 'question',
 'i

In [31]:
#here we are adding '\n' in punctuation..
punctuation += '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [32]:
word_frequencies = {}
for word in doc:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_frequencies.keys():
                word_frequencies[word.text] = 1
            else:
                word_frequencies[word.text]+=1

In [33]:
word_frequencies

{'self': 1,
 'study': 1,
 'lessons': 1,
 'section': 1,
 'written': 1,
 'organised': 1,
 'according': 1,
 'levels': 1,
 'Common': 2,
 'European': 1,
 'Framework': 1,
 'Reference': 1,
 'languages': 1,
 'CEFR': 1,
 'different': 1,
 'types': 1,
 'texts': 1,
 'interactive': 1,
 'exercises': 1,
 'practise': 1,
 'reading': 1,
 'skills': 1,
 'need': 1,
 'studies': 1,
 'ahead': 1,
 'work': 1,
 'communicate': 1,
 'English': 1,
 'free': 1,
 'time': 1,
 'story': 1,
 'good': 1,
 'exciting': 1,
 'sudden': 1,
 'end': 1,
 'thought': 1,
 'text': 2,
 'message': 1,
 'joke': 1,
 's': 1,
 'exactly': 1,
 'reason': 1,
 'think': 1,
 'kind': 1,
 'interesting': 1,
 'question': 1,
 'awesome': 1,
 'difficult': 1,
 'input': 1,
 'device': 2,
 'games': 5,
 'game': 5,
 'controller': 1,
 'varies': 1,
 'platforms': 5,
 'controllers': 1,
 'include': 3,
 'gamepads': 1,
 'joysticks': 1,
 'mouse': 1,
 'devices': 4,
 'keyboards': 1,
 'touchscreens': 1,
 'mobile': 1,
 'person': 1,
 'body': 1,
 'Kinect': 1,
 'sensor': 1,
 'Pl

In [34]:
max_frequency = max(word_frequencies.values())
max_frequency

5

In [35]:
for word in word_frequencies.keys():
    word_frequencies[word] = word_frequencies[word]/max_frequency

word_frequencies

{'self': 0.2,
 'study': 0.2,
 'lessons': 0.2,
 'section': 0.2,
 'written': 0.2,
 'organised': 0.2,
 'according': 0.2,
 'levels': 0.2,
 'Common': 0.4,
 'European': 0.2,
 'Framework': 0.2,
 'Reference': 0.2,
 'languages': 0.2,
 'CEFR': 0.2,
 'different': 0.2,
 'types': 0.2,
 'texts': 0.2,
 'interactive': 0.2,
 'exercises': 0.2,
 'practise': 0.2,
 'reading': 0.2,
 'skills': 0.2,
 'need': 0.2,
 'studies': 0.2,
 'ahead': 0.2,
 'work': 0.2,
 'communicate': 0.2,
 'English': 0.2,
 'free': 0.2,
 'time': 0.2,
 'story': 0.2,
 'good': 0.2,
 'exciting': 0.2,
 'sudden': 0.2,
 'end': 0.2,
 'thought': 0.2,
 'text': 0.4,
 'message': 0.2,
 'joke': 0.2,
 's': 0.2,
 'exactly': 0.2,
 'reason': 0.2,
 'think': 0.2,
 'kind': 0.2,
 'interesting': 0.2,
 'question': 0.2,
 'awesome': 0.2,
 'difficult': 0.2,
 'input': 0.2,
 'device': 0.4,
 'games': 1.0,
 'game': 1.0,
 'controller': 0.2,
 'varies': 0.2,
 'platforms': 1.0,
 'controllers': 0.2,
 'include': 0.6,
 'gamepads': 0.2,
 'joysticks': 0.2,
 'mouse': 0.2,
 'de

In [36]:
sentence_tokens = [sent for sent in doc.sents]
sentence_tokens

[The self-study lessons in this section are written and organised according to the levels of the Common European Framework of Reference for languages (CEFR).,
 There are different types of texts and interactive exercises that practise the reading skills you need to do well in your studies, to get ahead at work and to communicate in English in your free time.,
 This story was really good and exciting with an sudden end.,
 I thought the text message was a joke, thats exactly the reason why I think this kind of text is really interesting.,
 The question is really awesome and difficult.,
 The input device used for games, the game controller, varies across platforms.,
 Common controllers include gamepads, joysticks, mouse devices, keyboards, the touchscreens of mobile devices, or even a person's body, using a Kinect sensor.,
 Players view the game on a display device such as a television or computer monitor or sometimes on virtual reality head-mounted display goggles.,
 There are often game

In [37]:
len(sentence_tokens)

14

In [38]:
sentence_scores ={}
for sent in sentence_tokens:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_scores.keys():
                sentence_scores[sent] = word_frequencies[word.text.lower()]
            else:
                sentence_scores[sent] += word_frequencies[word.text.lower()]

In [39]:
sentence_scores

{The self-study lessons in this section are written and organised according to the levels of the Common European Framework of Reference for languages (CEFR).: 1.9999999999999998,
 There are different types of texts and interactive exercises that practise the reading skills you need to do well in your studies, to get ahead at work and to communicate in English in your free time.: 3.0000000000000004,
 This story was really good and exciting with an sudden end.: 1.0,
 I thought the text message was a joke, thats exactly the reason why I think this kind of text is really interesting.: 2.6,
 The question is really awesome and difficult.: 0.6000000000000001,
 The input device used for games, the game controller, varies across platforms.: 4.0,
 Common controllers include gamepads, joysticks, mouse devices, keyboards, the touchscreens of mobile devices, or even a person's body, using a Kinect sensor.: 4.400000000000001,
 Players view the game on a display device such as a television or compute

In [41]:
from heapq import nlargest

In [40]:
select_length = int(len(sentence_tokens)* 0.3)
select_length                   #to give 4 sentences..

4

In [49]:
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)

In [50]:
summary

[Specialized platforms such as arcade games, which present the game in a large, typically coin-operated chassis, were common in the 1980s in video arcades, but declined in popularity as other, more affordable platforms became available. ,
 These include dedicated devices such as video game consoles, as well as general-purpose computers like a laptop, desktop or handheld computing devices.,
 Common controllers include gamepads, joysticks, mouse devices, keyboards, the touchscreens of mobile devices, or even a person's body, using a Kinect sensor.,
 Players view the game on a display device such as a television or computer monitor or sometimes on virtual reality head-mounted display goggles.]

In [51]:
#combining above sentences
final_summary = [word.text for word in summary]
final_summary

['Specialized platforms such as arcade games, which present the game in a large, typically coin-operated chassis, were common in the 1980s in video arcades, but declined in popularity as other, more affordable platforms became available. \n',
 'These include dedicated devices such as video game consoles, as well as general-purpose computers like a laptop, desktop or handheld computing devices.',
 "Common controllers include gamepads, joysticks, mouse devices, keyboards, the touchscreens of mobile devices, or even a person's body, using a Kinect sensor.\n",
 'Players view the game on a display device such as a television or computer monitor or sometimes on virtual reality head-mounted display goggles.']

In [52]:
summary = ' '.join(final_summary)
summary

"Specialized platforms such as arcade games, which present the game in a large, typically coin-operated chassis, were common in the 1980s in video arcades, but declined in popularity as other, more affordable platforms became available. \n These include dedicated devices such as video game consoles, as well as general-purpose computers like a laptop, desktop or handheld computing devices. Common controllers include gamepads, joysticks, mouse devices, keyboards, the touchscreens of mobile devices, or even a person's body, using a Kinect sensor.\n Players view the game on a display device such as a television or computer monitor or sometimes on virtual reality head-mounted display goggles."