In [1]:
text="""
There are broadly two types of extractive summarization tasks depending on what the summarization program focuses on. The first is generic summarization, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.). The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query. Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.
An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document. Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a cluster of articles on the same topic). This problem is called multi-document summarization. A related application is summarizing news articles. Imagine a system, which automatically pulls together news articles on a given topic (from the web), and concisely represents the latest news as a summary.
Image collection summarization is another application example of automatic summarization. It consists in selecting a representative set of images from a larger set of images.[3] A summary in this context is useful to show the most representative images of results in an image collection exploration system. Video summarization is a related domain, where the system automatically creates a trailer of a long video. This also has applications in consumer or personal videos, where one might want to skip the boring or repetitive actions. Similarly, in surveillance videos, one would want to extract important and suspicious activity, while ignoring all the boring and redundant frames captured.
"""

In [2]:
!pip3 install -U spacy

!python3 -m spacy download en_core_web_sm

Collecting spacy
[?25l  Downloading https://files.pythonhosted.org/packages/e5/bf/ca7bb25edd21f1cf9d498d0023808279672a664a70585e1962617ca2740c/spacy-2.3.5-cp36-cp36m-manylinux2014_x86_64.whl (10.4MB)
[K     |████████████████████████████████| 10.4MB 5.2MB/s 
Collecting thinc<7.5.0,>=7.4.1
[?25l  Downloading https://files.pythonhosted.org/packages/c0/1a/c3e4ab982214c63d743fad57c45c5e68ee49e4ea4384d27b28595a26ad26/thinc-7.4.5-cp36-cp36m-manylinux2014_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 46.3MB/s 
Installing collected packages: thinc, spacy
  Found existing installation: thinc 7.4.0
    Uninstalling thinc-7.4.0:
      Successfully uninstalled thinc-7.4.0
  Found existing installation: spacy 2.2.4
    Uninstalling spacy-2.2.4:
      Successfully uninstalled spacy-2.2.4
Successfully installed spacy-2.3.5 thinc-7.4.5
Collecting en_core_web_sm==2.3.1
[?25l  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_we

In [5]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [6]:
stopwords=list(STOP_WORDS)

In [7]:
nlp=spacy.load('en_core_web_sm')

In [8]:
doc=nlp(text)

In [9]:
tokens=[token.text for token in doc]
print(tokens)

['\n', 'There', 'are', 'broadly', 'two', 'types', 'of', 'extractive', 'summarization', 'tasks', 'depending', 'on', 'what', 'the', 'summarization', 'program', 'focuses', 'on', '.', 'The', 'first', 'is', 'generic', 'summarization', ',', 'which', 'focuses', 'on', 'obtaining', 'a', 'generic', 'summary', 'or', 'abstract', 'of', 'the', 'collection', '(', 'whether', 'documents', ',', 'or', 'sets', 'of', 'images', ',', 'or', 'videos', ',', 'news', 'stories', 'etc', '.', ')', '.', 'The', 'second', 'is', 'query', 'relevant', 'summarization', ',', 'sometimes', 'called', 'query', '-', 'based', 'summarization', ',', 'which', 'summarizes', 'objects', 'specific', 'to', 'a', 'query', '.', 'Summarization', 'systems', 'are', 'able', 'to', 'create', 'both', 'query', 'relevant', 'text', 'summaries', 'and', 'generic', 'machine', '-', 'generated', 'summaries', 'depending', 'on', 'what', 'the', 'user', 'needs', '.', '\n', 'An', 'example', 'of', 'a', 'summarization', 'problem', 'is', 'document', 'summarizatio

In [10]:
punctuation=punctuation+"\n"
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [18]:
word_freq={}
for word in doc:
    if word.text.lower() not in stopwords:
        if word.text.lower() not in punctuation:
            if word.text not in word_freq.keys():
                word_freq[word.text]=1
            else:
                word_freq[word.text]=word_freq[word.text]+1

In [19]:
word_freq

{'Image': 1,
 'Imagine': 1,
 'Similarly': 1,
 'Summarization': 1,
 'Video': 1,
 'able': 1,
 'abstract': 2,
 'actions': 1,
 'activity': 1,
 'application': 2,
 'applications': 1,
 'articles': 3,
 'attempts': 1,
 'automatic': 1,
 'automatically': 3,
 'based': 1,
 'boring': 2,
 'broadly': 1,
 'called': 2,
 'captured': 1,
 'cluster': 1,
 'collection': 3,
 'concisely': 1,
 'consists': 1,
 'consumer': 1,
 'context': 1,
 'create': 1,
 'creates': 1,
 'depending': 2,
 'document': 4,
 'documents': 2,
 'domain': 1,
 'etc': 1,
 'example': 3,
 'exploration': 1,
 'extract': 1,
 'extractive': 1,
 'focuses': 2,
 'frames': 1,
 'generated': 1,
 'generating': 1,
 'generic': 3,
 'given': 2,
 'ignoring': 1,
 'image': 1,
 'images': 3,
 'images.[3': 1,
 'important': 1,
 'interested': 1,
 'larger': 1,
 'latest': 1,
 'long': 1,
 'machine': 1,
 'multi': 1,
 'multiple': 1,
 'needs': 1,
 'news': 4,
 'objects': 1,
 'obtaining': 1,
 'personal': 1,
 'problem': 2,
 'produce': 1,
 'program': 1,
 'pulls': 1,
 'query': 4

In [21]:
max_freq=max(word_freq.values())
max_freq

11

In [23]:
#normalizing the freq
for word in word_freq.keys():
    word_freq[word]=word_freq[word]/max_freq

word_freq

{'Image': 0.09090909090909091,
 'Imagine': 0.09090909090909091,
 'Similarly': 0.09090909090909091,
 'Summarization': 0.09090909090909091,
 'Video': 0.09090909090909091,
 'able': 0.09090909090909091,
 'abstract': 0.18181818181818182,
 'actions': 0.09090909090909091,
 'activity': 0.09090909090909091,
 'application': 0.18181818181818182,
 'applications': 0.09090909090909091,
 'articles': 0.2727272727272727,
 'attempts': 0.09090909090909091,
 'automatic': 0.09090909090909091,
 'automatically': 0.2727272727272727,
 'based': 0.09090909090909091,
 'boring': 0.18181818181818182,
 'broadly': 0.09090909090909091,
 'called': 0.18181818181818182,
 'captured': 0.09090909090909091,
 'cluster': 0.09090909090909091,
 'collection': 0.2727272727272727,
 'concisely': 0.09090909090909091,
 'consists': 0.09090909090909091,
 'consumer': 0.09090909090909091,
 'context': 0.09090909090909091,
 'create': 0.09090909090909091,
 'creates': 0.09090909090909091,
 'depending': 0.18181818181818182,
 'document': 0.3636

In [24]:
sentence_token=[sent for sent in doc.sents]
print(sentence_token)

[
There are broadly two types of extractive summarization tasks depending on what the summarization program focuses on., The first is generic summarization, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.)., The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query., Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.
, An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document., Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a cluster of articles on the same topic)., This problem is called multi-document summarization., A related application 

In [25]:
sentence_score={}
for sent in sentence_token:
    for word in sent:
        if word.text.lower() in word_freq.keys():
            if sent not in sentence_score.keys():
                sentence_score[sent]=word_freq[word.text.lower()]
            else:
                sentence_score[sent]+=word_freq[word.text.lower()]

In [26]:
sentence_score

{
 There are broadly two types of extractive summarization tasks depending on what the summarization program focuses on.: 2.818181818181818,
 The first is generic summarization, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.).: 3.9999999999999987,
 The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query.: 3.909090909090909,
 Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.: 3.09090909090909,
 An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document.: 3.9999999999999996,
 Sometimes one might be interested in generating a summary from a single source document, while others can use multiple source documents (for example, a cluster of artic

In [28]:
#getting top 30 persent score
from heapq import nlargest

In [30]:
select_length=int(len(sentence_token)*0.3)
select_length

4

In [32]:
summary=nlargest(select_length,sentence_score,key=sentence_score.get)
summary

[An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document.,
 The first is generic summarization, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.).,
 The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query.,
 Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.]

In [33]:
#combine these sentence together
final_summary=[word.text for word in summary]

In [35]:
final_summary=' '.join(final_summary)

In [36]:
final_summary

'An example of a summarization problem is document summarization, which attempts to automatically produce an abstract from a given document. The first is generic summarization, which focuses on obtaining a generic summary or abstract of the collection (whether documents, or sets of images, or videos, news stories etc.). The second is query relevant summarization, sometimes called query-based summarization, which summarizes objects specific to a query. Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.\n'