### Automatic Summarization With Python
+ Extractive Summarization
        - (selecting a subset of sentences /extracts objects from the entire collection)
+ Abstractive Summarization
        - (paraphrases)
+ Aided Summarization
        - (highlighting candidate passages to be included in the summary)

+ pip install gensim
+ pip install gensim_sum_ext

+ Gensim uses textrank summarization algorithm

In [1]:
# load the pkgs
from gensim.summarization import summarize



In [2]:
# Our Text
mytext = """
Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax.

Automatic data summarization is part of machine learning and data mining. The main idea of summarization is to find a subset of data which contains the "information" of the entire set. Such techniques are widely used in industry today. Search engines are an example; others include summarization of documents, image collections and videos. Document summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images.
For surveillance videos, one might want to extract the important events from the uneventful context.

There are two general approaches to automatic summarization: extraction and abstraction. Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary. In contrast, abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might express. Such a summary might include verbal innovations. Research to date has focused primarily on extractive methods, which are appropriate for image collection summarization and video summarization."""

In [4]:
# Length of text
len(mytext)

1574

In [3]:
# Summarize the Text
summarize(mytext)

'Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.\nDocument summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images.'

In [5]:
summary_txt = summarize(mytext)

In [6]:
# length of summarized text
len(summary_txt)

410

In [7]:
summary_txt

'Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.\nDocument summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images.'

#### How to Get the Result as A List of String
+ split=True

In [8]:
summarize(mytext,split=True)

['Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.',
 'Document summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images.']

#### How to Set the Amount of Text You Want As Summary
+ ratio 
      - default is 0.2 or 20%
+ word_count

In [9]:
# Get 50% of summary output
summarize(mytext,ratio=0.5)

'Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.\nSearch engines are an example; others include summarization of documents, image collections and videos.\nDocument summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images.\nThere are two general approaches to automatic summarization: extraction and abstraction.\nExtractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary.\nResearch to date has focused primarily on extractive methods, which are appropriate for image collection summarization and video summarization.'

In [10]:
# Get 20% of summary output
summarize(mytext,ratio=0.2)

'Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.\nDocument summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images.'

#### Narrative
+ Noticed that the 0.2 or 20% gave us the same result as the first summary

#### How to Get the Maximum amount of words in the summary. 
+ word_count

In [11]:
# Words about 50
summarize(mytext,word_count=50)

'Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.\nDocument summarization tries to create a representative summary or abstract of the entire document, by finding the most informative sentences, while in image summarization the system finds the most representative and important (i.e. salient) images.'

In [12]:
summary_txt2 = summarize(mytext)

In [13]:
# Number of Words
len(summary_txt2.split())

61

### How to Find the Position of the Extracted Text
+

In [15]:
docx = open("example.txt").read()

In [16]:
docx

'The nativity of Jesus or birth of Jesus is described in the gospels of Luke and Matthew. The two accounts agree that Jesus was born in Bethlehem in the time of Herod the Great, that his mother Mary was married to Joseph, who was of Davidic descent and was not his biological father, and that his birth was effected by divine intervention, but the two gospels agree on little else.[1] Matthew does not mention the census, annunciation to the shepherds or presentation in the Temple, and does not give the name of the angel that appeared to Joseph to foretell the birth. In Luke there is no mention of Magi, no flight into Egypt, or Massacre of the Innocents, and the angel who announces the coming birth to Mary is named (as Gabriel).[1]\n\nThe consensus of scholars is that both gospels were written about AD 75-85,[2] and while it is possible that one account might be based on the other, or that the two share common source material, the majority conclusion is that, in respect of the nativity sto

In [40]:
mysummary = summarize(docx,ratio=0.1)

In [41]:
mysummary

'Christian congregations of the Western tradition (including the Catholic Church, the Western Rite Orthodox, the Anglican Communion, and many Protestants) begin observing the season of Advent four Sundays before Christmas, the traditional feast-day of his birth, which falls on December 25.'

In [42]:
mysummary in docx

True

In [43]:
# Method 1 Using Find
docx.find(mysummary)

1631

In [44]:
# Method 2 Using Index
docx.index(mysummary)

1631

In [63]:
# Start of the Extracted Text
docx[1631:]

'Christian congregations of the Western tradition (including the Catholic Church, the Western Rite Orthodox, the Anglican Communion, and many Protestants) begin observing the season of Advent four Sundays before Christmas, the traditional feast-day of his birth, which falls on December 25.\n\nChristians of the Eastern Orthodox Church and Oriental Orthodox Church observe a similar season, sometimes called Advent but also called the "Nativity Fast", which begins forty days before Christmas. Some Eastern Orthodox Christians (e.g. Greeks and Syrians) celebrate Christmas on December 25. Other Orthodox (e.g. Copts, Ethiopians, Georgians, and Russians) celebrate Christmas on (the Gregorian) January 7 (Koiak 29 on coptic calendar)[6] as a result of their churches continuing to follow the Julian calendar, rather than the modern day Gregorian calendar.\n\n'

#### Method 3
+ Split Sentences
+ Find Location in Our List of sentences


In [64]:
from gensim.summarization.textcleaner import split_sentences

In [65]:
# Split Sentences
split_sentences(docx)

['The nativity of Jesus or birth of Jesus is described in the gospels of Luke and Matthew.',
 'The two accounts agree that Jesus was born in Bethlehem in the time of Herod the Great, that his mother Mary was married to Joseph, who was of Davidic descent and was not his biological father, and that his birth was effected by divine intervention, but the two gospels agree on little else.[1] Matthew does not mention the census, annunciation to the shepherds or presentation in the Temple, and does not give the name of the angel that appeared to Joseph to foretell the birth.',
 'In Luke there is no mention of Magi, no flight into Egypt, or Massacre of the Innocents, and the angel who announces the coming birth to Mary is named (as Gabriel).[1]',
 'The consensus of scholars is that both gospels were written about AD 75-85,[2] and while it is possible that one account might be based on the other, or that the two share common source material, the majority conclusion is that, in respect of the na

In [66]:
# List of all sentences
all_sentences = split_sentences(docx)

In [67]:
# Is Our Summary in Our List
mysummary in all_sentences

True

In [69]:
# Location of our Summary
all_sentences.index(mysummary)

8

In [70]:
# Actural Location
all_sentences[8]

'Christian congregations of the Western tradition (including the Catholic Church, the Western Rite Orthodox, the Anglican Communion, and many Protestants) begin observing the season of Advent four Sundays before Christmas, the traditional feast-day of his birth, which falls on December 25.'

In [91]:
### Thanks For Watching
### Jesse E.Agbe (JCharis)
### Jesus Saves @JCharisTech
### J-Secur1ty