# Extractive summarization

The simplest way to summarize a block text is by selecting the most important sentence. This is called *extractive summarization*. 

Let's try some algorithms on the first chapter of "_A Scandal in Behomia_".

In [1]:
%load_ext autoreload
%autoreload 2

In [46]:
from book_reading import Book

sherlock=Book("books/Sherlock.html")

first_chapter_p=[*sherlock.get_paragraphs(0,0)]

In [47]:
first_chapter_p[0]

'To Sherlock Holmes she is always the woman. I have seldom heard him mention her under any other name. In his eyes she eclipses and predominates the whole of her sex. It was not that he felt any emotion akin to love for Irene Adler. All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind. He was, I take it, the most perfect reasoning and observing machine that the world has seen, but as a lover he would have placed himself in a false position. He never spoke of the softer passions, save with a gibe and a sneer. They were admirable things for the observer—excellent for drawing the veil from men’s motives and actions. But for the trained reasoner to admit such intrusions into his own delicate and finely adjusted temperament was to introduce a distracting factor which might throw a doubt upon all his mental results. Grit in a sensitive instrument, or a crack in one of his own high-power lenses, would not be more disturbing than a strong emo

## TextRank

A very naive algroithm, that just uses word frequency to decide the importance. 

In [48]:
import gensim
from gensim.summarization import summarize

In [49]:
summarize(first_chapter_p[0])

'It was not that he felt any emotion akin to love for Irene Adler.\nAll emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.'

In [50]:
print(summarize(first_chapter_p[1]))

I had seen little of Holmes lately.


What happens if we do this for every paragraphs? Will it be understandable?

In [53]:
for p in first_chapter_p:
    try:
        summary=summarize(p)
        if len(summary)>0:
            print(summary)
    except:
        continue

It was not that he felt any emotion akin to love for Irene Adler.
All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
I had seen little of Holmes lately.
His rooms were brilliantly lit, and, even as I looked up, I saw his tall, spare figure pass twice in a dark silhouette against the blind.
Now, I know that there are seventeen steps, because I have both seen and observed.
Stay where you are.
From the lower part of the face he appeared to be a man of strong character, with a thick, hanging lip, and a long, straight chin suggestive of resolution pushed to the length of obstinacy.
Living in London—quite so!


What if we pass a full chapter?

In [54]:
print(summarize(" ".join([*first_chapter_p])))

I had seen little of Holmes lately.
My own complete happiness, and the home-centred interests which rise up around the man who first finds himself master of his own establishment, were sufficient to absorb all my attention, while Holmes, who loathed every form of society with his whole Bohemian soul, remained in our lodgings in Baker Street, buried among his old books, and alternating from week to week between cocaine and ambition, the drowsiness of the drug, and the fierce energy of his own keen nature.
As I passed the well-remembered door, which must always be associated in my mind with my wooing, and with the dark incidents of the Study in Scarlet, I was seized with a keen desire to see Holmes again, and to know how he was employing his extraordinary powers.
How do I know that you have been getting yourself very wet lately, and that you have a most clumsy and careless servant girl?” “My dear Holmes,” said I, “this is too much.
“It is simplicity itself,” said he; “my eyes tell me tha

It is clear that is won't do for **litarature** texts, no matter how much we will fine-tune the options. Sentences rely heavily on the context, not to speak of dialogue, which makes no sence at all out-of-context.

This will not be an easy task.

## A bit more advanced: Sumy

### Lexrank

In [55]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

In [70]:
my_parser = PlaintextParser.from_string("\n\n".join(first_chapter_p[:5]),Tokenizer('english'))

In [72]:
lex_rank_summarizer = LexRankSummarizer()
for sentence in lex_rank_summarizer(my_parser.document, sentences_count=10):
    print(sentence)

To Sherlock Holmes she is always the woman.
I have seldom heard him mention her under any other name.
It was not that he felt any emotion akin to love for Irene Adler.
He was, I take it, the most perfect reasoning and observing machine that the world has seen, but as a lover he would have placed himself in a false position.
Grit in a sensitive instrument, or a crack in one of his own high-power lenses, would not be more disturbing than a strong emotion in a nature such as his.
And yet there was but one woman to him, and that woman was the late Irene Adler, of dubious and questionable memory.
My own complete happiness, and the home-centred interests which rise up around the man who first finds himself master of his own establishment, were sufficient to absorb all my attention, while Holmes, who loathed every form of society with his whole Bohemian soul, remained in our lodgings in Baker Street, buried among his old books, and alternating from week to week between cocaine and ambition, t

In [79]:
from sumy.summarizers.lsa import LsaSummarizer

lsa_summarizer=LsaSummarizer()
for sentence in lsa_summarizer(my_parser.document, sentences_count=10):
    print(sentence)

I have seldom heard him mention her under any other name.
In his eyes she eclipses and predominates the whole of her sex.
It was not that he felt any emotion akin to love for Irene Adler.
All emotions, and that one particularly, were abhorrent to his cold, precise but admirably balanced mind.
He was, I take it, the most perfect reasoning and observing machine that the world has seen, but as a lover he would have placed himself in a false position.
But for the trained reasoner to admit such intrusions into his own delicate and finely adjusted temperament was to introduce a distracting factor which might throw a doubt upon all his mental results.
My own complete happiness, and the home-centred interests which rise up around the man who first finds himself master of his own establishment, were sufficient to absorb all my attention, while Holmes, who loathed every form of society with his whole Bohemian soul, remained in our lodgings in Baker Street, buried among his old books, and alterna

In [81]:
from sumy.summarizers.luhn import LuhnSummarizer
luhn_summarizer=LuhnSummarizer()
for sentence in luhn_summarizer(my_parser.document, sentences_count=10):
    print(sentence)

He was, I take it, the most perfect reasoning and observing machine that the world has seen, but as a lover he would have placed himself in a false position.
Grit in a sensitive instrument, or a crack in one of his own high-power lenses, would not be more disturbing than a strong emotion in a nature such as his.
And yet there was but one woman to him, and that woman was the late Irene Adler, of dubious and questionable memory.
My own complete happiness, and the home-centred interests which rise up around the man who first finds himself master of his own establishment, were sufficient to absorb all my attention, while Holmes, who loathed every form of society with his whole Bohemian soul, remained in our lodgings in Baker Street, buried among his old books, and alternating from week to week between cocaine and ambition, the drowsiness of the drug, and the fierce energy of his own keen nature.
He was still, as ever, deeply attracted by the study of crime, and occupied his immense faculti