# Survery of Different Methods Used for Text Summarization
## Overview:
In this notebook we will look at three broad categories of text summarization algorithms.  
> A. Extractive Text Summarization Using Unsupervised Learning:  
1. Lead-3
2. Random Sampling
3. Text Rank
4. Using spaCy (based upon word frequency in the text)

> B. Abstractive Text Summarization Using Supervised Learning:  
1. RNN/LSTM based seq-seq models
2. Pointer-Generator Network
3. BERT based summarizer

> C. Reinforcement Learning + Supervised Learning Based Text Summarization:  
1. Fast Abstractive Summarization with Reinforce-Selected Sentence Re-writing
2. A Deep Reinforced Model for Abstractive Summarization


## Mount Google Drive and Import Packages

In [None]:
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


In [None]:
import sys
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# for auto-reloading external modules (automatically reloads before using an imported module)
%load_ext autoreload
%autoreload 2

#To ensure that the Colab Python interpreter can load Python files from within
PATH_NAME = os.path.join('/', 'content', 'gdrive', 'My Drive', 'Colab Notebooks', 'UCSDX_MLE_Bootcamp', 'Text_Summarization_UCSD', 'Step_6_Literature_Survey_13-6-1')
sys.path.append(PATH_NAME)
print(sys.path)
os.chdir(PATH_NAME)

In [None]:
!pip install rouge

Collecting rouge
  Downloading https://files.pythonhosted.org/packages/43/cc/e18e33be20971ff73a056ebdb023476b5a545e744e3fc22acd8c758f1e0d/rouge-1.0.0-py3-none-any.whl
Installing collected packages: rouge
Successfully installed rouge-1.0.0


## Results from the BigPatent Paper
This will give us some idea on what to expect from the different models on this dataset, as it is a more challenging dataset.

![bigpatent](https://drive.google.com/uc?export=view&id=1xwU4T11oWLl1oARsp1VkleOf9bwVryDJ)

Image source and for details: BigPatent Paper <https://arxiv.org/pdf/1906.03741.pdf>

## Load Data and Utility Functions
We will use cpc_codes 'de' from the BigPatent dataset

In [None]:
import utils

In [None]:
data = utils.load_data_string(split_type='train', cpc_codes='de', fname='data0_str_json.gz')
for df in data:
    print(df.head(5), df.shape, df.columns)
    print(df.iloc[0,0])
    print(df.iloc[0,1])
del data

data = utils.load_data_numpy(split_type='train', cpc_codes='de', fname='data0_np.npz')
for data_np in data:
    print(data_np['data'].shape, data_np['data'][0,0].shape[1], data_np['data'][0,1].shape[1])
    print(data_np['data'][0,1])
del data

                                         description  ...                                  original_abstract
0  [upon, review, of, the, detailed, description,...  ...  a hose puller that includes puller wheels that...
1  [referring, to, the, drawings, for, a, clearer...  ...  an articulated ballast cleaning system utilize...
2  [there, is, represented, in, --oov--, a, porti...  ...  a premium paving unit of elastomeric binder ma...
3  [in, the, drawings, ,, similar, or, correspond...  ...  apparatus for washing and / or dewatering cell...
4  [without, limiting, the, scope, of, the, inven...  ...  a patterned conductive textile is provided by ...

[5 rows x 5 columns] (21099, 5) Index(['description', 'abstract', 'cpc_code', 'original_description',
       'original_abstract'],
      dtype='object')
['upon', 'review', 'of', 'the', 'detailed', 'description', 'and', 'the', 'accompanying', 'drawings', 'provided', 'herein', ',', 'it', 'will', 'be', 'apparent', 'to', 'one', 'of', 'ordinary', '

## Glove Word Embeddings

In [None]:
# !wget http://nlp.stanford.edu/data/glove.6B.zip
# !unzip glove*.zip
# !rm -f  glove*.zip

In [None]:
# Extract word vectors
word_embeddings = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    word_embeddings[word] = coefs
f.close()

In [None]:
# !ls
# !pwd
# word_embeddings['the']
print(len(word_embeddings))
print(len(word_embeddings['the']))

400000
100


## A. Extractive Text Summarization Using Unsupervised Learning

> 1. Lead-3
> 2. Random Sampling
> 3. Text Rank
> 4. Rank Sentences Using Their Mean TF-IDF Scores

In [None]:
import text_summarizer
import warnings
warnings.filterwarnings('ignore')

### A.1: Lead-3
This is a baseline model where we just select the first 3 sentences in the text as our summary.

In [None]:
#selecting the first three sentences from text
def text_lead3_summary(text):
    summary = text[0:3]
    summary = utils.sents2text(summary)
    return summary
print('Selecting First 3 Sentences From the Text')
text_summarizer.Text_Summarizer(algo_type=text_lead3_summary).predict(data_size='full')

Selecting First 3 Sentences From the Text
bleu-4     0.156101
rouge_1    0.313269
rouge_2    0.083678
rouge_l    0.258909
dtype: float64

Here's an example summary...
Original Abstract:
a tamper - resistant remotely monitorable electronic seal including a shaft portion , a socket arranged to engage the shaft position in a monitorable manner , whereby disengagement of the socket and the shaft portion results in a monitorable event , and a wireless communicator associated with at least one of the shaft portion and the socket and being operative to provide a remotely monitorable indication of the monitorable event .
Predicted Abstract:
reference is now made to --oov-- a and --oov-- , which are simplified pictorial illustrations of two stages in the assembly of a press - fit electronic seal constructed and operative in accordance with a preferred embodiment of the present invention . as seen in --oov-- a and --oov-- , there is provided a tamper - resistant electronic seal which preferably 

Note that the raw input text has been cleaned, i.e. proprocessed. And in one of the preprocessing steps, all numbers are removed and replaced by a generic token for numbers, i.e. '--#number#--.' And since there are many numbers in the text, we see a lot of --#number#-- tokens in the predicted abstract. 

### A.2: Randomly Selecting 3 Sentences
In this approach we randomly select three sentences from the text as our summary.

In [None]:
#selecting just random sentences from text
def text_random_summary(text):
    top_n = 3
    ranks = np.random.choice(len(text), top_n)
    summary = []
    for r in ranks:
        summary.append(text[r])
    summary = utils.sents2text(summary)
    return summary
print('Randomly Selecting 3 Sentences From the Text')
text_summarizer.Text_Summarizer(algo_type=text_random_summary).predict(data_size='full')

Randomly Selecting 3 Sentences From the Text
bleu-4     0.167779
rouge_1    0.295383
rouge_2    0.067555
rouge_l    0.239744
dtype: float64

Here's an example summary...
Original Abstract:
a tamper - resistant remotely monitorable electronic seal including a shaft portion , a socket arranged to engage the shaft position in a monitorable manner , whereby disengagement of the socket and the shaft portion results in a monitorable event , and a wireless communicator associated with at least one of the shaft portion and the socket and being operative to provide a remotely monitorable indication of the monitorable event .
Predicted Abstract:
it is seen that this break produces a disconnection or significant change in the electrical properties of the conductive loop defined by conductors --#number#-- and --#number#-- . conventional wireless monitoring circuitry ( not shown ) may be employed to receive information which is transmitted by rf transceiver --#number#-- and indicates tampering with

### A.3: Text Rank
**Overview**  
The objective of this algorithm is to rank each sentence in the text, and the output summary will be the top k sentences. Here is a synopsis of how this algorithm works:  
1. Using some type of word embeddings (e.g. Glove embeddings), each sentence in the text is represented as a vector.
2. This is achieved by adding together the word embeddings of each word in the sentence and dividing by the sentence length. So it is a bag of words type of model.
3. Then a cosine similarity matrix between any pair of sentences in the text is constructed.
4. The similarity matrix is then converted into a probability matrix using the softmax function (or other such functions)
5. The sentence ranking (i.e. steady state solution of this Markov matrix) is proportional to the eigen vector corresponding to the largest eigen value (i.e. 1) of this Markov matrix. 

References:  
- https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
- https://www.analyticsvidhya.com/blog/2018/11/introduction-text-summarization-textrank-python/

In [None]:
%%timeit -r 1 -n 1
print('Text Rank Using 100d Glove Vectors/Features:')
text_summarizer.Text_Summarizer(word_embeddings=word_embeddings).predict(data_size=500)

Text Rank 100d:
bleu-4     0.138449
rouge_1    0.330658
rouge_2    0.089889
rouge_l    0.257293
dtype: float64

Here's an example summary...
Original Abstract:
a tamper - resistant remotely monitorable electronic seal including a shaft portion , a socket arranged to engage the shaft position in a monitorable manner , whereby disengagement of the socket and the shaft portion results in a monitorable event , and a wireless communicator associated with at least one of the shaft portion and the socket and being operative to provide a remotely monitorable indication of the monitorable event .
Predicted Abstract:
a pair of elongate conductors --#number#-- and --#number#-- , at least one of which includes a series connected reed switch --#number#-- which is closed by magnet --#number#-- when shaft portion --#number#-- is in lockable engagement with lock --#number#-- , extends through shaft portion --#number#-- through to the tip --#number#-- thereof and is configured and mounted in shaft port

Using a 200 dimensional Glove vector for word embeddings, we get the following (slightly better) results:

Text Rank 200d:  


> bleu-4:     0.140467  
> rouge_1:    0.333766  
> rouge_2:    0.096971   
> rouge_l:    0.260112

In [None]:
%%timeit -r 1 -n 1
print('Text Rank Using TF-IDF Features:')
text_summarizer.Text_Summarizer().predict(data_size=500)

Text Rank Using TF-IDF Features:
bleu-4     0.144048
rouge_1    0.321074
rouge_2    0.091552
rouge_l    0.269977
dtype: float64

Here's an example summary...
Original Abstract:
a tamper - resistant remotely monitorable electronic seal including a shaft portion , a socket arranged to engage the shaft position in a monitorable manner , whereby disengagement of the socket and the shaft portion results in a monitorable event , and a wireless communicator associated with at least one of the shaft portion and the socket and being operative to provide a remotely monitorable indication of the monitorable event .
Predicted Abstract:
a conductive loop --#number#-- , including a series connected reed switch --#number#-- which is closed by magnet --#number#-- when shaft portion --#number#-- is in lockable engagement with lock --#number#-- , preferably extends through shaft portion --#number#-- through to the tip --#number#-- thereof and is configured and mounted in shaft portion --#number#-- , suc

Based upon the BLEU and ROUGE metrics, we can see that the performance of the text rank model is not so good. It's actually quite similar to just randomly selecting 3 sentences or selecting the first 3 sentences from the text. Which is surprising!

Upon further evaluation, I noticed that the probability matrix (generated from the cosine similarity between every sentence pair) is quite close to being uniformly distributed. This is because the cosine similarity (using the computed sentence embeddings) for different sentences in the text is not too different from each other. I suspect the sentence embeddings are similar to each other because we are using a bag of words' type of model, which ignores important information that word ordering brings. Moreover, it is also worth noting that the Glove word embeddings we're using is trained on generic text data which may not necessarily represent all the important words present in the BigPatents dataset.

One observation I have noticed is that increasing the number of sentences from 3 to 5 or higher degrades the model performance.

In [None]:
# #for debug only
# text = 'this is a ball . there is no text . aa bb ff .'
# text = text.split(' ')
# text = utils.word2sent_tokenizer(text)
# print(text_rank.text_rank_summary(text, word_embeddings)) #after modifying text_rank_summary() to accept word_embeddings
# print()
# text_rank.text_random_summary(text) #after modifying text_rank_summary() to accept word_embeddings


num_words_not_in_glove 0, num_tot_words 14
No. of sents is 3 and the sentence ranking scores are:
 [(1, 0.37631869814956), (0, 0.37317684000212326), (2, 0.25050446184831676)]
there is no text . this is a ball . aa bb ff .



'this is a ball . there is no text . aa bb ff . aa bb ff . there is no text .'

### A.4: Using TF-IDF

In this approach we find the TF-IDF (term frequency-inverse document frequency) score for each word in a sentence inside the text. Then use the mean of these TF-IDF scores to compute the TF-IDF score for the sentence. Select the top 3 sentences (according to their TF-IDF scores) for summary.

https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-real-world-dataset-796d339a4089

In [None]:
import sklearn.feature_extraction.text as sktext

#for debug only
corpus = ['This is the first document.',
          'This document is the second document.',
          'And this is the third one.',
          'Is this the first document?']
vectorizer = sktext.TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
print(X.shape)
print(vectorizer.get_feature_names())
# print(X.toarray().mean(1))
# print(type(X.toarray()))
# print(X.toarray().shape)
print(X.mean(1))

(4, 9)
['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']
[[0.24470366]
 [0.22994858]
 [0.25965077]
 [0.24470366]]


In [None]:
def tf_idf(text):
    res = []
    for sent in text:
        res.append(utils.words2text(sent))
    vectorizer = sktext.TfidfVectorizer()
    X = vectorizer.fit_transform(res)
    # print(X.shape, type(X), X.mean(1).shape)
    arr = np.array(X.mean(1))[:,0]
    idxs = arr.argsort()
    # print(idxs)

    #find the top 3 sentences according to their mean TF-IDF scores
    temp = []
    for i in range(len(res)-3, len(res)):
        temp.append(res[idxs[i]])
    temp = ' '.join(temp)
    return temp

In [None]:
%%timeit -r 1 -n 1
print('TF-IDF Based Text Summarizer')
text_summarizer.Text_Summarizer(algo_type=tf_idf).predict(data_size=500, use_prl=True)

TF-IDF Based Text Summarizer
bleu-4     0.130911
rouge_1    0.307676
rouge_2    0.078908
rouge_l    0.231808
dtype: float64

Here's an example summary...
Original Abstract:
a tamper - resistant remotely monitorable electronic seal including a shaft portion , a socket arranged to engage the shaft position in a monitorable manner , whereby disengagement of the socket and the shaft portion results in a monitorable event , and a wireless communicator associated with at least one of the shaft portion and the socket and being operative to provide a remotely monitorable indication of the monitorable event .
Predicted Abstract:
a pair of elongate conductors --#number#-- and --#number#-- , at least one of which includes a series connected reed switch --#number#-- which is closed by magnet --#number#-- when shaft portion --#number#-- is in lockable engagement with lock --#number#-- , extends through shaft portion --#number#-- through to the tip --#number#-- thereof and is configured and mounted 

### A.5: Using spaCy (based upon word frequency in the text)
https://medium.com/analytics-vidhya/text-summarization-using-spacy-ca4867c6b744

## B. Abstractive Text Summarization Using Supervised Learning:  
There are many different abstractive text summarization approaches in the literature, but all of the state-of-the-art techniques are based upon Deep Learning, such as:
> 1. RNN/LSTM based seq-seq models
> 2. Pointer-Generator Network
> 3. BERT based summarizer

Note that apart from the BERT summarizer, I was not able to find a pre-trained model that I could use to evaluate on the BigPatents dataset. But nonetheless, for all these models we will try to understand how they work.

### B.1: RNN/LSTM Based Seq2Seq Models

This is an encoder-decoder model using some type of recurrence structure (e.g. RNN, GRU, LSTM) for implicitly storing short-term dependencies.



![seq2seq](https://drive.google.com/uc?export=view&id=1CR_e4072BI-ZsgdBcYFZ4sJY89SRjuqD)

Image source & for further details: https://www.analyticsvidhya.com/blog/2019/06/comprehensive-guide-text-summarization-using-deep-learning-python/

The input text is sequentially fed into the encoder. The final output of the decoder layer is a latent vector representing the entire text.

This hidden vector is then fed to initialize the internal state the decoder, which then generates the summary text.

One major drawback of this approach is that is it difficult for the model to learn very long term dependencies, even using an LSTM. This is especially problematic because the documents we would like to summarize then to be very long. To address it, the encoder outputs at each time step are stored such that the decoder layer attends to it using an attention matrix at each time step.

![seq2seq_attn](https://drive.google.com/uc?export=view&id=1wX0sZAIkbbi4LaU_qlDiSTC6ZAVZdUku)

https://www.aclweb.org/anthology/P17-1099/

https://www.aclweb.org/anthology/attachments/P17-1099.Presentation.pdf

Image source and for further details: http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html

As good as attention based seq2seq models are, they still have a couple major problems. 
> 1. Sometimes the summaries will incorrectly reproduce factual details. This is because the vocabulary used for training the model is only composed of words that have sufficient statistics in the training data, i.e. infrequently occuring words in the training data are not included in the vocabulary as there is limited information in the dataset for the model to "understand" these words. Hence, the decoder cannot generate such words.  Here is a example describing this issue. Suppose we have the following sentence in our text: 'Germany emerge victorious in 2-0 win against Argentina on Saturday.' The token 2-0 is not going to be present in the vocabulary, but it's important enough that it should be present in the summary. Basically, the problem is that the network is not able to directly copy text from the source, it can only generate new text.

> 2. The other problem is that summaries sometimes repeat themselves (e.g. Germany beat Germany beat Germany...). This is likely due to the decoder's over reliance on the previous word (because in the above architecture the decoder does not attend to all the previously generated words by the decoder, unlike what it does with the encoder outputs). 


### B.2: Pointer Generator Network

The pointer-generator network addresses these challenges as follows:  

> 1. It replaces the decoder with a hybrid network that uses a pointer network to directly copy text from the source sentence in addition to a generator network (i.e. standard decoder) to generate text from a fixed set of vocabulary. So in a way it combines extractive and abstractive text summarization.

> 2. It eliminates the problem of summary repetition by using a technique called coverage where they use attention distribution to keep track of words from the source sentence that have already been tracked and penalize the model for attending to the same part of the source sentence again.

![pointer_gen](https://drive.google.com/uc?export=view&id=1EtgNwtIX2NhgL0zqMaNUwdzGNl1TUKTW)

Image source & for further details: http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html

In [None]:
# import os
# os.chdir('/content')
# !git clone https://github.com/abisee/pointer-generator.git
# !ls pointer-generator/

attention_decoder.py  data.py	   inspect_checkpoint.py  README.md
batcher.py	      decode.py    LICENSE.txt		  run_summarization.py
beam_search.py	      __init__.py  model.py		  util.py


### B.3: BERT Based Summarizer

This approach uses the Transformers based BERT (Bi-directional Encoder Representations from Transformers) architecture for sequence to sequence modelling. For many NLP tasks, using a pre-trained BERT model tends to perform much better than other architectures. So this is applying that same idea to text summarization.

Details can be found at:  
> https://pypi.org/project/bert-extractive-summarizer/  
> https://arxiv.org/pdf/1906.04165.pdf  
> https://github.com/dmmiller612/bert-extractive-summarizer


In [None]:
#Install bert-extractive-summarizer
!pip install bert-extractive-summarizer
from summarizer import Summarizer

In [None]:
def BERT_Model():
    #this does not work with multiprocessing module
    model = Summarizer(model='bert-large-uncased')
    def model_helper(body, ratio=0.02, max_length=300):
        result = model(body, ratio=ratio, max_length=max_length)
        return result
    return model_helper

# model = Summarizer() 
bert_model = BERT_Model()

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=434.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1344997306.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




In [None]:
%%timeit -r 1 -n 1model = Summarizer()
# body = 'The Chrysler Building, the famous art deco New York skyscraper, will be sold for a small fraction of its previous sales price. The deal, first reported by The Real Deal, was for $150 million, according to a source familiar with the deal.'
# model(body, min_length=10, use_first=False) #for shorter sentences, it just copies them
text_summarizer.Text_Summarizer(word_embeddings=word_embeddings, algo_type=bert_model, use_sent_tokenization=False).predict(use_prl=False, data_size=20)

bleu-4     0.124808
rouge_1    0.297681
rouge_2    0.081295
rouge_l    0.240064
dtype: float64

Here's an example summary...
Original Abstract:
a tamper - resistant remotely monitorable electronic seal including a shaft portion , a socket arranged to engage the shaft position in a monitorable manner , whereby disengagement of the socket and the shaft portion results in a monitorable event , and a wireless communicator associated with at least one of the shaft portion and the socket and being operative to provide a remotely monitorable indication of the monitorable event .
Predicted Abstract:
reference is now made to --oov-- a and --oov-- , which are simplified pictorial illustrations of two stages in the assembly of a press - fit electronic seal constructed and operative in accordance with a preferred embodiment of the present invention . in accordance with a preferred embodiment of the present invention , sensing circuitry --#number#-- and an rf transceiver --#number#-- are housed wit

In [None]:
# model = Summarizer()
# body = 'The Chrysler Building, the famous art deco New York skyscraper, will be sold for a small fraction of its previous sales price. The deal, first reported by The Real Deal, was for $150 million, according to a source familiar with the deal.'
# model(body, min_length=10, use_first=False) #for shorter sentences, it just copies them

## C. Reinforcement Learning + Supervised Learning Based Text Summarization
> 1. Fast Abstractive Summarization with Reinforce-Selected Sentence Re-writing
> 2. A Deep Reinforced Model for Abstractive Summarization


### C.1: Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

This method is inspried by how humans summarize long documents whereby it first extracts important sentences from the text using an extractor network and then rewrites and compresses them using an abstractor network. One challenge in training here is the non-differentiable computation between the extractor and abstractor networks. And so training end-to-end with a gradient based method is not possible. Hence, what this approach does is use a sentence level policy gradient method instead to hierarchically train the two networks.


![RL_Sentwriting_1](https://drive.google.com/uc?export=view&id=1yqvcd3K5TteLWkHaiYYXFDGEix2X2duB)

![RL_Sentwriting_1](https://drive.google.com/uc?export=view&id=12Hs983c_NdsL-bPfu5-8bxnbyKlatuvR)

Image source & further details: https://arxiv.org/pdf/1805.11080.pdf

### C.2: A Deep Reinforced Model for Abstractive Summarization

As we had discussed above, standard attention based seq2seq models suffer from generating repetitive words because they only attend to the encoder hidden states and not that of the previously generated decoder hidden states. This paper solves this problem by attending over the decoder hidden states as well.

Furthermore, in addition to the standard supervised training procedure, it uses reinforcement learning. This supposedly allows for the model to generalize better by addressing the "exposure bias" (where it assumes ground truth is provided at each time step) where they combine standard word prediction with reinforcement learning based global sequence prediction to allow for better generalization and readablity of the generated text.

![rl_abstractive_gen](https://drive.google.com/uc?export=view&id=16_A5MBQNCGp7fDG7ltx384MJDpuCt8Ks)

Image source & further details: https://arxiv.org/pdf/1705.04304.pdf

## Model Evaluation Metrics Playground
As an overview, there are three different metrics that are commonly used for text summmarization, namely BLEU, ROUGE, and METEOR. Each of these metrics can give us a precision, recall, and f1 score. Additionally, BLEU-4 does a weighted average of 1-,2-,3-,and 4-gram BLEU scores where as ROUGE-1 uses 1-gram and ROUGE-2 uses 2-gram, and ROUGE-l uses the longest common subsequence. 

Human text summarization is subjective, and so using a fixed metric to measure it will not be perfect. It is important to note that both Rouge and Bleu metrics just compare the generated text to a reference text, and this can cause a couple problems:
1. Intolerant to paraphrasing. Even a well paraphrased version of the reference text will lead to a low Rouge/Bleu score.  For example, if we replace a word with its synonym, the Rouge score will decrease. This is because Rouge measures syntactical matching as opposed to semantical matching between the reference and predicted summaries.
2. They tend to reward extractive summaries more than abstractive summaries even if a human would evaluate both summaries as equally good. It is widely observed that just selecting the first 3 sentences from a text will pretty good Rouge scores and in many cases even better than published models. Even though when a human compares the summaries, their ratings are different. See slides 25-26 of https://www.aclweb.org/anthology/P17-1099/.

References:  
https://towardsdatascience.com/the-ultimate-performance-metric-in-nlp-111df6c64460

https://www.freecodecamp.org/news/what-is-rouge-and-how-it-works-for-evaluation-of-summaries-e059fb8ac840/

https://towardsdatascience.com/evaluating-text-output-in-nlp-bleu-at-your-own-risk-e8609665a213

https://towardsdatascience.com/automatic-text-summarization-evaluation-2e312f66893b

In [None]:
pred = 'for example , the idler wheel may actually be a flat surface that functions to keep the hose in frictional engagement with the puller wheels --#number#-- . for example , all three puller wheels can be positioned in alignment to increase the amount of bend in the hose as it passes over each wheel . although a parallel positioning of puller wheels --#number#-- is disclosed , it is contemplated that the space between the wheels may be adjusted to create a deeper groove . in such a configuration the wheels are positioned to redirected the hose as it passes over each pair of wheels --#number#-- . it is further contemplated that the guide arms may be configured with rollers to reduce the friction between the hose and the forward guide arms --#number#-- . although the monitor is shown mounted to the back of cleaning truck --#number#-- , it is understood that the monitor may also be located in the cab --#number#-- . to the extent additional gripping is needed , a weight can be applied to the arm supporting the upper puller wheel --#number#-- . for example , in the configuration shown , the hose is realigned from an orientation that is parallel to the surface to one that is perpendicular to the surface . the forward guide arms --#number#-- are shown as two separate extensions , which allows the hose to be easily fed into the gripper wheels --#number#-- . a configuration with multiple sets of puller wheels is particularly adapted for straight line pulling applications where the hose direction is not changed as it passes through the hose puller --#number#-- .'
ref = "a hose puller that includes puller wheels that are motorized and configured to grip , among other things , a high pressure water hose . the hose puller further includes an idler wheel that is positioned to oppose the puller wheels . the idler wheel is spring loaded to help ensure that the hose maintains frictional relation with the puller wheels . alternatively , the hose puller may have puller wheels shaped to grip a high pressure water hose . the hose puller also includes a camera that is configured to show images that enable the operator to control the hose puller from a remote location ."

### Rouge
https://kavita-ganesan.com/what-is-rouge-and-how-it-works-for-evaluation-of-summaries/#.YFF4RftKhrR

In [None]:
!pip install rouge
!pip install rouge-score

Collecting rouge-score
  Downloading https://files.pythonhosted.org/packages/1f/56/a81022436c08b9405a5247b71635394d44fe7e1dbedc4b28c740e09c2840/rouge_score-0.0.4-py2.py3-none-any.whl
Installing collected packages: rouge-score
Successfully installed rouge-score-0.0.4


In [None]:
# #for debug 1
# ref = 'a hose puller that includes puller wheels that are motorized and configured to grip , among other things , a high pressure water hose . the hose puller further includes an idler wheel that is positioned to oppose the puller wheels . the idler wheel is spring loaded to help ensure that the hose maintains frictional relation with the puller wheels . alternatively , the hose puller may have puller wheels shaped to grip a high pressure water hose . the hose puller also includes a camera that is configured to show images that enable the operator to control the hose puller from a remote location .'
# pred = 'for example , the idler wheel may actually be a flat surface that functions to keep the hose in frictional engagement with the puller wheels --#number#-- . for example , all three puller wheels can be positioned in alignment to increase the amount of bend in the hose as it passes over each wheel . although a parallel positioning of puller wheels --#number#-- is disclosed , it is contemplated that the space between the wheels may be adjusted to create a deeper groove . in such a configuration the wheels are positioned to redirected the hose as it passes over each pair of wheels --#number#-- . it is further contemplated that the guide arms may be configured with rollers to reduce the friction between the hose and the forward guide arms --#number#-- . although the monitor is shown mounted to the back of cleaning truck --#number#-- , it is understood that the monitor may also be located in the cab --#number#-- . to the extent additional gripping is needed , a weight can be applied to the arm supporting the upper puller wheel --#number#-- . for example , in the configuration shown , the hose is realigned from an orientation that is parallel to the surface to one that is perpendicular to the surface . the forward guide arms --#number#-- are shown as two separate extensions , which allows the hose to be easily fed into the gripper wheels --#number#-- . a configuration with multiple sets of puller wheels is particularly adapted for straight line pulling applications where the hose direction is not changed as it passes through the hose puller --#number#-- .'

# # ref = 'a hose puller that includes puller wheels that are motorized and configured'
# # pred = 'for example , the idler wheel may actually be a flat surface that functions'

# # print(sentence_bleu([ref], pred, weights=[0,1,0,0]))
# print(sentence_bleu([ref.split(' ')], pred.split(' '))) #, weights=[1,0,0,0]))
# rouge_evaluator = Rouge()
# print(rouge_evaluator.get_scores(ref, pred))

In [None]:
# # #for debug 2
# text = ['upon', 'review', 'of', 'the', 'detailed', 'description', 'and', 'the', 'accompanying', 'drawings', 'provided', 'herein', ',', 'it', 'will', 'be', 'apparent', 'to', 'one', 'of', 'ordinary', 'skill', 'in', 'the', 'art', 'that', 'a', 'portable', 'hose', 'puller', 'may', 'be', 'used', 'in', 'a', 'wide', 'array', 'of', 'applications', 'that', 'require', 'maneuvering', 'of', 'hoses', 'or', 'the', 'like', '.', 'accordingly', ',', 'the', 'present', 'invention', 'shall', 'not', 'be', 'limited', 'to', 'the', 'structures', 'and', 'methods', 'specifically', 'described', 'and', 'illustrated', 'herein', ',', 'although', 'the', 'following', 'description', 'is', 'particularly', 'directed', 'to', 'a', 'portable', 'hose', 'puller', 'for', 'use', 'in', 'sewer', 'cleaning', 'operations', '.', 'the', 'term', 'hose', 'with', 'which', 'the', 'present', 'invention', 'is', 'associated', ',', 'includes', 'various', 'types', 'of', 'hoses', ',', 'tubes', ',', 'ropes', ',', 'cables', ',', 'chains', ',', 'and', 'the', 'like', '.', 'the', 'term', 'portable', 'with', 'which', 'the', 'present', 'invention', 'is', 'associated', 'describes', 'an', 'apparatus', 'sized', 'to', 'be', 'moved', 'by', 'one', 'person', '.', 'further', ',', 'the', 'hose', 'puller', 'is', 'light', 'enough', 'that', 'it', 'does', 'not', 'damage', 'soft', 'ground', 'while', 'being', 'positioned', '.', 'portability', 'makes', 'the', 'disclosed', 'apparatus', 'uniquely', 'suited', 'to', 'be', 'positioned', 'near', 'a', 'work', 'site', '.', 'however', ',', 'it', 'is', 'contemplated', 'that', 'the', 'disclosed', 'apparatus', 'may', 'be', 'scaled', 'for', 'a', 'particular', 'application', '.', 'for', 'example', ',', 'in', 'large', 'cable', 'laying', 'applications', ',', 'the', 'disclosed', 'devise', 'may', 'be', 'scaled', 'to', 'handle', 'the', 'increased', 'loads', 'associated', 'with', 'such', 'applications', '.', '--oov--', 'shows', 'one', 'aspect', 'of', 'portable', 'hose', 'puller', '--#number#--', '.', 'the', 'hose', 'puller', 'includes', 'a', 'frame', '--#number#--', ',', 'which', 'may', 'be', 'of', 'metal', ',', 'aluminum', ',', 'plastic', ',', 'or', 'combinations', 'thereof', '.', 'the', 'metal', 'frame', 'is', 'configured', 'with', 'handles', '--#number#--', 'and', 'wheels', '--#number#--', 'to', 'allow', 'for', 'easy', 'mobility', '.', 'handles', '--#number#--', 'may', 'be', 'telescoping', 'to', 'provide', 'greater', 'leverage', 'when', 'moving', 'the', 'portable', 'hose', 'puller', '.', 'frame', '--#number#--', 'is', 'also', 'configured', 'with', 'stand', 'arms', '--#number#--', '.', 'the', 'lower', 'portion', 'of', 'stand', 'arms', '--#number#--', 'include', 'a', 'gripping', 'shape', '--#number#--', '.', 'for', 'grass', 'and', 'other', 'soft', 'surfaces', ',', 'the', 'gripping', 'shape', 'may', 'be', 'shovel', 'shaped', 'to', 'dig', 'into', 'soft', 'surfaces', '.', 'however', ',', 'it', 'is', 'readily', 'understood', 'that', 'many', 'different', 'shapes', 'may', 'be', 'used', 'for', 'different', 'applications', '.', 'for', 'example', ',', 'it', 'is', 'contemplated', 'that', 'rubber', 'stoppers', 'may', 'also', 'be', 'used', 'in', 'some', 'applications', '.', 'the', 'essential', 'characteristic', 'of', 'all', 'gripping', 'shapes', ',', 'however', ',', 'is', 'that', 'they', 'inhibit', 'the', 'movement', 'of', 'the', 'hose', 'puller', '--#number#--', 'when', 'it', 'is', 'in', 'use', '.', '--oov--', 'shows', 'a', 'detailed', 'view', 'of', 'the', 'gripping', 'shape', '--#number#--', 'that', 'is', 'shaped', 'to', 'rest', 'inside', 'a', 'manhole', 'opening', '.', 'also', 'included', 'on', 'stand', 'arms', '--#number#--', 'are', 'forward', 'guide', 'arms', '--#number#--', '.', 'the', 'forward', 'guide', 'arms', '--#number#--', 'are', 'sized', 'to', 'keep', 'the', 'hose', 'in', 'guided', 'relation', 'with', 'the', 'gripper', 'wheels', '--#number#--', '.', 'the', 'forward', 'guide', 'arms', '--#number#--', 'are', 'shown', 'as', 'two', 'separate', 'extensions', ',', 'which', 'allows', 'the', 'hose', 'to', 'be', 'easily', 'fed', 'into', 'the', 'gripper', 'wheels', '--#number#--', '.', 'however', ',', 'it', 'is', 'contemplated', 'that', 'the', 'arms', 'may', 'be', 'connected', 'to', 'enclose', 'the', 'area', 'in', 'which', 'the', 'hose', 'is', 'located', '.', 'it', 'is', 'further', 'contemplated', 'that', 'the', 'guide', 'arms', 'may', 'be', 'configured', 'with', 'rollers', 'to', 'reduce', 'the', 'friction', 'between', 'the', 'hose', 'and', 'the', 'forward', 'guide', 'arms', '--#number#--', '.', 'alternatively', ',', 'the', 'forward', 'guide', 'arms', '--#number#--', 'may', 'include', 'a', 'material', ',', 'such', 'as', 'teflon', ',', 'to', 'reduce', 'the', 'friction', 'between', 'the', 'hose', 'and', 'the', 'forward', 'guide', 'arms', '.', 'the', 'guide', 'arms', 'are', 'shown', 'attached', 'to', 'stand', 'arms', '--#number#--', '.', 'however', ',', 'it', 'is', 'readily', 'understood', 'that', 'the', 'guide', 'arms', 'may', 'extend', 'from', 'handles', '--#number#--', ',', 'extend', 'from', 'stand', 'arms', '--#number#--', 'to', 'handles', '--#number#--', ',', 'extend', 'from', 'some', 'other', 'frame', 'element', ',', 'or', 'any', 'combination', 'thereof', '.', 'attached', 'to', 'the', 'hose', 'puller', 'frame', '--#number#--', 'are', 'puller', 'wheels', '--#number#--', '.', 'the', 'puller', 'wheels', '--#number#--', 'are', 'made', 'from', 'a', 'soft', 'material', 'such', 'as', 'rubber', '.', 'although', 'rubber', 'is', 'disclosed', ',', 'one', 'skilled', 'in', 'the', 'art', 'understands', 'that', 'any', 'soft', 'compound', 'may', 'be', 'used', '.', 'additionally', ',', 'the', 'puller', 'wheels', '--#number#--', 'may', 'be', 'air', 'filled', '.', 'the', 'puller', 'wheels', '--#number#--', 'are', 'positioned', 'to', 'create', 'a', 'friction', 'groove', '--#number#--', 'between', 'the', 'wheels', '.', '--oov--', 'shows', 'a', 'front', 'view', 'of', 'the', 'hose', 'puller', 'to', 'show', 'the', 'friction', 'groove', '--#number#--', '.', 'the', 'puller', 'wheels', '--#number#--', 'are', 'shown', 'positioned', 'side', 'by', 'side', 'in', 'a', 'parallel', 'configuration', '.', 'in', 'such', 'a', 'configuration', ',', 'the', 'curvature', 'of', 'the', 'wheels', 'form', 'the', 'side', 'walls', 'of', 'the', 'friction', 'groove', '.', 'although', 'a', 'parallel', 'positioning', 'of', 'puller', 'wheels', '--#number#--', 'is', 'disclosed', ',', 'it', 'is', 'contemplated', 'that', 'the', 'space', 'between', 'the', 'wheels', 'may', 'be', 'adjusted', 'to', 'create', 'a', 'deeper', 'groove', '.', 'it', 'is', 'also', 'contemplated', 'that', 'the', 'angle', 'between', 'the', 'wheels', 'may', 'be', 'adjusted', 'to', 'change', 'the', 'depth', 'of', 'the', 'friction', 'groove', '--#number#--', '.', 'puller', 'wheels', '--#number#--', 'are', 'connected', 'to', 'drive', 'motor', '--#number#--', '.', 'the', 'drive', 'motor', '--#number#--', 'rotates', 'the', 'puller', 'wheels', '--#number#--', 'when', 'power', 'is', 'applied', '.', 'alternatively', ',', 'the', 'frictional', 'groove', 'can', 'be', 'created', 'by', 'a', 'single', 'wheel', '--#number#--', '.', '--oov--', 'shows', 'a', 'wheel', 'shaped', 'for', 'a', 'frictional', 'groove', '.', 'the', 'shaped', 'wheel', '--#number#--', 'may', 'be', 'made', 'out', 'of', 'any', 'suitable', 'material', '.', 'the', 'wheel', 'shown', 'in', '--oov--', 'is', 'made', 'out', 'of', 'aluminum', '.', 'the', 'puller', 'wheels', '--#number#--', 'are', 'positioned', 'relative', 'to', 'the', 'man', 'hole', 'such', 'that', 'the', 'weight', 'of', 'the', 'hose', 'pulls', 'the', 'hose', 'into', 'greater', 'frictional', 'engagement', 'with', 'the', 'puller', 'wheels', '--#number#--', '.', 'attached', 'to', 'the', 'hose', 'puller', 'frame', '--#number#--', 'is', 'an', 'idler', 'wheel', '--#number#--', 'and', 'idler', 'wheel', 'frame', '--#number#--', '.', 'the', 'idler', 'wheel', 'is', 'configured', 'to', 'ensure', 'that', 'the', 'hose', 'being', 'manipulated', 'by', 'the', 'hose', 'puller', 'is', 'maintained', 'in', 'frictional', 'engagement', 'with', 'the', 'frictional', 'groove', '--#number#--', '.', 'like', 'the', 'puller', 'wheels', '--#number#--', ',', 'the', 'idler', 'wheel', 'is', 'made', 'out', 'of', 'a', 'soft', 'material', 'such', 'as', 'rubber', 'or', 'the', 'like', '.', 'the', 'idler', 'wheel', 'may', 'also', 'be', 'filled', 'with', 'air', '.', 'although', 'the', 'idler', 'wheel', '--#number#--', 'is', 'shown', 'a', 'different', 'size', 'than', 'the', 'puller', 'wheels', '--#number#--', ',', 'it', 'is', 'understood', 'that', 'the', 'idler', 'wheel', 'may', 'be', 'sized', 'to', 'suit', 'a', 'particular', 'purpose', '.', 'additionally', ',', 'the', 'idler', 'wheel', 'may', 'be', 'any', 'number', 'of', 'different', 'shapes', '.', 'for', 'example', ',', 'the', 'idler', 'wheel', 'may', 'actually', 'be', 'a', 'flat', 'surface', 'that', 'functions', 'to', 'keep', 'the', 'hose', 'in', 'frictional', 'engagement', 'with', 'the', 'puller', 'wheels', '--#number#--', '.', 'alternatively', ',', 'the', 'idler', 'wheel', '--#number#--', 'may', 'be', 'shaped', 'to', 'complement', 'the', 'puller', 'wheel', '--#number#--', 'shown', 'in', '--oov--', '.', 'the', 'disclosed', 'hose', 'puller', 'is', 'adapted', 'to', 'take', 'advantage', 'of', 'the', 'frictional', 'force', 'associated', 'with', 'redirecting', 'a', 'hose', 'as', 'it', 'is', 'being', 'manipulated', '.', 'for', 'example', ',', 'in', 'the', 'configuration', 'shown', ',', 'the', 'hose', 'is', 'realigned', 'from', 'an', 'orientation', 'that', 'is', 'parallel', 'to', 'the', 'surface', 'to', 'one', 'that', 'is', 'perpendicular', 'to', 'the', 'surface', '.', 'such', 'realignment', 'naturally', 'seats', 'the', 'hose', 'in', 'the', 'frictional', 'groove', '.', 'however', ',', 'in', 'other', 'applications', 'or', 'in', 'applications', 'requiring', 'greater', 'frictional', 'force', ',', 'the', 'idler', 'wheel', 'frame', 'may', 'be', 'adapted', 'to', 'provide', 'additional', 'force', 'to', 'help', 'seat', 'the', 'hose', 'in', 'the', 'frictional', 'groove', '.', 'additionally', ',', 'the', 'hose', 'puller', 'may', 'be', 'configured', 'with', 'multiple', 'wheels', '--#number#--', '.', 'in', 'such', 'a', 'configuration', 'the', 'wheels', 'are', 'positioned', 'to', 'redirected', 'the', 'hose', 'as', 'it', 'passes', 'over', 'each', 'pair', 'of', 'wheels', '--#number#--', '.', 'redirecting', 'the', 'hose', 'acts', 'to', 'increases', 'the', 'gripping', 'friction', 'provided', 'by', 'the', 'gripping', 'groove', '.', 'a', 'configuration', 'with', 'multiple', 'sets', 'of', 'puller', 'wheels', 'is', 'particularly', 'adapted', 'for', 'straight', 'line', 'pulling', 'applications', 'where', 'the', 'hose', 'direction', 'is', 'not', 'changed', 'as', 'it', 'passes', 'through', 'the', 'hose', 'puller', '--#number#--', '.', 'one', 'skilled', 'in', 'the', 'art', 'understands', 'that', 'the', 'relationship', 'between', 'the', 'puller', 'wheels', '--#number#--', 'can', 'be', 'changed', 'to', 'further', 'increase', 'the', 'frictional', 'forces', '.', 'for', 'example', ',', 'all', 'three', 'puller', 'wheels', 'can', 'be', 'positioned', 'in', 'alignment', 'to', 'increase', 'the', 'amount', 'of', 'bend', 'in', 'the', 'hose', 'as', 'it', 'passes', 'over', 'each', 'wheel', '.', 'the', 'idler', 'wheel', '--#number#--', 'shown', 'in', '--oov--', 'is', 'attached', 'to', 'the', 'idler', 'wheel', 'frame', '--#number#--', '.', 'the', 'idler', 'wheel', 'frame', '--#number#--', 'may', 'be', 'selectively', 'positionable', 'or', 'configured', 'to', 'apply', 'rotational', 'force', 'such', 'that', 'the', 'idler', 'wheel', '--#number#--', 'applies', 'pressure', 'to', 'the', 'puller', 'wheels', '--#number#--', '.', 'the', 'rotational', 'force', 'may', 'be', 'the', 'result', 'of', 'a', 'spring', 'or', 'may', 'be', 'driven', 'by', 'some', 'other', 'means', ',', 'such', 'as', 'pneumatically', '.', 'further', ',', 'the', 'spring', 'tension', 'can', 'be', 'adjusted', 'using', 'spring', 'handle', '--#number#--', '.', 'the', 'puller', 'frame', '--#number#--', 'includes', 'aft', 'guide', 'arms', '--#number#--', '.', 'the', 'aft', 'guide', 'arms', 'function', 'similarly', 'to', 'the', 'forward', 'guide', 'arms', '--#number#--', 'and', 'may', 'be', 'similarly', 'shaped', 'and', 'configured', '.', 'the', 'hose', 'puller', '--#number#--', 'may', 'be', 'controlled', 'using', 'control', 'panel', '--#number#--', 'or', 'by', 'remote', 'control', '(', 'not', 'shown', ')', '.', 'the', 'hose', 'puller', '--#number#--', 'may', 'also', 'be', 'configured', 'with', 'a', 'camera', '--#number#--', '.', 'the', 'camera', 'is', 'positioned', 'to', 'capture', 'images', 'of', 'the', 'hose', 'as', 'it', 'is', 'feed', 'into', 'or', 'retrieved', 'from', 'a', 'sewer', 'line', '.', 'the', 'camera', 'may', 'also', 'be', 'trained', 'on', 'the', 'hose', 'puller', 'or', 'any', 'other', 'aspect', 'of', 'interest', '.', 'the', 'hose', 'puller', 'may', 'also', 'be', 'configured', 'to', 'view', 'counter', '--#number#--', '.', 'the', 'counter', '--#number#--', 'records', 'the', 'amount', 'of', 'hose', 'that', 'passes', 'over', 'wheel', '--#number#--', '.', 'this', 'information', 'is', 'used', 'by', 'the', 'operator', 'to', 'control', 'how', 'far', 'the', 'cleaning', 'nozzle', 'is', 'inserted', 'into', 'the', 'sewer', 'line', '.', 'in', 'a', 'normal', 'operation', ',', 'once', 'the', 'length', 'is', 'established', 'by', 'visual', 'inspection', 'at', 'the', 'downhole', 'manhole', ',', 'the', 'cleaning', 'nozzle', 'can', 'then', 'make', 'multiple', 'passes', 'through', 'the', 'sewer', 'line', 'without', 'additional', 'visual', 'inspections', '.', '--oov--', 'depicts', 'an', 'alternative', 'configuration', 'in', 'which', 'the', 'hose', 'puller', '--#number#--', 'is', 'configured', 'with', 'two', 'puller', 'wheels', '--#number#--', '.', 'both', 'puller', 'wheels', '--#number#--', 'are', 'connected', 'with', 'chain', '--#number#--', 'to', 'drive', 'motor', '--#number#--', 'and', 'drive', 'motor', 'sprocket', '--#number#--', '.', 'the', 'hose', 'puller', '--#number#--', 'also', 'includes', 'a', 'tensioning', 'wheel', '--#number#--', '.', 'the', 'tensioning', 'wheel', 'is', 'designed', 'to', 'regulate', 'the', 'chain', 'tension', '.', 'the', 'tensioning', 'wheel', 'may', 'be', 'a', 'wheel', ',', 'sprocket', ',', 'or', 'the', 'like', '.', 'the', 'tension', 'may', 'be', 'set', 'manually', 'or', 'adjusted', 'by', 'way', 'of', 'a', 'spring', '.', 'the', 'hose', 'puller', 'is', 'hinged', 'at', 'point', '--#number#--', 'such', 'that', 'different', 'size', 'hoses', 'can', 'be', 'easily', 'inserted', 'into', 'the', 'hose', 'puller', '.', 'to', 'the', 'extent', 'additional', 'gripping', 'is', 'needed', ',', 'a', 'weight', 'can', 'be', 'applied', 'to', 'the', 'arm', 'supporting', 'the', 'upper', 'puller', 'wheel', '--#number#--', '.', 'optimally', ',', 'if', 'a', 'weight', 'is', 'needed', ',', 'it', 'is', 'applied', 'to', 'the', 'upper', 'arm', 'at', 'end', '--#number#--', '.', 'the', 'hose', 'puller', 'configured', 'as', 'shown', 'in', '--oov--', 'includes', 'a', 'camera', 'and', 'control', 'box', '.', 'further', ',', 'the', 'hose', 'puller', 'of', '--oov--', 'is', 'configured', 'to', 'be', 'operated', 'remotely', '.', 'puller', 'wheels', '--#number#--', 'may', 'be', 'made', 'out', 'of', 'a', 'hard', 'rubber', 'or', 'other', 'solid', 'material', 'that', 'is', 'also', 'suited', 'for', 'gripping', 'a', 'hose', '.', '--oov--', 'shows', 'the', 'hose', 'puller', 'positioned', 'over', 'a', 'manhole', '.', 'the', 'hose', 'puller', '--#number#--', 'is', 'shown', 'as', 'it', 'is', 'feeding', 'a', 'hose', 'into', 'a', 'manhole', 'for', 'cleaning', 'head', '--#number#--', '.', 'the', 'hose', 'puller', 'is', 'shown', 'connected', 'to', 'cleaning', 'truck', '--#number#--', '.', 'the', 'cleaning', 'truck', 'supplies', 'high', 'pressure', 'water', 'to', 'the', 'cleaning', 'head', '--#number#--', '.', 'although', 'the', 'cleaning', 'truck', 'is', 'shown', 'as', 'the', 'source', 'of', 'the', 'water', 'used', 'by', 'cleaning', 'head', '--#number#--', ',', 'it', 'is', 'understood', 'that', 'the', 'cleaning', 'truck', '--#number#--', 'may', 'be', 'connected', 'to', 'a', 'fire', 'hydrant', 'or', 'other', 'similar', 'water', 'source', '.', 'dashed', 'line', '--#number#--', 'shows', 'a', 'connection', 'between', 'the', 'cleaning', 'truck', '--#number#--', 'and', 'camera', '.', 'images', 'from', 'the', 'video', 'camera', '--#number#--', 'are', 'displayed', 'on', 'monitor', '--#number#--', '.', 'although', 'the', 'monitor', 'is', 'shown', 'mounted', 'to', 'the', 'back', 'of', 'cleaning', 'truck', '--#number#--', ',', 'it', 'is', 'understood', 'that', 'the', 'monitor', 'may', 'also', 'be', 'located', 'in', 'the', 'cab', '--#number#--', '.', 'additionally', ',', '--oov--', 'shows', 'the', 'cleaning', 'truck', '--#number#--', 'being', 'located', 'in', 'close', 'proximity', 'to', 'the', 'hose', 'puller', '--#number#--', '.', 'in', 'reality', ',', 'the', 'cleaning', 'truck', '--#number#--', 'is', 'positioned', 'much', 'further', 'away', 'from', 'the', 'manhole', '.', 'the', 'hose', 'puller', 'engine', 'may', 'be', 'gas', 'powered', 'or', 'connected', 'via', 'a', 'power', 'line', '(', 'not', 'shown', ')', 'to', 'the', 'cleaning', 'truck', '--#number#--', '.', 'additionally', ',', 'the', 'hose', 'puller', 'is', 'not', 'show', 'to', 'scale', '.', 'in', 'particular', ',', 'the', 'hose', 'puller', 'is', 'not', 'scaled', 'relative', 'to', 'cleaning', 'truck', '--#number#--', '.', 'in', 'reality', ',', 'the', 'hose', 'puller', 'is', 'much', 'smaller', 'relative', 'to', 'the', 'cleaning', 'truck', '.', 'the', 'present', 'invention', 'is', ',', 'therefore', ',', 'well', 'adapted', 'to', 'carry', 'out', 'the', 'objects', 'and', 'attain', 'the', 'ends', 'and', 'the', 'advantages', 'mentioned', ',', 'as', 'well', 'as', 'others', 'inherent', 'therein', '.', 'while', 'presently', 'preferred', 'embodiments', 'have', 'been', 'described', ',', 'numerous', 'changes', 'to', 'the', 'details', 'of', 'construction', ',', 'arrangement', 'of', 'the', 'article', '--oov--', '--#number#--', ';', 's', 'parts', 'or', 'components', ',', 'and', 'the', 'steps', 'to', 'the', 'processes', 'may', 'be', 'made', '.', 'for', 'example', ',', 'the', 'frame', 'may', 'be', 'reconfigured', 'in', 'a', 'number', 'of', 'different', 'ways', '.', 'however', ',', 'all', 'such', 'configurations', 'allow', 'for', 'the', 'frictional', 'groove', 'to', 'provide', 'the', 'primary', 'means', 'whereby', 'the', 'hose', 'puller', 'manipulates', 'hoses', '.', 'such', 'changes', 'will', 'readily', 'suggest', 'themselves', 'of', 'those', 'skilled', 'in', 'the', 'art', 'and', 'are', 'encompassed', 'within', 'the', 'spirit', 'of', 'invention', 'and', 'in', 'the', 'scope', 'of', 'the', 'appended', 'claims', '.', 'although', 'the', 'present', 'invention', 'and', 'its', 'advantages', 'have', 'been', 'described', 'in', 'detail', ',', 'it', 'should', 'be', 'understood', 'that', 'various', 'changes', ',', 'substitutions', 'and', 'alterations', 'can', 'be', 'made', 'herein', 'without', 'departing', 'from', 'the', 'invention', 'as', 'defined', 'by', 'the', 'appended', 'claims', '.', 'moreover', ',', 'the', 'scope', 'of', 'the', 'present', 'application', 'is', 'not', 'intended', 'to', 'be', 'limited', 'to', 'the', 'particular', 'embodiments', 'of', 'the', 'machine', ',', 'methods', 'and', 'steps', 'described', 'in', 'the', 'specification', '.', 'as', 'one', 'will', 'readily', 'appreciate', 'from', 'the', 'disclosure', ',', 'machines', ',', 'methods', ',', 'and', 'steps', ',', 'presently', 'existing', 'or', 'later', 'to', 'be', 'developed', 'that', 'perform', 'substantially', 'the', 'same', 'function', 'or', 'achieve', 'substantially', 'the', 'same', 'result', 'as', 'the', 'corresponding', 'embodiments', 'described', 'herein', 'may', 'be', 'utilized', '.', 'accordingly', ',', 'the', 'appended', 'claims', 'are', 'intended', 'to', 'include', 'within', 'their', 'scope', 'such', 'processes', ',', 'machines', ',', 'manufacture', ',', 'compositions', 'of', 'matter', ',', 'means', ',', 'methods', ',', 'or', 'steps', '.']
# text = word2sent_tokenizer(text)
# res = text_rank_summary(text)

In [None]:
# https://pypi.org/project/rouge/
from rouge import Rouge
rouge_evaluator = Rouge()
ref = 'this is a good text adc. this is sent 2.'
pred = 'this is a good text. this is sent 2'
rouge_score = rouge_evaluator.get_scores(pred, ref) #pred is first argument (see above documentation)
# rouge_score = rouge_evaluator.get_scores([pred], [ref])
rouge_score #[0]['rouge-1']['f']

[{'rouge-1': {'f': 0.9473684160664821, 'p': 1.0, 'r': 0.9},
  'rouge-2': {'f': 0.823529406782007, 'p': 0.875, 'r': 0.7777777777777778},
  'rouge-l': {'f': 0.9333333283555556, 'p': 1.0, 'r': 0.875}}]

In [None]:
# https://pypi.org/project/rouge-score/
from rouge_score import rouge_scorer
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], 
                                          use_stemmer=True)
ref = 'this is a good text adc. this is sent 2.'
pred = 'this is a good text. this is sent 2'
rouge_score = scorer.score(ref, pred) #pred is second argument (see above documentation)
rouge_score #[0]['rouge-1']['f']

{'rouge1': Score(precision=1.0, recall=0.9, fmeasure=0.9473684210526316),
 'rouge2': Score(precision=0.875, recall=0.7777777777777778, fmeasure=0.823529411764706),
 'rougeL': Score(precision=1.0, recall=0.9, fmeasure=0.9473684210526316)}

### BLEU

In [None]:
from nltk.translate.bleu_score import corpus_bleu
from nltk.translate.bleu_score import sentence_bleu
# http://www.nltk.org/api/nltk.translate.html##nltk.translate.bleu_score.sentence_bleu

ref = 'this is a good text .'
pred = 'this is 2 good text .'
bleu_score = sentence_bleu([ref], pred, weights=(1,0,0,0)) #this is wrong according to documentation
print(bleu_score)
bleu_score = sentence_bleu([ref.split(' ')], pred.split(' '), weights=[1,0,0,0]) #this is the right approach
print(bleu_score)

0.9523809523809523
0.8333333333333334


Corpus/Sentence contains 0 counts of 4-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().


### Meteor

In [None]:
import nltk #.translate.meteor_score as meteor_score
meteor_eval = nltk.translate.meteor_score
# meteor_eval(ref, pred)