##### In this notebook, the predictive power of a pretrained Hugging Face transformer model without any extra fine tuning is tested in generating summaries. This gives an idea of what the base capability of these models may be, although this is very much just an approximation as only one model is used and only a small amount of example summaries are generated. This example also allows for some exploratory analysis of generated summaries and a metric for judging model performance going forward.

# Imports

In [88]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, BartTokenizer, BartForConditionalGeneration, AutoModelForSequenceClassification
import pandas as pd
import torch
import spacy
from rouge_score import rouge_scorer
from rouge import Rouge
import random
import numpy as np

In [2]:
spacy.cli.download("en_core_web_md")
nlp = spacy.load("en_core_web_md") 

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')


# Read in data

In [3]:
df_chapters = pd.read_csv('./cleaned_data_copy/cleaned_summaries_and_texts.csv')

In [4]:
df_chapters.head(20)

Unnamed: 0,chapter_title,chapter_summary,book_title,chapters,chapter_text
0,The Age of Innocence: Novel Summary: Chapters 1-3,The story opens at the opera in New York....,The Age of Innocence,Chapters 1-3,"byEdith Wharton Etext prepared by JudithBoss,..."
1,The Age of Innocence: Novel Summary: Chapters 4-6,Archer and May begin their round of betrotha...,The Age of Innocence,Chapters 4-6,In the course of the next day the first of th...
2,The Age of Innocence: Novel Summary: Chapters 7-9,Mrs Archer and her son call on the van der L...,The Age of Innocence,Chapters 7-9,Mrs. Henry van der Luyden listened in silence...
3,The Age of Innocence: Novel Summary: Chapters ...,Archer tells May about his having sent roses...,The Age of Innocence,Chapters 10-12,"The Countess Olenska had said ""after five""; a..."
4,The Age of Innocence: Novel Summary: Chapters ...,"At the theatre, Archer is moved by an incide...",The Age of Innocence,Chapters 13-15,It was a crowded night at Wallack's theatre. ...
5,The Age of Innocence: Novel Summary: Chapters ...,"Archer arrives at St Augustine, impatient to...",The Age of Innocence,Chapters 16-18,When Archer walked down the sandy main street...
6,The Age of Innocence: Novel Summary: Chapters ...,"Under the eyes of New York society, Archer m...",The Age of Innocence,Chapters 19-21,"The day was fresh, with a lively spring wind ..."
7,The Age of Innocence: Novel Summary: Chapters ...,Mr and Mrs Emerson Sillerton invite the Well...,The Age of Innocence,Chapters 22-24,"A party for the Blenkers--the Blenkers?"" Mr. ..."
8,The Age of Innocence: Novel Summary: Chapters ...,"As he leaves Boston, Archer feels tranquil, ...",The Age of Innocence,Chapters 25-27,"Once more on the boat, and in the presence of..."
9,The Age of Innocence: Novel Summary: Chapters ...,Archer sends a telegram to Ellen asking her ...,The Age of Innocence,Chapters 28-30,"Ol-ol--howjer spell it, anyhow?"" asked the ta..."


# Test pretrained summarization model with one row of data
model from https://huggingface.co/docs/transformers/master/en/model_doc/bart#transformers.BartForConditionalGeneration

In [5]:
test_text = df_chapters.iloc[10]['chapter_text']

In [6]:
checkpoint = "sshleifer/distilbart-xsum-1-1"

In [7]:
model = BartForConditionalGeneration.from_pretrained(checkpoint)

In [8]:
tokenizer = BartTokenizer.from_pretrained(checkpoint)

In [9]:
inputs = tokenizer(test_text, max_length=1024, return_tensors="pt")

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [10]:
summary_ids = model.generate(inputs["input_ids"], num_beams=2, max_length=2000)

In [11]:
model_summary1 = tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
model_summary1

' The mother of a woman who died in a head-to-be-hought Hibernian was a bit of her life, the BBC has learned.'

This model-generated summary does not make much sense. The influence of the training on the model of new sources can definitely be seen in the reference to 'the BBC.'

## Try another model
model from https://huggingface.co/facebook/bart-large-xsum

In [12]:
checkpoint2 = "facebook/bart-large-xsum"

In [13]:
tokenizer2 = AutoTokenizer.from_pretrained(checkpoint2)

In [14]:
inputs2 = tokenizer2(test_text, padding=True, truncation=True, return_tensors="pt")

In [15]:
model2 = AutoModelForSeq2SeqLM.from_pretrained(checkpoint2)

In [16]:
summary_ids2 = model2.generate(inputs2["input_ids"], num_beams=2, max_length=5000)

In [17]:
#Model-generated summary

model_summary2 = tokenizer2.batch_decode(summary_ids2, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
model_summary2

"As he walked home from Mrs. Mingott's house, he was conscious of a growing distaste for what lay before him."

In [18]:
#Actual summary for chapter text passed to model

df_chapters.iloc[10]['chapter_summary']

'  Archer is stunned at Mrs Mingott\'s news. He feels that Ellen, frightened at the prospect of his making a decisive step from which there would be no possibility of return, has decided to compromise and have an affair with him. He worries that an affair might draw them into a life of lies and deceptions, and is concerned about how he will be perceived by society. He waits near Beaufort\'s house for Ellen. When she comes out, he grasps her hand and tells her that they will be together. He notices Lefferts and Chivers discreetly avoiding them, and is sickened by the thought that this could be their future. They arrange to meet the next day at the Metropolitan Museum. In the museum, Ellen tells Archer that she has decided to stay with Mrs Mingott because she feels she will be safer from temptation. She does not want to do irreparable harm, as others do in such situations. Archer protests that he is no different from the others; he has the same longings. She wavers, and asks if she shoul

This summary seems to be better than the one generated by the last model

## Try with another test text

In [19]:
test_text2 = df_chapters.iloc[12]['chapter_text']

In [20]:
#Actual summary for chapter text passed to model

text_summary = df_chapters.iloc[12]['chapter_summary']
text_summary

' Chapter one introduces Mr. and Mrs. Bennet of the Longbourn estate.\xa0 Mrs. Bennet has been told that a "young man of large fortune from the north of England" is moving to Netherfield, an estate near theirs, and she has designs on marrying him to one of her daughters.\xa0 Mrs. Bennet says that Mr. Bennet must go and see Bingley, the new neighbor, "as soon as he comes," and that he should think of his daughters and what a good marriage it would be. Mr. Bennet\'s preference for his daughter Elizabeth also becomes evident, when he says she "has something more of quickness than her sisters," whom he describes as "silly and ignorant like other girls." Mr. Bennet teasingly questions why his visit to Bingley could be so important.  Elizabeth, as well as three of her four sisters, Kitty, Mary, and Lydia are briefly introduced in chapter two.\xa0 While in Chapter one Mr. Bennet teases his wife saying he will not visit Bingley as soon as he arrives, in Chapter two we learn that indeed "Mr. Be

In [21]:
inputs3 = tokenizer2(test_text2, padding=True, truncation=True, return_tensors="pt")

In [22]:
summary_ids3 = model2.generate(inputs3["input_ids"], num_beams=2, max_length=5000)

In [23]:
#Model-generated summary

model_summary3 = tokenizer2.batch_decode(summary_ids3, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
model_summary3

'A young man of large fortune is to take Netherfield Park, and his daughters are to be the first to see him, writes Jane Austen.'

Based on subjective comparison to the provided test summary, this summary is not very accurate, but does make grammatical sense and seems to capture the overall tone of and some of and the most important words in the test summary

# Exploratory analysis of model and evaluation metrics

## Comparison of sentiment of test and model-generated summaries

### Function to return sentiment analysis scores of text 
This needs to be done in a step by step as oppsoed to with the Huggingface transformers piepline in order to be able to handle longer pieces of text

In [24]:
def sentiment_analyzer(text):
  sent_checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
  sent_tokenizer = AutoTokenizer.from_pretrained(sent_checkpoint)
  sent_inputs = sent_tokenizer(text, padding=True, truncation=True, return_tensors="pt")
  sent_model = AutoModelForSequenceClassification.from_pretrained(sent_checkpoint)
  sent_outputs = sent_model(**sent_inputs)
  sent_preds = torch.nn.functional.softmax(sent_outputs.logits, dim=-1)
  return f'POSITIVE: {sent_preds[0][1]}  ---  NEGATIVE: {sent_preds[0][0]}'

In [25]:
sentiment_analyzer(text_summary)

'POSITIVE: 0.6427821516990662  ---  NEGATIVE: 0.35721784830093384'

In [26]:
sentiment_analyzer(model_summary3)

'POSITIVE: 0.998404324054718  ---  NEGATIVE: 0.001595696434378624'

### Function to compare sentiment analysis scores of two pieces of text
Returns the absolute value of the difference between scores for the two texts (a value between 0 and 1, closer to 0 is more similar)

In [27]:
def sentiment_comparison(test_summary_text, model_summary_text):
    sent_checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
    sent_tokenizer = AutoTokenizer.from_pretrained(sent_checkpoint)
    sent_model = AutoModelForSequenceClassification.from_pretrained(sent_checkpoint)

    test_sent_inputs = sent_tokenizer(test_summary_text, padding=True, truncation=True, return_tensors="pt")
    test_sent_outputs = sent_model(**test_sent_inputs)
    test_sent_preds = torch.nn.functional.softmax(test_sent_outputs.logits, dim=-1)
    test_positive = test_sent_preds[0][1]

    model_sent_inputs = sent_tokenizer(model_summary_text, padding=True, truncation=True, return_tensors="pt")
    model_sent_outputs = sent_model(**model_sent_inputs)
    model_sent_preds = torch.nn.functional.softmax(model_sent_outputs.logits, dim=-1)
    model_positive = model_sent_preds[0][1]

    return abs(float(model_positive) - float(test_positive))

    

In [28]:
sentiment_comparison(text_summary, model_summary3)

0.35562217235565186

The sentiment analysis scores of the reference summary and the model-generated summary in this case are quite different. The model-generated summary is much higher on the positivity scale. This is likely due in part to the much shorter word count of the model-generated summary. It seems logical that a longer piece of text would be more liekly to be more neutral as there are more words able to balance out the score

## Spacy similarity

### Function to find spacy similarity (estimated similarity score) between to pieces of text (1 is most similar, 0 is least)

In [31]:
def get_similarity(test_summary_text, model_summary_text):
  test_doc = nlp(test_summary_text)
  model_doc = nlp(model_summary_text)
  return test_doc.similarity(model_doc)

In [32]:
get_similarity(text_summary, model_summary3)

0.9693890661161696

The model-generated summary is very close to the actual summary for this example based on this score

## Rouge score - primary method for summarization model analysis
Rouge calculates precision, recall, and f scores for the two texts based on the overlap of words within them. Rouge1 considers unigrams, Rouge2 considers bigrams, and RougeL looks for the 'longest matching sequence' of words. All scores are between 0 and 1, and for all a score closer to 1 is better
<br>
source: https://www.freecodecamp.org/news/what-is-rouge-and-how-it-works-for-evaluation-of-summaries-e059fb8ac840/ 

In [41]:
scorer = rouge_scorer.RougeScorer(rouge_types=['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)

In [42]:
scores = scorer.score(target=text_summary, prediction=model_summary3)

In [43]:
scores

{'rouge1': Score(precision=0.84, recall=0.045951859956236324, fmeasure=0.08713692946058092),
 'rouge2': Score(precision=0.3333333333333333, recall=0.017543859649122806, fmeasure=0.03333333333333333),
 'rougeL': Score(precision=0.76, recall=0.04157549234135667, fmeasure=0.07883817427385892)}

In [63]:
scores['rouge1']

Score(precision=0.84, recall=0.045951859956236324, fmeasure=0.08713692946058092)

The value of n for the n-grams being considered in the score can be increased

In [44]:
scorer2 = scorer = rouge_scorer.RougeScorer(rouge_types=['rouge5'], use_stemmer=True)

In [45]:
scores2 = scorer2.score(target=text_summary, prediction=model_summary3)
scores2

{'rouge5': Score(precision=0.09523809523809523, recall=0.004415011037527594, fmeasure=0.008438818565400845)}

# Pre-trained model for summarization

## Select data

In [54]:
data = df_chapters[['chapter_text', 'chapter_summary']]

## Set up model

In [60]:
bl_checkpoint = "facebook/bart-large-xsum"
bl_tokenizer = AutoTokenizer.from_pretrained(bl_checkpoint)
baseline_model = AutoModelForSeq2SeqLM.from_pretrained(bl_checkpoint)

baseline_model_summaries = []

def baseline_summary(chapter):
  bl_inputs = bl_tokenizer(chapter, padding=True, truncation=True, return_tensors="pt")
  bl_summary_ids = baseline_model.generate(bl_inputs["input_ids"], num_beams=2, max_length=5000)  
  bl_summary = bl_tokenizer.batch_decode(bl_summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
  return bl_summary

## Run and score model

### Test run 10 randomly selected chapters

In [66]:
rand_indexes = []
for i in range(10):
  rand_indexes.append(random.randint(0,295))

In [67]:
rand_indexes

[206, 256, 233, 270, 258, 117, 80, 10, 51, 121]

In [89]:
reference_summaries = []
model_summaries = []


for i in rand_indexes:
  chapter_text = data.iloc[i]['chapter_text']
  chapter_summary = data.iloc[i]['chapter_summary']

  model_summary = baseline_summary(chapter_text)

  reference_summaries.append(chapter_summary)
  model_summaries.append(model_summary)


rouge = Rouge()
rouge.get_scores(model_summaries, reference_summaries, avg=True)

{'rouge-1': {'f': 0.06483084507293815,
  'p': 0.400385158298044,
  'r': 0.03683754277250695},
 'rouge-2': {'f': 0.005290374588741339,
  'p': 0.060648282072430684,
  'r': 0.0028607933694909564},
 'rouge-l': {'f': 0.05838192852861028,
  'p': 0.36527021576930835,
  'r': 0.03314770206159899}}

Overall, based on rouge scores this model did not perform very well.
<br>
Future fine-tuned models should score higher on these metrics than this baseline model. This model was also tested on a small subset fo the data in the interest of time. Future models will be trained and tested with a larger portion of the book texts and summaries dataset