<a href="https://colab.research.google.com/github/ajit-rajput/nlp-text-summarizer/blob/main/text_summarization_techniques.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text Summarization

We are going to see how deep learning can be used to summarize the text. So, let’s dive in.

**Types of Summarizers**

**1. Extractive**, where important sentences are selected from the input text to 
form a summary. Most summarization approaches today are extractive in nature.

**2. Abstractive summarizers** do not select sentences from the originally given text passage to create the summary. Instead, they produce a paraphrasing of the main contents of the given text, using a vocabulary set different from the original document

In [None]:
!pip install bert-extractive-summarizer
!pip install spacy
!pip install transformers
!pip install torch
!pip install sentencepiece



In [None]:
text = """
Tesla reported second-quarter earnings after the bell Monday, and it’s a beat on both the top and bottom lines. Shares rose about 2% after-hours. Here are the results.
Earnings: $1.45 vs 98 cents per share adjusted expected, according to Refinitiv. Revenue: $11.96 billion vs $11.30 billion expected, according to Refinitiv
Tesla reported $1.14 billion in (GAAP) net income for the quarter, the first time it has surpassed $1 billion. In the year-ago quarter, net income amounted to $104 million.
Overall automotive revenue came in at $10.21 billion, of which only $354 million, about 3.5%, came from sales of regulatory credits. That’s a lower number for credits than in any of the previous four quarters. Automotive gross margins were 28.4%, higher than in any of the last four quarters.
Tesla had already reported deliveries (its closest approximation to sales) of 201,250 electric vehicles, and production of 206,421 total vehicles, during the quarter ended June 30, 2021.
The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems for homes, businesses and utilities, an increase of more than 60% from last quarter. While Tesla does not disclose how many energy storage units it sells each quarter, in recent weeks CEO Elon Musk said, in court, 
that the company would only be able to produce 30,000 to 35,000 at best during the current quarter, blaming the lag on chip shortages. Tesla also reported $951 million in services and other revenues. The company now operates 598 stores and service centers, and a mobile service fleet including 1,091 vehicles, 
an increase of just 34% versus a year ago. That compares with an increase of 121% in vehicle deliveries year over year.A $23 million impairment related to the value of its bitcoin holdings was reported as an operating expense under “Restructuring and other.”
"""

## Summarization using Gensim

In [None]:
import gensim
from gensim.summarization import summarize

from transformers import pipeline

import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration, T5Config

from summarizer import Summarizer,TransformerSummarizer

In [None]:
summary_by_ratio=summarize(text, ratio=0.15)
print("Summary : " + summary_by_ratio)

Summary : The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems for homes, businesses and utilities, an increase of more than 60% from last quarter.
Tesla also reported $951 million in services and other revenues.


In [None]:
summary_by_count=summarize(text, word_count=60)
print("Summary : " + summary_by_count)

Summary : Overall automotive revenue came in at $10.21 billion, of which only $354 million, about 3.5%, came from sales of regulatory credits.
The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems for homes, businesses and utilities, an increase of more than 60% from last quarter.
Tesla also reported $951 million in services and other revenues.


## Abstractive Summarization using Transformers

## Summarization using Pipeline API

In [None]:
summarization = pipeline("summarization")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1802.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1222317369.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=26.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




In [None]:
abstract_text = summarization(text)[0]['summary_text']
print("Summary:", abstract_text)

To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


Summary:  Tesla reported $1.14 billion in (GAAP) net income for the quarter, the first time it has surpassed $1 billion . The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems . Shares rose about 2% after-hours .


In [None]:
# use t5 in tf
t5summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf")


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=892146080.0, style=ProgressStyle(descri…




All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1389353.0, style=ProgressStyle(descript…




[{'summary_text': 'the company reported $1.45 vs 98 cents per share adjusted expected .'}]

In [None]:
t5summarizer(text, min_length=5, max_length=60)

[{'summary_text': 'Tesla reported $1.45 vs 98 cents per share adjusted expected, according to Refinitiv . overall automotive revenue came in at $10.21 billion, of which only $354 million, about 3.5%, came from sales of regulatory credits .'}]

## Summarization using T5 Transformer

In [None]:
t5model = T5ForConditionalGeneration.from_pretrained('t5-small')
t5tokenizer = T5Tokenizer.from_pretrained('t5-small')
device = torch.device('cpu')

In [None]:
t5tokenized_text = t5tokenizer.encode("summarize:"+ text,
                                truncation=True,
                                return_attention_mask=True,
                                add_special_tokens=True, 
                                padding='max_length',     
                                return_tensors="pt").to(device)


## Beam search

Beam search reduces the risk of missing hidden high probability word sequences by keeping the most likely num_beams of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability

Read more about Greedy Search and Beam Search here : https://huggingface.co/blog/how-to-generate

In [None]:
# summmarize 
t5summary_ids =  t5model.generate(input_ids=t5tokenized_text['input_ids'],
                 attention_mask=t5tokenized_text['attention_mask'],
                 num_beams=3,
                 min_length=20,
                 max_length=70,
                 repetition_penalty=2.0,
                 early_stopping=True)

1. max_length: The maximum number of tokens to generate

2. min_length: This is the minimum number of tokens to generate

3. length_penalty: Exponential penalty to the length, 1.0 means no penalty, increasing this parameter, will increase the length of the output text.

4. num_beams: Specifying this parameter, will lead the model to use beam search instead of greedy search, setting num_beams to 4, will allow the model to lookahead for 4 possible words (1 in the case of greedy search) 

5. early_stopping: We set it to True, so that generation is finished when all beam hypotheses reached the end of string token (EOS).


In [None]:
output = t5tokenizer.decode(t5summary_ids[0],  
                             skip_special_tokens=True, 
                             clean_up_tokenization_spaces=True)
print ("Summary:", output)



Summarized text: 
 shares rose about 2% after-hours, according to Refinitiv. in the year-ago quarter, net income amounted to $104 million. overall automotive revenue came in at $10.21 billion, of which only $354 million, about 3.5%, came from sales of regulatory credits.


## Summarization with GPT2 Model

In [None]:
GPT2_model = TransformerSummarizer(transformer_type="GPT2",
                         transformer_model_key="gpt2-medium")


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=718.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1520013706.0, style=ProgressStyle(descr…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355256.0, style=ProgressStyle(descript…


Tesla reported second-quarter earnings after the bell Monday, and it’s a beat on both the top and bottom lines. In the year-ago quarter, net income amounted to $104 million. The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems for homes, businesses and utilities, an increase of more than 60% from last quarter.


In [None]:
gpt_summary = ''.join(GPT2_model(text, min_length=60))
print("Summary:" + gpt_summary)

Summary:Tesla reported second-quarter earnings after the bell Monday, and it’s a beat on both the top and bottom lines. In the year-ago quarter, net income amounted to $104 million. The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems for homes, businesses and utilities, an increase of more than 60% from last quarter.


## Summarization using XLNet Model

In [None]:
model = TransformerSummarizer(transformer_type="XLNet",transformer_model_key="xlnet-base-cased")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=760.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=467042463.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at xlnet-base-cased were not used when initializing XLNetModel: ['lm_loss.weight', 'lm_loss.bias']
- This IS expected if you are initializing XLNetModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLNetModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=798011.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1382015.0, style=ProgressStyle(descript…




In [None]:
xlnet_summary = ''.join(model(text, min_length=60))
print("Summary: " + xlnet_summary)

Summary: Tesla reported second-quarter earnings after the bell Monday, and it’s a beat on both the top and bottom lines. That’s a lower number for credits than in any of the previous four quarters. The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems for homes, businesses and utilities, an increase of more than 60% from last quarter.


## Summarization using BERT Model

In [None]:
bert_model = Summarizer()
bert_summary = ''.join(bert_model(text, min_length=60))
print("Summary: " + bert_summary)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=571.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1344997306.0, style=ProgressStyle(descr…




Some weights of the model checkpoint at bert-large-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…


Summary: Tesla reported second-quarter earnings after the bell Monday, and it’s a beat on both the top and bottom lines. The company also reported $801 million in revenue from its energy business, including solar photovoltaics and energy storage systems for homes, businesses and utilities, an increase of more than 60% from last quarter. That compares with an increase of 121% in vehicle deliveries year over year.
