# Problem Statement:- Text Summarization

Natural Processing language is the Machine learning Model we have to work on to train our model that can actually summarize the text.

It is the process of creating a summary of a certain document which contains the most important information of the original, the purpose of which is to obtain a summary of the main points of the document.

In general there are two types of summarization, abstractive and extractive summarization.
- Abstractive Summarization:
  - Heuristic techniques are used in abstractive summarization to teach the system to try to grasp the entire context and provide a summary based on that understanding. Summaries are produced in a more human-like manner in this manner, and they are more useful than extractive alternatives.
  
- Extractive Summarization:
  - In essence, extractive summarization entails selecting certain text fragments (often sentences) based on specified weights assigned to the key words, with the choice of text being determined by the weights of the words inside it. The default weights are typically assigned based on how frequently a word appears. Here, the maximum and minimum number of sentences that should be included in the summary can be set in order to control its length.


In the given article the text summarization is based on the frequency of the text in an article. If we take a large dataset which may have different essence to the topics, the given model will not be able to capture the essence of the topic and summarize it only by using frequency of the words.

### Goal
The main objective of the problem is to understand different approaches of text summarization methods in NLP using python.

We have used deep learning - resource intensive method and pretrained transformer model using hugginface.
To test how the training data and pretrained data concludes the summary we have deployed Transformer model and hugging face model.


# Importance 

The amount of data that is being created digitally is enormous, so it is essential to create a special method for quickly summarising lengthy texts while retaining the core idea. Additionally, text summary enables users to read faster, find information more quickly, and get as much data as feasible.

The basic objective of text summarization using machine learning is to condense the reference material while maintaining its knowledge and meaning. Examples of multiple text summary descriptions given include one that described the report as text produced from one or more papers that express pertinent knowledge in the first text, which is typically considerably shorter than this and is no longer than half of the main text.

Therefore with the help of any text summarization ML model we will save huge amount of time to extract important information from any large text data.

# Project Information

Steps taken are as follows to create a text summarization model.


1. Data - preprocess
2. Tokenizing text into numerical tokens
3. Obtain maxlen for the numeric tokens
4. Padding/Truncating sequences for identical sequences lengths.
5. Creating dataset pipeline.
6. Positional Encoding.
7. Masking -
  - pad-sequences
  - Lookahead mask to prevent words from the future from influencing words from the present when one is paying attention to oneself
8. Building Model.
    - encode
    - decode
    - Transformer
    - Adam optimizer
    - Metrices
    - Inferences
    - Summarize
    - Rouge Score

### Using Pretrained Model from HuggingFace

1. Import Model
2. Fit the dataset
3. Summarize
4. Check Rouge Score



In [1]:
# importing required libraries/ dependencies
import pandas as pd
import numpy as np
import tensorflow as tf
import time
import re
# import pickle

In [2]:
news_data = pd.read_excel("news.xlsx")

In [3]:
news_data.drop(['Source ', 'Time ', 'Publish Date'], axis=1, inplace=True)

In [4]:
news_data.head(10)

Unnamed: 0,Headline,Short
0,4 ex-bank officials booked for cheating bank o...,The CBI on Saturday booked four former officia...
1,Supreme Court to go paperless in 6 months: CJI,Chief Justice JS Khehar has said the Supreme C...
2,"At least 3 killed, 30 injured in blast in Sylh...","At least three people were killed, including a..."
3,Why has Reliance been barred from trading in f...,Mukesh Ambani-led Reliance Industries (RIL) wa...
4,Was stopped from entering my own studio at Tim...,TV news anchor Arnab Goswami has said he was t...
5,New trailer of &#39;Justice League&#39; released,A new trailer for the upcoming superhero film ...
6,His touch was not right: Shilpa Shinde on sexu...,"Television actress Shilpa Shinde, while openin..."
7,Anti-Romeo squads must not trouble consenting ...,"Uttar Pradesh Chief Minister Yogi Adityanath, ..."
8,Both Romeo and Juliet are welcome in Delhi: AA...,In an apparent jibe at UP&#39;s anti-Romeo squ...
9,"30 blasts occur at ordnance factory in MP, 20...",At least 20 people were reportedly injured aft...


In [5]:
doc = news_data['Short']
sumry = news_data['Headline']

## - Data Preprocessing

In [6]:
sumry = sumry.apply(lambda x: '<start> ' + x + ' <stop>')
sumry.head()

0    <start> 4 ex-bank officials booked for cheatin...
1    <start> Supreme Court to go paperless in 6 mon...
2    <start> At least 3 killed, 30 injured in blast...
3    <start> Why has Reliance been barred from trad...
4    <start> Was stopped from entering my own studi...
Name: Headline, dtype: object

## - Tokenization of texts into integer tokens

In [7]:
filters = '!"#$%&()*+,-./:;=?@[\\]^_`{|}~\t\n'
remove_token = '<unk>' # < and > cannot be removed from tokens

In [8]:
doc_tokenizer = tf.keras.preprocessing.text.Tokenizer(oov_token=remove_token)
sumry_tokenizer = tf.keras.preprocessing.text.Tokenizer(filters=filters, oov_token=remove_token)

In [9]:
doc_tokenizer.fit_on_texts(doc)
sumry_tokenizer.fit_on_texts(sumry)

In [10]:
inputs = doc_tokenizer.texts_to_sequences(doc)
targets = sumry_tokenizer.texts_to_sequences(sumry)

## - Testing tokenized text into numerical values

- Tokenization API for TensorFlow essentially takes care of all aspects of data preparation and cleaning. To provide the model a more universal input, there is still one more step that involves padding or truncating the sequences to a predetermined length.

In [11]:
sumry_tokenizer.texts_to_sequences(["This is a test"])

[[184, 22, 12, 71]]

In [12]:
sumry_tokenizer.sequences_to_texts([[184, 22, 12, 71]])

['this is a test']

## - Encoder size

In [13]:
evs = len(doc_tokenizer.word_index) + 1 # encoder_vocab_size
dvs = len(sumry_tokenizer.word_index) + 1 #decoder_vocab_size

# vocab_size
evs, dvs

(76362, 29661)

## Obtaining Insights for defining maxlen

In [14]:
doc_lengths = pd.Series([len(x) for x in doc])
sumry_lengths = pd.Series([len(x) for x in sumry])

In [15]:
doc_lengths.describe(), sumry_lengths.describe()

(count    55104.000000
 mean       368.003049
 std         26.235510
 min        280.000000
 25%        350.000000
 50%        369.000000
 75%        387.000000
 max        469.000000
 dtype: float64, count    55104.000000
 mean        66.620282
 std          7.267463
 min         23.000000
 25%         62.000000
 50%         66.000000
 75%         72.000000
 max         99.000000
 dtype: float64)

In [16]:
encoder_maxlen = 400 # value of close to 75th percentile
decoder_maxlen = 75

## Padding/ Truncating sequences for indentical sequence lengths
- Tokenization API for TensorFlow essentially takes care of all aspects of data preparation and cleaning. To provide the model a more universal input, there is still one more step that involves padding or truncating the sequences to a predetermined length.

In [17]:
inputs = tf.keras.preprocessing.sequence.pad_sequences(inputs, maxlen=encoder_maxlen, padding='post', truncating='post')
targets = tf.keras.preprocessing.sequence.pad_sequences(targets, maxlen=decoder_maxlen, padding='post', truncating='post')

## - Creating Dataset Pipeline

In [18]:
inputs = tf.cast(inputs, dtype=tf.int32)
targets = tf.cast(targets, dtype=tf.int32)

In [19]:
# random values
BUFFER_SIZE = 5000
BATCH_SIZE = 55

In [20]:
dataset = tf.data.Dataset.from_tensor_slices((inputs, targets)).shuffle(BUFFER_SIZE).batch(BATCH_SIZE)

Positional encoding adds the idea of location between words because it is non-directional, unlike RNN.

In [21]:
def get_angles(position, i, d_model):
    angle_rates = 1 / np.power(10000, (2 * (i // 2)) / np.float32(d_model))
    return position * angle_rates

In [22]:
def positional_encoding(position, d_model):
    angle_rads = get_angles(
        np.arange(position)[:, np.newaxis],
        np.arange(d_model)[np.newaxis, :],
        d_model
    )

    # Applying sin to even indices in the array; 2i
    angle_rads[:, 0::2] = np.sin(angle_rads[:, 0::2])

    # Applying cos to odd indices in the array; 2i+1
    angle_rads[:, 1::2] = np.cos(angle_rads[:, 1::2])

    pos_encoding = angle_rads[np.newaxis, ...]

    return tf.cast(pos_encoding, dtype=tf.float32)


## Masking -
- pad-sequences
- Lookahead mask to prevent words from the future from influencing words from the present when one is paying attention to oneself

In [23]:
def create_padding_mask(seq):
    seq = tf.cast(tf.math.equal(seq, 0), tf.float32)
    return seq[:, tf.newaxis, tf.newaxis, :]

In [24]:
def create_look_ahead_mask(size):
    mask = 1 - tf.linalg.band_part(tf.ones((size, size)), -1, 0)
    return mask

# MODEL - building

In [25]:
def scaled_dot_product_attention(q, k, v, mask):
    matmul_qk = tf.matmul(q, k, transpose_b=True)

    dk = tf.cast(tf.shape(k)[-1], tf.float32)
    scaled_attention_logits = matmul_qk / tf.math.sqrt(dk)

    if mask is not None:
        scaled_attention_logits += (mask * -1e9)  

    attention_weights = tf.nn.softmax(scaled_attention_logits, axis=-1)

    output = tf.matmul(attention_weights, v)
    return output, attention_weights

In [26]:
class MultiHeadAttention(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        self.num_heads = num_heads
        self.d_model = d_model

        assert d_model % self.num_heads == 0

        self.depth = d_model // self.num_heads

        self.wq = tf.keras.layers.Dense(d_model)
        self.wk = tf.keras.layers.Dense(d_model)
        self.wv = tf.keras.layers.Dense(d_model)

        self.dense = tf.keras.layers.Dense(d_model)
        
    def split_heads(self, x, batch_size):
        x = tf.reshape(x, (batch_size, -1, self.num_heads, self.depth))
        return tf.transpose(x, perm=[0, 2, 1, 3])
    
    def call(self, v, k, q, mask):
        batch_size = tf.shape(q)[0]

        q = self.wq(q)
        k = self.wk(k)
        v = self.wv(v)

        q = self.split_heads(q, batch_size)
        k = self.split_heads(k, batch_size)
        v = self.split_heads(v, batch_size)

        scaled_attention, attention_weights = scaled_dot_product_attention(
            q, k, v, mask)

        scaled_attention = tf.transpose(scaled_attention, perm=[0, 2, 1, 3])

        concat_attention = tf.reshape(scaled_attention, (batch_size, -1, self.d_model))
        output = self.dense(concat_attention)
            
        return output, attention_weights

## Feed Forward Network

In [27]:
def point_wise_feed_forward_network(d_model, dff):
    return tf.keras.Sequential([
        tf.keras.layers.Dense(dff, activation='relu'),
        tf.keras.layers.Dense(d_model)
    ])

## - TRANSFORMERS
- encoding layer
- decoding layer

In [28]:
class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, dff, rate=0.1):
        super(EncoderLayer, self).__init__()

        self.mha = MultiHeadAttention(d_model, num_heads)
        self.ffn = point_wise_feed_forward_network(d_model, dff)

        self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        self.dropout1 = tf.keras.layers.Dropout(rate)
        self.dropout2 = tf.keras.layers.Dropout(rate)
    
    def call(self, x, training, mask):
        attn_output, _ = self.mha(x, x, x, mask)
        attn_output = self.dropout1(attn_output, training=training)
        out1 = self.layernorm1(x + attn_output)

        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        out2 = self.layernorm2(out1 + ffn_output)

        return out2

In [29]:
class DecoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, num_heads, dff, rate=0.1):
        super(DecoderLayer, self).__init__()

        self.mha1 = MultiHeadAttention(d_model, num_heads)
        self.mha2 = MultiHeadAttention(d_model, num_heads)

        self.ffn = point_wise_feed_forward_network(d_model, dff)

        self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm3 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        self.dropout1 = tf.keras.layers.Dropout(rate)
        self.dropout2 = tf.keras.layers.Dropout(rate)
        self.dropout3 = tf.keras.layers.Dropout(rate)
    
    
    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        attn1, attn_weights_block1 = self.mha1(x, x, x, look_ahead_mask)
        attn1 = self.dropout1(attn1, training=training)
        out1 = self.layernorm1(attn1 + x)

        attn2, attn_weights_block2 = self.mha2(enc_output, enc_output, out1, padding_mask)
        attn2 = self.dropout2(attn2, training=training)
        out2 = self.layernorm2(attn2 + out1)

        ffn_output = self.ffn(out2)
        ffn_output = self.dropout3(ffn_output, training=training)
        out3 = self.layernorm3(ffn_output + out2)

        return out3, attn_weights_block1, attn_weights_block2


In [30]:
class Encoder(tf.keras.layers.Layer):
    def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, maximum_position_encoding, rate=0.1):
        super(Encoder, self).__init__()

        self.d_model = d_model
        self.num_layers = num_layers

        self.embedding = tf.keras.layers.Embedding(input_vocab_size, d_model)
        self.pos_encoding = positional_encoding(maximum_position_encoding, self.d_model)

        self.enc_layers = [EncoderLayer(d_model, num_heads, dff, rate) for _ in range(num_layers)]

        self.dropout = tf.keras.layers.Dropout(rate)
        
    def call(self, x, training, mask):
        seq_len = tf.shape(x)[1]

        x = self.embedding(x)
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)
    
        for i in range(self.num_layers):
            x = self.enc_layers[i](x, training, mask)
    
        return x


In [31]:
class Decoder(tf.keras.layers.Layer):
    def __init__(self, num_layers, d_model, num_heads, dff, target_vocab_size, maximum_position_encoding, rate=0.1):
        super(Decoder, self).__init__()

        self.d_model = d_model
        self.num_layers = num_layers

        self.embedding = tf.keras.layers.Embedding(target_vocab_size, d_model)
        self.pos_encoding = positional_encoding(maximum_position_encoding, d_model)

        self.dec_layers = [DecoderLayer(d_model, num_heads, dff, rate) for _ in range(num_layers)]
        self.dropout = tf.keras.layers.Dropout(rate)
    
    def call(self, x, enc_output, training, look_ahead_mask, padding_mask):
        seq_len = tf.shape(x)[1]
        attention_weights = {}

        x = self.embedding(x)
        x *= tf.math.sqrt(tf.cast(self.d_model, tf.float32))
        x += self.pos_encoding[:, :seq_len, :]

        x = self.dropout(x, training=training)

        for i in range(self.num_layers):
            x, block1, block2 = self.dec_layers[i](x, enc_output, training, look_ahead_mask, padding_mask)

            attention_weights['decoder_layer{}_block1'.format(i+1)] = block1
            attention_weights['decoder_layer{}_block2'.format(i+1)] = block2
    
        return x, attention_weights


In [32]:
class Transformer(tf.keras.Model):
    def __init__(self, num_layers, d_model, num_heads, dff, input_vocab_size, target_vocab_size, pe_input, pe_target, rate=0.1):
        super(Transformer, self).__init__()

        self.encoder = Encoder(num_layers, d_model, num_heads, dff, input_vocab_size, pe_input, rate)

        self.decoder = Decoder(num_layers, d_model, num_heads, dff, target_vocab_size, pe_target, rate)

        self.final_layer = tf.keras.layers.Dense(target_vocab_size)
    
    def call(self, inp, tar, training, enc_padding_mask, look_ahead_mask, dec_padding_mask):
        enc_output = self.encoder(inp, training, enc_padding_mask)

        dec_output, attention_weights = self.decoder(tar, enc_output, training, look_ahead_mask, dec_padding_mask)

        final_output = self.final_layer(dec_output)

        return final_output, attention_weights


TRAINING

In [33]:
# Using hyper-parameters
num_layers = 4
d_model = 128
dff = 512
num_heads = 8
EPOCHS = 2

ADAM optimizer with learning rate

In [34]:
class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule):
    def __init__(self, d_model, warmup_steps=4000):
        super(CustomSchedule, self).__init__()

        self.d_model = d_model
        self.d_model = tf.cast(self.d_model, tf.float32)

        self.warmup_steps = warmup_steps
    
    def __call__(self, step):
        arg1 = tf.math.rsqrt(step)
        arg2 = step * (self.warmup_steps ** -1.5)

        return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)


In [35]:
learning_rate = CustomSchedule(d_model)

optimizer = tf.keras.optimizers.Adam(learning_rate, beta_1=0.9, beta_2=0.98, epsilon=1e-9)

In [36]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')

In [37]:
def loss_function(real, pred):
    mask = tf.math.logical_not(tf.math.equal(real, 0))
    loss_ = loss_object(real, pred)

    mask = tf.cast(mask, dtype=loss_.dtype)
    loss_ *= mask

    return tf.reduce_sum(loss_)/tf.reduce_sum(mask)


In [38]:
train_loss = tf.keras.metrics.Mean(name='train_loss')

In [39]:
transformer = Transformer(
    num_layers, 
    d_model, 
    num_heads, 
    dff,
    evs, 
    dvs, 
    pe_input = evs, 
    pe_target = dvs,
)

In [40]:
def create_masks(inp, tar):
    enc_padding_mask = create_padding_mask(inp)
    dec_padding_mask = create_padding_mask(inp)

    look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
    dec_target_padding_mask = create_padding_mask(tar)
    combined_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)
  
    return enc_padding_mask, combined_mask, dec_padding_mask


## - Checkpoints

In [41]:
checkpoint_path = "checkpoints"

ckpt = tf.train.Checkpoint(transformer=transformer, optimizer=optimizer)

ckpt_manager = tf.train.CheckpointManager(ckpt, checkpoint_path, max_to_keep=5)

if ckpt_manager.latest_checkpoint:
    ckpt.restore(ckpt_manager.latest_checkpoint)
    print ('Latest checkpoint restored!!')

In [42]:
@tf.function
def train_step(inp, tar):
    tar_inp = tar[:, :-1]
    tar_real = tar[:, 1:]

    enc_padding_mask, combined_mask, dec_padding_mask = create_masks(inp, tar_inp)

    with tf.GradientTape() as tape:
        predictions, _ = transformer(
            inp, tar_inp, 
            True, 
            enc_padding_mask, 
            combined_mask, 
            dec_padding_mask
        )
        loss = loss_function(tar_real, predictions)

    gradients = tape.gradient(loss, transformer.trainable_variables)    
    optimizer.apply_gradients(zip(gradients, transformer.trainable_variables))

    train_loss(loss)

In [44]:
for epoch in range(EPOCHS):
    start = time.time()

    train_loss.reset_states()
  
    for (batch, (inp, tar)) in enumerate(dataset):
        train_step(inp, tar)
    
        if batch % 429 == 0:
            print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1, batch, train_loss.result()))
      
    if (epoch + 1) % 5 == 0:
        ckpt_save_path = ckpt_manager.save()
        print ('Saving checkpoint for epoch {} at {}'.format(epoch+1, ckpt_save_path))
    
    print ('Epoch {} Loss {:.4f}'.format(epoch + 1, train_loss.result()))

    print ('Time taken for 1 epoch: {} secs\n'.format(time.time() - start))


Epoch 1 Batch 0 Loss 9.9500
Epoch 1 Batch 429 Loss 8.4779
Epoch 1 Batch 858 Loss 7.8145
Epoch 1 Loss 7.6728
Time taken for 1 epoch: 347.0222644805908 secs

Epoch 2 Batch 0 Loss 7.4298
Epoch 2 Batch 429 Loss 6.7162
Epoch 2 Batch 858 Loss 6.4927
Epoch 2 Loss 6.4146
Time taken for 1 epoch: 343.43158054351807 secs



In [45]:
def evaluate(input_document):
    input_document = doc_tokenizer.texts_to_sequences([input_document])
    input_document = tf.keras.preprocessing.sequence.pad_sequences(input_document, maxlen=encoder_maxlen, padding='post', truncating='post')

    encoder_input = tf.expand_dims(input_document[0], 0)

    decoder_input = [sumry_tokenizer.word_index["<start>"]]
    output = tf.expand_dims(decoder_input, 0)
    
    for i in range(decoder_maxlen):
        enc_padding_mask, combined_mask, dec_padding_mask = create_masks(encoder_input, output)

        predictions, attention_weights = transformer(
            encoder_input, 
            output,
            False,
            enc_padding_mask,
            combined_mask,
            dec_padding_mask
        )

        predictions = predictions[: ,-1:, :]
        predicted_id = tf.cast(tf.argmax(predictions, axis=-1), tf.int32)

        if predicted_id == sumry_tokenizer.word_index["<stop>"]:
            return tf.squeeze(output, axis=0), attention_weights

        output = tf.concat([output, predicted_id], axis=-1)

    return tf.squeeze(output, axis=0), attention_weights


In [46]:
def summarize(input_document):
    # not considering attention weights for now, can be used to plot attention heatmaps in the future
    summarized = evaluate(input_document=input_document)[0].numpy()
    summarized = np.expand_dims(summarized[1:], 0)  # not printing <go> token
    return sumry_tokenizer.sequences_to_texts(summarized)[0]  # since there is just one translated document

In [47]:
test_text = ''' 
Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. AI research has been defined as the field of study of intelligent agents, which refers to any system that perceives its environment and takes actions that maximize its chance of achieving its goals.
The term "artificial intelligence" had previously been used to describe machines that mimic and display "human" cognitive skills that are associated with the human mind, such as "learning" and "problem-solving". This definition has since been rejected by major AI researchers who now describe AI in terms of rationality and acting rationally, which does not limit how intelligence can be articulated.
AI applications include advanced web search engines (e.g., Google), recommendation systems (used by YouTube, Amazon and Netflix), understanding human speech (such as Siri and Alexa), self-driving cars (e.g., Tesla), automated decision-making and competing at the highest level in strategic game systems (such as chess and Go). As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. For instance, optical character recognition is frequently excluded from things considered to be AI, having become a routine technology.
Artificial intelligence was founded as an academic discipline in 1956, and in the years since has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success and renewed funding. AI research has tried and discarded many different approaches since its founding, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge and imitating animal behavior. In the first decades of the 21st century, highly mathematical-statistical machine learning has dominated the field, and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.
The various sub-fields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception, and the ability to move and manipulate objects. General intelligence (the ability to solve an arbitrary problem) is among the field's long-term goals. To solve these problems, AI researchers have adapted and integrated a wide range of problem-solving techniques—including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, probability and economics. AI also draws upon computer science, psychology, linguistics, philosophy, and many other fields.
The field was founded on the assumption that human intelligence "can be so precisely described that a machine can be made to simulate it". This raised philosophical arguments about the mind and the ethical consequences of creating artificial beings endowed with human-like intelligence; these issues have previously been explored by myth, fiction and philosophy since antiquity. Computer scientists and philosophers have since suggested that AI may become an existential risk to humanity if its rational capacities are not steered towards beneficial goals.
'''


In [52]:
pred_text = summarize(test_text)

In [53]:
pred_text

'new app launches new app in new 999'

In [None]:
# prerequisite
!pip install transformers
# !pip install rouge_score 
!pip install rouge

In [55]:
from rouge import Rouge

ROUGE = Rouge()

results1 = ROUGE.get_scores(pred_text, test_text)



TEXT SUMMARIZATION USING HUGGINGFACE with pretrained model.

In [56]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
pred_test_text = summarizer(test_text, max_length=130, min_length=30, do_sample=False)

Downloading config.json:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.51G [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [64]:
pred_test_text[0]['summary_text']

'Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. AI applications include advanced web search engines (e.g., Google), recommendation systems (used by YouTube, Amazon and Netflix), understanding human speech and self-driving cars.'

In [62]:

results2 = ROUGE.get_scores(pred_test_text[0]['summary_text'], test_text)


In [63]:
results1, results2

([{'rouge-1': {'f': 0.01290322548907389, 'p': 0.4, 'r': 0.006557377049180328},
   'rouge-2': {'f': 0.0, 'p': 0.0, 'r': 0.0},
   'rouge-l': {'f': 0.01290322548907389,
    'p': 0.4,
    'r': 0.006557377049180328}}],
 [{'rouge-1': {'f': 0.22674418403613442, 'p': 1.0, 'r': 0.12786885245901639},
   'rouge-2': {'f': 0.1592232994403997,
    'p': 0.9534883720930233,
    'r': 0.08686440677966102},
   'rouge-l': {'f': 0.22674418403613442, 'p': 1.0, 'r': 0.12786885245901639}}])

## Analysis

1. The well-known Transformer model has been successfully implemented using the TensorFlow API. 
2. We have also applied deep learning to a use case that was rather challenging and got satisfactory results.
3. The deeplearning transformer model is very costly.


#### Using pretrained model of HuggingFace

- We can see two different results using performance metrics called rouge score:
    - For deep learning model with very few epochs trained: we have very low rouge score.
    - For pretrained model of HuggingFace we can see rouge score higher and precision-score almost 1.

REFERENCES
1. https://huggingface.co/facebook/bart-large-cnn
2. https://towardsdatascience.com/introduction-to-text-summarization-with-rouge-scores-84140c64b471
3. https://towardsdatascience.com/transformers-explained-65454c0f3fa7
4. https://medium.com/swlh/abstractive-text-summarization-using-transformers-3e774cc42453
5. https://towardsdatascience.com/introduction-to-text-summarization-with-rouge-scores-84140c64b471