# Recurrent Neural Networks

Natural language processing (NLP) is the practice of identify sequence patterns from language, to deduce the meaning behind the statement. In short, NLP has the goal of derviging information out of natural language (could be sequences text or speech). Another common term for NLP problems is sequence to sequence problems (seq2seq).

The purpose of this notebook is to download, prepare, and use a text dataset to build out multiple recurrent neural network (RNN) models to make predictions from the text. Additionally, I will create a model from an already pre-trained model on TensorFlow Hub.

The dataset I am going to use is Kaggle's introduction to NLP dataset (text samples of Tweets that predict as disaster or not disater).
* https://www.kaggle.com/competitions/nlp-getting-started

NOTE: Other sequence problems may include something like time series forecasting.

## Imports

In [None]:
from dataclasses import dataclass, asdict
import io
import os
import pathlib
import random
import sys
from typing import Dict

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from py_learning_toolbox import dl_toolbox
from py_learning_toolbox import performance_toolbox
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
import tensorflow as tf

## Helper Functions

## Download and Analyze Data

In [None]:
# Image dataset location
data_directory = pathlib.Path('./data/nlp_getting_started')
test_file = data_directory / 'test.csv'
train_file = data_directory / 'train.csv'

In [None]:
# Visualizing the text dataset
train_data = pd.read_csv(str(train_file))
test_data = pd.read_csv(str(test_file))
train_data.head()

In [None]:
# Lets shuffle the training data
train_data_shuffled = train_data.sample(frac=1, random_state=42)
train_data_shuffled.head()

In [None]:
# Lets look at 10 random tweets and whether each one was a disaster or not
for i in range(10):
    row = train_data_shuffled.iloc[random.randint(0, len(train_data_shuffled))]
    print(f"Target: ({'Disaster' if row['target'] else 'Not Disaster'})")
    print(row['text'])
    print('\n', '-' * 40, '\n')

In [None]:
# Lets look at the number of each target (disaster or not a disaster)
train_data_shuffled.target.value_counts()

## Preparing Data

To prep this data, there are a few things I need to do to get everything ready to build out my models.

1. Shuffle the training data set.
2. Split the training data set into a training and validation set (go to use 10% of the training data as the validation data).
3. Need to convert text into numbers.

In [None]:
# Lets shuffle the training data
train_data_shuffled = train_data.sample(frac=1, random_state=42)
train_data_shuffled.head()

In [None]:
# Splitting the train data to split into training and validation datasets
train_sentences, val_sentences, train_labels, val_labels = train_test_split(
    train_data_shuffled['text'].to_numpy(),
    train_data_shuffled['target'].to_numpy(),
    test_size=0.1,
    random_state=42)

len(train_sentences), len(val_sentences), len(train_labels), len(val_labels)

In [None]:
# Verify the split worked as expected
train_sentences[:10], train_labels[:10]

#### TextVectorization Layer (To be Used in Models)

When dealing with a text problem, one of the first things to do before building a model is to convert text to numbers. There are a few ways to do this:

* Tokenization - direct mapping of token (a token could be a word or a character) to a number.
* Embedding - create a matrix of featyre vector for each token (the size of the feature vector can be defined and this embedding can be learned).

In [None]:
# Find average number of tokens
round(sum([len(i.split()) for i in train_sentences]) / len(train_sentences))

In [None]:
# Setup text vectorization params
max_vocab_length = 10000  # Max words to have in our vocab
max_length = 15  # Max length our sequence will be

In [None]:
# Setting up a text vectorization layer (tokenization)
text_vectorizer = tf.keras.layers.experimental.preprocessing.TextVectorization(
    max_tokens=max_vocab_length,  # How many words in the vocabulary (None sets as no maximum number of tokens)
    output_mode='int',
    output_sequence_length=max_length)  # Padds (adds 0's to end of number) to make all the same length

In [None]:
# Adapt the vectorizer to the training data
text_vectorizer.adapt(train_sentences)

In [None]:
# Verify the text vectorizer was adapted correctly
sample_sentence = 'There\'s a flood in my street!'
text_vectorizer([sample_sentence])

In [None]:
# Choose a random sentence from the train data and encode it
rand_i = random.randint(0, len(train_sentences))
print(f'Sentence: {train_sentences[rand_i]}')
print(f'Vectorized: {text_vectorizer([train_sentences[rand_i]])}')

In [None]:
# Getting the words in the vocab from the training data
words_in_vocab = text_vectorizer.get_vocabulary()
top_5_words = words_in_vocab[:5]
least_common_5_words = words_in_vocab[-5:]
len(words_in_vocab), top_5_words, least_common_5_words

#### Creating Embedding Layer (To be Used in Models)

To make our embedding layer, I am going to use TensorFlow's `Embedding` layer. 

The parameters we care most about for our embedding layer are:

* `input dim` -  The size of the vocabulary
* `output dim` - The size of the output embedding vector, for example, a size of 100 mean each token would be represented by a vector of length 100.
* `input_length` - The length of the sequences being passed to the embedding layer.

In [None]:
embedding = tf.keras.layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,  # GPU's work well when number is divisible by 8
                                     input_length=max_length)
embedding

In [None]:
# Verify the embedding layer worked
random_sentence = random.choice(train_sentences)
embedded_sentence = embedding(text_vectorizer([random_sentence]))
print('Sentence: \n', random_sentence)
print('Embedded Version: \n', embedded_sentence)

## Experiments

To experiment and identify the best model, I am going to run the following experiments with the corresponding model to analyze the difference between the different types of models for modeling sequence based problems.

1. Naive Bayes with TF-IDF encoder (baseline model) NOTE: this is not a Deep Learning model
2. Feed-forward neural network (dense model)
3. LSTM (RNN)
4. GRU (RNN)
5. Bidirection-LSTM (RNN)
6. 1D Convolutional Neural Network
7. TensorFlow Hub Pretrained Feature Extractor
8. TensorFlow Hub Pretrained Feature Extractor (10% of Data)

In [None]:
# Setup
TENSORBOARD_LOGS_DIR = pathlib.Path('logs/disaster_tweets')
CHECKPOINTS_DIR = pathlib.Path('checkpoints/disaster_tweets')
MODELS_DIR = pathlib.Path('models/disaster_tweets')

### Model-0 (Baseline Model): Naive Bayes Model

As a baseline model, I am going to use SKLearn's Multinomial Naive Bayes algorithm using the TF-IDF formuila to convert words to numbers. This model will be used to compare the DL models against to judge performance.

NOTE: It's common practice to use non-LD algorithms as a baseline because of their speed and then later using DL to see if you can improve upon them.

In [None]:
# Building out the baseline model

# Build Model
model_0 = Pipeline([
    ('tfidf', TfidfVectorizer()),
    ('clf', MultinomialNB()),
])

# Fit Model
model_0.fit(train_sentences, train_labels)

In [None]:
# Evaluate model (SKlearn uses accuracy as the metric)
baseline_score = model_0.score(val_sentences, val_labels)
baseline_score  # Accuracy

In [None]:
# Make predictions
baseline_preds = model_0.predict(val_sentences)
baseline_preds[:20]

In [None]:
baseline_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, baseline_preds)
baseline_results

### Model-1: Feed Forward Dense Model

The first test I am going to run against my baseline model is to use the traditional Dense DL model.

In [None]:
# Build model
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string) # Inputs are 1 dimensional strings
x = text_vectorizer(inputs)  # Turn the input text into numbers
x = embedding(x)  # Embed the text
x = tf.keras.layers.GlobalAveragePooling1D()(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)  # Create the output layer

model_1 = tf.keras.models.Model(inputs, outputs, name='DenseModel')
model_1.summary()

In [None]:
# Compile Model
model_1.compile(loss='binary_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

In [None]:
# Setup Callbacks
tensorboard_callback_1 = dl_toolbox.modeling.callbacks.generate_tensorboard_callback('dense-model', str(TENSORBOARD_LOGS_DIR))

# Fit the model
model_1_history = model_1.fit(x=train_sentences,
            y=train_labels,
            epochs=5,
            validation_data=(val_sentences, val_labels),
            callbacks=[tensorboard_callback_1])

In [None]:
model_1.evaluate(val_sentences, val_labels)

In [None]:
# Get prediction probabilities
model_1_pred_probs = model_1.predict(val_sentences)
model_1_pred_probs[:10]

In [None]:
# Convert prediction probabilities to 1 or 0
model_1_preds = tf.squeeze(tf.round(model_1_pred_probs))
model_1_preds[:10]

In [None]:
model_1_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, model_1_preds)
model_1_results

In [None]:
# Analyze data against the baseline data
np.array(list(dict(model_1_results).values())) >= np.array(list(dict(baseline_results).values()))

#### Findings

Looks like the baseline outperformed the simple Dense DL model.

#### Visualizing the Learned Embeddings

To visualize the embedding matrix, TensorFlow has a handy tool called projector that visualizes the matrix.

NOTE: To utilize the projector tool, you need to create a vectors.tsv and metadata.tsv that will be uploaded to the projector website linked below.

* https://www.tensorflow.org/text/guide/word_embeddings
* https://projector.tensorflow.org/

In [None]:
# Get the weight matrix of embedding layer
# These are the numerical representation of each token in our training data, learned for 5 epochs.
embed_weights_1 = model_1.get_layer('embedding').get_weights()[0]
embed_weights_1

In [None]:
# Looking at the shape, the embedding matrix is 10,000 x 128 matrix
# (every token in vocabulary has 128 params to better represent each token)
embed_weights_1.shape

In [None]:
# Create embedding files (These will be uploaded to the embedding projector)
filepath = f'{str(TENSORBOARD_LOGS_DIR)}/dense-model/embedding_projector'
dl_toolbox.analysis.export.export_embedding_projector_data(embed_weights_1, words_in_vocab, filepath, True)

### Model-2: LSTM

LSTM (Long Short Term Memory) is one of the most popular RNN models.

Our structure of an RNN typically looks like this:

```
Input (text) -> Tokenize -> Embedding -> Layers (RNN's/Dense) -> Output
```

In [None]:
# Create an LSTM Model

# Build model
inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string) # Inputs are 1 dimensional strings

x = text_vectorizer(inputs)  # Turn the input text into numbers
x = embedding(x)  # Embed the text
# x = tf.keras.layers.LSTM(64, return_sequences=True)(x)  # when you're stacking RNN cells together, you need to set return sequences to True
x = tf.keras.layers.LSTM(64)(x)
# x = tf.keras.layers.Dense(64, activation='relu')(x)

outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model_2 = tf.keras.models.Model(inputs, outputs, name='Model2LSTM')
model_2.summary()

In [None]:
# Compile the model
model_2.compile(loss='binary_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

In [None]:
# Setup Callbacks
tensorboard_callback_2 = dl_toolbox.modeling.callbacks.generate_tensorboard_callback('lstm-model', str(TENSORBOARD_LOGS_DIR))

# Fit the Model
model_2_history = model_2.fit(train_sentences,
            train_labels,
            epochs=5,
            validation_data=(val_sentences, val_labels),
            callbacks=[tensorboard_callback_2])

In [None]:
dl_toolbox.analysis.history.plot_history(model_2_history, 'accuracy')

In [None]:
model_2_pred_probs = model_2.predict(val_sentences)
model_2_pred_probs[:10]

In [None]:
model_2_preds = tf.squeeze(tf.round(model_2_pred_probs))
model_2_preds[:10]

In [None]:
model_2_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, model_2_preds)
model_2_results

In [None]:
# Analyze data against the baseline data
np.array(list(dict(model_1_results).values())) >= np.array(list(dict(baseline_results).values()))

#### Findings

Looks like the baseline model is still outperforming the LSTM model.

### Model-3: GRU

Another popular and effective RNN component is the FRU or gated recurrent unit. The GRU cell has similar features to an LSTM cell, but has less parameters.

In [None]:
# Create the GRU model

inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = text_vectorizer(inputs)
x = embedding(x)
x = tf.keras.layers.GRU(64)(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model_3 = tf.keras.models.Model(inputs, outputs, name='Model3GRU')
model_3.summary()

In [None]:
# Compile the Model
model_3.compile(loss='binary_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

In [None]:
# Fit the model with Tensorboard callback
# Setup Callbacks
tensorboard_callback_3 = dl_toolbox.modeling.callbacks.generate_tensorboard_callback('gru-model', str(TENSORBOARD_LOGS_DIR))

# Fit Model
model_3_history = model_3.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[tensorboard_callback_3])

In [None]:
dl_toolbox.analysis.history.plot_history(model_3_history, 'accuracy')

In [None]:
model_3_pred_probs = model_3.predict(val_sentences)
model_3_pred_probs[:10]

In [None]:
model_3_preds = tf.squeeze(tf.round(model_3_pred_probs))
model_3_preds[:10]

In [None]:
model_3_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, model_3_preds)
model_3_results

In [None]:
np.array(list(dict(model_3_results).values())) >= np.array(list(dict(baseline_results).values()))

#### Findings

Still haven't beat the baseline model :(

### Model-4: Bidirectional LSTM Model

Normal RNN's go from left to right, however, Bidirectional RNN's go from left to right as well as right to left. To summarize, it reads a sentence from left to right, then reads it from right to left.

NOTE: These are really only useful when going both directions can teach the network something useful when going both ways.

In [None]:
# Build out the Bidirectional Model

inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = text_vectorizer(inputs)
x = embedding(x)
x = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64))(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model_4 = tf.keras.models.Model(inputs, outputs, name='Model4BidirectionalLSTM')
model_4.summary()

In [None]:
# compile model
model_4.compile(loss='binary_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

In [None]:
# Callback and Fit
# Fit the model with Tensorboard callback
# Setup Callbacks
tensorboard_callback_4 = dl_toolbox.modeling.callbacks.generate_tensorboard_callback('bidirectional-lstm-model', str(TENSORBOARD_LOGS_DIR))

model_4_history = model_4.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[tensorboard_callback_4])

In [None]:
dl_toolbox.analysis.history.plot_history(model_4_history, 'accuracy')

In [None]:
model_4_pred_probs = model_4.predict(val_sentences)
model_4_pred_probs[:10]

In [None]:
model_4_preds = tf.squeeze(tf.round(model_4_pred_probs))
model_4_preds[:10]

In [None]:
model_4_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, model_4_preds)
model_4_results

In [None]:
np.array(list(dict(model_4_results).values())) >= np.array(list(dict(baseline_results).values()))

#### Findings

Looks like this performed worse than the LSTM and GRU.

### Model-5: 1D Convolutional Neural Network

We've used CNN's for images, but images are tpycally 2D, however, text data is 1D. 

The typical structure for Conv1D models:

```
Inputs -> Tokenization -> Embedding -> Layers (Conv1D + Pooling) -> Outputs
```

In [None]:
# build the model

inputs = tf.keras.layers.Input(shape=(1,), dtype=tf.string)
x = text_vectorizer(inputs)
x = embedding(x)
x = tf.keras.layers.Conv1D(filters=32, kernel_size=5, activation='relu', padding='valid')(x)
x = tf.keras.layers.GlobalMaxPool1D()(x)
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)

model_5 = tf.keras.models.Model(inputs, outputs, name='Model5CNN1D')
model_5.summary()

In [None]:
# Compile model
model_5.compile(loss='binary_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

In [None]:
# Callback and Fit
# Fit the model with Tensorboard callback
# Setup Callbacks
tensorboard_callback_5 = dl_toolbox.modeling.callbacks.generate_tensorboard_callback('conv-1d-model', str(TENSORBOARD_LOGS_DIR))

model_5_history = model_5.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[tensorboard_callback_5])

In [None]:
dl_toolbox.analysis.history.plot_history(model_5_history, 'accuracy')

In [None]:
model_5_pred_probs = model_5.predict(val_sentences)
model_5_pred_probs[:10]

In [None]:
model_5_preds = tf.squeeze(tf.round(model_5_pred_probs))
model_5_preds[:10]

In [None]:
model_5_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, model_5_preds)
model_5_results

In [None]:
np.array(list(dict(model_5_results).values())) >= np.array(list(dict(baseline_results).values()))

#### Findings

Still not outperforming our Baseline Model.

### Model-6: TensorFlow Hub Pretrained Sentence Encoder

This model will use Transfer Learning with the `Universal Sentence Encoder` pretrained model on TensorFlow Hub (see link below).

* https://tfhub.dev/google/collections/universal-sentence-encoder/1

In [None]:
import tensorflow_hub as hub

In [None]:
use_url = 'https://tfhub.dev/google/universal-sentence-encoder/4'

In [None]:
# Testing out the transfer learning model
embed = hub.load(use_url)

embeddings = embed([
    "The quick brown fox jumps over the lazy dog.",
    "I am a sentence for which I would like to get its embedding"])

print(embeddings)

In [None]:
# Build Model

# Create a Keras Layer using the USE pretrained layer from TensorFlow Hub
sentence_encoder_layer = hub.KerasLayer(use_url, input_shape=[], dtype=tf.string, trainable=False, name='USE')

# Setup Layers
model_6 = tf.keras.models.Sequential([
    sentence_encoder_layer,
    tf.keras.layers.Dense(1, activation='sigmoid')], name='Model6USE')
model_6.summary()

In [None]:
# Compile model
model_6.compile(loss='binary_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

In [None]:
# Callback and Fit
# Fit the model with Tensorboard callback
# Setup Callbacks
tensorboard_callback_6 = dl_toolbox.modeling.callbacks.generate_tensorboard_callback('use-model', str(TENSORBOARD_LOGS_DIR))

model_6_history = model_6.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[tensorboard_callback_6])

In [None]:
dl_toolbox.analysis.history.plot_history(model_6_history, 'accuracy')

In [None]:
model_6_pred_probs = model_6.predict(val_sentences)
model_6_pred_probs[:10]

In [None]:
model_6_preds = tf.squeeze(tf.round(model_6_pred_probs))
model_6_preds[:10]

In [None]:
model_6_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, model_6_preds)
model_6_results

In [None]:
np.array(list(dict(model_6_results).values())) >= np.array(list(dict(baseline_results).values()))

#### Findings:

Looks like this model beat the baseline the first time I ran this, but it was very close and isn't guranteed to beat it every time due to randomness.

### Model-7: TF Hub Pretrained USE but w/ 10% of Training Data

Transfer learning helps when you don't have a large dataset. To see how our model performs on a smaller dataset, I am going to replicate model 6, but I will only train it on 10% of the data.

In [None]:
# Creating 10% subset of the training data
train_10_percent_split = int(0.1 * len(train_sentences))
train_sentences_10_percent = train_sentences[:train_10_percent_split]
train_labels_10_percent = train_labels[:train_10_percent_split]
len(train_sentences_10_percent), len(train_labels_10_percent)

**NOTE**
When looking at the 10% sample, needed to verify that the subset is representative of the entire dataset.

In [None]:
pd.Series(np.array(train_labels_10_percent)).value_counts()

In [None]:
train_data_shuffled['target'].value_counts()

In [None]:
# Build Model
model_7 = tf.keras.models.clone_model(model_6)
model_7.summary()

In [None]:
# Compile model
model_7.compile(loss='binary_crossentropy',
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

In [None]:
# Callback and Fit
# Fit the model with Tensorboard callback
# Setup Callbacks
tensorboard_callback_7 = dl_toolbox.modeling.callbacks.generate_tensorboard_callback('use-10-percent-model', str(TENSORBOARD_LOGS_DIR))

model_7_history = model_7.fit(train_sentences_10_percent,
                              train_labels_10_percent,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[tensorboard_callback_7])

In [None]:
dl_toolbox.analysis.history.plot_history(model_7_history, 'accuracy')

In [None]:
model_7_pred_probs = model_7.predict(val_sentences)
model_7_pred_probs[:10]

In [None]:
model_7_preds = tf.squeeze(tf.round(model_7_pred_probs))
model_7_preds[:10]

In [None]:
model_7_results = dl_toolbox.analysis.classification.generate_prediction_metrics(val_labels, model_7_preds)
model_7_results

In [None]:
np.array(list(dict(model_7_results).values())) >= np.array(list(dict(baseline_results).values()))

#### Findings

Even with only 10% of the data, it performed only slightly worse than when training the model on 100% of the data.

## Comparing the Performance of Each Model

In [None]:
# Lets look at the combined performance of each model
all_model_results = pd.DataFrame({
    '0_baseline': dict(baseline_results),
    '1_simple_dense': dict(model_1_results),
    '2_lstm': dict(model_2_results),
    '3_gru': dict(model_3_results),
    '4_bidirectional': dict(model_4_results),
    '5_conv1d': dict(model_5_results),
    '6_tf_hub_use_encoder': dict(model_6_results),
    '7_tf_hub_use_encoder_10_percent': dict(model_7_results),
})
all_model_results = all_model_results.transpose()
all_model_results

In [None]:
# Plot and compare all of the model results
all_model_results.plot(kind='bar', figsize=(10, 7)).legend(bbox_to_anchor=(1.0, 1.0))

In [None]:
all_model_results.sort_values('f1', ascending=False)['f1'].plot(kind='bar', figsize=(10, 7))

### Finding Most Wrong Examples

* If our best model still isn't perfect, what examples is it getting wrong?
* And of these wrong examples, which ones is it getting *most* wrong.

For example if a sample should have a label of 0, but our model predicts a prediction probability of 0.999, that is pretty wrong.

To do this, I am going to look at Model 6 because that model performed the best.

In [None]:
model_6_pred_probs[:10], model_6_preds[:10]

In [None]:
# CReateing DataFram with data
val_df = pd.DataFrame({
    'text': val_sentences,
    'target': val_labels,
    'pred': model_6_preds,
    'pred_prob': tf.squeeze(model_6_pred_probs)
})
val_df.head()

In [None]:
# Find wrong predictions and sort by prediction probs
most_wrong = val_df[val_df['target'] != val_df['pred']].sort_values('pred_prob', ascending=False)
most_wrong[:10]

In [None]:
most_wrong[-10:]

In [None]:
for row in most_wrong[-10:].itertuples():
    _, text, target, pred, pred_prob = row
    print(f'Target: {target}, Pred: {pred}, Prob: {pred_prob}')
    print('Text: ', text)
    print('-' * 80)

## Making Predictios on Test Dataset

In [None]:
test_data

In [None]:
test_sentences = test_data['text'].to_list()
test_samples = random.sample(test_sentences, 10)

for test_sample in test_samples:
    pred_prob = tf.squeeze(model_6.predict([test_sample]))
    pred = tf.round(pred_prob)

    print(f'Pred: {int(pred)}, Prob: {pred_prob}')
    print('Text: ', test_sample)
    print('-' * 80, '\n')

## Speed vs. Score Tradeoff

In [None]:
model_6_performance = performance_toolbox.model.prediction_timer(model_6, val_sentences)
model_6_performance

In [None]:
model_0_performance = performance_toolbox.model.prediction_timer(model_0, val_sentences)
model_0_performance

In [None]:
plt.figure(figsize=(10,7))
plt.scatter(model_0_performance.time_per_prediction, baseline_results.f1, label='baseline')
plt.scatter(model_6_performance.time_per_prediction, model_6_results.f1, label='model 6')
plt.legend()
plt.title('F1-score vs. Time per prediction')

#### Findings

When comparing out best model against our baseline model, the time difference is significant, even though the `F1` is virtually identical.