# Text Classification Model for Large Movie Review Using TensorFlow Take 1, Part B
### David Lowe
### March 5, 2021

Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery. [https://machinelearningmastery.com/]

SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The Large Movie Review dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.

INTRODUCTION: The Large Movie Review Dataset is a collection of movie reviews used in the research paper "Learning Word Vectors for Sentiment Analysis" by Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts, The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011). The dataset comprises 25,000 highly polar movie reviews for training and 25,000 for testing.

This Take1 iteration will construct a bag-of-words model and analyze it with a simple TensorFlow deep learning network. Due to the system's memory limitation, we have to break up the script processing into two parts. Part A will test the model with the training dataset using a five-fold validation. Part B will train the final model with the entire training dataset and make predictions on a previously unseen test dataset.

ANALYSIS: In this Take1 iteration, the bag-of-words model's performance achieved an average accuracy score of 87.18% after 20 epochs with five iterations of cross-validation. Furthermore, the final model processed the test dataset with an accuracy measurement of 85.24%.

CONCLUSION: In this modeling iteration, the bag-of-words TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.

Dataset Used: Large Movie Review Dataset

Dataset ML Model: Binary class text classification with text-oriented features

Dataset Reference: https://ai.stanford.edu/~amaas/papers/wvSent_acl2011.bib

One potential source of performance benchmarks: https://ai.stanford.edu/~amaas/data/sentiment/ and https://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf

A deep-learning text classification project generally can be broken down into five major tasks:

1. Prepare Environment
2. Load and Prepare Text Data
3. Define and Train Models
4. Evaluate and Optimize Models
5. Finalize Model and Make Predictions

# Task 1 - Prepare Environment

In [1]:
# # Install the packages to support accessing environment variable and SQL databases
# !pip install python-dotenv PyMySQL boto3

In [2]:
# # Retrieve GPU configuration information from Colab
# gpu_info = !nvidia-smi
# gpu_info = '\n'.join(gpu_info)
# if gpu_info.find('failed') >= 0:
#     print('Select the Runtime → "Change runtime type" menu to enable a GPU accelerator, ')
#     print('and then re-execute this cell.')
# else:
#     print(gpu_info)

In [3]:
# # Retrieve memory configuration information from Colab
# from psutil import virtual_memory
# ram_gb = virtual_memory().total / 1e9
# print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

# if ram_gb < 20:
#     print('To enable a high-RAM runtime, select the Runtime → "Change runtime type"')
#     print('menu, and then select High-RAM in the Runtime shape dropdown. Then, ')
#     print('re-execute this cell.')
# else:
#     print('You are using a high-RAM runtime!')

In [4]:
# Retrieve CPU information from the system
ncpu = !nproc
print("The number of available CPUs is:", ncpu[0])

The number of available CPUs is: 4


## 1.a) Load libraries and modules

In [5]:
# Set the random seed number for reproducible results
seedNum = 888

In [6]:
# Load libraries and packages
import random
random.seed(seedNum)
import numpy as np
np.random.seed(seedNum)
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import sys
import boto3
import shutil
import string
import nltk
from nltk.corpus import stopwords
from collections import Counter
from datetime import datetime
from dotenv import load_dotenv
from sklearn.model_selection import RepeatedKFold
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import tensorflow as tf
tf.random.set_seed(seedNum)
from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.callbacks import ReduceLROnPlateau

In [7]:
nltk.download('popular')

[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /home/pythonml/nltk_data...
[nltk_data]    |   Package cmudict is already up-to-date!
[nltk_data]    | Downloading package gazetteers to
[nltk_data]    |     /home/pythonml/nltk_data...
[nltk_data]    |   Package gazetteers is already up-to-date!
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /home/pythonml/nltk_data...
[nltk_data]    |   Package genesis is already up-to-date!
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /home/pythonml/nltk_data...
[nltk_data]    |   Package gutenberg is already up-to-date!
[nltk_data]    | Downloading package inaugural to
[nltk_data]    |     /home/pythonml/nltk_data...
[nltk_data]    |   Package inaugural is already up-to-date!
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /home/pythonml/nltk_data...
[nltk_data]    |   Package movie_reviews is a

True

## 1.b) Set up the controlling parameters and functions

In [8]:
# Begin the timer for the script processing
startTimeScript = datetime.now()

# Set up the number of CPU cores available for multi-thread processing
n_jobs = 1

# Set up the flag to stop sending progress emails (setting to True will send status emails!)
notifyStatus = False

# Set Pandas options
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 140)

# Set the percentage sizes for splitting the dataset
test_set_size = 0.2
val_set_size = 0.25

# Set the number of folds for cross validation
n_folds = 5
n_iterations = 1

# Set various default modeling parameters
default_loss = 'binary_crossentropy'
default_metrics = ['accuracy']
default_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
default_kernel_init = tf.keras.initializers.GlorotUniform(seed=seedNum)
default_epochs = 20
default_batch_size = 1

# Define the labels to use for graphing the data
train_metric = "accuracy"
validation_metric = "val_accuracy"
train_loss = "loss"
validation_loss = "val_loss"

# Check the number of GPUs accessible through TensorFlow
print('Num GPUs Available:', len(tf.config.list_physical_devices('GPU')))

# Print out the TensorFlow version for confirmation
print('TensorFlow version:', tf.__version__)

Num GPUs Available: 0
TensorFlow version: 2.3.1


In [9]:
# Set up the parent directory location for loading the dotenv files
# from google.colab import drive
# drive.mount('/content/gdrive')
# gdrivePrefix = '/content/gdrive/My Drive/Colab_Downloads/'
# env_path = '/content/gdrive/My Drive/Colab Notebooks/'
# dotenv_path = env_path + "python_script.env"
# load_dotenv(dotenv_path=dotenv_path)

# Set up the dotenv file for retrieving environment variables
# env_path = "/Users/david/PycharmProjects/"
# dotenv_path = env_path + "python_script.env"
# load_dotenv(dotenv_path=dotenv_path)

In [10]:
# Set up the email notification function
def status_notify(msg_text):
    access_key = os.environ.get('SNS_ACCESS_KEY')
    secret_key = os.environ.get('SNS_SECRET_KEY')
    aws_region = os.environ.get('SNS_AWS_REGION')
    topic_arn = os.environ.get('SNS_TOPIC_ARN')
    if (access_key is None) or (secret_key is None) or (aws_region is None):
        sys.exit("Incomplete notification setup info. Script Processing Aborted!!!")
    sns = boto3.client('sns', aws_access_key_id=access_key, aws_secret_access_key=secret_key, region_name=aws_region)
    response = sns.publish(TopicArn=topic_arn, Message=msg_text)
    if response['ResponseMetadata']['HTTPStatusCode'] != 200 :
        print('Status notification not OK with HTTP status code:', response['ResponseMetadata']['HTTPStatusCode'])

In [11]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 1 - Prepare Environment has begun on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

In [12]:
# Reset the random number generators
def reset_random(x):
    random.seed(x)
    np.random.seed(x)
    tf.random.set_seed(x)

In [13]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 1 - Prepare Environment completed on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

# Task 2 - Load and Prepare Text Data

In [14]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 2 - Load and Prepare Text Data has begun on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

## 2.a) Download Text Data Archive

In [15]:
# Clean up the old files and download directories before receiving new ones
!rm -rf staging/
!rm aclImdb_v1.tar.gz
!rm vocabulary.txt

In [16]:
!wget https://dainesanalytics.com/datasets/acl2011-large-movie-review/aclImdb_v1.tar.gz

--2021-03-01 06:09:00--  https://dainesanalytics.com/datasets/acl2011-large-movie-review/aclImdb_v1.tar.gz
Resolving dainesanalytics.com (dainesanalytics.com)... 13.226.254.77, 13.226.254.110, 13.226.254.13, ...
Connecting to dainesanalytics.com (dainesanalytics.com)|13.226.254.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 84125825 (80M) [application/x-gzip]
Saving to: ‘aclImdb_v1.tar.gz’


2021-03-01 06:09:02 (33.4 MB/s) - ‘aclImdb_v1.tar.gz’ saved [84125825/84125825]



## 2.b) Splitting Data for Training and Validation

In [17]:
staging_dir = 'staging/'
!mkdir staging/

In [18]:
local_archive = 'aclImdb_v1.tar.gz'
shutil.unpack_archive(local_archive, staging_dir)

In [19]:
training_dir = 'staging/aclImdb/train/'
testing_dir = 'staging/aclImdb/test/'
classA_name = 'pos'
classB_name = 'neg'

In [20]:
# Brief listing of training text files for both classes before splitting
training_classA_dir = os.path.join(training_dir, classA_name)
training_classA_files = os.listdir(training_classA_dir)
print('Number of training images for', classA_name, ':', len(training_classA_files))
print('Training samples for', classA_name, ':', training_classA_files[:10])

training_classB_dir = os.path.join(training_dir, classB_name)
training_classB_files = os.listdir(training_classB_dir)
print('Number of training images for', classB_name, ':', len(training_classB_files))
print('Training samples for', classB_name, ':', training_classB_files[:10])

Number of training images for pos : 12500
Training samples for pos : ['127_7.txt', '126_10.txt', '125_7.txt', '124_10.txt', '123_10.txt', '122_9.txt', '121_10.txt', '120_8.txt', '119_10.txt', '118_8.txt']
Number of training images for neg : 12500
Training samples for neg : ['127_4.txt', '126_1.txt', '125_1.txt', '124_2.txt', '123_1.txt', '122_1.txt', '121_4.txt', '120_1.txt', '119_4.txt', '118_2.txt']


In [21]:
# Brief listing of testing text files for both classes after splitting
testing_classA_dir = os.path.join(testing_dir, classA_name)
testing_classA_files = os.listdir(testing_classA_dir)
print('Number of testing files for', classA_name, ':', len(testing_classA_files))
print('Test samples for', classA_name, ':', testing_classA_files[:10])

testing_classB_dir = os.path.join(testing_dir, classB_name)
testing_classB_files = os.listdir(testing_classB_dir)
print('Number of testing files for', classB_name, ':', len(testing_classB_files))
print('Test samples for', classB_name, ':', testing_classB_files[:10])

Number of testing files for pos : 12500
Test samples for pos : ['127_10.txt', '126_10.txt', '125_7.txt', '124_10.txt', '123_10.txt', '122_8.txt', '121_8.txt', '120_9.txt', '119_9.txt', '118_10.txt']
Number of testing files for neg : 12500
Test samples for neg : ['127_3.txt', '126_4.txt', '125_3.txt', '124_2.txt', '123_4.txt', '122_4.txt', '121_4.txt', '120_2.txt', '119_3.txt', '118_1.txt']


## 2.c) Load Document and Build Vocabulary

In [22]:
# load doc into memory
def load_doc(filename):
    # open the file as read only
    file = open(filename, 'r')
    # read all text
    text = file.read()
    # close the file
    file.close()
    return text

In [23]:
# turn a doc into clean tokens
def clean_doc(doc):
    # split into tokens by white space
    tokens = doc.split()
    # remove punctuation from each token
    table = str.maketrans('', '', string.punctuation)
    tokens = [w.translate(table) for w in tokens]
    # remove remaining tokens that are not alphabetic
    tokens = [word for word in tokens if word.isalpha()]
    # filter out stop words
    stop_words = set(stopwords.words('english'))
    tokens = [w for w in tokens if not w in stop_words]
    # filter out short tokens
    tokens = [word for word in tokens if len(word) > 1]
    return tokens

In [24]:
# load doc and add to vocab
def add_doc_to_vocab(filename, vocab):
    # load doc
    doc = load_doc(filename)
    # clean doc
    tokens = clean_doc(doc)
    # update counts
    vocab.update(tokens)

In [25]:
# load all docs in a directory
def build_vocab_from_docs(directory, vocab):
    # walk through all files in the folder
    i = 0
    print('Processing the text files and showing the first 10...')
    for filename in os.listdir(directory):
        # skip files that do not have the right extension
        if not filename.endswith(".txt"):
            continue
        # create the full path of the file to open
        path = directory + '/' + filename
        # add doc to vocab
        add_doc_to_vocab(path, vocab)
        i = i + 1
        if i < 10: print('Loaded %s' % path)
    print('Total number of text files loaded:', i, '\n')

In [26]:
# define vocab
vocab = Counter()
# add all docs to vocab
build_vocab_from_docs(training_classA_dir, vocab)
build_vocab_from_docs(training_classB_dir, vocab)
# print the size of the vocab
print('The total number of words in the vocabulary:', len(vocab))
# print the top words in the vocab
top_words = 50
print('The top', top_words, 'words in the vocabulary:\n', vocab.most_common(top_words))

Processing the text files and showing the first 10...
Loaded staging/aclImdb/train/pos/127_7.txt
Loaded staging/aclImdb/train/pos/126_10.txt
Loaded staging/aclImdb/train/pos/125_7.txt
Loaded staging/aclImdb/train/pos/124_10.txt
Loaded staging/aclImdb/train/pos/123_10.txt
Loaded staging/aclImdb/train/pos/122_9.txt
Loaded staging/aclImdb/train/pos/121_10.txt
Loaded staging/aclImdb/train/pos/120_8.txt
Loaded staging/aclImdb/train/pos/119_10.txt
Total number of text files loaded: 12500 

Processing the text files and showing the first 10...
Loaded staging/aclImdb/train/neg/127_4.txt
Loaded staging/aclImdb/train/neg/126_1.txt
Loaded staging/aclImdb/train/neg/125_1.txt
Loaded staging/aclImdb/train/neg/124_2.txt
Loaded staging/aclImdb/train/neg/123_1.txt
Loaded staging/aclImdb/train/neg/122_1.txt
Loaded staging/aclImdb/train/neg/121_4.txt
Loaded staging/aclImdb/train/neg/120_1.txt
Loaded staging/aclImdb/train/neg/119_4.txt
Total number of text files loaded: 12500 

The total number of words i

In [27]:
# keep tokens with a min occurrence
min_occurane = 2
tokens = [k for k,c in vocab.items() if c >= min_occurane]
print('The number of words with the minimum appearance:', len(tokens))

The number of words with the minimum appearance: 65663


In [28]:
# save list to file
def save_list(lines, filename):
    # convert lines to a single blob of text
    data = '\n'.join(lines)
    # open file
    file = open(filename, 'w')
    # write text
    file.write(data)
    # close file
    file.close()

# save tokens to a vocabulary file
save_list(tokens, 'vocabulary.txt')

## 2.d) Create Tokenizer and Encode the Input Text

In [29]:
# load doc, clean and return line of tokens
def doc_to_line(filename, vocab):
    # load the doc
    doc = load_doc(filename)
    # clean doc
    tokens = clean_doc(doc)
    # filter by vocab
    tokens = [w for w in tokens if w in vocab]
    return ' '.join(tokens)

In [30]:
# load all docs in a directory
def process_docs_to_lines(directory, vocab):
    lines = list()
    # walk through all files in the folder
    for filename in os.listdir(directory):
        # create the full path of the file to open
        path = directory + '/' + filename
        # load and clean the doc
        line = doc_to_line(path, vocab)
        # add to list
        lines.append(line)
    return lines

In [31]:
# load the vocabulary
vocab_filename = 'vocabulary.txt'
vocab = load_doc(vocab_filename)
vocab = vocab.split()
vocab = set(vocab)
print('Number of tokens in the vocabulary:', len(vocab))

Number of tokens in the vocabulary: 65663


In [32]:
# prepare bag of words encoding of docs
def encode_training_data(train_docs, mode='binary'):
    # create the tokenizer
    tokenizer = Tokenizer()
    # fit the tokenizer on the documents
    tokenizer.fit_on_texts(train_docs)
    # encode training data set
    train_encoded = tokenizer.texts_to_matrix(train_docs, mode=mode)
    return train_encoded

In [33]:
# Load all training cases
positive_train_cases = process_docs_to_lines(training_classA_dir, vocab)
print('The number of positive reviews processed:', len(positive_train_cases))
negative_train_cases = process_docs_to_lines(training_classB_dir, vocab)
print('The number of negative reviews processed:', len(negative_train_cases))
training_docs =  negative_train_cases + positive_train_cases
y_train = np.array([0 for _ in range(len(negative_train_cases))] + [1 for _ in range(len(positive_train_cases))])
print('The shape of the encoded training classes:', y_train.shape)

The number of positive reviews processed: 12500
The number of negative reviews processed: 12500
The shape of the encoded training classes: (25000,)


In [34]:
# Encode the training dataset
X_train = encode_training_data(training_docs)
print('The shape of the encoded training dataset:', X_train.shape)

The shape of the encoded training dataset: (25000, 54889)


In [35]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 2 - Load and Prepare Text Data completed on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

# Task 3 - Define and Train Models

In [36]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 3 - Define and Train Models has begun on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

In [37]:
# Define the default numbers of input/output for modeling
num_inputs = X_train.shape[1]
num_outputs = 1

In [38]:
# Define the baseline model for benchmarking
def create_nn_model(n_inputs=num_inputs, n_outputs=num_outputs, layer1_nodes=64, layer1_dropout=0, opt_param=default_optimizer, init_param=default_kernel_init):
    nn_model = keras.Sequential([
        keras.layers.Dense(layer1_nodes, input_shape=(n_inputs,), activation='relu', kernel_initializer=init_param),
#         keras.layers.Dropout(layer1_dropout),
        keras.layers.Dense(n_outputs, activation='sigmoid', kernel_initializer=init_param)
    ])
    nn_model.compile(loss=default_loss, optimizer=opt_param, metrics=default_metrics)
    return nn_model

In [39]:
# # Initialize the default model and get a baseline result
# startTimeModule = datetime.now()
# results = list()
# iteration = 0
# cv = RepeatedKFold(n_splits=n_folds, n_repeats=n_iterations, random_state=seedNum)
# for train_ix, val_ix in cv.split(X_train):
#     feature_train, feature_validation = X_train[train_ix], X_train[val_ix]
#     target_train, target_validation = y_train[train_ix], y_train[val_ix]
#     reset_random(seedNum)
#     baseline_model = create_nn_model()
#     baseline_model.fit(feature_train, target_train, epochs=default_epochs, batch_size=default_batch_size, verbose=0)
#     model_metric = baseline_model.evaluate(feature_validation, target_validation, verbose=0)[1]
#     iteration = iteration + 1
#     print('Accuracy measurement from iteration %d >>> %.2f%%' % (iteration, model_metric*100))
#     results.append(model_metric)
# validation_score = np.mean(results)
# validation_variance = np.std(results)
# print('Average model accuracy from all iterations: %.2f%% (%.2f%%)' % (validation_score*100, validation_variance*100))
# print('Total time for model fitting and cross validating:', (datetime.now() - startTimeModule))

In [40]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 3 - Define and Train Models completed on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

# Task 4 - Evaluate and Optimize Models

In [41]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 4 - Evaluate and Optimize Models has begun on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

In [42]:
# Not applicable for this iteration of modeling

In [43]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 4 - Evaluate and Optimize Models completed on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

# Task 5 - Finalize Model and Make Predictions

In [44]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 5 - Finalize Model and Make Predictions has begun on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

## 5.a) Train the Final Model

In [45]:
# Create the final model for evaluating the test dataset
reset_random(seedNum)
final_model = create_nn_model()
final_model.fit(X_train, y_train, epochs=default_epochs, batch_size=default_batch_size, verbose=1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f551dd73400>

In [46]:
# Summarize the final model
final_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 64)                3512960   
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
Total params: 3,513,025
Trainable params: 3,513,025
Non-trainable params: 0
_________________________________________________________________


## 5.b) Make Predictions on Test Dataset

In [47]:
# load all test cases
positive_test_cases = process_docs_to_lines(testing_classA_dir, vocab)
print('The number of positive reviews processed:', len(positive_test_cases))
negative_test_cases = process_docs_to_lines(testing_classB_dir, vocab)
print('The number of negative reviews processed:', len(negative_test_cases))
testing_docs = negative_test_cases + positive_test_cases
y_test = np.array([0 for _ in range(len(negative_test_cases))] + [1 for _ in range(len(positive_test_cases))])
print('The shape of the encoded test classes:', y_test.shape)

The number of positive reviews processed: 12500
The number of negative reviews processed: 12500
The shape of the encoded test classes: (25000,)


In [48]:
# prepare bag of words encoding of docs
def encode_test_data(train_docs, val_docs, mode='binary'):
    # create the tokenizer
    tokenizer = Tokenizer()
    # fit the tokenizer on the documents
    tokenizer.fit_on_texts(train_docs)
    # encode validation data set
    val_encoded = tokenizer.texts_to_matrix(val_docs, mode=mode)
    return val_encoded

In [49]:
# Encode the test dataset
X_test = encode_test_data(training_docs, testing_docs, 'binary')
print('The shape of the encoded validation dataset:', X_test.shape)

The shape of the encoded validation dataset: (25000, 54889)


In [50]:
# test_predictions = final_model.predict(X_test, batch_size=default_batch, verbose=1)
test_predictions = (final_model.predict(X_test) > 0.5).astype("int32").ravel()
print('Accuracy Score:', accuracy_score(y_test, test_predictions))
print(confusion_matrix(y_test, test_predictions))
print(classification_report(y_test, test_predictions))

Accuracy Score: 0.85244
[[10774  1726]
 [ 1963 10537]]
              precision    recall  f1-score   support

           0       0.85      0.86      0.85     12500
           1       0.86      0.84      0.85     12500

    accuracy                           0.85     25000
   macro avg       0.85      0.85      0.85     25000
weighted avg       0.85      0.85      0.85     25000



In [51]:
if notifyStatus: status_notify('(TensorFlow Text Classification) Task 5 - Finalize Model and Make Predictions completed on ' + datetime.now().strftime('%A %B %d, %Y %I:%M:%S %p'))

In [52]:
print ('Total time for the script:',(datetime.now() - startTimeScript))

Total time for the script: 1:17:12.216865
