BERT Model - Perform sentiment analysis on a dataset of plain-text IMDB movie reviews.

In [1]:
!pip install tensorflow_text

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow_text
  Downloading tensorflow_text-2.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.9 MB)
[K     |████████████████████████████████| 5.9 MB 5.1 MB/s 
Collecting tensorflow<2.11,>=2.10.0
  Downloading tensorflow-2.10.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (578.0 MB)
[K     |████████████████████████████████| 578.0 MB 16 kB/s 
[?25hCollecting tensorboard<2.11,>=2.10
  Downloading tensorboard-2.10.0-py3-none-any.whl (5.9 MB)
[K     |████████████████████████████████| 5.9 MB 47.6 MB/s 
Collecting tensorflow-estimator<2.11,>=2.10.0
  Downloading tensorflow_estimator-2.10.0-py2.py3-none-any.whl (438 kB)
[K     |████████████████████████████████| 438 kB 65.6 MB/s 
Collecting keras<2.11,>=2.10.0
  Downloading keras-2.10.0-py2.py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 55.9 MB/s 
Collecting gast<=0.4.0,

In [2]:
import os
import shutil
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_hub as hub
import tensorflow_text as tensorflow_text
import pandas as pd
import numpy as np

In [3]:
#(x_train, y_train), (x_test, y_test)=tf.keras.datasets.imdb.load_data()


Load the IMDB dataset.
Dataset can be downloaded from the below location.
url = 'https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'

In [4]:
url = 'https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'

dataset = tf.keras.utils.get_file('aclImdb_v1.tar.gz', url,
                                  untar=True, cache_dir='.',
                                  cache_subdir='')

dataset_dir = os.path.join(os.path.dirname(dataset), 'aclImdb')

train_dir = os.path.join(dataset_dir, 'train')

# remove unused folders to make it easier to load the data
remove_dir = os.path.join(train_dir, 'unsup')
shutil.rmtree(remove_dir)

Downloading data from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz


In [5]:
AUTOTUNE = tf.data.AUTOTUNE
batch_size = 32
seed = 42

raw_train_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='training',
    seed=seed)

class_names = raw_train_ds.class_names
train_ds = raw_train_ds.cache().prefetch(buffer_size=AUTOTUNE)

val_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='validation',
    seed=seed)

val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

test_ds = tf.keras.utils.text_dataset_from_directory(
    'aclImdb/test',
    batch_size=batch_size)

test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)

Found 25000 files belonging to 2 classes.
Using 20000 files for training.
Found 25000 files belonging to 2 classes.
Using 5000 files for validation.
Found 25000 files belonging to 2 classes.


In [6]:
for text_batch, label_batch in train_ds.take(1):
  for i in range(3):
    print(f'Review: {text_batch.numpy()[i]}')
    label = label_batch.numpy()[i]
    print(f'Label : {label} ({class_names[label]})')

Review: b'"Pandemonium" is a horror movie spoof that comes off more stupid than funny. Believe me when I tell you, I love comedies. Especially comedy spoofs. "Airplane", "The Naked Gun" trilogy, "Blazing Saddles", "High Anxiety", and "Spaceballs" are some of my favorite comedies that spoof a particular genre. "Pandemonium" is not up there with those films. Most of the scenes in this movie had me sitting there in stunned silence because the movie wasn\'t all that funny. There are a few laughs in the film, but when you watch a comedy, you expect to laugh a lot more than a few times and that\'s all this film has going for it. Geez, "Scream" had more laughs than this film and that was more of a horror film. How bizarre is that?<br /><br />*1/2 (out of four)'
Label : 0 (neg)
Review: b"David Mamet is a very interesting and a very un-equal director. His first movie 'House of Games' was the one I liked best, and it set a series of films with characters whose perspective of life changes as they

Load a BERT model from TensorFlow Hub. BERT models can be referred from the below location.
https://tfhub.dev/google/collections/bert/1

In [7]:
preprocess_url="https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3"
model_url = "https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4"

In [8]:
bert_preprocess_url= hub.KerasLayer(preprocess_url)



In [9]:
bert_model_url=hub.KerasLayer(model_url)

Build your own model by combining BERT with a classifier.

In [10]:
myinputs=tf.keras.layers.Input(shape=(),dtype=tf.string,name='InputLayer')

preprocessed_text= bert_preprocess_url(myinputs)

myoutputs=bert_model_url(preprocessed_text)['pooled_output']

myoutputs= tf.keras.layers.Dense(128,activation='tanh',name='HiddenLayer1')(myoutputs)
myoutputs= tf.keras.layers.Dense(64,activation='tanh',name='HiddenLayer2')(myoutputs)
myoutputs=tf.keras.layers.Dense(1,activation='sigmoid',name='Outputyer')(myoutputs)

model = tf.keras.Model(inputs=myinputs, outputs=myoutputs)

In [11]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 InputLayer (InputLayer)        [(None,)]            0           []                               
                                                                                                  
 keras_layer (KerasLayer)       {'input_word_ids':   0           ['InputLayer[0][0]']             
                                (None, 128),                                                      
                                 'input_type_ids':                                                
                                (None, 128),                                                      
                                 'input_mask': (Non                                               
                                e, 128)}                                                      

In [12]:
model.compile(loss='binary_crossentropy'
                     ,optimizer='adam'
                     ,metrics=['accuracy'])

Train your own model.

In [13]:
model.fit(x=train_ds,validation_data=val_ds,
                               epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f2726404b10>

In [14]:
model.evaluate(test_ds)



[0.47741469740867615, 0.7714400291442871]

In [15]:
check_predictions=[
    'Bromwell High is a cartoon comedy.','The movie Haggard is one of the funniest movie'
]

In [16]:
model.predict(check_predictions)



array([[0.7036338],
       [0.7679914]], dtype=float32)

Save your model and deploy the model in a docker container and create a REST API to perform prediction

In [17]:
model.save("imdb/1")



In [17]:
!zip -r imdb.zip imdb

In [None]:
from google.colab import files
files.download('imdb.zip')

In [None]:
#docker pull tensorflow/serving

deployment within Docker Container.
API Status and API Prediction using REST on Command line

In [None]:
#docker run -t --rm -p 8501:8501 -v "C:\Users\Hp\Downloads\divya\imdb\IMDB:/models/IMDB"  -e MODEL_NAME=IMDB  tensorflow/serving &

In [None]:
#curl -X GET http://localhost:8501/v1/models/IMDB

In [None]:
#curl -d '{"instances": ["Bromwell High is a cartoon comedy.", "The movie Haggard is one of the funniest movie"]}' -X POST http://localhost:8501/v1/models/IMDB:predict