# Sentiment Analysis in the Browser

In this notebook, we will show how to create a `.air` file to perform sentiment analysis in the browser using a neural network.  To do this, we will utilize the IMDB Movie Reviews dataset to build the initial model and then package the model using the `aisquared` Python SDK.

## Dependencies

For this notebook, the following dependencies are required:

- `aisquared`

This package is available on [pypi](https://pypi.org) via `pip`.

In [1]:
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
import tensorflow as tf
import pandas as pd
import aisquared



## Model Creation

Now that the required packages have been installed and imported, it is time to create the sentiment analysis model.  To do this, we have to first load and preprocess the data, create the model, and then package the model in the `.air` format.  The following cells will go through an in-depth explanation of each of the steps in this process.

In [2]:
df = pd.read_csv('IMDB Dataset.csv')
df.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [3]:
# Load the data and tokenize

tokenizer = tf.keras.preprocessing.text.Tokenizer(10000, oov_token = 1)
tokenizer.fit_on_texts(df.review)
vocab = tokenizer.word_index

sequences = tokenizer.texts_to_sequences(df.review)
sequences = tf.keras.preprocessing.sequence.pad_sequences(sequences, 256, padding = 'pre', truncating = 'post')

labels = df.sentiment.apply(lambda x : {'positive' : 0, 'negative' : 1}[x]).values

x_train, x_test, y_train, y_test = train_test_split(sequences, labels, train_size = 0.7)
del vocab[1]

In [4]:
# Create the model

input_layer = tf.keras.layers.Input(sequences.shape[1:])
embedding_layer = tf.keras.layers.Embedding(
    10000,
    4
)(input_layer)
x = tf.keras.layers.Flatten()(embedding_layer)
for _ in range(5):
    x = tf.keras.layers.Dense(1000, activation = 'relu')(x)
output_layer = tf.keras.layers.Dense(1, activation = 'sigmoid')(x)

model = tf.keras.models.Model(input_layer, output_layer)
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.summary()

Metal device set to: Apple M1

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 256)]             0         
                                                                 
 embedding (Embedding)       (None, 256, 4)            40000     
                                                                 
 flatten (Flatten)           (None, 1024)              0         
                                                                 
 dense (Dense)               (None, 1000)              1025000   
                                                                 
 dense_1 (Dense)             (None, 1000)              1001000   
                                                                 
 dense_2 (Dense)             (None, 1000)              1001000   
                                                  

In [5]:
# Compile the model
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

# Train the model with the sparsification callback
model.fit(
    x_train,
    y_train.reshape(-1,1),
    epochs = 20,
    batch_size = 512,
    validation_split = 0.2,
    verbose = 2,
    callbacks = None
)

Epoch 1/20


2022-06-22 15:16:31.018770: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


55/55 - 9s - loss: 0.6739 - accuracy: 0.5537 - val_loss: 0.5550 - val_accuracy: 0.7250 - 9s/epoch - 171ms/step
Epoch 2/20
55/55 - 9s - loss: 0.3667 - accuracy: 0.8403 - val_loss: 0.3416 - val_accuracy: 0.8570 - 9s/epoch - 172ms/step
Epoch 3/20
55/55 - 9s - loss: 0.1798 - accuracy: 0.9331 - val_loss: 0.4031 - val_accuracy: 0.8500 - 9s/epoch - 173ms/step
Epoch 4/20
55/55 - 9s - loss: 0.0590 - accuracy: 0.9804 - val_loss: 0.6060 - val_accuracy: 0.8404 - 9s/epoch - 157ms/step
Epoch 5/20
55/55 - 8s - loss: 0.0242 - accuracy: 0.9919 - val_loss: 0.8298 - val_accuracy: 0.8396 - 8s/epoch - 137ms/step
Epoch 6/20
55/55 - 7s - loss: 0.0156 - accuracy: 0.9949 - val_loss: 0.8656 - val_accuracy: 0.8316 - 7s/epoch - 120ms/step
Epoch 7/20
55/55 - 7s - loss: 0.0174 - accuracy: 0.9940 - val_loss: 0.9225 - val_accuracy: 0.8370 - 7s/epoch - 123ms/step
Epoch 8/20
55/55 - 6s - loss: 0.0115 - accuracy: 0.9959 - val_loss: 1.0702 - val_accuracy: 0.8400 - 6s/epoch - 111ms/step
Epoch 9/20
55/55 - 5s - loss: 0.007

<keras.callbacks.History at 0x157a3d5b0>

In [6]:
# Check model performance
preds = (model.predict(x_test) >= 0.5).astype(int)
print('Model Performance on Test Data:')
print('\n')
print(confusion_matrix(y_test, preds))
print(classification_report(y_test, preds))

# Save the model
model.save('SentimentClassifier.h5')

Model Performance on Test Data:


[[6436 1011]
 [1188 6365]]
              precision    recall  f1-score   support

           0       0.84      0.86      0.85      7447
           1       0.86      0.84      0.85      7553

    accuracy                           0.85     15000
   macro avg       0.85      0.85      0.85     15000
weighted avg       0.85      0.85      0.85     15000



## Package the Model

Now that the model has been created, we can package the model into a single `.air` file that enables integration into the browser.

To perform this packaging, we will be utilizing the `aisquared` package `DocumentPredictor` class.

In [7]:
harvester = aisquared.config.harvesting.InputHarvester()

preprocessor = aisquared.config.preprocessing.text.TextPreprocessor(
    [
        aisquared.config.preprocessing.text.RemoveCharacters(),
        aisquared.config.preprocessing.text.ConvertToCase(lowercase = True),
        aisquared.config.preprocessing.text.Tokenize(),
        aisquared.config.preprocessing.text.ConvertToVocabulary(vocabulary = vocab, oov_character = 1, start_character = 0),
        aisquared.config.preprocessing.text.PadSequences(length = 256, pad_location = 'pre', truncate_location = 'post')
    ]
)   
analytic = aisquared.config.analytic.LocalModel('SentimentClassifier.h5', 'text')

postprocessor = aisquared.config.postprocessing.BinaryClassification(['positive', 'negative'], 0.5)

renderer = aisquared.config.rendering.DocumentRendering(include_probability = True)

feedback = aisquared.config.feedback.BinaryFeedback(['positive', 'negative'])

aisquared.config.ModelConfiguration(
    'SentimentClassifier',
    harvester,
    preprocessor,
    analytic,
    postprocessor,
    renderer,
    feedback).compile(dtype = 'float16')