# Sentiment Analysis in the Browser

In this notebook, we will show how to create a `.air` file to perform sentiment analysis in the browser using a neural network.

## Dependencies

For this notebook, the following dependencies are required:

- `scikit-learn`
- `tensorflow`
- `aisquared`

All of these are available on [pypi](https://pypi.org) via `pip`.  The following cell also runs the commands to install these dependencies as well as imports them into the notebook environment, along with TensorFlow (which is a dependency of the `mann` package).

In [1]:
from sklearn.metrics import classification_report, confusion_matrix
import tensorflow as tf
import aisquared



## Model Creation

Now that the required packages have been installed and imported, it is time to create the sentiment analysis model.  To do this, we have to first download and preprocess the data, train the model, and then package the model in the `.air` format.  The following cells will go through an in-depth explanation of each of the steps in this process.

In [2]:
# Loading the data

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(
    num_words = 10000,
    skip_top = 0,
    start_char = 1,
    oov_char = 2,
    index_from = 3
)
x_train = tf.keras.preprocessing.sequence.pad_sequences(
    x_train,
    maxlen = 512,
    padding = 'post',
    truncating = 'post'
)
x_test = tf.keras.preprocessing.sequence.pad_sequences(
    x_test,
    maxlen = 512,
    padding = 'post',
    truncating = 'post'
)

# Get the vocabulary
vocab = tf.keras.datasets.imdb.get_word_index()

# Add 2 to each vocab value to ensure matching with the needed values
vocab = {
    k : v + 2 for k, v in vocab.items()
}


In [3]:
# Create the model

input_layer = tf.keras.layers.Input(x_train.shape[1:])
embedding_layer = tf.keras.layers.Embedding(
    10000,
    4
)(input_layer)
x = tf.keras.layers.Flatten()(embedding_layer)
for _ in range(5):
    x = tf.keras.layers.Dense(256, activation = 'relu')(x)
    x = tf.keras.layers.Dropout(0.1)(x)
output_layer = tf.keras.layers.Dense(1, activation = 'sigmoid')(x)

model = tf.keras.models.Model(input_layer, output_layer)
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.summary()

Metal device set to: Apple M1
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 512)]             0         
                                                                 
 embedding (Embedding)       (None, 512, 4)            40000     
                                                                 
 flatten (Flatten)           (None, 2048)              0         
                                                                 
 dense (Dense)               (None, 256)               524544    
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 256)               65792     
                                                                 
 dropout_1 (Dropout)         (N

In [4]:
# Train the model
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.fit(
    x_train,
    y_train.reshape(-1,1),
    epochs = 20,
    batch_size = 512,
    validation_split = 0.2,
    callbacks = tf.keras.callbacks.EarlyStopping(min_delta = 0.001, patience = 3, restore_best_weights = True)
)

Epoch 1/20


2022-09-13 08:18:55.282574: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20


<keras.callbacks.History at 0x1489c0ca0>

In [5]:
# Check model performance
preds = (model.predict(x_test) >= 0.5).astype(int)
print('Model Performance on Test Data:')
print('\n')
print(confusion_matrix(y_test, preds))
print(classification_report(y_test, preds))

# Save the model
model.save('SentimentClassifier.h5')

Model Performance on Test Data:


[[10750  1750]
 [ 2239 10261]]
              precision    recall  f1-score   support

           0       0.83      0.86      0.84     12500
           1       0.85      0.82      0.84     12500

    accuracy                           0.84     25000
   macro avg       0.84      0.84      0.84     25000
weighted avg       0.84      0.84      0.84     25000



## Package the Model

Now that the model has been created, we can package the model into a single `.air` file that enables integration into the browser.

To perform this packaging, we will be utilizing the `aisquared` package `DocumentPredictor` class.

In [6]:
harvester = aisquared.config.harvesting.TextHarvester()

preprocessor = aisquared.config.preprocessing.text.TextPreprocessor(
    [
        aisquared.config.preprocessing.text.Tokenize(),
        aisquared.config.preprocessing.text.ConvertToCase(lowercase = True),
        aisquared.config.preprocessing.text.RemoveCharacters(),
        aisquared.config.preprocessing.text.ConvertToVocabulary(vocabulary = vocab, max_vocab = 9999),
        aisquared.config.preprocessing.text.PadSequences(length = 512, pad_location = 'post', truncate_location = 'post')
    ]
)   
analytic = aisquared.config.analytic.LocalModel('SentimentClassifier.h5', 'text')

##we might be missing an analogue to "sequence_length" in the .config package

postprocessor = aisquared.config.postprocessing.BinaryClassification(['positive', 'negative'], 0.5)

renderer = aisquared.config.rendering.DocumentRendering(include_probability = True)

feedback = aisquared.config.feedback.BinaryFeedback(['positive', 'negative'])

aisquared.config.ModelConfiguration(
    'SentimentClassifier',
    harvester,
    preprocessor,
    analytic,
    postprocessor,
    renderer,
    feedback).compile(dtype = 'float16')