# Sentiment Analysis in the Browser

In this notebook, we will show how to create a `.air` file to perform sentiment analysis in the browser using a neural network.  To do this, we will utilize the IMDB Movie Reviews dataset to build the initial model, prune the model using the `mann` package, and then package the model using the `aisquared` Python SDK.

## Dependencies

For this notebook, the following dependencies are required:

- `mann`
- `aisquared`

Both of these are available on [pypi](https://pypi.org) via `pip`.  The following cell also runs the commands to install these dependencies as well as imports them into the notebook environment, along with TensorFlow (which is a dependency of the `mann` package).

In [1]:
! pip install mann
! pip install aisquared

from sklearn.metrics import classification_report, confusion_matrix
import tensorflow as tf
import aisquared
import mann



## Model Creation

Now that the required packages have been installed and imported, it is time to create the sentiment analysis model.  To do this, we have to first download and preprocess the data, create the model, prune the model so that it can perform well in the browser, and then package the model in the `.air` format.  The following cells will go through an in-depth explanation of each of the steps in this process.

In [2]:
# Loading the data

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(
    num_words = 10000,
    skip_top = 0,
    start_char = 1,
    oov_char = 2,
    index_from = 3
)
x_train = tf.keras.preprocessing.sequence.pad_sequences(
    x_train,
    maxlen = 512,
    padding = 'post',
    truncating = 'post'
)
x_test = tf.keras.preprocessing.sequence.pad_sequences(
    x_test,
    maxlen = 512,
    padding = 'post',
    truncating = 'post'
)

# Get the vocabulary
vocab = tf.keras.datasets.imdb.get_word_index()

# Add 2 to each vocab value to ensure matching with the needed values
vocab = {
    k : v + 2 for k, v in vocab.items()
}


In [3]:
# Create the model

input_layer = tf.keras.layers.Input(x_train.shape[1:])
embedding_layer = tf.keras.layers.Embedding(
    10000,
    4
)(input_layer)
x = tf.keras.layers.Flatten()(embedding_layer)
for _ in range(5):
    x = mann.layers.MaskedDense(1000, activation = 'relu')(x)
output_layer = mann.layers.MaskedDense(1, activation = 'sigmoid')(x)

model = tf.keras.models.Model(input_layer, output_layer)
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
model.summary()

Metal device set to: Apple M1

systemMemory: 16.00 GB
maxCacheSize: 5.33 GB

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 512, 4)            40000     
_________________________________________________________________
flatten (Flatten)            (None, 2048)              0         
_________________________________________________________________
masked_dense (MaskedDense)   (None, 1000)              4098000   
_________________________________________________________________
masked_dense_1 (MaskedDense) (None, 1000)              2002000   
_________________________________________________________________
masked_dense_2 (MaskedDense) (None, 1000)              2002000   
__________________________________________________

2022-03-03 12:10:39.289509: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-03-03 12:10:39.289644: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [4]:
# Prune the model and train it
model = mann.utils.mask_model(
    model,
    50,
    x = x_train[:1000],
    y = y_train[:1000].reshape(-1, 1)
)
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

# Create a pruning callback that will increase pruning rate as performance improves
callback = mann.utils.ActiveSparsification(
    performance_cutoff = 0.8,
    starting_sparsification = 50
)

# Train the model with the sparsification callback
model.fit(
    x_train,
    y_train.reshape(-1, 1),
    epochs = 1000,
    batch_size = 512,
    validation_split = 0.2,
    verbose = 2,
    callbacks = [callback]
)

# Now that the model has been trained, convert all model layers to built-in TensorFlow layers
model = mann.utils.remove_layer_masks(model)

2022-03-03 12:10:50.868761: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-03-03 12:10:50.868933: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


Epoch 1/1000


InvalidArgumentError: Cannot assign a device for operation model/embedding/embedding_lookup: Could not satisfy explicit device specification '/job:localhost/replica:0/task:0/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
AssignSubVariableOp: GPU CPU 
RealDiv: GPU CPU 
Sqrt: GPU CPU 
AssignVariableOp: GPU CPU 
UnsortedSegmentSum: GPU CPU 
Identity: GPU CPU 
StridedSlice: CPU 
Const: GPU CPU 
NoOp: GPU CPU 
Mul: GPU CPU 
Shape: GPU CPU 
_Arg: GPU CPU 
ResourceScatterAdd: GPU CPU 
Unique: CPU 
ReadVariableOp: GPU CPU 
AddV2: GPU CPU 
ResourceGather: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  model_embedding_embedding_lookup_65233 (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  adam_adam_update_readvariableop_2_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  model/embedding/embedding_lookup (ResourceGather) /job:localhost/replica:0/task:0/device:GPU:0
  model/embedding/embedding_lookup/Identity (Identity) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/Unique (Unique) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/Shape (Shape) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice/stack (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice/stack_1 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice/stack_2 (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/strided_slice (StridedSlice) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/UnsortedSegmentSum (UnsortedSegmentSum) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/mul (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp (ReadVariableOp) 
  Adam/Adam/update/mul_1 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/AssignVariableOp (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ResourceScatterAdd (ResourceScatterAdd) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp_1 (ReadVariableOp) 
  Adam/Adam/update/mul_2 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/mul_3 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp_2 (ReadVariableOp) 
  Adam/Adam/update/mul_4 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/AssignVariableOp_1 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ResourceScatterAdd_1 (ResourceScatterAdd) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/ReadVariableOp_3 (ReadVariableOp) 
  Adam/Adam/update/Sqrt (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/mul_5 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/add (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/truediv (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/AssignSubVariableOp (AssignSubVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/group_deps/NoOp (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/group_deps/NoOp_1 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Adam/Adam/update/group_deps (NoOp) /job:localhost/replica:0/task:0/device:GPU:0

Op: ResourceGather
Node attrs: Tindices=DT_INT32, dtype=DT_FLOAT, batch_dims=0, validate_indices=true, _class=["loc:@model/embedding/embedding_lookup/65233"]
Registered kernels:
  device='XLA_CPU_JIT'; Tindices in [DT_INT32, DT_INT64]; dtype in [DT_FLOAT, DT_DOUBLE, DT_INT32, DT_UINT8, DT_INT16, DT_INT8, DT_COMPLEX64, DT_INT64, DT_BOOL, DT_QINT8, DT_QUINT8, DT_QINT32, DT_BFLOAT16, DT_UINT16, DT_COMPLEX128, DT_HALF, DT_UINT32, DT_UINT64]
  device='GPU'; dtype in [DT_INT64]; Tindices in [DT_INT32]
  device='GPU'; dtype in [DT_INT64]; Tindices in [DT_INT64]
  device='GPU'; dtype in [DT_FLOAT]; Tindices in [DT_INT32]
  device='GPU'; dtype in [DT_FLOAT]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_UINT64]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_UINT64]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_INT64]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_INT64]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_UINT32]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_UINT32]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_UINT16]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_UINT16]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_INT16]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_INT16]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_UINT8]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_UINT8]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_INT8]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_INT8]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_INT32]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_INT32]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_HALF]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_HALF]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_BFLOAT16]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_BFLOAT16]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_FLOAT]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_FLOAT]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_DOUBLE]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_DOUBLE]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_COMPLEX64]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_COMPLEX64]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_COMPLEX128]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_COMPLEX128]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_BOOL]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_BOOL]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_STRING]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_STRING]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_RESOURCE]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_RESOURCE]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_VARIANT]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_VARIANT]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_QINT8]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_QINT8]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_QUINT8]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_QUINT8]; Tindices in [DT_INT64]
  device='CPU'; dtype in [DT_QINT32]; Tindices in [DT_INT32]
  device='CPU'; dtype in [DT_QINT32]; Tindices in [DT_INT64]

	 [[{{node model/embedding/embedding_lookup}}]] [Op:__inference_train_function_65668]

In [None]:
# Check model performance
preds = (model.predict(x_test) >= 0.5).astype(int)
print('Model Performance on Test Data:')
print('\n')
print(confusion_matrix(y_test, preds))
print(classification_report(y_test, preds))

# Save the model
model.save('SentimentClassifier.h5')

## Package the Model

Now that the model has been created, we can package the model into a single `.air` file that enables integration into the browser.

To perform this packaging, we will be utilizing the `aisquared` package `DocumentPredictor` class.

In [None]:
aisquared.base.DocumentPredictor(
    model_path = 'SentimentClassifier.h5',
    vocabulary = vocab,
    sequence_length = 512,
    name = 'SentimentClassifier',
    label_map = ['Negative', 'Postivie', 'Inconclusive'],
    include_probability = True,
    remove_punctuation = False,
    max_vocab = 9999
).compile()