# Multi-Domain: Video Model

This notebook shows the model and training process for the video-side of the multi-domain model.

In [1]:
!apt-get install -y xxd

Reading package lists... Done
Building dependency tree       
Reading state information... Done
xxd is already the newest version (2:8.0.1453-1ubuntu1.4).
0 upgraded, 0 newly installed, 0 to remove and 15 not upgraded.


In [2]:
import random

import tensorflow as tf
import numpy as np
import PIL

In [3]:
from google.colab import drive 
drive.mount('/content/gdrive', force_remount=True)

Mounted at /content/gdrive


## Data

The data has been preprocessed into numpy arrays. This is done to save space on Google Drive. To do this yourself you can use `PIL` or `cv2` to import images and convert them to numpy arrays.

In [4]:
!cp gdrive/MyDrive/multi_domain/video_dataset.npz .

In [5]:
dataset = np.load("video_dataset.npz")

x_train = dataset['x_train']
y_train = dataset['y_train']
x_val = dataset['x_val']
y_val = dataset['y_val']

In [6]:
x_train.shape

(454, 224, 224, 3)

I had originally planned for a `(224, 224, 3)` image model. However because the Pico is so small this would require too much memory (even when represented with `int8` dtype). 

Instead the images are now converted use `PIL` to greyscale and then resized into `28x28` images. This results in arrays of the shape `(28, 28, 1)`.


In [7]:
#create new temporary arrays
x_train_new = []
x_val_new = []

for img in x_train:
  #convert array into PIL Image
  image = PIL.Image.fromarray(img)
  #resize (this is current (28,28,3))
  image = image.resize((28,28))
  #convert to greyscale -> (28, 28, 1)
  image = image.convert('L')
  image = np.array(image)
  #reshape for keras as this will give an array (28,28)
  image = image.reshape((28,28, 1))
  x_train_new.append(image)

#repeat for validation dataset
for img in x_val:
  image = PIL.Image.fromarray(img)
  image = image.resize((28,28))
  image = image.convert('L')
  image = np.array(image)
  image = image.reshape((28,28, 1))
  x_val_new.append(image)

#set x train and x val to the new arrays
x_train = np.array(x_train_new)
x_val = np.array(x_val_new)

del x_train_new
del x_val_new

#final standardise the images to be between 0-1
x_train = x_train / 255
x_val = x_val / 255

We now need to convert the output data into integers to represent the classes. The two classes are "happy" and "angry". For a problem this simple, a list comprehension can simply be done to change "happy" into a 1 and "angry" into a 0.

- Happy: 1
- Angry: 0

In [8]:
y_train = np.array([1 if y == 'happy' else 0 for y in y_train])
y_val = np.array([1 if y == 'happy' else 0 for y in y_val])

The last step for the data is to shuffle the dataset. This can help with overfitting as there are no clusters of the same input/output to overfit to when taking batches.

Because this is a small dataset it can be done simply with the `random` module in the Python standard library.

In [9]:
#create a list of tuples
c = list(zip(x_train, y_train))
#shuffle the tuples
random.shuffle(c)
#return back to x_train and y_train
x_train, y_train = zip(*c)

In [10]:
x_train = np.array(x_train)
x_val = np.array(x_val)

y_train = np.array(y_train)
y_val = np.array(y_val)

y_train.shape = (len(y_train),1)
y_val.shape = (len(y_val), 1)

## Modelling

The next part is to create the neural network to fit to the images. Because this is a simple problem (binary classification) a model is built from scratch. However Tensorfow and Keras have many pretrained models which can be adapted to your problem. Prequantized models can also be found on the [TensorFlow Hub](https://tfhub.dev/s?q=quantized). 

When creating a model for a microcontroller you need to think more carefully about your model selection. A few important points:
- Are the layers supported by TensorFlow Lite for Microcontrollers?
- Is the model too big?
- Is there a more efficient architecture

For example when running on a laptop you may create a really simple dense network by flattening the image. This will result in too many weights for a microcontroller and waste precious memory. 

The model in this example is a simple feed-forward convolutional network which uses a sigmoid classifier to shift the output between 0 and 1 (like the y values we have).

In [11]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size = (2,2), activation="relu", input_shape = (28,28,1)),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPool2D(pool_size=(2,2)),
    tf.keras.layers.Conv2D(32, kernel_size= (2,2), activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPool2D(pool_size=(2,2)),
    tf.keras.layers.Conv2D(64, kernel_size = (2,2), activation="relu"),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.MaxPool2D(pool_size=(2,2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1, activation = "sigmoid")
    ]
)

In [12]:
model.build((None, 28,28, 1))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 27, 27, 32)        160       
_________________________________________________________________
batch_normalization (BatchNo (None, 27, 27, 32)        128       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 12, 12, 32)        4128      
_________________________________________________________________
batch_normalization_1 (Batch (None, 12, 12, 32)        128       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 5, 5, 64)          8

## Training

The model is trained with a simple training schedule. The loss function is `binary_crossentropy` which is often used for binary classification problems. The optimizer is called `Adam`.

We first use a large learning rate (this allows weights to jump around a lot at the start to avoid falling into local minimum. After 3 epochs the model then trains with a smaller learning rate to get closer to the minimum without "jumping" out for 10 epochs.

In [13]:
model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(0.1), metrics=['accuracy'])
model.build((28,28,1))

In [14]:
history = model.fit(x_train, y_train, epochs=3, batch_size=8, validation_data=(x_val, y_val), shuffle=True)

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [15]:
model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(0.0001), metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=10, batch_size=8, validation_data=(x_val, y_val), shuffle=True)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Post-Training Quantization

The model is quantized ready for use. This helps keep the model as small as possible. Here we use int8 quantization to keep it as small as possible.

In [16]:
def representative_dataset():
  for data in x_train:
    data = data.reshape(1, 28, 28, 1)
    yield [data.astype(np.float32)]

In [17]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 
converter.inference_output_type = tf.int8
quant_model = converter.convert()

INFO:tensorflow:Assets written to: /tmp/tmpm98l4ro3/assets


Once converted, the model can be exported as a .tflite. For tensorflow lite for microcontrollers. One final step is required and that is to convert the model into a .cc file using `xxd`.  The C source file contains the TensorFlow Lite model as a char array.

In [18]:
with open("video_model_mobilenet.tflite", "wb") as f:
  f.write(quant_model)

In [19]:
from pathlib import Path

p = Path("video_model_mobilenet.tflite")

In [20]:
print(p.stat().st_size / 1024, "kb", (p.stat().st_size) / (1024*1024), "mb")

23.7578125 kb 0.02320098876953125 mb


In [21]:
!xxd -i "video_model_mobilenet.tflite" > "video_model_mobilenet.cc"

## Test Data

To test the model on the microcontroller, we can create some test data which can be used to simulate the model.

To do this we need to convert the float32 input data into int8 data using the conversion provided with the Tensorflow Lite for Microcontrollers 

In [22]:
interpreter = tf.lite.Interpreter("video_model_mobilenet.tflite")
input_details = interpreter.get_input_details()

scale, zero = input_details[0]["quantization"]

In [23]:
test_x_data = (x_val[0] / scale + zero).astype(input_details[0]['dtype'])
test_x_data.tofile("x_video_test.txt")

In [24]:
!xxd -i x_video_test.txt > x_video_test.cc