# **Task Specific Autoencoder enabled by Tensorflow-Lite Micro end-to-end Tutorial-Deploying to NRF 5340 and OSU Motionsense HRV**





In this notebook we will show you how to deploy a tensorflow lite micro autoencoder to a cortex-M microcontroller such as the nrf5340 in order to trasmit photoplethysmogram (ppg) IR signals to an mobile device for smooth Heart Rate (HR) prediction.

In the last few years, the power of stronger processors, combined with new techniques for reducing size and cost of machine learning models (a process called quantization) has allowed ai to come all the way down to the internet of things (IoT) level. This is the level of microcontrollers and sensor systems. In other words, with recent proper techniques we can now deploy machine learning on really tiny devices that require very little power, such as watches, security cameras, and wearable medical devices.

There are many benefits to being able to employ machine learning on IoT devices, and the possibilities are endless. In this tutorial, however, we will be taking a look at deploying a *task specific autoencoder*. Autoencoders are a type of neural network that, given an input, will reduce or compress the size to be very small. Then, they attempt to learn from this compressed signal by trying to recreate it as the output. This allows the model to learn what signals/features are key and should be preserved, and what should be ignored. In our case, we will be using this autoenocoder for feature-preserving compression. That is, we will encode data coming from a ppg sensor and use this to send over Bluetooth Low Energy (BLE) so that this data can be further analyzed by higher caliber and deeper networks with things such as Smooth Heart Rate prediction (HR). BLE is very low-power, and as such, cannot transmit large quanities of data easily such as a full ppg signal. As a result of this, the Autoencoder allows the data to be compressed to such small sizes that only the features needed for learning are kept, and this is a small enough size to be sent over through BLE.



Let us start working on the project itself. You will need:


1.   An OSU MotionSenseHRV chip
2.   Anaconda or Miniconda installed (See https://docs.anaconda.com/anaconda/install/)

After step 2, you'll have conda package manager (for python packages) either through your terminal (Linux, macOS) or through Anaconda Prompt (Windows). Open it in the "tutorials" folder as admin/sudo and run the following conda command. These will install all python dependencies in an environment named tf_MSHRV3_AEHR and activate it. Then, re-open this notebook from inside tf_MSHRV3_AEHR environment by simply executing "jupyter notebook AE_model_tutorial.ipynb" from the command line.

```
conda env create -f environment.yml
conda activate tf_MSHRV3_AEHR
jupyter notebook AE_model_tutorial.ipynb
```

In [None]:
#Uncomment to Download the Data via. an automatic python script. Run this only once.

#from data import get_data
#get_data.main()

**Make sure parameters and sample data are in the correct folder**

We will start by getting the necessary model parameters and sample data in order to first test the model. To do this, run the get_data.py file in the repository, or run the following code:

In [None]:
# TensorFlow is an open source machine learning library
import tensorflow as tf

# Keras is TensorFlow's high-level API for deep learning
from tensorflow import keras
from tensorflow.keras import layers

# Numpy is a math library
import numpy as np
# Pandas is a data manipulation library 
import pandas as pd
# Scipy is a signal-processing library
from scipy.signal import detrend
# Matplotlib is a graphing library
import matplotlib.pyplot as plt
# Math is Python's math library
import math
import os
import glob

#import some custom utility functions
from utils import make_data_pipe, Rpeak2HR, sliding_window_fragmentation
#tf.config.set_visible_devices([], 'GPU')


path_prefix='./data/pre-training/'
weights_path_prefix='./data/post-training/model_weights' #for saved weights

val_files=[path_prefix+'2019092801_3154_clean.csv']
test_files=[path_prefix+'2019092820_5701_clean.csv']
win_len=8 #the window length, in seconds
step=1 #in n_samples
Fs_pks=100 #in Hz


#TODO: this code will not run here for obvious reasons. In the public repository, will these files be located in the directory?
#If so I will provide a paragraph to make sure it is in the right settings.


Now that we've configured our paths and imports, we should start by first loading our training data for the model, located in the repository as a spreadsheet csv file of a sample recorded ppg signal through time. We do this by using pd.readcsv(), whitch gets our data, and then we reformat and combine using numpy concatenations.

In [None]:
def get_train_data(path,val_files=[],test_files=[],
                   win_len=8,step=1,Fs_pks=100):

    def get_clean_ppg_and_ecg(files):
        list_clean_ppg=[];list_arr_pks=[]
        for i in range(len(files)):
            df=pd.read_csv(files[i],header=None)
            arr=df.values
            if 'clean' in files[i]:
                arr[:,41:45]=(detrend(arr[:,41:45].reshape(-1),0,'constant')
                                ).reshape((-1,4))
                list_clean_ppg+=[np.concatenate([arr[:,29:30],arr[:,41:45]],
                                    axis=-1),arr[:,30:31],arr[:,39:40],
                                arr[:,40:41]]
                list_arr_pks+=[arr[:,45:49].reshape(-1)]
        return list_clean_ppg,list_arr_pks
    files=glob.glob(path+'*.csv')
    #files=[fil for fil in files if 'WZ' in fil] #get wenxiao's data
    #separate val and test files
    s3=set(files);s4=set(val_files+test_files)
    files_2=list(s3.difference(s4))
    #files_2=[files_2[0]]
    #files_2=[fil for fil in files if not((val_names[0] in fil))]
    list_clean_ppg,list_arr_pks=get_clean_ppg_and_ecg(files_2)

    dsample_factr=4
    Fs_pks=int(Fs_pks/dsample_factr)
    win_len=win_len*Fs_pks

    list_r_pk_locs=[np.arange(len(arr_pks))[arr_pks.astype(bool)] for
                    arr_pks in list_arr_pks]

    #get nearest dsampled idx
    list_r_pk_locs_dsampled=[np.round(r_pk_locs/dsample_factr).astype(int) for
                             r_pk_locs in list_r_pk_locs]

    list_arr_pks_dsampled=[]
    for j in range(len(list_arr_pks)):
        arr_pks_dsampled=np.zeros([len(list_clean_ppg[4*j]),1])
        #check & correct for rare rounding up issue in the last element
        if list_r_pk_locs_dsampled[j][-1]==len(arr_pks_dsampled):
            list_r_pk_locs_dsampled[j][-1]-=1
        arr_pks_dsampled[list_r_pk_locs_dsampled[j]]=1
        list_arr_pks_dsampled.append(arr_pks_dsampled)
    #print([len(ppg) for ppg in list_arr_pks_dsampled])


    list_HR=[dsample_factr*[Rpeak2HR(arr_pks,win_len,step,Fs_pks)]
             for arr_pks in list_arr_pks_dsampled]
    list_HR=sum(list_HR,[])
    #list_HR=[HR[::dsample_factr] for HR in list_HR]

    return list_clean_ppg,list_HR



#input_list,output_list=[],[]
list_sigs,list_HR=get_train_data(path_prefix,val_files,test_files,win_len,
                                           step,Fs_pks)
    


**import training data from csv file**

**Convert and Visualize Data**

We will feed the data to the network using tensorflow's [tf.data](https://www.tensorflow.org/guide/data) pipeline which comes with lots of benefits (check out the link to learn more).

In [None]:
#Pre-process data
Fs_ppg=25 #Hz
sample_win_len,step_size=win_len*Fs_ppg,2*Fs_ppg
HR_win_len=sample_win_len
ppg_win_len=sample_win_len+HR_win_len
model_sigs_in,model_HR_out=[],[]
for j in range(len(list_HR)):
    #fragmenting our signal into 8 second windows
    ppg,HR=list_sigs[j][:,0:1],list_HR[j]
    ppg=sliding_window_fragmentation([ppg],ppg_win_len,step_size)
    HR=sliding_window_fragmentation([HR],HR_win_len,step_size)
    #chaining it all together
    model_sigs_in.append(ppg)
    model_HR_out.append(HR[:len(ppg)])

#flattening the data to it is in a straight line
model_sigs_in=np.concatenate(model_sigs_in,axis=0)
model_HR_out=np.concatenate(model_HR_out,axis=0)
model_in=model_sigs_in[:,:,0] 
model_out=model_HR_out[:,:,0]
print(model_in.shape,model_out.shape)

#Visualize our PPG signal
idx=1
plt.figure()
plt.subplot(211)
plt.title('A sample PPG and HR')
plt.plot(model_in[idx,:])
plt.ylabel('PPG')
plt.grid(True)
plt.subplot(212)
plt.plot(model_out[idx,:])
plt.ylabel('HR (BPS)')
plt.grid(True)
plt.xlabel('Sample No.')
#partition
val_perc=0.14
val_idx=int(val_perc*len(model_in))

#get our final formats and make the data pipe for our model
val_data_AE=[model_in[0:val_idx,:],model_in[0:val_idx]]
train_data_AE=[model_in[val_idx:,:],model_in[val_idx:]]
val_data_e2e=[model_in[0:val_idx,:],model_out[0:val_idx]]
train_data_e2e=[model_in[val_idx:,:],model_out[val_idx:]]

train_ds_AE=make_data_pipe(train_data_AE,batch_size=32,shuffle=True)
val_ds_AE=make_data_pipe(val_data_AE,batch_size=128,shuffle=False)
train_ds_e2e=make_data_pipe(train_data_e2e,batch_size=32,shuffle=True)
val_ds_e2e=make_data_pipe(val_data_e2e,batch_size=128,shuffle=False)

**Loading/Training the model**

Now that we have the necessary data in order to properly load, train and test our model, we will start by importing it and then pre processing it.

We begin by constructing our machine learning model. Our target network is an AutoEncoder that will encode 8 seconds of a ppg signal into a single 1x16 latent representation (or "code"). The layers in between are responsible for the feature-selective compression. If you look at the model, we start with our input, then  convert it into a 256 x 1 layer, with a rectified linear activation function. This is then shrunk to a 64 layer, and then finally our 16 layer.

In [None]:
def create_model_AE(in_shape,latent_shape=(16,),encoder=None,decoder=None):
    #We will be creating the AE_model here
    if encoder is None:
        encoder_in = layers.Input(shape=in_shape,name="encoder_in")
        x = layers.Dense(256,activation='relu')(encoder_in)
        x = layers.Dense(64,activation='relu')(x)
        encoder_out = layers.Dense(latent_shape[0],activation='relu')(x)
        encoder = keras.Model(encoder_in, encoder_out, name="encoder")

    #our encoder is what will be on the edge-device. Since we also need to uncompress,
    #or decode it, we must also create a decoder, and then combine them to together in order to train them
    if decoder is None:
        decoder_in = layers.Input(shape=latent_shape,name="decoder_in")
        x = layers.Dense(64,activation='relu')(decoder_in)
        x = layers.Dense(256,activation='relu')(x)
        decoder_out = layers.Dense(in_shape[0])(x)
        decoder = keras.Model(decoder_in, decoder_out, name="decoder")
    
    AE_in = keras.Input(shape=in_shape, name="AE_in")
    z = encoder(AE_in)
    sig_hat = decoder(z)
    model_AE = keras.Model(AE_in, sig_hat, name="AE")
    return encoder,decoder,model_AE

#now we will use this function to make the AE model
encoder,decoder,model_AE = create_model_AE(model_in.shape[1:],
                                           latent_shape=(16,))

Now that we have our auto encoder, let's also connect our smooth heart rate prediction model to the autoencoder to that it can be optimized to be task specific. the HR model will append onto the decoder and find the heart rate from the reconstructed ppg signal.

In [None]:
# Load an existing PPG to HR Neural Network model called model_HR for smooth Heart Rate Prediction from one channel PPG signal
def create_model_HR(in_shape,HR_win_len=200):
    expand_dims = layers.Lambda(lambda x: tf.expand_dims(x,axis=-1), 
                                name='expand_dims')
    #RNN model via. Functional API
    rnn = layers.GRU(64, return_sequences=True, return_state=True)
    sig_in = layers.Input(shape=in_shape)
    x = expand_dims(sig_in)
    _, final_state=rnn(x[:,:HR_win_len,:]) #warm-up RNN
    rnn_out, _ = rnn(x[:,HR_win_len:,:],initial_state=final_state)
    HR_hat=layers.Conv1D(filters=1,kernel_size=1, strides=1,padding='same',
                         activation=None,name='Conv_{}'.format(1))(rnn_out)
    HR_hat=layers.Flatten()(HR_hat)
    model = keras.Model(sig_in, HR_hat, name='model_HR')
    return model

#% Load HR model
model_HR = create_model_HR(model_in.shape[1:],HR_win_len)
weights_dir_HR=weights_path_prefix+'/sig2HR/checkpoints'
latest_ckpt = tf.train.latest_checkpoint(weights_dir_HR)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_HR.load_weights(latest_ckpt)

In [None]:
# We then stack the model_AE and model_HR together in a model called model_e2e (end2end).
def create_model_e2e(in_shape,encoder,decoder,model_HR):
    #Put the models together
    e2e_in = keras.Input(shape=in_shape, name="e2e_in")
    HR_hat=model_HR(decoder(encoder(e2e_in)))
    model_e2e = keras.Model(e2e_in, HR_hat, name="e2e")
    return model_e2e

# Make e2e model
model_e2e = create_model_e2e(model_in.shape[1:],encoder,decoder,model_HR)

Now, let's inspect our model and, finally, train it. The training process may take a while depending on your PC hardware.

In [None]:
#model_in.shape is the input format
print(encoder.summary())
print(decoder.summary())
print(model_AE.summary())
print(model_HR.summary())
print(model_e2e.summary())

plot_model=False
file_path='./data/figures'
os.makedirs(file_path,exist_ok=True)

if plot_model:
    tf.keras.utils.plot_model(model_AE,to_file=file_path+'/AE.png', 
    dpi=200, show_shapes=True, show_layer_names=True, expand_nested=True)
    tf.keras.utils.plot_model(model_e2e,to_file=file_path+'/e2e.png', 
    dpi=200, show_shapes=True, show_layer_names=True, expand_nested=True)

In [None]:
#compiling the model prepares it for training
model_AE.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])
#we will also set a directory to save our trained weights once they are finished
weights_dir_AE=weights_path_prefix+'/AE/checkpoints'
os.makedirs(weights_dir_AE,exist_ok=True)
ckpt_filepath_AE=weights_dir_AE+'/cp.ckpt'
#callbacks
callbacks=[]
callbacks.append(tf.keras.callbacks.ModelCheckpoint(ckpt_filepath_AE,
                    save_weights_only=True,save_best_only=True,
                    monitor="val_loss",mode='min'))

# Train Simple AE for reconstruction
model_AE.fit(train_ds_AE,epochs=400,validation_data=val_ds_AE,callbacks=callbacks)

In [None]:
#Load best AE ckpt
latest_ckpt = tf.train.latest_checkpoint(weights_dir_AE)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_AE.load_weights(latest_ckpt)

# Train End2End model
model_e2e.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])
weights_dir_e2e=weights_path_prefix+'/AE/checkpoints_e2e'
os.makedirs(weights_dir_e2e,exist_ok=True)
ckpt_filepath_e2e=weights_dir_e2e+'/cp.ckpt'

#callbacks
callbacks=[]

callbacks.append(tf.keras.callbacks.ModelCheckpoint(ckpt_filepath_e2e,
                save_weights_only=True,save_best_only=True,
                monitor="val_loss",mode='min'))

# Train
model_e2e.fit(train_ds_e2e,epochs=100,validation_data=val_ds_e2e,
              callbacks=callbacks)

**Prepare for Quanitization aware training**

Now that we have our model ready, we should now use something called quantization aware training (QAT), that will train our network in preparation to be converted to a lighter format. Integer Quantization is the process of converting parameters (weights, bias, and activations) in our network to 8 bit integer values, used because 8bit integer values are both faster and lighter in terms of storage, whitch is very helpful for microcontrollers. While QAT is not the quantization process itself, This training will optimize our weights to the 8bit model and fine tune them to prepare for this conversion process. It does this by simulating what the model would run like if it used integers, and then optimizing these pre-converted numbers to minized the loss function AFTER it has been trained. Note that we have not actually converted the numbers by involking QAT, we have just prepared the model.

In [None]:
import tensorflow_model_optimization as tfmopt

#Load best e2e ckpt
#latest_ckpt = tf.train.latest_checkpoint(weights_dir_e2e)
#print('Loading model from ckpt {}'.format(latest_ckpt))
#model_e2e.load_weights(latest_ckpt)

#performing generation of the quantization aware model of the encoder
quan_model_generator = tfmopt.quantization.keras.quantize_model
encoder_q = quan_model_generator(encoder) #quantization aware encoder
decoder_noq = keras.models.clone_model(decoder)
decoder_noq.set_weights(decoder.get_weights())
decoder_noq.trainable=False #Freeze decoder for now
    
# Make the AE model, for quantization of it later on
_,_,model_AE_q = create_model_AE(model_in.shape[1:],
latent_shape=(16,),encoder=encoder_q,decoder=decoder_noq)
# Make e2e model
model_e2e_q = create_model_e2e(model_in.shape[1:],encoder_q,decoder_noq,model_HR)


#model_in.shape is the input format, you should see (0, 16)
print(encoder_q.summary())

if plot_model:
    tf.keras.utils.plot_model(model_e2e_q,to_file=file_path+'/e2e_q.png', 
    dpi=200, show_shapes=True, show_layer_names=True, expand_nested=True)

**Train QAT model**

The QAT version of our model has now been configured properly and is ready to 8-bit optimize, by training. We train this model in the same fashion that we trained our original.

In [None]:
model_e2e_q.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])
#we set the directory to our pretrained path
weights_dir_e2e_q = weights_path_prefix+'/AE/checkpoints_e2e_q'
os.makedirs(weights_dir_e2e_q,exist_ok=True)
ckpt_filepath_e2e_q=weights_dir_e2e_q+'/cp.ckpt'

callbacks=[]
callbacks.append(tf.keras.callbacks.ModelCheckpoint(ckpt_filepath_e2e_q,
                save_weights_only=True,save_best_only=True,
                monitor="val_loss",mode='min'))
# Train
model_e2e_q.fit(train_ds_e2e,epochs=100,validation_data=val_ds_e2e,callbacks=callbacks)


**Compare QAT results with normal model**
We also should make sure that most of our accuracy is preserved during this conversion process. Here we check only using validation set but we encourage and leave this verification, using the test data, as an exercise.

In [None]:
#Load best e2e_q checkpoint, training by the model
latest_ckpt = tf.train.latest_checkpoint(weights_dir_e2e_q)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_e2e_q.load_weights(latest_ckpt)

baseline_model_mse = model_e2e.evaluate(val_ds_e2e, verbose=1)
q_aware_model_mse = model_e2e_q.evaluate(val_ds_e2e, verbose=1)

print('Baseline val mse:', baseline_model_mse)
print('Quant val mse:', q_aware_model_mse)

**Quantize the model**

We are now fully ready to convert the model into it's lightweight component, an 8 bit fixed point version of our model. Normally, in machine learning models we use a floating point 32 type of number, whitch is a number that supports decimal points and has a high precision amount, meaning you can type a lot of numbers into it. 8 bit integers only support up to 256 possible numbers, and as such, are not as accurate, but are much faster to run. After running this code, a model named according to the string in "filename". Quantization is typically done by representing the number by this format: Real number = most_significant_digits_of_real_number*scaling_factor. This is the esense of what the converter is doing, though there are many optimizations you can make to ensure this runs with optimal accuracy. You can do this by modifying the attributes of the converter objects, as you can see in the code below. However, since we have employed Quantization Awareness training, most of these settings are automated, and we just need to make sure only one flag, converter.optimizations, is set.

In [None]:
file_name = "tf_lit2_encoder.tflite"
#prepare to convert the encoder model for deeployment by getting our converter object
converter = tf.lite.TFLiteConverter.from_keras_model(encoder_q)


#the following optimizations are already configured by QAT:
#converter.representitive_dataset
#converter.target_spec
#converter.inference_input_type and converter.inference_output_type

converter.optimizations = [tf.lite.Optimize.DEFAULT]

#this converts the model to it's 8 bit weight, bias, and activation form
encoder_tf_lite = converter.convert()

#we now have our model, and we can save it to our directory for deployment
open(file_name, "wb").write(encoder_tf_lite)

**Compare our model to the original**

To make sure our model works, we will now test it with the baseline model and compare mean-sqaure error values to measure how accurate they are. In Tensorflow, you can to this with model.evaluate, but because we are now also working with a tensorflow lite model format we need to pass induvidual values to it through a loop and then stack the values with numpy. Autoencoders work just like any other network in that you can compare the results of the network's output with the desired result, in this case our real ppg signal.

In [None]:
test_tf_lite_model = tf.lite.Interpreter("tf_lit2_encoder.tflite")
#you can change the name of this model in the program by renaming the "file_name" variable and then
#replaceing the above string with your file_name variable
test_tf_lite_model.allocate_tensors()
#the model involking actually works the same way in tflite micro, the only difference is the language
input_tensor = test_tf_lite_model.get_input_details()[0]["index"]
output_tensor = test_tf_lite_model.get_output_details()

repre_data_set = np.array(val_data_AE[1])
np.random.shuffle(repre_data_set)

result_difference = 0
results = []
sample_test_num = 3493
first_time = False
# Run the model's interpreter for each value and store the results in arrays
for i in range(0, sample_test_num):
    calibrat_data = repre_data_set[i]
    calibrat_data = np.reshape(calibrat_data, (1, 400))
    tensor_calibrat = tf.convert_to_tensor(calibrat_data, dtype=tf.float32)
    test_tf_lite_model.set_tensor(input_tensor, tensor_calibrat)
    test_tf_lite_model.invoke()
    output_data = test_tf_lite_model.get_tensor(output_tensor[0]['index'])
    normal_model_output = 0
    if not first_time:
        result = np.array(output_data)
        first_time = True
        result_parent = np.array(normal_model_output)
    else:
        result = np.vstack((result, output_data))
        result_parent = np.vstack((result_parent, normal_model_output))

    #now we have the data from both sets encoder models. All we need to do is pass them through data

    #now we are just taking the average of the difference of each value in the output tensor and printing this
    result_arr = np.subtract(normal_model_output, output_data)
    result_arr = np.absolute(result_arr)
    result_difference = np.mean(result_arr)
    #we are taking a mean here to compare the result_difference)
    results.append(float(result_difference))

final_diff = np.mean(results)
result_parent = encoder.predict(repre_data_set)
decoder.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])
print("the mean ouput loss across", sample_test_num, "runs is ", final_diff)
tflite_loss = decoder.evaluate(result, repre_data_set)
parent_loss = decoder.evaluate(result_parent, repre_data_set)
model_AE.evaluate(repre_data_set, repre_data_set)

**Convert the model to a Tflite micro model**

Now that we have our full tensorflow lite model, we'll need to convert this model into a tf-lite-micro file so that it can be deployed to a micro-controller. You can do this by opening up Linux and running the following commands in the terminal. Make sure you replace the MODEL_TFLITE and MODEL_TFLITE_MICRO with the tflite file name and the converted name you want the file to be, respectively. For Windows users, xxd package is available in Vim for Windows.

### Install xxd if it is not available
!apt-get update && apt-get -qq install xxd

### Convert to a C source file, i.e, a TensorFlow Lite for Microcontrollers model
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}

### **Load a tensorflow nrf-project**

This is where our project gets a little specific. We are working on the  MotionSense_v3 Device specific instructions right now and will update this tutorial soon. So Stay Tuned!!...