# **Task Specific Autoencoder enabled by Tensorflow-Lite Micro end-to-end Tutorial-Deploying to NRF 5340 and OSU Motionsense HRV**





In this notebook we will show you how to deploy a tensorflow lite micro autoencoder to a cortex-M microcontroller such as the nrf5340 in order to trasmit photoplethysmogram (ppg) IR signals to an mobile device for smooth Heart Rate (HR) prediction.

In the last few years, the power of stronger processors, combined with new techniques for reducing size and cost of machine learning models (a process called quantization) has allowed ai to come all the way down to the internet of things (IoT) level. This is the level of microcontrollers and sensor systems. In other words, with recent proper techniques we can now deploy machine learning on really tiny devices that require very little power, such as watches, security cameras, and wearable medical devices.

There are many benefits to being able to employ machine learning on IoT devices, and the possibilities are endless. In this tutorial, however, we will be taking a look at deploying a *task specific autoencoder*. Autoencoders are a type of neural network that, given an input, will reduce or compress the size to be very small. Then, they attempt to learn from this compressed signal by trying to recreate it as the output. This allows the model to learn what signals/features are key and should be preserved, and what should be ignored. In our case, we will be using this autoenocoder for feature-preserving compression. That is, we will encode data coming from a ppg sensor and use this to send over Bluetooth Low Energy (BLE) so that this data can be further analyzed by higher caliber and deeper networks with things such as Smooth Heart Rate prediction (HR). BLE is very low-power, and as such, cannot transmit large quanities of data easily such as a full ppg signal. As a result of this, the Autoencoder allows the data to be compressed to such small sizes that only the features needed for learning are kept, and this is a small enough size to be sent over through BLE.



Let us start working on the project itself. You will need:


1.   An OSU MotionSenseHRV chip
2.   Anaconda or Miniconda installed (See https://docs.anaconda.com/anaconda/install/)

After step 2, you'll have conda package manager (for python packages) either through your terminal (Linux, macOS) or through Anaconda Prompt (Windows). Open it in the "tutorials" folder as admin/sudo and run the following conda command. These will install all python dependencies in an environment named tf_MSHRV3_AEHR and activate it. Then, re-open this notebook from inside tf_MSHRV3_AEHR environment by simply executing "jupyter notebook AE_model_tutorial.ipynb" from the command line.

Keep exp_id='1_1' to reuse existing stored weights and to avoid all training/fitting commands. Keep exp_id to anything else if you want to train from scratch. You can explore commands following "if exp_id!='1_1'" in this jupyter notebook to see what steps are skipped when using pretrained weights. The downstream HR prediction model weights are always provided in "sig2HR" folder but if it's desired to retrain that model as well, use the separate python script "sig2HR_model.py". Ignore most tensorflow WARNINGS.

In [None]:
#Uncomment to Download the Data via. an automatic python script. Run this only once.

from data import get_data
get_data.main()

**Make sure parameters and sample data are in the correct folder**

We will start by getting the necessary model parameters and sample data in order to first test the model. To do this, run the get_data.py file in the repository, or run the following code:

In [None]:
# TensorFlow is an open source machine learning library
import tensorflow as tf

# Keras is TensorFlow's high-level API for deep learning
from tensorflow import keras
from tensorflow.keras import layers

# Numpy is a math library
import numpy as np
# Pandas is a data manipulation library 
import pandas as pd
# Scipy is a signal-processing library
from scipy.signal import detrend
# Matplotlib is a graphing library
import matplotlib.pyplot as plt
# Math is Python's math library
import math
import os
import glob
import shutil

#import some custom utility functions
from utils import make_data_pipe, Rpeak2HR, sliding_window_fragmentation
from sig2HR_model import create_model as create_model_HR
from sig2HR_model import create_infer_model as create_infer_model_HR
#tf.config.set_visible_devices([], 'GPU')


path_prefix='data/pre-training'
exp_id='1_2'
#current_time = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
#log_prefix='../experiments/{}_{}'.format(exp_id,current_time)
log_prefix='data/post-training/experiments/{}'.format(exp_id)


#path_prefix= 'E:/Box Sync/' #'C:/Users/agarwal.270/Box/' #
path=(path_prefix+'/')
val_files=[path+'2019092801_3154_clean.csv']
test_files=[path+'2019092820_5701_clean.csv']
win_len=8 #in sec
step=1 #in n_samples
Fs_pks=100 #in Hz

Now that we've configured our paths and imports, we should start by first loading our training data for the model, located in the repository as a spreadsheet csv file of a sample recorded ppg signal through time. We do this by using pd.readcsv(), whitch gets our data, and then we reformat and combine using numpy concatenations.

In [None]:
def get_train_data(path,val_files=[],test_files=[],
                   win_len=8,step=1,Fs_pks=100):
    '''
    Use all files in the folder 'path' except the val_files and test_files
    '''
    def get_clean_ppg_and_ecg(files):
        list_clean_ppg=[];list_arr_pks=[]
        for i in range(len(files)):
            df=pd.read_csv(files[i],header=None)
            arr=df.values
            if 'clean' in files[i]:
                arr[:,41:45]=(detrend(arr[:,41:45].reshape(-1),0,'constant')
                                ).reshape((-1,4))
                list_clean_ppg+=[np.concatenate([arr[:,29:31],arr[:,41:45]],
                                    axis=-1),arr[:,39:41]]
                list_arr_pks+=[arr[:,45:49].reshape(-1)]    
        return list_clean_ppg,list_arr_pks
    files=glob.glob(path+'*.csv')
    #files=[fil for fil in files if 'WZ' in fil] #get wenxiao's data
    #separate val and test files
    s3=set(files);s4=set(val_files+test_files)
    files_2=list(s3.difference(s4))
    #files_2=[files_2[0]]
    #files_2=[fil for fil in files if not((val_names[0] in fil))]
    list_clean_ppg,list_arr_pks=get_clean_ppg_and_ecg(files_2)
    
    dsample_factr=4
    Fs_pks=int(Fs_pks/dsample_factr)
    win_len=win_len*Fs_pks
    
    list_r_pk_locs=[np.arange(len(arr_pks))[arr_pks.astype(bool)] for 
                    arr_pks in list_arr_pks]
    
    #get nearest dsampled idx
    #TODO: Started using round instead of floor
    list_r_pk_locs_dsampled=[np.round(r_pk_locs/dsample_factr).astype(int) for 
                             r_pk_locs in list_r_pk_locs]
    #print([np.max(r_pks) for r_pks in list_r_pk_locs_dsampled])
    #print([len(ppg) for ppg in list_clean_ppg[::4]])
    
    list_arr_pks_dsampled=[]
    for j in range(len(list_arr_pks)):
        arr_pks_dsampled=np.zeros([int(len(list_arr_pks[j])/dsample_factr),1])
        #check & correct for rare rounding up issue in the last element
        if list_r_pk_locs_dsampled[j][-1]==len(arr_pks_dsampled):
            list_r_pk_locs_dsampled[j][-1]-=1
        arr_pks_dsampled[list_r_pk_locs_dsampled[j]]=1
        list_arr_pks_dsampled.append(arr_pks_dsampled)
    #print([len(ppg) for ppg in list_arr_pks_dsampled])


    list_HR=[2*[Rpeak2HR(arr_pks,win_len,step,Fs_pks)] 
             for arr_pks in list_arr_pks_dsampled]
    list_HR=sum(list_HR,[])
    #list_HR=[HR[::dsample_factr] for HR in list_HR]
    
    return list_clean_ppg,list_HR


list_sigs,list_HR=get_train_data(path,val_files,test_files,win_len,
                                     step,Fs_pks)

**import training data from csv file**

**Convert and Visualize Data**

We will feed the data to the network using tensorflow's [tf.data](https://www.tensorflow.org/guide/data) pipeline which comes with lots of benefits (check out the link to learn more).

In [None]:
#Pre-process data
dsample_factr=4;Fs_new=int(Fs_pks/dsample_factr)
sample_win_len,step_size=win_len*Fs_new,2*Fs_new
HR_win_len=sample_win_len*3 #TODO: Can change this later, 4 is arbitrary choice after profs suggestion
ppg_win_len=sample_win_len+HR_win_len

model_sigs_in,model_HR_out=[],[]
for j in range(len(list_HR)):
    #HR=list_HR[j][list_arr_pks[j].astype(bool)]
    ppg,HR=list_sigs[j][:,0:2],list_HR[j]
    ppg=sliding_window_fragmentation([ppg],ppg_win_len,step_size)
    HR=sliding_window_fragmentation([HR],HR_win_len,step_size)
    #print(len(ppg),len(HR))
    model_sigs_in.append(ppg)
    model_HR_out.append(HR[:len(ppg)]) #clipping extra HRs at the end
model_sigs_in=np.concatenate(model_sigs_in,axis=0)
model_HR_out=np.concatenate(model_HR_out,axis=0)
model_in=model_sigs_in#[:,:,0] #removing last dummy dimension
model_out=model_HR_out[:,:,0] #removing last dummy dimension
print(model_in.shape,model_out.shape)

#Visualize our PPG signal
idx=1
plt.figure()
plt.subplot(211)
plt.title('A sample PPG and HR')
plt.plot(model_in[idx,:,:])
plt.ylabel('PPG')
plt.grid(True)
plt.subplot(212)
plt.plot(model_out[idx,:])
plt.ylabel('HR (BPS)')
plt.grid(True)
plt.xlabel('Sample No.')

#partition
val_perc=0.14
val_idx=int(val_perc*len(model_in))

#TODO: Changed here to have fused decoder output
val_data_AE=[model_in[0:val_idx,:],np.mean(model_in[0:val_idx],axis=-1,keepdims=True)]
train_data_AE=[model_in[val_idx:,:],np.mean(model_in[val_idx:],axis=-1,keepdims=True)]
val_data_e2e=[model_in[0:val_idx,:],model_out[0:val_idx]]
train_data_e2e=[model_in[val_idx:,:],model_out[val_idx:]]

train_ds_AE=make_data_pipe(train_data_AE,batch_size=32,shuffle=True)
val_ds_AE=make_data_pipe(val_data_AE,batch_size=128,shuffle=False)
train_ds_e2e=make_data_pipe(train_data_e2e,batch_size=32,shuffle=True)
val_ds_e2e=make_data_pipe(val_data_e2e,batch_size=128,shuffle=False)

**Loading/Training the model**

Now that we have the necessary data in order to properly load, train and test our model, we will start by importing it and then pre processing it.

We begin by constructing our machine learning model. Our target network is an AutoEncoder that will encode 8 seconds of a ppg signal into a single 1x16 latent representation (or "code"). The layers in between are responsible for the feature-selective compression. If you look at the model, we start with our input, then  convert it into a 256 x 1 layer, with a rectified linear activation function. This is then shrunk to a 64 layer, and then finally our 16 layer.

In [None]:
def create_model_AE(in_shape,latent_shape=(16,),mem_shape=(1,),reps=4,
                    encoder=None,decoder=None):
    in_shape_AE = (int(in_shape[0]/reps),in_shape[1])
    #We will be creating the AE_model here
    if encoder is None:
        encoder_in = layers.Input(shape=in_shape_AE,name="encoder_in")
        x = layers.Reshape((1, in_shape_AE[0], in_shape_AE[1]),
            name='expand_dims')(encoder_in) #insert a dummy dim to use conv2d
        x = layers.Conv2D(filters=8,kernel_size=(1,3), strides=(1,2),
                    activation='relu',padding='same')(x)
        x = layers.Conv2D(filters=8,kernel_size=(1,3), strides=(1,2),
                    activation='relu',padding='same')(x)
        x = layers.Conv2D(filters=16,kernel_size=(1,5), strides=(1,5),
                    activation='relu',padding='same')(x)
        x = layers.Conv2D(filters=16,kernel_size=(1,5), strides=(1,5),
                    activation='relu',padding='same')(x)
        #x = layers.GlobalAveragePooling2D(name='GAP{}')(x)
        x = layers.Flatten()(x)
        
        #x = layers.Dense(256,activation='relu')(encoder_in)
        #x = layers.Dense(64,activation='relu')(x)
        encoder_out = layers.Dense(latent_shape[0],activation='relu')(x)
        encoder = keras.Model(encoder_in, encoder_out, name="encoder")
    if decoder is None:
        #dec_shape = (latent_shape[0]+mem_shape[0],)
        decoder_mem = layers.Input(shape=mem_shape,name="decoder_mem")
        decoder_in = layers.Input(shape=(latent_shape[0],),
                                  name="decoder_in")
        #net_in= tf.keras.layers.Concatenate(axis=-1)([decoder_mem,decoder_in])
        x = layers.Dense(units=1*2*latent_shape[0], activation='relu')(decoder_in)
        x = layers.Reshape(target_shape=(1, 2, latent_shape[0]))(x)
        x = layers.Conv2DTranspose(filters=16,kernel_size=(1,5), strides=(1,5),
                    activation='relu',padding='same')(x)
        x = layers.Conv2DTranspose(filters=16,kernel_size=(1,5), strides=(1,5),
                    activation='relu',padding='same')(x)
        x = layers.Conv2DTranspose(filters=8,kernel_size=(1,3), strides=(1,2),
                    activation='relu',padding='same')(x)
        x = layers.Conv2DTranspose(filters=1,kernel_size=(1,3), strides=(1,2),
                    activation='linear',padding='same')(x)
                     
        #x = layers.Dense(64,activation='relu')(net_in)
        #x = layers.Dense(256,activation='relu')(x)
        x = tf.squeeze(x, axis=1, name='squeeze_dims') #remove the dummy dim
        
        x,mem_out = layers.GRU(mem_shape[0], return_sequences=True, 
                      return_state=True)(x,initial_state=decoder_mem)
        decoder_out = layers.Conv1D(filters=1,kernel_size=1, strides=1,padding='same',
                             activation=None,name='Conv1')(x)
        #decoder_out = layers.Dense(in_shape_AE[0])(x)
        decoder = keras.Model([decoder_mem,decoder_in], [mem_out,decoder_out], name="decoder")
        decoder.mem_shape = mem_shape
    
    #return encoder,decoder,decoder

    AE_in = keras.Input(shape=in_shape, name="AE_in")
    #Functional API is "call" function of subclassing API
    inputs = layers.Reshape((reps, in_shape_AE[0], in_shape_AE[1]))(AE_in) #Check if reshaping in desired fashion
    dec_mem=tf.zeros([tf.shape(inputs)[0],mem_shape[0]])
    out_list=[]
    for i in range(reps):
        z = encoder(inputs[:,i,:,:])
        dec_mem,dec_out=decoder([dec_mem,z])
        out_list.append(dec_out)
        #dec_mem = z[:,-mem_shape[0]:]
    out = tf.stack(out_list,axis=1)
    out = layers.Reshape((in_shape[0],-1))(out)#Check if reshaping in desired fashion
    model_AE = keras.Model(AE_in, out, name="AE")
    
    return encoder,decoder,model_AE

#now we will use this function to make the AE model
# Make AE model
encoder,decoder,model_AE = create_model_AE(model_in.shape[1:],
                                           latent_shape=(16,),
                                           reps=4,mem_shape=(4,))
#compile AE and prep for training
model_AE.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])

Now that we have our auto encoder, let's also connect our smooth heart rate prediction model to the autoencoder to that it can be optimized to be task specific. the HR model will append onto the decoder and find the heart rate from the reconstructed ppg signal.

In [None]:
# Load an existing PPG to HR Neural Network model called model_HR for smooth Heart Rate Prediction from one channel PPG signal
# Load HR model
model_HR = create_model_HR(val_data_AE[1].shape[1:],HR_win_len)
weights_dir_HR=log_prefix+'/sig2HR/checkpoints'

#copy sig2HR weights to new directory if exp_id!='1_1'
if exp_id!='1_1':
    os.makedirs(log_prefix,exist_ok=True)
    shutil.copytree(log_prefix+'/../1_1/sig2HR', log_prefix+'/sig2HR')
    
latest_ckpt = tf.train.latest_checkpoint(weights_dir_HR)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_HR.load_weights(latest_ckpt)
model_HR.trainable=False #freeze model_HR weights

In [None]:
# We then stack the model_AE and model_HR together in a model called model_e2e (end2end).
def create_model_e2e(in_shape,reps,encoder,decoder,model_HR):
    in_shape_AE = (int(in_shape[0]/reps),in_shape[1])
    #Put the models together
    e2e_in = keras.Input(shape=in_shape, name="e2e_in")
    #Functional API is "call" function of subclassing API
    inputs = layers.Reshape((reps, in_shape_AE[0], in_shape_AE[1]))(e2e_in) #Check if reshaping in desired fashion
    dec_mem=tf.zeros([tf.shape(inputs)[0],decoder.mem_shape[0]],name='dec_mem_e2e')
    out_list=[]
    for i in range(reps):
        z = encoder(inputs[:,i,:,:])
        dec_mem,dec_out=decoder([dec_mem,z])
        out_list.append(dec_out)
        #dec_mem = z[:,-decoder.mem_shape[0]:]
    out = tf.stack(out_list,axis=1)
    out = layers.Reshape((in_shape[0],-1))(out)#Check if reshaping in desired fashion
    HR_hat=model_HR(out)
    model_e2e = keras.Model(e2e_in, HR_hat, name="e2e")
    return model_e2e

# Make e2e model
model_e2e = create_model_e2e(model_in.shape[1:],4,encoder,decoder,model_HR)
model_e2e.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])

Now, let's inspect our model and, finally, train it. The training process may take a while depending on your PC hardware.

In [None]:
#model_in.shape is the input format
print(encoder.summary())
print(decoder.summary())
print(model_AE.summary())
print(model_HR.summary())
print(model_e2e.summary())

ckpt_dir_AE=log_prefix+'/AE/checkpoints'
stdout_log_file = log_prefix + '/AE/stdout.log'
#ckpt_filepath=ckpt_dir+'/cp-{epoch:04d}.ckpt'
ckpt_filepath_AE=ckpt_dir_AE+'/cp-{epoch:04d}.ckpt'
# Include the epoch in the file name (uses `str.format`)
ckpt_dir_e2e=log_prefix+'/AE/checkpoints_e2e'
ckpt_filepath_e2e=ckpt_dir_e2e+'/cp-{epoch:04d}.ckpt'
#ckpt_filepath_e2e=ckpt_dir_e2e+'/cp.ckpt'
#we will also set a directory to save our trained weights once they are finished
weights_path_prefix=log_prefix#glob.glob('../experiments/' + exp_id + '*')[0] #for latest experiment weights


os.makedirs(ckpt_dir_AE,exist_ok=True)
os.makedirs(ckpt_dir_e2e,exist_ok=True)
file_path=log_prefix+'/AE'

In [None]:
#weights_dir_AE=weights_path_prefix+'/AE/checkpoints'
#os.makedirs(weights_dir_AE,exist_ok=True)
#ckpt_filepath_AE=weights_dir_AE+'/cp.ckpt'

if exp_id!='1_1':
    #callbacks
    callbacks=[]

    callbacks.append(tf.keras.callbacks.ModelCheckpoint(ckpt_filepath_AE,
                    save_weights_only=True,save_best_only=True,
                    monitor="val_loss",mode='min'))
    # Train
    model_AE.fit(train_ds_AE,epochs=600,validation_data=val_ds_AE,callbacks=callbacks)

In [None]:
#Load best AE ckpt
weights_dir_AE=weights_path_prefix+'/AE/checkpoints'
latest_ckpt = tf.train.latest_checkpoint(weights_dir_AE)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_AE.load_weights(latest_ckpt)

# Train End2End model
if exp_id!='1_1':
    #callbacks
    callbacks=[]

    callbacks.append(tf.keras.callbacks.ModelCheckpoint(ckpt_filepath_e2e,
                    save_weights_only=True,save_best_only=True,
                    monitor="val_loss",mode='min'))
    model_e2e.fit(train_ds_e2e,epochs=1000,validation_data=val_ds_e2e,
                  callbacks=callbacks)

**Prepare for Quanitization aware training**

Now that we have our model ready, we should now use something called quantization aware training (QAT), that will train our network in preparation to be converted to a lighter format. Everything before this can be run in a tensorflow_GPU conda environment for faster execution. For QAT, we need tf-nightly which is the default installation in tf_MS3 conda environment. 

Integer Quantization is the process of converting parameters (weights, bias, and activations) in our network to 8 bit integer values, used because 8bit integer values are both faster and lighter in terms of storage, whitch is very helpful for microcontrollers. While QAT is not the quantization process itself, This training will optimize our weights to the 8bit model and fine tune them to prepare for this conversion process. It does this by simulating what the model would run like if it used integers, and then optimizing these pre-converted numbers to minized the loss function AFTER it has been trained. Note that we have not actually converted the numbers by involking QAT, we have just prepared the model.

In [None]:
import tensorflow_model_optimization as tfmopt

#Load best e2e ckpt
#latest_ckpt = tf.train.latest_checkpoint(weights_dir_e2e)
#print('Loading model from ckpt {}'.format(latest_ckpt))
#model_e2e.load_weights(latest_ckpt)

#performing generation of the quantization aware model of the encoder
quan_model_generator = tfmopt.quantization.keras.quantize_model
encoder_q = quan_model_generator(encoder) #quantization aware encoder
decoder_noq = keras.models.clone_model(decoder)
decoder_noq.mem_shape=decoder.mem_shape #copy mem_shape attribute explicitly
decoder_noq.set_weights(decoder.get_weights())
decoder_noq.trainable=False #Freeze decoder for now

# Make the AE model, for quantization of it later on

_,_,model_AE_q = create_model_AE(model_in.shape[1:],
latent_shape=(16,),reps=4,mem_shape=(4,),encoder=encoder_q,decoder=decoder_noq)

# Make e2e model
model_e2e_q = create_model_e2e(model_in.shape[1:],4,encoder_q,decoder_noq,model_HR)


#model_in.shape is the input format, you should see (0, 16)
print(encoder_q.summary())

**Train QAT model**

The QAT version of our model has now been configured properly and is ready to 8-bit optimize, by training. We train this model in the same fashion that we trained our original.

In [None]:
model_e2e_q.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])
#we set the directory to our pretrained path
weights_dir_e2e_q = weights_path_prefix+'/AE/checkpoints_e2e_q'
os.makedirs(weights_dir_e2e_q,exist_ok=True)
ckpt_filepath_e2e_q=weights_dir_e2e_q+'/cp.ckpt'

# Train End2End model
if exp_id!='1_1':
    callbacks=[]
    callbacks.append(tf.keras.callbacks.ModelCheckpoint(ckpt_filepath_e2e_q,
                    save_weights_only=True,save_best_only=True,
                    monitor="val_loss",mode='min'))
    # Train
    model_e2e_q.fit(train_ds_e2e,epochs=50,validation_data=val_ds_e2e,callbacks=callbacks)

**Compare QAT results with normal model**
We also should make sure that most of our accuracy is preserved during this conversion process. Here we check only using validation set but we encourage and leave this verification, using the test data, as an exercise.

In [None]:
#for evaluation
rmse=lambda y,y_hat:np.sqrt(np.mean((y.reshape(-1)-y_hat.reshape(-1))**2))

# Predictions from AE
weights_dir_AE=weights_path_prefix+'/AE/checkpoints'
latest_ckpt = tf.train.latest_checkpoint(weights_dir_AE)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_AE.load_weights(latest_ckpt)
AE_model_mse = model_e2e.evaluate(val_ds_e2e, verbose=1)

# Predictions for new e2e AE    
weights_dir_e2e=weights_path_prefix+'/AE/checkpoints_e2e'
latest_ckpt = tf.train.latest_checkpoint(weights_dir_e2e)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_AE.load_weights(latest_ckpt)
e2e_model_mse = model_e2e.evaluate(val_ds_e2e, verbose=1)

#Load best e2e_q checkpoint, training by the model
latest_ckpt = tf.train.latest_checkpoint(weights_dir_e2e_q)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_e2e_q.load_weights(latest_ckpt)
q_aware_model_mse = model_e2e_q.evaluate(val_ds_e2e, verbose=1)

print('Baseline AE val mse on HR:', AE_model_mse)
print('AE val mse on HR after end-to-end training:', e2e_model_mse)
print('End-to-end trained AE val mse on HR after Quantization:', q_aware_model_mse)

In [None]:
# Create inference model
pred_tsteps=decoder.outputs[1].shape.as_list()[1]
infer_model_HR=create_infer_model_HR([pred_tsteps,decoder.outputs[1].shape.as_list()[2]])
# Load HR model
weights_dir_HR=weights_path_prefix+'/sig2HR/checkpoints'
latest_ckpt = tf.train.latest_checkpoint(weights_dir_HR)
print('Loading model from ckpt {}'.format(latest_ckpt))
infer_model_HR.load_weights(latest_ckpt)

#Check if weights correctly loaded in infer model
HR_weights=[v.numpy() for v in model_HR.variables]
infer_HR_weights=[v.numpy() for v in infer_model_HR.variables]
HR_weights_check=[(HR_weights[i]==infer_HR_weights[i]).astype(int).reshape(-1).prod()
                  for i in range(len(HR_weights))]
print(HR_weights_check)

In [None]:
def get_test_data(file_path,win_len,step,Fs_pks):
    df=pd.read_csv(file_path,header=None)
    arr=df.values
    test_in=[np.concatenate([arr[:,29:31],arr[:,41:45]],axis=-1),
                            arr[:,39:41]]
    
    arr_pks=arr[:,45:49].reshape(-1)
    
    dsample_factr=4
    Fs_pks=int(Fs_pks/dsample_factr)
    win_len=win_len*Fs_pks
    
    r_pk_locs=np.arange(len(arr_pks))[arr_pks.astype(bool)]
    
    #get nearest dsampled idx
    #TODO: Started using round instead of floor
    r_pk_locs_dsampled=np.round(r_pk_locs/dsample_factr).astype(int)
    #print([np.max(r_pks) for r_pks in list_r_pk_locs_dsampled])
    #print([len(ppg) for ppg in list_clean_ppg[::4]])
    arr_pks_dsampled=np.zeros([len(test_in[0]),1])
        #check & correct for rare rounding up issue in the last element
    if r_pk_locs_dsampled[-1]==len(arr_pks_dsampled):
        r_pk_locs_dsampled[-1]-=1
    arr_pks_dsampled[r_pk_locs_dsampled]=1
    #print([len(ppg) for ppg in list_arr_pks_dsampled])


    list_HR=2*[Rpeak2HR(arr_pks_dsampled,win_len,step,Fs_pks)] 
    
    test_in=[ppg.astype('float32') for ppg in test_in]
    test_out=[HR[:,0].astype('float32') for HR in list_HR]
    return test_in,test_out

#load test data
ppg_in,HR_out=get_test_data(test_files[0],win_len,step,Fs_pks)
ppg,HR=ppg_in[0][:,0:2],HR_out[0]

ppg=sliding_window_fragmentation([ppg],pred_tsteps,pred_tsteps)
HR=sliding_window_fragmentation([HR],pred_tsteps,pred_tsteps)

In [None]:
#Some more Plotting and evaluation

# Predictions from AE
weights_dir_AE=weights_path_prefix+'/AE/checkpoints'
latest_ckpt = tf.train.latest_checkpoint(weights_dir_AE)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_AE.load_weights(latest_ckpt)


dec_mem=np.zeros([1,decoder.mem_shape[0]],dtype=np.float32)
HR_mem=np.zeros([1,*infer_model_HR.inputs[0].shape.as_list()[1:]],dtype=np.float32)

ppg_out_list=[];HR_out_list=[]
for i in range(ppg.shape[0]):
    z = encoder.predict(ppg[i:i+1,:,:])
    #dec_out=decoder.predict([dec_mem,z[:,:-decoder.mem_shape[0]]])
    dec_mem,dec_out=decoder.predict([dec_mem,z])
    #HR_mem is updated alongwith prediction
    HR_mem, HR_out = infer_model_HR.predict([HR_mem,dec_out]) 
    #dec_mem = z[:,-decoder.mem_shape[0]:] #update dec_mem

    ppg_out_list.append(dec_out[0])
    HR_out_list.append(HR_out[0])

ppg_hat_AE=np.concatenate(ppg_out_list,axis=0)
HR_hat_AE=np.concatenate(HR_out_list[1:],axis=0)

# Predictions for new e2e AE    
weights_dir_e2e=weights_path_prefix+'/AE/checkpoints_e2e'
latest_ckpt = tf.train.latest_checkpoint(weights_dir_e2e)
print('Loading model from ckpt {}'.format(latest_ckpt))
model_AE.load_weights(latest_ckpt)

dec_mem=np.zeros([1,decoder.mem_shape[0]])
HR_mem=np.zeros([1,*infer_model_HR.inputs[0].shape.as_list()[1:]])

ppg_out_list=[];HR_out_list=[]
for i in range(ppg.shape[0]):
    z = encoder.predict(ppg[i:i+1,:,:])
    dec_mem,dec_out=decoder.predict([dec_mem,z])
    #HR_mem is updated alongwith prediction
    HR_mem, HR_out = infer_model_HR.predict([HR_mem,dec_out]) 
    #dec_mem = z[:,-decoder.mem_shape[0]:] #update dec_mem

    ppg_out_list.append(dec_out[0])
    HR_out_list.append(HR_out[0])

ppg_hat_e2e=np.concatenate(ppg_out_list,axis=0)
HR_hat_e2e=np.concatenate(HR_out_list[1:],axis=0)

#Get HR from True ppg
HR_mem=np.zeros([1,*infer_model_HR.inputs[0].shape.as_list()[1:]])
HR_out_list=[]
for i in range(ppg.shape[0]):
    #HR_mem is updated alongwith prediction
    HR_mem, HR_out = infer_model_HR.predict([HR_mem,ppg[i:i+1,:,0:1]])#TODO: selected first LED channel for now
    HR_out_list.append(HR_out[0])
HR_from_ppg=np.concatenate(HR_out_list[1:],axis=0)

target=np.mean(ppg,axis=-1,keepdims=True).reshape(-1)
plt.figure()
plt.plot(target.reshape(-1),'b',
         ppg_hat_AE[...,0].reshape(-1),'r--',
         ppg_hat_e2e[...,0].reshape(-1),'g-.')
plt.legend(['Target','AE_out','AE_out_e2e'])
plt.ylabel('PPG magnitude')
plt.xlabel('Sample No.')
plt.title('AE_out_rmse={:.2f}, AE_out_e2e_rmse={:.2f}'.format(
            rmse(ppg[...,0:1],ppg_hat_AE),rmse(ppg[...,0:1],ppg_hat_e2e)))

#HR plot
plt.figure()
ax1=plt.subplot(311)
plt.plot(HR.reshape(-1),'b',HR_from_ppg.reshape(-1),'r--')
plt.legend(['True','HR_ppg_true'])
plt.subplot(312,sharex=ax1,sharey=ax1)
plt.plot(HR.reshape(-1),'b',HR_hat_AE.reshape(-1),'r--')
plt.legend(['True','HR_hat_AE'])
plt.subplot(313,sharex=ax1,sharey=ax1)
plt.plot(HR.reshape(-1),'b',HR_hat_e2e.reshape(-1),'r--')
plt.legend(['True','HR_hat_e2e'])
plt.xlabel('Sample No.');plt.ylabel('PPG magnitude')
plt.suptitle('RMSE: HR_ppg_true={:.4f}, HR_hat_AE={:.4f}, '
             'HR_hat_e2e={:.4f}'.format(rmse(HR,HR_from_ppg),
                rmse(HR,HR_hat_AE),rmse(HR,HR_hat_e2e)))

**Quantize the model**

We are now fully ready to convert the model into it's lightweight component, an 8 bit fixed point version of our model. Normally, in machine learning models we use a floating point 32 type of number, whitch is a number that supports decimal points and has a high precision amount, meaning you can type a lot of numbers into it. 8 bit integers only support up to 256 possible numbers, and as such, are not as accurate, but are much faster to run. After running this code, a model named according to the string in "filename". Quantization is typically done by representing the number by this format: Real number = most_significant_digits_of_real_number*scaling_factor. This is the esense of what the converter is doing, though there are many optimizations you can make to ensure this runs with optimal accuracy. You can do this by modifying the attributes of the converter objects, as you can see in the code below. However, since we have employed Quantization Awareness training, most of these settings are automated, and we just need to make sure only one flag, converter.optimizations, is set.

In [None]:
file_name = "tf_lit2_encoder.tflite"
#prepare to convert the encoder model for deeployment by getting our converter object
converter = tf.lite.TFLiteConverter.from_keras_model(encoder_q)


#the following optimizations are already configured by QAT:
#converter.representitive_dataset
#converter.target_spec
#converter.inference_input_type and converter.inference_output_type

converter.optimizations = [tf.lite.Optimize.DEFAULT]

#this converts the model to it's 8 bit weight, bias, and activation form
encoder_tf_lite = converter.convert()

#we now have our model, and we can save it to our directory for deployment
open(file_name, "wb").write(encoder_tf_lite)

**Compare our model to the original**

To make sure our model works, we will now test it with the baseline model and compare mean-sqaure error values to measure how accurate they are. In Tensorflow, you can to this with model.evaluate, but because we are now also working with a tensorflow lite model format we need to pass induvidual values to it through a loop and then stack the values with numpy. Autoencoders work just like any other network in that you can compare the results of the network's output with the desired result, in this case our real ppg signal.

In [None]:
test_tf_lite_model = tf.lite.Interpreter("tf_lit2_encoder.tflite")
#you can change the name of this model in the program by renaming the "file_name" variable and then
#replaceing the above string with your file_name variable
test_tf_lite_model.allocate_tensors()
#the model involking actually works the same way in tflite micro, the only difference is the language
input_tensor = test_tf_lite_model.get_input_details()[0]["index"]
output_tensor = test_tf_lite_model.get_output_details()

repre_data_set = np.array(val_data_AE[1])
np.random.shuffle(repre_data_set)

result_difference = 0
results = []
sample_test_num = 3493
first_time = False
# Run the model's interpreter for each value and store the results in arrays
for i in range(0, sample_test_num):
    calibrat_data = repre_data_set[i]
    calibrat_data = np.reshape(calibrat_data, (1, 400))
    tensor_calibrat = tf.convert_to_tensor(calibrat_data, dtype=tf.float32)
    test_tf_lite_model.set_tensor(input_tensor, tensor_calibrat)
    test_tf_lite_model.invoke()
    output_data = test_tf_lite_model.get_tensor(output_tensor[0]['index'])
    normal_model_output = 0
    if not first_time:
        result = np.array(output_data)
        first_time = True
        result_parent = np.array(normal_model_output)
    else:
        result = np.vstack((result, output_data))
        result_parent = np.vstack((result_parent, normal_model_output))

    #now we have the data from both sets encoder models. All we need to do is pass them through data

    #now we are just taking the average of the difference of each value in the output tensor and printing this
    result_arr = np.subtract(normal_model_output, output_data)
    result_arr = np.absolute(result_arr)
    result_difference = np.mean(result_arr)
    #we are taking a mean here to compare the result_difference)
    results.append(float(result_difference))

final_diff = np.mean(results)
result_parent = encoder.predict(repre_data_set)
decoder.compile(optimizer = "adam", loss = "mse", metrics = ["mse"])
print("the mean ouput loss across", sample_test_num, "runs is ", final_diff)
tflite_loss = decoder.evaluate(result, repre_data_set)
parent_loss = decoder.evaluate(result_parent, repre_data_set)
model_AE.evaluate(repre_data_set, repre_data_set)

**Convert the model to a Tflite micro model**

Now that we have our full tensorflow lite model, let's convert this model into a tf-lite-micro file so that it can be deployed directly to a micro-controller. You can do this by opening up linux and running the following commands in the terminal. Make sure you replace the MODEL_TFLITE and MODEL_TFLITE_MICRO with the tflite file name and the converted name you want the file to be, respectively.

# Install xxd if it is not available
!apt-get update && apt-get -qq install xxd
# Convert to a C source file, i.e, a TensorFlow Lite for Microcontrollers model
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}
# Update variable names
REPLACE_TEXT = MODEL_TFLITE.replace('/', '_').replace('.', '_')
!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {MODEL_TFLITE_MICRO}

**load a tensorflow nrf-project**

This is where our project gets a little specific. Using your MotionSense Device, connect it to your computer. You should see a notification that Jlink is connected. Next, make sure you install nrf connect sdk, and get the segger embedded studio version. You should follow the intruction guide as specified in this [nordic guide](https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/gs_installing.html), following the linux intructions. Once you are in Segger Embedded Studio, click on file:open Nrf Connect Project. Next, grab the project from the repository (located in edge-deployment). In the src tab, click on the (...) icon and select the project folder. In the board tab, select nrf5340pdk_nrf5340_cpuapp. This should load the project into the Segger IDE.

**Update the model with newly created one**


Once the project has opened, you should see a file named model.cc. This is the current model that the application is using, and we will need to replace this with our own model. In the project directory, go to the CmakeLists.txt file and open it. You should see something like the following:

```cmake
cmake_minimum_required(VERSION 3.13.1)
find_package(Zephyr HINTS $ENV{ZEPHYR_BASE})
project(external_lib)

#target_sources(app PRIVATE src/hello_world_test.cc)
target_sources(app PRIVATE src/main.cc)
target_sources(app PRIVATE src/main_functions.cc)
target_sources(app PRIVATE src/constants.cc)
target_sources(app PRIVATE src/output_handler.cc)
target_sources(app PRIVATE src/Tencoder_data.cc)
target_sources(app PRIVATE src/assert.cc)
target_sources(app PRIVATE src/bluetooth_func.c)
target_sources(app PRIVATE src/sample_test.cc)

zephyr_include_directories(src)


#this file is the target c makelists file for the Ohio State University SENSE lab ble autoencoder.

#zephyr_cc_option(-lstdc++)

# The external static library that we are linking with does not know
# how to build for this platform so we export all the flags used in
# this zephyr build to the external build system.
#
# Other external build systems may be self-contained enough that they
# do not need any build information from zephyr. Or they may be
# incompatible with certain zephyr options and need them to be
# filtered out.

#If nithin plans to work on this code we will need to add some if then statements for
#controlling these marcos.
                # you need to fix the following path before you run the code
set(TF_SRC_DIR  /home/devan/Documents/ncs/nrf/applications/nrf-tensorflow/tensorflow)
#set(TF_SRC_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../tensorflow)    #Somehow this does not work, and I have no idea why CMAKE does not like the relative path.
set(TF_MAKE_DIR ${TF_SRC_DIR}/tensorflow/lite/micro/tools/make)
set(TF_LIB_DIR ${TF_MAKE_DIR}/gen/${TARGET}_${TARGET_ARCH}/lib)
# Create a wrapper CMake library that our app can link with
add_library(tf_lib STATIC IMPORTED GLOBAL)

set_target_properties(tf_lib PROPERTIES IMPORTED_LOCATION ${CMAKE_SOURCE_DIR}/lib/nrf5340_cortex-m33_libtensorflow-microlite.a)

#set_target_properties(tf_lib PROPERTIES INTERFACE_INCLUDE_DIRECTORIES "${TF_SRC_DIR};${TF_SRC_DIR}/tensorflow/lite/micro;${TF_MAKE_DIR}/downloads/flatbuffers/include")

target_link_libraries(app PUBLIC tf_lib)

```

To link your own model, replace the file in the code marked
`target_sources(app PRIVATE src/Tencoder_data.cc)`
with your own generated file from above.

**Inspect the code for the tensorflow lite micro model startup**

Inside the Embedded studio Project you will find a file labeled main_functions.cc. When you run run the project on your device, this is the main stack where most of the tensorflow inference is done.

```cpp
tflite::ErrorReporter *error_reporter = nullptr;
const tflite::Model *model = nullptr;
tflite::MicroInterpreter *interpreter = nullptr;
TfLiteTensor *input = nullptr;
TfLiteTensor *output = nullptr;
int inference_count = 0;
TfLiteIntArray* n_dims;

/*
There is code normlly here but it is deleted because we are going to
eventually change it to be better.


/*

outputInit();


	static tflite::MicroErrorReporter micro_error_reporter;
	error_reporter = &micro_error_reporter;

	// we are creating the model here. This doesn't involve any
	// copying or parsing, it's a very lightweight operation.
	model = tflite::GetModel(tf_lit2_encoder_tflite);

	if (model->version() != TFLITE_SCHEMA_VERSION) {
		TF_LITE_REPORT_ERROR(
			error_reporter,
			"Model provided is schema version %d not equal "
			"to supported version %d.",
			model->version(), TFLITE_SCHEMA_VERSION);
		return;
	}

	/* This pulls in all the operation implementations we need. However we might change
       this later on as we only really need dense and convulution functions for the autoencoder
       But, if we want models to be able to be swapped out during runtime, it should stay this way.*/

	// NOLINTNEXTLINE(runtime-global-variables)
	static tflite::AllOpsResolver resolver;

	//This creates our interpreter that runs the (soon to be) auto encoder model.
	static tflite::MicroInterpreter static_interpreter(model, resolver,
							   tensor_arena,
							   kTensorArenaSize,
							   error_reporter);
	interpreter = &static_interpreter;

	// Allocate memory from the tensor_arena for the model's tensors.
	TfLiteStatus allocate_status = interpreter->AllocateTensors();

	if (allocate_status != kTfLiteOk) {
		TF_LITE_REPORT_ERROR(error_reporter,
				     "AllocateTensors() failed");


		return;
	}
```

Looking at this code, we have the following. First, we setup all of our variables. The main objects we have are the error reporter and interpreter, whitch are static global objects stored as in our heap, so that it will be permanently in memory. The error reporter object attaches to the interpreter, and will evalute the interpreter closely to make sure it running inferences accuratly and not messing up or erroring. Then, our model object is created, whitch is actually just the model we imported. This model object is composed of all the weights and bias from out parent model, as well as information about the layers, whitch is all encoded in a c byte array that you can see in your created tflite micro file. At runtime, the interpreter interprets these bytes and with the method AllocateTensors(), it makes allocation requests for memory to store the entire working model in. With this, we are ready to be able to use the model.

**Verify the dimentions for the tensorflow lite micro inputs and outputs**

when you initially run it with Segger Embedded Studio, you should see 2 print statements. These are for verification that the model is working properly. The second statement prints out the input and output tensors dimentions. Verify that these are correct with the model that you imported. Also check that this matches the code to place in the input tensors in. The Snippet looks like this:

```cpp
//get ready to load the test ppg data into the input tensor
        for (int i = 0; i<400; i++){
                input->data.f[i] = sample_data[i];
        }
        float test_bounds = input->data.f[399];
```

And should match the number in the input tensor print statement.

**Receiving the setup data on target device**
