Reference websites:
* https://www.hackster.io/news/easy-tinyml-on-esp32-and-arduino-a9dbc509f26c
* https://github.com/eloquentarduino/EloquentTinyML
* https://github.com/atomic14/tensorflow-lite-esp32
* https://github.com/eloquentarduino/tinymlgen
* https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization
* https://medium.com/mlearning-ai/optimizing-tflite-models-for-on-edge-machine-learning-for-efficiency-a-comparison-of-quantization-2c0123959cb6

Links to check out:
* https://www.tensorflow.org/model_optimization

#### Code below is used to generate the FFT TFLite Model 
* Healthy data: Own dataset
* Unhealthy data: Online dataset

1. Loading of dataset

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import glob
import os

# Selecting the relevant data

vibe_col = [1,2,3]
healthy_vert = 'C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/healthy/second-batch-27-3-2023-cleaned/vertical-cleaned/OneKhz2023-03-24t0*.csv'
healthy_hori = 'C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/healthy/second-batch-27-3-2023-cleaned/horizontal-cleaned/OneKhz2023-03-24t0*.csv'
unhealthy_vert = 'C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/unhealthy/loose-base/first-batch-22-3-2023/vibration/vertical-cleaned/OneKhz2023-03-23t0*.csv'
unhealthy_hori = 'C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/unhealthy/loose-base/first-batch-22-3-2023/vibration/horizontal-cleaned/OneKhz2023-03-23t0*.csv'

# Loading the data
def dataReader(datapath):
    data_n = pd.DataFrame()
    for file in glob.glob(datapath):
        df = pd.read_csv(file, usecols=['x', 'y', 'z'])

        data_n = pd.concat([data_n, df], axis=0)
        print("done with file: " + file)

    return data_n

data_healthy_vert = dataReader(healthy_vert)
data_healthy_hori = dataReader(healthy_hori)
data_unhealthy_vert = dataReader(unhealthy_vert)
data_unhealthy_hori = dataReader(unhealthy_hori)

print("starting size: ")
print(data_healthy_vert.shape)
print(data_healthy_hori.shape)
print(data_unhealthy_vert.shape)
print(data_unhealthy_hori.shape)

done with file: C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/healthy/second-batch-27-3-2023-cleaned/vertical-cleaned\OneKhz2023-03-24t00-00-00.csv
done with file: C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/healthy/second-batch-27-3-2023-cleaned/vertical-cleaned\OneKhz2023-03-24t00-30-00.csv
done with file: C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/healthy/second-batch-27-3-2023-cleaned/vertical-cleaned\OneKhz2023-03-24t01-00-00.csv
done with file: C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/healthy/second-batch-27-3-2023-cleaned/vertical-cleaned\OneKhz2023-03-24t01-30-00.csv
done with file: C:/Users/jared/OneDrive - National University of Singapore/Y2/S2/EG3301R/DataCollection/healthy/second-batch-27-3-2023-cleaned/vertical-cleaned\OneKhz2023-03-24t02-00-00.csv
done with file: C:/Users/jared/OneDrive - National

In [2]:
# Normalise the data
def normalise(df):
    df_normalized = df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)
    return df_normalized

data_healthy_hori_norm = normalise(data_healthy_hori)
data_healthy_vert_norm = normalise(data_healthy_vert)
data_unhealthy_hori_norm = normalise(data_unhealthy_hori)
data_unhealthy_vert_norm = normalise(data_unhealthy_vert)

2. Checking if data is loaded in properly

In [3]:
print(data_healthy_hori_norm.info())
print(data_healthy_vert_norm.info())
print(data_unhealthy_hori_norm.info())
print(data_unhealthy_vert_norm.info())

print(data_healthy_hori_norm.head())
print(data_healthy_vert_norm.head())
print(data_unhealthy_hori_norm.head())
print(data_unhealthy_vert_norm.head())

<class 'pandas.core.frame.DataFrame'>
Index: 4900690 entries, 0 to 245128
Data columns (total 3 columns):
 #   Column  Dtype  
---  ------  -----  
 0   x       float64
 1   y       float64
 2   z       float64
dtypes: float64(3)
memory usage: 149.6 MB
None
<class 'pandas.core.frame.DataFrame'>
Index: 4917380 entries, 0 to 245073
Data columns (total 3 columns):
 #   Column  Dtype  
---  ------  -----  
 0   x       float64
 1   y       float64
 2   z       float64
dtypes: float64(3)
memory usage: 150.1 MB
None
<class 'pandas.core.frame.DataFrame'>
Index: 4943184 entries, 0 to 243985
Data columns (total 3 columns):
 #   Column  Dtype  
---  ------  -----  
 0   x       float64
 1   y       float64
 2   z       float64
dtypes: float64(3)
memory usage: 150.9 MB
None
<class 'pandas.core.frame.DataFrame'>
Index: 4940044 entries, 0 to 244150
Data columns (total 3 columns):
 #   Column  Dtype  
---  ------  -----  
 0   x       float64
 1   y       float64
 2   z       float64
dtypes: float64

3. Downsampling to reduce size

In [38]:
def downSampler(data, a, b):
    """
    data = data
    a = start index
    b = sampling rate
    """
    x = b
    downsampled_data = [data.iloc[a:b,:].sum()/x for i in range(int(len(data)/x))]
    return pd.DataFrame(downsampled_data)

# Create donwsampled datasets
data_healthy_hori_norm_downsampled = downSampler(data_healthy_hori_norm, 0, 500)
data_healthy_vert_norm_downsampled = downSampler(data_healthy_vert_norm, 0, 500)
data_unhealthy_hori_norm_downsampled = downSampler(data_unhealthy_hori_norm, 0, 500)
data_unhealthy_vert_norm_downsampled = downSampler(data_unhealthy_vert_norm, 0, 500)

4. Checking that data is downsampled properly

In [39]:
print(data_healthy_hori_norm_downsampled.shape)
print(data_healthy_vert_norm_downsampled.shape)
print(data_unhealthy_hori_norm_downsampled.shape)
print(data_unhealthy_vert_norm_downsampled.shape)

(9801, 3)
(9834, 3)
(9886, 3)
(9880, 3)


5. Data processing. FFTConolve method is used here

In [40]:
from scipy import signal
def FFTConvolve(data):
    autocorr = signal.fftconvolve(data,data[::-1],mode='full')
    return pd.DataFrame(autocorr)

# Create FFTConvolved datasets
data_healthy_hori_norm_fftconvole = FFTConvolve(data_healthy_hori_norm_downsampled)
data_healthy_vert_norm_fftconvole = FFTConvolve(data_healthy_vert_norm_downsampled)
data_unhealthy_hori_norm_fftconvole = FFTConvolve(data_unhealthy_hori_norm_downsampled)
data_unhealthy_vert_norm_fftconvole = FFTConvolve(data_unhealthy_vert_norm_downsampled)

6. Checking that the data processing step is done correctly

In [41]:
print(data_healthy_hori_norm_fftconvole.shape)
print(data_healthy_vert_norm_fftconvole.shape)
print(data_unhealthy_hori_norm_fftconvole.shape)
print(data_unhealthy_vert_norm_fftconvole.shape)

(19601, 5)
(19667, 5)
(19771, 5)
(19759, 5)


7. Data labelling

In [42]:
# Setting up labels for horiztonal and vertical data
y_hori_healthy = pd.DataFrame(np.zeros(int(len(data_healthy_hori_norm_fftconvole)),dtype=int))
y_hori_unhealthy = pd.DataFrame(np.ones(int(len(data_unhealthy_hori_norm_fftconvole)),dtype=int))
y_vert_healthy = pd.DataFrame(np.zeros(int(len(data_healthy_vert_norm_fftconvole)),dtype=int))
y_vert_unhealthy = pd.DataFrame(np.ones(int(len(data_unhealthy_vert_norm_fftconvole)),dtype=int))

y_hori = pd.concat([y_hori_healthy, y_hori_unhealthy],axis=0)
y_vert = pd.concat([y_vert_healthy, y_vert_unhealthy],axis=0)

print(y_hori.head())
print(y_hori.tail())
print(y_vert.head())
print(y_vert.tail())

   0
0  0
1  0
2  0
3  0
4  0
       0
19766  1
19767  1
19768  1
19769  1
19770  1
   0
0  0
1  0
2  0
3  0
4  0
       0
19754  1
19755  1
19756  1
19757  1
19758  1


8. Preparing data to train model

In [43]:
x_hori = pd.concat([data_healthy_hori_norm_fftconvole, data_unhealthy_hori_norm_fftconvole], ignore_index=True) # Concatenate all the data
x_vert = pd.concat([data_healthy_vert_norm_fftconvole, data_unhealthy_vert_norm_fftconvole], ignore_index=True) # Concatenate all the data

In [44]:
x_hori # Check if data is concatenated correctly

Unnamed: 0,0,1,2,3,4
0,0.002180,0.000712,-0.001461,-0.000248,0.000265
1,0.004359,0.001425,-0.002921,-0.000497,0.000529
2,0.006539,0.002137,-0.004382,-0.000745,0.000794
3,0.008718,0.002850,-0.005843,-0.000993,0.001059
4,0.010898,0.003562,-0.007304,-0.001241,0.001323
...,...,...,...,...,...
39367,0.006315,0.010130,0.005048,0.000791,0.000039
39368,0.005052,0.008104,0.004039,0.000633,0.000031
39369,0.003789,0.006078,0.003029,0.000475,0.000023
39370,0.002526,0.004052,0.002019,0.000316,0.000015


In [45]:
x_vert # Check if data is concatenated correctly

Unnamed: 0,0,1,2,3,4
0,0.000356,-0.000585,-0.000008,0.000204,0.000043
1,0.000711,-0.001169,-0.000016,0.000408,0.000087
2,0.001067,-0.001754,-0.000023,0.000612,0.000130
3,0.001422,-0.002339,-0.000031,0.000816,0.000173
4,0.001778,-0.002923,-0.000039,0.001020,0.000216
...,...,...,...,...,...
39421,0.004556,0.003275,0.015486,0.005355,0.012177
39422,0.003645,0.002620,0.012389,0.004284,0.009742
39423,0.002734,0.001965,0.009292,0.003213,0.007306
39424,0.001823,0.001310,0.006194,0.002142,0.004871


9. Splitting the data

In [46]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
x_hori_train, x_hori_test, y_hori_train, y_hori_test = train_test_split(x_hori, y_hori, test_size=0.25, shuffle=True)

x_vert_train, x_vert_test, y_vert_train, y_vert_test = train_test_split(x_vert, y_vert, test_size=0.25, shuffle=True)

10. Training of model

In [47]:
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import Sequential

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True, mode="auto")

def representative_dataset_hori():
    for val in x_hori_test:
        yield [np.array(val, dtype=np.float32)]

def representative_dataset_vert():
    for val in x_vert_test:
        yield [np.array(val, dtype=np.float32)]

def get_model(x_train, y_train, epochs=10, validation_split=0.2, batch_size=100):
    model = Sequential()
    model.add(Dense(32, activation='relu', input_shape=(5,)))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(2, activation='softmax')) # Output layer needs to correspond to the number of classes for softmax
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train, epochs=epochs, validation_split=validation_split, batch_size=batch_size, callbacks=[early_stopping])
    model.summary()
    return model

FFTmodel_hori = get_model(x_hori_train, y_hori_train, epochs=50, validation_split=0.2, batch_size=100)
converter = tf.lite.TFLiteConverter.from_keras_model(FFTmodel_hori)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_hori
tflite_model_hori = converter.convert()

# Save the model to disk
open("FFT_model_hori_quantized.tflite", "wb").write(tflite_model_hori)

FFTmodel_vert = get_model(x_vert_train, y_vert_train, epochs=50, validation_split=0.2, batch_size=100)
converter = tf.lite.TFLiteConverter.from_keras_model(FFTmodel_vert)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_vert
tflite_model_vert = converter.convert()

# Save the model to disk
open("FFT_model_vert_quantized.tflite", "wb").write(tflite_model_vert)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_36 (Dense)            (None, 32)                192       
                                                                 
 dense_37 (Dense)            (None, 64)                2112      
                                                                 
 dense_38 (Dense)            (None, 32)                2080      
         

INFO:tensorflow:Assets written to: C:\Users\jared\AppData\Local\Temp\tmpc263hret\assets


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_40 (Dense)            (None, 32)                192       
                                                                 
 dense_41 (Dense)            (None, 64)                2112      
                                                                 
 dense_42 (Dense)            (None, 32)                2080      
                                                                 
 dense_43 (Dense)            (None, 2)                 66        
                                                                 
Total params: 4450 (17.38 KB)
Trainable params: 4450 (17.38 KB)
Non-trainable params: 0 (0.00 Byte)
__________

INFO:tensorflow:Assets written to: C:\Users\jared\AppData\Local\Temp\tmpx1i4mon0\assets


20404

11. Converting of model to C array (Run below line on bash)

`xxd -i FFT_model_fullint_quantized.tflite > FFT_model_fullint_quantized.cc`

#### Evaluating the TFLite model

In [48]:
# Testing the baseline model on the test dataset.

# Evaluating the model on the test dataset.
_, baseline_hori_model_accuracy = FFTmodel_hori.evaluate(x=x_hori_test, y=y_hori_test, verbose=0)

# Printing the baseline test accuracy in percentage.
print('The Baseline test accuracy:', baseline_hori_model_accuracy * 100)

# Evaluating the model on the test dataset.
_, baseline_vert_model_accuracy = FFTmodel_vert.evaluate(x=x_vert_test, y=y_vert_test, verbose=0)

# Printing the baseline test accuracy in percentage.
print('The Baseline test accuracy:', baseline_vert_model_accuracy * 100)

The Baseline test accuracy: 100.0
The Baseline test accuracy: 99.95942115783691


In [50]:
# A helper function to evaluate the TF Lite model using "test" dataset.
def evaluate_model(interpreter, x_test, y_test):
    # Get input and output tensors.
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    input_shape = input_details[0]['shape']
    num_test_samples = len(x_test)

    # Run predictions on every set in the "test" dataset.
    prediction_y = []
    for i in range(num_test_samples):

        # Pre-processing the data to fit it with the model's input.
        input_data = np.array(x_test.iloc[i,:], dtype=np.float32)
        input_data = np.expand_dims(input_data, axis=0)
        interpreter.set_tensor(input_details[0]['index'], input_data)

        # Run inference.
        interpreter.invoke()

        # Post-processing: remove batch dimension and find the digit with highest
        # probability.
        output_data = interpreter.get_tensor(output_details[0]['index'])
        prediction_y.append(output_data.argmax())

    # Compare prediction results with ground truth labels to calculate accuracy.
    accurate_count = 0
    for index in range(len(prediction_y)):
        if prediction_y[index] == y_test.iloc[index][0]:
            accurate_count += 1
    accuracy = accurate_count * 1.0 / len(prediction_y)

    return accuracy


# Passing the FP-16 TF Lite model to the interpreter.
interpreter = tf.lite.Interpreter('FFT_model_hori_quantized.tflite')

# Allocating tensors.
interpreter.allocate_tensors()

# Evaluating the model on the test dataset.
test_accuracy_hori = evaluate_model(interpreter, x_hori_test, y_hori_test)

# Printing the test accuracy for the FP-16 quantized TFLite model and the baseline Keras model.
print('Hori Quantized TFLite Model Test Accuracy:', test_accuracy_hori*100)

# Printing the test accuracy for the baseline Keras model.
print('Baseline Hori Keras Model Test Accuracy:', baseline_hori_model_accuracy*100)

# Testing the full integer quantized model on the test dataset.

# Passing the full integer quantized TF Lite model to the interpreter.
interpreter = tf.lite.Interpreter('FFT_model_vert_quantized.tflite')

# Allocating tensors.
interpreter.allocate_tensors()

# Evaluating the model on the test dataset.
test_accuracy_vert = evaluate_model(interpreter, x_vert_test, y_vert_test)

# Printing the test accuracy for the full integer quantized TFLite model and the baseline Keras model.
print('Vert Quantized TFLite Model Test Accuracy:', test_accuracy_vert*100)

# Printing the test accuracy for the baseline Keras model.
print('Baseline Vert Keras Model Test Accuracy:', baseline_vert_model_accuracy*100)

Hori Quantized TFLite Model Test Accuracy: 100.0
Baseline Hori Keras Model Test Accuracy: 100.0
Vert Quantized TFLite Model Test Accuracy: 99.9594197017348
Baseline Vert Keras Model Test Accuracy: 99.95942115783691


Current to-dos:
* Find way to build FFTConvolve model without overfitting
* Find way to do hyperparameters testing for models
* Create template to test TF model and TFLite models quickly