<a href="https://colab.research.google.com/github/SebastianSaldarriagaC1/os-final-project-tinyml/blob/main/TinyML03_Model_validation_with_dirty_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Set environment

In [1]:
!pip install tflite-runtime

Collecting tflite-runtime
  Downloading tflite_runtime-2.14.0-cp310-cp310-manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tflite-runtime
Successfully installed tflite-runtime-2.14.0


In [2]:
!pip install gdown
import gdown



## Upload and Load Model

In this step, we upload the pre-trained TensorFlow Lite model (model.tflite) to our Colab environment.

In [3]:
from google.colab import files
files.upload()

Saving model.tflite to model.tflite


{'model.tflite': b'\x1c\x00\x00\x00TFL3\x14\x00 \x00\x1c\x00\x18\x00\x14\x00\x10\x00\x0c\x00\x00\x00\x08\x00\x04\x00\x14\x00\x00\x00\x1c\x00\x00\x00\x84\x00\x00\x00\xdc\x00\x00\x00$\x1d\x00\x004\x1d\x00\x00\xb8&\x00\x00\x03\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00\x82\xe2\xff\xff\x0c\x00\x00\x00\x1c\x00\x00\x00<\x00\x00\x00\x0f\x00\x00\x00serving_default\x00\x01\x00\x00\x00\x04\x00\x00\x00\x98\xff\xff\xff\x13\x00\x00\x00\x04\x00\x00\x00\x08\x00\x00\x00dense_11\x00\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x00\x9e\xe3\xff\xff\x04\x00\x00\x00\x07\x00\x00\x00input_2\x00\x02\x00\x00\x004\x00\x00\x00\x04\x00\x00\x00\xdc\xff\xff\xff\x16\x00\x00\x00\x04\x00\x00\x00\x13\x00\x00\x00CONVERSION_METADATA\x00\x08\x00\x0c\x00\x08\x00\x04\x00\x08\x00\x00\x00\x15\x00\x00\x00\x04\x00\x00\x00\x13\x00\x00\x00min_runtime_version\x00\x17\x00\x00\x00D\x1c\x00\x00<\x1c\x00\x00\xec\x1b\x00\x00\xb4\x1b\x00\x00d\x1b\x00\x00\xd4\x1a\x00\x00\xb0\x1a\x00\x00 \x1a\x00\x00\x90\x17\x00\x00\x80\x0f\x00\x00p\r\x00\x00

We download and load the dataset into a DataFrame. This dataset contains temperature data that we will use to detect anomalies.

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
url = 'https://drive.google.com/uc?id=1lJWy8niBfia6uacFvtpPKLVQpn-zKQmN'
output = 'dirty-earth-surface-temperature-data.csv'
gdown.download(url, output, quiet=False)

df = pd.read_csv(output)

Downloading...
From (original): https://drive.google.com/uc?id=1lJWy8niBfia6uacFvtpPKLVQpn-zKQmN
From (redirected): https://drive.google.com/uc?id=1lJWy8niBfia6uacFvtpPKLVQpn-zKQmN&confirm=t&uuid=b08ed309-f3f2-4660-b1ee-a9a5cebaacc3
To: /content/dirty-earth-surface-temperature-data.csv
100%|██████████| 202M/202M [00:01<00:00, 104MB/s]


In [6]:
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping
from tflite_runtime.interpreter import Interpreter

## Data Normalization

We select the relevant features for our model and normalize them using StandardScaler, ensuring consistency with the preprocessing done during training.

In [7]:

features = df[['AverageTemperature', 'Latitude', 'Longitude', 'Month', 'Year']]


scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)

print(scaled_features[:5])


[[-1.16890645  1.4377177  -0.29439889  1.31217702 -3.41410721]
 [-0.70740105  1.4377177  -0.29439889 -0.45181951 -3.39542581]
 [-0.36379333  1.4377177  -0.29439889 -0.15782009 -3.39542581]
 [-0.15895996  1.4377177  -0.29439889  0.13617933 -3.39542581]
 [-0.49187721  1.4377177  -0.29439889  0.72417818 -3.39542581]]


In [8]:
print(scaler.mean_)

[  17.65814788   24.08938664   32.08945356    6.53680408 1925.7543933 ]


# Predict and detect anomalies

We define a function to predict using the TensorFlow Lite model, then use this function to predict the reconstructed values for the dataset. We calculate the mean squared error (MSE) for each sample and set a threshold to identify anomalies. Samples with an MSE above this threshold are considered anomalies, and their indices are printed.

In [9]:

interpreter = tf.lite.Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Function to predict one sample
def predict(sample):
    # Check the sample has the correct format
    sample = np.array(sample, dtype=np.float32)
    sample = np.expand_dims(sample, axis=0)

    # Set input data
    interpreter.set_tensor(input_details[0]['index'], sample)

    # Execute model
    interpreter.invoke()

    # Get prediction
    output_data = interpreter.get_tensor(output_details[0]['index'])
    return output_data


In [19]:
predictions = []
for sample in scaled_features:
    predictions.append(predict(sample))

predictions = np.array(predictions).squeeze()

mse = np.mean(np.power(scaled_features - predictions, 2), axis=1)

threshold = np.percentile(mse, 95)

anomalies = mse > threshold

anomaly_indices = np.where(anomalies)[0]

print(f'Amount of anomalies detected: {len(anomaly_indices)}')
print('Detected anomalies indexes:')
print(anomaly_indices)
print(f'Threshhold: {threshold}')

Amount of anomalies detected: 357454
Detected anomalies indexes:
[      0       1       2 ... 7147246 7148188 7148464]
Threshhold: 1.6664290202151997


# Convert TFlite to C Array

In [None]:
def tflite_to_c_array(tflite_model_path, c_array_path):
    # Read the TensorFlow Lite model file
    with open(tflite_model_path, 'rb') as f:
        tflite_model = f.read()

    # Create the C array header content
    c_array_content = f'const unsigned char model[] = {{\n'
    c_array_content += ',\n'.join('  ' + ', '.join(f'0x{byte:02x}' for byte in tflite_model[i:i+12])
                                  for i in range(0, len(tflite_model), 12))
    c_array_content += f'\n}};\nconst int model_len = {len(tflite_model)};'

    # Save the C array to a file
    with open(c_array_path, 'w') as f:
        f.write(c_array_content)

In [None]:
tflite_to_c_array('model.tflite', 'temperature_model.h')