Skip to content

bharathsudharsan/Tiny-Impute

Repository files navigation

Tiny-Impute: On-device Hybrid Anomaly Detection and Data Imputation

Imputation Algorithms

Summary of the 3 hybrid anomaly detection and data imputation Algorithms:

Moving Average with Simple Linear Regression (MA-SLR)

This algorithm is designed for MCUs and small CPU devices (like Arduino boards), considering their hardware limitations. In this algorithm, we developed and employed a hybrid system that seamlessly integrates moving averages with Z-score thresholding to accurately pinpoint and remove anomalous data points within a dataset. This is further augmented by utilizing a modified linear regression method for data imputation [Code for IoT Boards][Code for PC and RPi].

K-Nearest Neighbors with Expectation-Maximization (KNN-EM)

This algorithm is designed for edge devices (like gateways, AIoT boards, and SBCs) with processing and memory capabilities higher than MCUs. The design of this algorithm combines our highly-optimized unsupervised K-Nearest Neighbors (KNN) and Expectation-Maximization (EM) for anomaly detection and data imputation respectively [Code for IoT Boards][Code for PC and RPi].

Optimized Laplacian Convolutional Representation (LCR-Opt)

Here, we deeply modified and optimized a top-performing and high-resource consuming (LCR) method, that imputes missing data using a low-rank approximation model complemented by regularization techniques [Code for IoT Boards][Code for PC and RPi].

Test Datasets

Datasets used to test Tiny-Impute algorithms MA-SLR, KNN-EM, LCR-Opt:

  • Gesture Phase Segmentation: The dataset is composed by features extracted from 7 videos with people gesticulating. It contains 50 attributes divided into two files for each video [Original Dataset] [Test Samples]

  • Iris Flowers: A small classic dataset. Very popular datasets used for evaluating classification methods [Original Dataset] [Test Samples]

  • Mammographic Mass: Discrimination of benign and malignant mammographic masses based on BI-RADS attributes and the patient's age. To access Original Dataset [Original Dataset] [Test Samples]

  • Daily and Sports Activities: The dataset comprises motion sensor data of 19 daily and sports activities each performed by 8 subjects in their own style for 5 minutes [Original Dataset] [Test Samples]

  • Urban Observatory - CO: Carbon Monoxide (CO) data taken from the Urban Observatory, Newcastle University [Original Dataset]

IoT Boards

The IoT boards used to test the 3 imputation algorithms over 5 test datasets:

  • Arduino MKR1000: [CPU] SAMD21 Cortex-M0+ 48MHz. [Memory] Flash 256KB, SRAM 32KB [Board]

  • ESP 32 Dev Kit: [CPU] Xtensa LX6 240 MHz. [Memory] Flash 4MB, SRAM 520KB [Board]

  • Raspberry Pi 4 Model B: [CPU] Cortex-A72 1.8GHz. [Memory] M-SD 16GB, SDRAM 4GB [Board]

Imputation Experiments

CircuitPython & MicroPython - IoT Boards

Set up the IoT board by installing the appropriate Python implementation by following [CircuitPython] or [MicroPython] To have an easier experience with coding and running the repo on MCUs, intall and use Thonny IDE

To run the expirements on IoT Board, clone this repo, copy the dataset sample (.csv files) to the board's memory, call the same name in the code, then run the (.py file) on the board

Jupyter Notebooks - PC / Collab

To run the expirements on local PC, clone this repo, open the algorithm of choice (.ipynb files), run all cells in sequence