```python
#!/usr/bin/env python
# coding: utf-8 

#   This software component is licensed by ST under BSD 3-Clause license,
#   the "License"; You may not use this file except in compliance with the
#   License. You may obtain a copy of the License at:
#                        https://opensource.org/licenses/BSD-3-Clause
  

# Script to prepare the data for the Human Activitiy Recognition using Cartesiam NanoEdge AI Studio.

```

# Step by Step HAR using Classical Machine Learning and STM32CubeAI

This notebook provides a step by step demonstration of a simple script to prepare input segments for NanoEdge&trade; AI Studio for <u>H</u>uman <u>A</u>ctivity <u>R</u>ecognition (HAR) database. This script relies on a simple data preperation script through `DataHelper` class (used to prepare data for CNN and SVM also) and let user to preprocess, and segment the dataset to bring it into the form which can be used for generating the classification libraries with NanoEdge&trade; AI Studio.

For demonstration purposes this script uses two datasets created for HAR using accelerometer sensor. 

* A public dataset provided by <u>WI</u>reless <u>S</u>ensing <u>D</u>ata <u>M</u>ining group named as **<u>WISDM</u>**. The details of the dataset are available [here](http://www.cis.fordham.edu/wisdm/dataset.php).

* Our own propritery dataset called **<u>AST</u>**. 

**Note**: We are not providing any dataset in the function pack. The user can download WISDM dataset from [here](http://www.cis.fordham.edu/wisdm/dataset.php), while **<u>AST</u>** is a private dataset and is not provided. Although a subset of the **<u>AST</u>** dataset is provided in the function pack at location `/FP-AI-MONITOR1/Utilities/AI_Resources/Datasets/AST/`.

Following figure shows the detailed workflow of HAR based on NanoEdge&trade; AI library.


<p align="center">
<img width="760" height="400" src="workflow_neai.png">    
</p>

Now, let us implement it step by step.

### Step1 : Import necessary dependencies
Following section imports all the required dependencies. This also sets seeds for random number generator in Numpy to make the results more deterministic between different runs.

In [None]:
%load_ext autoreload
%autoreload 2

# python libraries
import os, numpy as np, warnings
from datetime import datetime

# private libraries
from PrepareDataset import DataHelper

# disabling annoying warnings originating from python
warnings.simplefilter("ignore")
# setting the seeds to the random generators of Numpy and Tensorflow
np.random.seed( 611 )

### Step2: Set environment variables
Following section sets some user variables which will later be used for preparing the dataset for training and creating the NanoEdge AI libraries.

In [None]:
# data variables
dataset = 'AST'
# dataset = 'WISDM'
reducedClasses = True
segmentLength = 24
stepSize = 24
preprocessing = True

# training variables (to split data into train, valid and test sets)
trainTestSplit = 0.6
trainValidationSplit = 0.7
nrSamplesPerClass = 50

### Step3: Result directory
Saving the created data segment files in the data_neai folder along with the configurations used to create this data.

In [None]:
# if not already exist create a parent directory for results.
resDir = './data_neai/'
if not os.path.exists( resDir ):
    os.mkdir( resDir )
infoString = 'runTime : {}\nDatabase : {}\nModel : {}\nSeqLength : {}\nStepSize : {}\nReducedClasses : {}\nnrSamplesPerClass : {}'.\
format( datetime.now().strftime("%Y-%b-%d at %H:%M:%S"), dataset, 'SVC', segmentLength, stepSize, reducedClasses, nrSamplesPerClass )
with open( resDir + 'info.txt', 'w' ) as text_file:
    text_file.write( infoString )

### Step4: Create a `DataHelper` object
The script in the following section creates a `DataHelper` object to preprocess, segment and split the dataset as well as to create the labels for the outputs to make the data training and testing ready.

In [None]:
myDataHelper = DataHelper( dataset = dataset, reducedClasses = reducedClasses, 
                          seqLength = segmentLength, seqStep = stepSize, preprocessing = preprocessing,
                          trainTestSplit = trainTestSplit, trainValidSplit = trainValidationSplit, resultDir = resDir )

### Step5: Prepare the dataset
Following section prepares the dataset and create four tensors namely `trainX`, `trainy`, `testX`, `testy`. Each of the variables with trailing `X` are the inputs with shape `[_, ( segmentLength  * 3 ), 1 ]`and each of the variables with trailing `y` are corresponding outputs with shape `[ , _ ]`, where `_` correspond to the number of samples.

In [None]:
trainX, trainy, _, _, _, _ = myDataHelper.prepare_data()

#### Step5.1: print the number of samples in train, test and validation data sets and the number of classes

In [None]:
print( 'Number of training samples : {}\nNumber of test samples : {}\nNumber of classes : {}\nClass labels : {}'.\
      format( trainX.shape, trainy.shape, len( myDataHelper.classes ), myDataHelper.classes ) )

### Step 6: reshaping the data to comply with NanoEdge&trade AI Studio
NanoEdge AI Studio expects the data for multiple axes to be vectorized using the followini convention
$$
\begin{pmatrix}
x_1 & y_1 & z_1\\
x_2 & y_2 & z_2 \\
\vdots & \vdots & \vdots \\
x_n & y_n & z_n\\
\vdots & \vdots & \vdots \\
\end{pmatrix}
 => 
\begin{pmatrix}
x_1 & y_1 & z_1 & x_2 & y_2 & z_2 & \dots & \dots & \dots & x_{24} & y_{24} & z_{24} \\
x_{25} & y_{25} & z_{25} & x_{26} & y_{26} & z_{26} & \ddots & \ddots & \ddots & x_{48} & y_{48} & z_{48} \\
\dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots & \dots
\end{pmatrix}
$$

Following section vectorize randomly choses `nrSamplesPerClass` samples for each class, vectorize them and dump them in a csv file with the title Class label.csv for example `Stationary.csv` in the resDir folder.

In [None]:
myDataHelper.create_neai_segments( trainX, trainy, nrSamplesPerClass )