# HDA - Project 3

In [1]:
import utils
import deeplearning
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns 

from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

from keras import regularizers
from keras.activations import relu
from keras.layers import Conv2D, BatchNormalization, Dropout, LeakyReLU, Flatten, Activation, Dense, MaxPooling2D, LSTM, Reshape
from keras.models import load_model, Model, Sequential
from keras.optimizers import Adam

Using TensorFlow backend.


The following cell contains the hyper-parameters that can be tuned for code execution:
- subject: select the subject on which to test the model, between [1,4];
- folder: directory name where '.mat' files are stored;
- label_col: column of features to be selected to perform activity detection, between [0,6]:

|  Label |  Feature |
|:-:     |:-:|
|  0     | Locomotion (TASK A)  |
|  1     | High Level Activity |
|  2     | Low Level Left Arm  |
|  3     | Low Level Left Arm Object  |
|  4     | Low Level Right Arm  |
|  5     | Low Level Right Arm Object  |
|  6     | Medium Level Both Arms (TASK B2) |

- window_size: parameter that sets the length of temporal windows on which to perform the convolution;
- stride: step length to chose the next window.

The size of the temporal window seems to be fundamental in order to get a more specific and powerful model; of course the choice of the step lenght between consequent windows has to be consistent and to make sense. Thinking about a real-time situation, as long as we collect data we can use a sliding window of real-time samples; in this way, it is reasonable to use also a small value for the stride. Another important reason behind the choice of the value of the 

In [2]:
subjects = [1,2,3,4]
folder = "./data/full/"
#folder = "/floyd/input/hdadataset/full/" # To be used with FloydHub
label = 0     # default for task A
window_size = 64
stride = 3

# Classification

After the _detection_ step, this time we exclude all the samples associated to the _null class_; in this way we can build a neural network cleaned of the null class and that can distinguish better the difference between motions.

### Model definition, compilation and input reshaping

In [3]:
n_features = 110 #number of features taken into consideration for the solution of the problem
n_classes = 4

In [4]:
classification_model = deeplearning.MotionClassification((window_size,n_features,1), n_classes)
classification_model.summary() # model visualization

classification_model.compile(optimizer = Adam(lr=0.01), 
                   loss = "categorical_crossentropy", 
                   metrics = ["accuracy"])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_1 (Batch (None, 64, 110, 1)        4         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 54, 110, 50)       600       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 27, 110, 50)       0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 27, 5500)          0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 27, 300)           6961200   
_________________________________________________________________
lstm_2 (LSTM)                (None, 300)               721200    
_________________________________________________________________
dropout_1 (Dropout)          (None, 300)               0         
__________

### Model fitting

After the training procedure, the model will be saved on disk

In [5]:
for s in subjects:
    
    print("Going for USER ", s)
    
    [x_train, y_train, x_test, y_test, n_classes] = utils.preprocessing(s,
                                                                    folder,
                                                                    label,
                                                                    window_size,
                                                                    stride,
                                                                    null_class = False)
    
    input_train = x_train.reshape(x_train.shape[0], window_size, n_features, 1)
    input_test = x_test.reshape(x_test.shape[0], window_size, n_features, 1)
    
    classification_model.fit(x = input_train, 
                   y = y_train, 
                   epochs = 20, 
                   batch_size = 300,
                   verbose = 1,
                   validation_data=(input_test, y_test))

classification_model.save('./data/classification_model_A.h5')
classification_model.save_weights('./data/classification_model_weights_A.h5')

Going for USER  1
Training samples:  157125 
Test samples:       57536 
Features:             110
TRAINING SET:
Dataset of Images have shape:  (46601, 64, 110) 
Dataset of Labels have shape:    (46601, 4) 
Fraction of labels:   [0.47170662 0.30868436 0.19233493 0.02727409]
TEST SET:
Dataset of Images have shape:  (15758, 64, 110) 
Dataset of Labels have shape:    (15758, 4) 
Fraction of labels:   [0.41743876 0.24666836 0.28899607 0.04689681]
Train on 46601 samples, validate on 15758 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Going for USER  2
Training samples:  145808 
Test samples:       57720 
Features:             110
TRAINING SET:
Dataset of Images have shape:  (44182, 64, 110) 
Dataset of Labels have shape:    (44182, 4) 
Fraction of labels:   [0.46335612 0.29527862 0.21861844 0.02274682]

Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Going for USER  4
Training samples:  118493 
Test samples:       45675 
Features:             110
TRAINING SET:
Dataset of Images have shape:  (36042, 64, 110) 
Dataset of Labels have shape:    (36042, 4) 
Fraction of labels:   [0.55102381 0.25009711 0.16564009 0.033239  ]
TEST SET:
Dataset of Images have shape:  (12956, 64, 110) 
Dataset of Labels have shape:    (12956, 4) 
Fraction of labels:   [0.44025934 0.36477308 0.15074097 0.04422661]
Train on 36042 samples, validate on 12956 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [6]:
#classification_model = load_model('./data/classification_model.h5')