# HDA - Project 3

In [1]:
import utils
import deeplearning
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns 

from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score

from keras import regularizers
from keras.activations import relu
from keras.layers import Conv2D, BatchNormalization, Dropout, LeakyReLU, Flatten, Activation, Dense, MaxPooling2D, LSTM, Reshape
from keras.models import load_model, Model, Sequential
from keras.optimizers import Adam

Using TensorFlow backend.


The following cell contains the hyper-parameters that can be tuned for code execution:
- subject: select the subject on which to test the model, between [1,4];
- folder: directory name where '.mat' files are stored;
- label_col: column of features to be selected to perform activity detection, between [0,6]:

|  Label |  Feature |
|:-:     |:-:|
|  0     | Locomotion (TASK A)  |
|  1     | High Level Activity |
|  2     | Low Level Left Arm  |
|  3     | Low Level Left Arm Object  |
|  4     | Low Level Right Arm  |
|  5     | Low Level Right Arm Object  |
|  6     | Medium Level Both Arms (TASK B2) |

- window_size: parameter that sets the length of temporal windows on which to perform the convolution;
- stride: step length to chose the next window.

The size of the temporal window seems to be fundamental in order to get a more specific and powerful model; of course the choice of the step lenght between consequent windows has to be consistent and to make sense. Thinking about a real-time situation, as long as we collect data we can use a sliding window of real-time samples; in this way, it is reasonable to use also a small value for the stride. Another important reason behind the choice of the value of the 

In [2]:
subjects = [1,2,3,4]
folder = "./data/full/"
#folder = "/floyd/input/hdadataset/full/" # To be used with FloydHub
label = 0     # default for task A
window_size = 64
stride = 3

# Detection

The following section consists on the first part of the structure; our idea is to treat separately the detection of the movement (i.e. the _null class_) and the movement classication itself.

The steps that we take are the following: first we set all the labels different from 0 to 1, making the problem binary; then, we build a suitable network that can spot the movement.

### Model compilation and input reshaping

In [3]:
n_features = 110 #number of features taken into consideration for the solution of the problem
n_classes = 2

In [4]:
detection_model = deeplearning.MotionDetection((window_size,n_features,1), n_classes)
detection_model.summary() # model visualization

detection_model.compile(optimizer = Adam(lr=0.01), 
                   loss = "categorical_crossentropy", 
                   metrics = ["accuracy"])

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_1 (Batch (None, 64, 110, 1)        4         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 54, 108, 50)       1700      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 27, 108, 50)       0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 27, 5400)          0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 27, 20)            433680    
_________________________________________________________________
lstm_2 (LSTM)                (None, 20)                3280      
_________________________________________________________________
dense_1 (Dense)              (None, 512)               10752     
__________

### Model fitting

After the training procedure, the model will be saved on the local disk.

In [5]:
for s in subjects:
    
    print("Going for USER ", s)
    
    [x_train, y_train, x_test, y_test, n_classes] = utils.preprocessing(s,
                                                                        folder,
                                                                        label,
                                                                        window_size,
                                                                        stride,
                                                                        make_binary = True)

    input_train = x_train.reshape(x_train.shape[0], window_size, n_features, 1)
    input_test = x_test.reshape(x_test.shape[0], window_size, n_features, 1)

    detection_model.fit(x = input_train, 
                   y = y_train, 
                   epochs = 20, 
                   batch_size = 300,
                   verbose = 1,
                   validation_data=(input_test, y_test))

detection_model.save('./detection_model_A.h5')
detection_model.save_weights('./detection_model_weights_A.h5')

Going for USER  1
Training samples:  157125 
Test samples:       57536 
Features:             110
TRAINING SET:
Dataset of Images have shape:  (52354, 64, 110) 
Dataset of Labels have shape:    (52354, 2) 
Fraction of labels:   [0.10986744 0.89013256]
TEST SET:
Dataset of Images have shape:  (19157, 64, 110) 
Dataset of Labels have shape:    (19157, 2) 
Fraction of labels:   [0.17737642 0.82262358]
Train on 52354 samples, validate on 19157 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Going for USER  2
Training samples:  145808 
Test samples:       57720 
Features:             110
TRAINING SET:
Dataset of Images have shape:  (48581, 64, 110) 
Dataset of Labels have shape:    (48581, 2) 
Fraction of labels:   [0.0901587 0.9098413]
TEST SET:
Dataset of Images have shape:  (19219, 64, 110) 
Dataset 

Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Going for USER  4
Training samples:  118493 
Test samples:       45675 
Features:             110
TRAINING SET:
Dataset of Images have shape:  (39476, 64, 110) 
Dataset of Labels have shape:    (39476, 2) 
Fraction of labels:   [0.08698956 0.91301044]
TEST SET:
Dataset of Images have shape:  (15204, 64, 110) 
Dataset of Labels have shape:    (15204, 2) 
Fraction of labels:   [0.14785583 0.85214417]
Train on 39476 samples, validate on 15204 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [6]:
#detection_model = load_model('./data/detection_model.h5')

### Performance evaluation

output_test = detection_model.predict(input_test)
prediction = np.argmax(output_test, axis=1)

print("Accuracy: ", accuracy_score(np.argmax(y_test, axis=1), prediction))
print("F1-measure: ", utils.f1_score(np.argmax(y_test, axis=1), prediction, average='weighted'))

cnf_matrix = utils.confusion_matrix(np.argmax(y_test, axis=1), prediction)
np.set_printoptions(precision=2)

sns.set_style("dark")
plt.figure()
utils.plot_confusion_matrix(cnf_matrix, classes=[0,1],
                      title='Confusion matrix, without normalization')