# HDA - Project 3

This first cell contains the parameters that can be tuned for code execution:
- subject: select the subject on which to test the model, between [1,4];
- label_col: index of feature column to be selected to perform activity detection, between [0,6];
- folder: directory name where '.mat' files are stored;
- window_size: parameter that sets the length of temporal windows on which to perform the convolution;
- stride: step length to chose the next window.

In [1]:
subject = 1
label = 6   # default for task B1
folder = "../data/full/"
window_size = 15
stride = 5

In [2]:
import preprocessing
import models
import numpy as np
from sklearn.metrics import classification_report
from keras.models import load_model
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint
from keras.utils import to_categorical

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In the following cell, we make use of some functions of Keras which have been removed, but of which the code is still available at https://github.com/keras-team/keras/commit/a56b1a55182acf061b1eb2e2c86b48193a0e88f7. These are used to evaulate the f1 score during training on batches of data: this is only an approximation though, which is the reason why they have been removed.

In [3]:
import keras.backend as K
from sklearn.metrics import f1_score


def precision(y_true, y_pred): 
    """Precision metric.
    
    Only computes a batch-wise average of precision. 
    Computes the precision, a metric for multi-label classification of 
    how many selected items are relevant. 
    """ 
    
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon()) 

    return precision


def recall(y_true, y_pred): 
    """Recall metric. 
    
    Only computes a batch-wise average of recall. 
    Computes the recall, a metric for multi-label classification of 
    how many relevant items are selected. 
    """ 
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1))) 
    possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    recall = true_positives / (possible_positives + K.epsilon()) 
 
    return recall

def fbeta_score(y_true, y_pred, beta=1): 
    """Computes the F score. 

    The F score is the weighted harmonic mean of precision and recall.
    Here it is only computed as a batch-wise average, not globally.
    This is useful for multi-label classification, where input samples can be
    classified as sets of labels. By only using accuracy (precision) a model
    would achieve a perfect score by simply assigning every class to every
    input. In order to avoid this, a metric should penalize incorrect class
    assignments as well (recall). The F-beta score (ranged from 0.0 to 1.0)
    computes this, as a weighted mean of the proportion of correct class
    assignments vs. the proportion of incorrect class assignments.
    With beta = 1, this is equivalent to a F-measure. With beta < 1, assigning
    correct classes becomes more important, and with beta > 1 the metric is
    instead weighted towards penalizing incorrect class assignments.
    """
    
    if beta < 0:
        raise ValueError('The lowest choosable beta is zero (only precision).')
        
    # If there are no true positives, fix the F score at 0 like sklearn.
    if K.sum(K.round(K.clip(y_true, 0, 1))) == 0:
        return 0 

    p = precision(y_true, y_pred)
    r = recall(y_true, y_pred)
    bb = beta ** 2
    fbeta_score = (1 + bb) * (p * r) / (bb * p + r + K.epsilon()) 

    return fbeta_score 


def fmeasure(y_true, y_pred):
    """Computes the f-measure, the harmonic mean of precision and recall.
    
    Here it is only computed as a batch-wise average, not globally.
    """ 

    return fbeta_score(y_true, y_pred, beta=1) 

## Classification with null class
### Preprocessing

In [4]:
X_train, Y_train, X_test, Y_test, n_features, n_classes, class_weights = preprocessing.loadDataMultiple(label=label,
                                                                                                        folder=folder,
                                                                                                        window_size=window_size,
                                                                                                        stride=stride,
                                                                                                        make_binary=False,
                                                                                                        null_class=True,
                                                                                                        print_info=True)


Processing data from subject 1

Session shapes:
ADL1:   (45810, 110)
ADL2:   (28996, 110)
ADL3:   (30167, 110)
ADL4:   (30228, 110)
ADL5:   (27308, 110)
Drill:  (52152, 110)

Features: 110 
Classes: 18 
Fraction of labels:   [0.66779111 0.01365242 0.01473443 0.01120199 0.02020813 0.02074913
 0.01183846 0.01613468 0.01307959 0.01377972 0.01584826 0.01225217
 0.02132196 0.02199026 0.01152022 0.01724851 0.07889126 0.01775769]

Features: 110 
Classes: 18 
Fraction of labels:   [0.77548892 0.00764885 0.00591047 0.00808344 0.01425467 0.01156019
 0.00365059 0.01833985 0.00990874 0.0065189  0.00808344 0.00582355
 0.01642764 0.01399392 0.00295524 0.01521078 0.06996958 0.00617123]

Processing data from subject 2

Session shapes:
ADL1:   (38733, 110)
ADL2:   (26824, 110)
ADL3:   (31242, 110)
ADL4:   (29723, 110)
ADL5:   (27997, 110)
Drill:  (49009, 110)

Features: 110 
Classes: 18 
Fraction of labels:   [0.6284852  0.01879351 0.01416372 0.00823073 0.0201996  0.02157138
 0.00956823 0.02249734 0.0

### Model

In [5]:
detection_model = models.MotionDetection((window_size, n_features), n_classes)

detection_model.compile(optimizer = Adam(lr=0.001),
                        loss = "categorical_crossentropy", 
                        metrics = ["accuracy", fmeasure])

checkpointer = ModelCheckpoint(filepath='./weights_1.hdf5', verbose=1, save_best_only=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_1 (Batch (None, 15, 110)           440       
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 5, 36)             43596     
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 5, 36)             0         
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 2, 36)             0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 2, 600)            1528800   
_________________________________________________________________
lstm_2 (LSTM)                (None, 600)               2882400   
_________________________________________________________________
dense_1 (Dense)              (None, 512)               307712    
__________

### Training

In [6]:
detection_model.fit(x = X_train, 
                    y = Y_train, 
                    epochs = 20, 
                    batch_size = 16,
                    verbose = 1,
                    callbacks=[checkpointer],
                    validation_data=(X_test, Y_test),
                    class_weight=class_weights)

Train on 114295 samples, validate on 41979 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.55482, saving model to ./weights_1.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.55482 to 0.51959, saving model to ./weights_1.hdf5
Epoch 3/20

Epoch 00003: val_loss improved from 0.51959 to 0.51661, saving model to ./weights_1.hdf5
Epoch 4/20

Epoch 00004: val_loss improved from 0.51661 to 0.51395, saving model to ./weights_1.hdf5
Epoch 5/20

Epoch 00005: val_loss did not improve
Epoch 6/20

Epoch 00006: val_loss did not improve
Epoch 7/20

Epoch 00007: val_loss did not improve
Epoch 8/20

Epoch 00008: val_loss improved from 0.51395 to 0.50956, saving model to ./weights_1.hdf5
Epoch 9/20

Epoch 00009: val_loss did not improve
Epoch 10/20

Epoch 00010: val_loss did not improve
Epoch 11/20

Epoch 00011: val_loss did not improve
Epoch 12/20

Epoch 00012: val_loss did not improve
Epoch 13/20

Epoch 00013: val_loss did not improve
Epoch 14/20

Epoch 00014: val_loss did not i

<keras.callbacks.History at 0x227055cb748>

### Evaluation

In [7]:
Y_pred = detection_model.predict(X_test)
Y_pred = np.argmax(Y_pred, 1)

print(classification_report(Y_test, to_categorical(Y_pred)))

             precision    recall  f1-score   support

          0       0.91      0.95      0.93     33446
          1       0.43      0.29      0.34       319
          2       0.40      0.36      0.38       283
          3       0.28      0.09      0.14       220
          4       0.65      0.69      0.67       443
          5       0.67      0.72      0.69       406
          6       0.37      0.20      0.26       206
          7       0.57      0.50      0.53       805
          8       0.49      0.47      0.48       488
          9       0.38      0.32      0.35       359
         10       0.51      0.35      0.42       325
         11       0.44      0.19      0.27       224
         12       0.70      0.62      0.66       495
         13       0.81      0.64      0.71       501
         14       0.41      0.21      0.28       209
         15       0.55      0.36      0.44       913
         16       0.75      0.69      0.72      1843
         17       0.61      0.48      0.54   

In [9]:
detection_model_best = load_model('./weights_1.hdf5')

Y_pred = detection_model.predict(X_test)
Y_pred = np.argmax(Y_pred, 1)

print(classification_report(Y_test, to_categorical(Y_pred)))

ValueError: Unknown metric function:fmeasure

## Classification without null class
### Preprocessing

In [13]:
X_train, Y_train, X_test, Y_test, n_features, n_classes, class_weights = preprocessing.loadDataMultiple(label=label,
                                                                                                        folder=folder,
                                                                                                        window_size=window_size,
                                                                                                        stride=stride,
                                                                                                        make_binary=False,
                                                                                                        null_class=False,
                                                                                                        print_info=True)


Processing data from subject 1

Session shapes:
ADL1:   (45810, 110)
ADL2:   (28996, 110)
ADL3:   (30167, 110)
ADL4:   (30228, 110)
ADL5:   (27308, 110)
Drill:  (52152, 110)

Features: 110 
Classes: 17 
Fraction of labels:   [0.04109589 0.04435291 0.0337197  0.06082958 0.06245809 0.0356356
 0.04856787 0.03937159 0.04147907 0.04770572 0.03688093 0.06418239
 0.06619408 0.03467765 0.05192068 0.23747485 0.0534534 ]

Features: 110 
Classes: 17 
Fraction of labels:   [0.03406891 0.02632598 0.03600465 0.06349206 0.05149051 0.01626016
 0.08168796 0.04413473 0.029036   0.03600465 0.02593883 0.07317073
 0.06233062 0.01316299 0.06775068 0.31165312 0.02748742]

Processing data from subject 2

Session shapes:
ADL1:   (38733, 110)
ADL2:   (26824, 110)
ADL3:   (31242, 110)
ADL4:   (29723, 110)
ADL5:   (27997, 110)
Drill:  (49009, 110)

Features: 110 
Classes: 17 
Fraction of labels:   [0.05058617 0.03812425 0.02215453 0.0543709  0.05806333 0.02575464
 0.06055571 0.04929382 0.05981723 0.03406259 0.02

### Model

In [14]:
detection_model = models.MotionDetection((window_size, n_features), n_classes)

detection_model.compile(optimizer = Adam(lr=0.001),
                        loss = "categorical_crossentropy", 
                        metrics = ["accuracy", fmeasure])

checkpointer = ModelCheckpoint(filepath='./weights_2.hdf5', verbose=1, save_best_only=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_3 (Batch (None, 15, 110)           440       
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 5, 36)             43596     
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU)    (None, 5, 36)             0         
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 2, 36)             0         
_________________________________________________________________
lstm_5 (LSTM)                (None, 2, 600)            1528800   
_________________________________________________________________
lstm_6 (LSTM)                (None, 600)               2882400   
_________________________________________________________________
dense_5 (Dense)              (None, 512)               307712    
__________

### Training

In [15]:
detection_model.fit(x = X_train, 
                    y = Y_train, 
                    epochs = 20, 
                    batch_size = 16,
                    verbose = 1,
                    callbacks=[checkpointer],
                    validation_data=(X_test, Y_test),
                    class_weight=class_weights)

Train on 39830 samples, validate on 8533 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 1.12972, saving model to ./weights_2.hdf5
Epoch 2/20

Epoch 00002: val_loss did not improve
Epoch 3/20

Epoch 00003: val_loss did not improve
Epoch 4/20

Epoch 00004: val_loss did not improve
Epoch 5/20

Epoch 00005: val_loss did not improve
Epoch 6/20

Epoch 00006: val_loss did not improve
Epoch 7/20

Epoch 00007: val_loss did not improve
Epoch 8/20

Epoch 00008: val_loss did not improve
Epoch 9/20

Epoch 00009: val_loss did not improve
Epoch 10/20

Epoch 00010: val_loss did not improve
Epoch 11/20

Epoch 00011: val_loss did not improve
Epoch 12/20

Epoch 00012: val_loss did not improve
Epoch 13/20

Epoch 00013: val_loss did not improve
Epoch 14/20

Epoch 00014: val_loss did not improve
Epoch 15/20

Epoch 00015: val_loss did not improve
Epoch 16/20

Epoch 00016: val_loss did not improve
Epoch 17/20

Epoch 00017: val_loss did not improve
Epoch 18/20

Epoch 00018: val_loss did not imp

<keras.callbacks.History at 0x227c681f588>

### Evaluation

In [16]:
Y_pred = detection_model.predict(X_test)
Y_pred = np.argmax(Y_pred, 1)

print(classification_report(Y_test, to_categorical(Y_pred)))

             precision    recall  f1-score   support

          0       0.44      0.46      0.45       319
          1       0.48      0.47      0.48       283
          2       0.36      0.15      0.21       220
          3       0.69      0.74      0.72       443
          4       0.72      0.76      0.74       406
          5       0.30      0.44      0.35       206
          6       0.64      0.69      0.67       805
          7       0.67      0.66      0.67       488
          8       0.37      0.53      0.44       359
          9       0.56      0.46      0.51       325
         10       0.41      0.29      0.34       224
         11       0.61      0.75      0.67       495
         12       0.72      0.77      0.75       501
         13       0.38      0.30      0.33       209
         14       0.74      0.64      0.68       913
         15       0.94      0.86      0.90      1843
         16       0.69      0.74      0.72       494

avg / total       0.67      0.67      0.67  

In [17]:
detection_model_best = load_model('./weights_2.hdf5')

Y_pred = detection_model.predict(X_test)
Y_pred = np.argmax(Y_pred, 1)

print(classification_report(Y_test, to_categorical(Y_pred)))

ValueError: Unknown metric function:fmeasure

## Activity detection (binary classification)
### Preprocessing

In [18]:
X_train, Y_train, X_test, Y_test, n_features, n_classes, class_weights = preprocessing.loadData(subject=subject,
                                                                                                label=label,
                                                                                                folder=folder,
                                                                                                window_size=window_size,
                                                                                                stride=stride,
                                                                                                make_binary=True,
                                                                                                null_class=True,
                                                                                                print_info=True)


Session shapes:
ADL1:   (45810, 110)
ADL2:   (28996, 110)
ADL3:   (30167, 110)
ADL4:   (30228, 110)
ADL5:   (27308, 110)
Drill:  (52152, 110)

Features: 110 
Classes: 2 
Fraction of labels:   [0.66718646 0.33281354]

Features: 110 
Classes: 2 
Fraction of labels:   [0.775402 0.224598]


### Model

In [19]:
detection_model = models.MotionDetection((window_size, n_features), n_classes)

detection_model.compile(optimizer = Adam(lr=0.001),
                        loss = "categorical_crossentropy", 
                        metrics = ["accuracy", fmeasure])

checkpointer = ModelCheckpoint(filepath='./weights_3.hdf5', verbose=1, save_best_only=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_4 (Batch (None, 15, 110)           440       
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 5, 36)             43596     
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 5, 36)             0         
_________________________________________________________________
max_pooling1d_4 (MaxPooling1 (None, 2, 36)             0         
_________________________________________________________________
lstm_7 (LSTM)                (None, 2, 600)            1528800   
_________________________________________________________________
lstm_8 (LSTM)                (None, 600)               2882400   
_________________________________________________________________
dense_7 (Dense)              (None, 512)               307712    
__________

In [20]:
detection_model.fit(x = X_train, 
                    y = Y_train, 
                    epochs = 20, 
                    batch_size = 16,
                    verbose = 1,
                    callbacks=[checkpointer],
                    validation_data=(X_test, Y_test),
                    class_weight=class_weights)

Train on 31423 samples, validate on 11505 samples
Epoch 1/20

Epoch 00001: val_loss improved from inf to 0.28809, saving model to ./weights_3.hdf5
Epoch 2/20

Epoch 00002: val_loss improved from 0.28809 to 0.25031, saving model to ./weights_3.hdf5
Epoch 3/20

Epoch 00003: val_loss did not improve
Epoch 4/20

Epoch 00004: val_loss improved from 0.25031 to 0.20039, saving model to ./weights_3.hdf5
Epoch 5/20

Epoch 00005: val_loss did not improve
Epoch 6/20

Epoch 00006: val_loss did not improve
Epoch 7/20

Epoch 00007: val_loss did not improve
Epoch 8/20

Epoch 00008: val_loss did not improve
Epoch 9/20

Epoch 00009: val_loss did not improve
Epoch 10/20

Epoch 00010: val_loss did not improve
Epoch 11/20

Epoch 00011: val_loss did not improve
Epoch 12/20

Epoch 00012: val_loss did not improve
Epoch 13/20

Epoch 00013: val_loss did not improve
Epoch 14/20

Epoch 00014: val_loss did not improve
Epoch 15/20

Epoch 00015: val_loss did not improve
Epoch 16/20

Epoch 00016: val_loss did not im

<keras.callbacks.History at 0x227c61baeb8>

### Evaluation

In [21]:
Y_pred = detection_model.predict(X_test)
Y_pred = np.argmax(Y_pred, 1)

print(classification_report(Y_test, to_categorical(Y_pred)))

             precision    recall  f1-score   support

          0       0.94      0.96      0.95      8921
          1       0.85      0.79      0.82      2584

avg / total       0.92      0.92      0.92     11505



In [22]:
detection_model_best = load_model('./weights_3.hdf5')

Y_pred = detection_model.predict(X_test)
Y_pred = np.argmax(Y_pred, 1)

print(classification_report(Y_test, to_categorical(Y_pred)))

ValueError: Unknown metric function:fmeasure