# Classifiers
Exploring different classifiers with different autoencoders.

#### Table of contents:  

autoencoders:  
[Undercomplete Autoencoder](#Undercomplete-Autoencoder)  
[Sparse Autoencoder](#Sparse-Autoencoder)  
[Deep Autoencoder](#Deep-Autoencoder)  
[Contractive Autoencoder](#Contractive-Autoencoder)  

classifiers:  
[Simple dense layer](#Simple-dense-layer)  
[LSTM-based classifier](#LSTM-based-classifier)  
[kNN](#kNN)  
[SVC](#SVC)  
[Random Forest](#Random-Forest)  
[XGBoost](#XGBoost)  

In [1]:
import datareader # made by the previous author for reading the collected data
import dataextractor # same as above
import pandas
import numpy as np
import tensorflow as tf
# need to disable eager execution for .get_weights() in contractive autoencoder loss to work
tf.compat.v1.disable_eager_execution()
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Dropout, Activation, Input
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Conv1D, MaxPooling1D
# required for the contractive autoencoder
import tensorflow.keras.backend as K
import json
from datetime import datetime

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import RandomizedSearchCV

import warnings

import talos
from talos.utils import lr_normalizer

from tensorflow import keras
from tensorflow.keras import layers, regularizers
import matplotlib.pyplot as plt

tf.keras.backend.set_floatx('float32') # call this, to set keras to use float32 to avoid a warning message
metrics = ['accuracy']#,
#            keras.metrics.TruePositives(),
#            keras.metrics.FalsePositives(),
#            keras.metrics.TrueNegatives(),
#            keras.metrics.FalseNegatives()]

In [2]:
# from https://github.com/ageron/handson-ml/blob/master/extra_tensorflow_reproducibility.ipynb
config = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1,
                        inter_op_parallelism_threads=1)

with tf.compat.v1.Session(config=config) as sess:
    #... this will run single threaded
    pass

In [3]:
import random

random.seed(1)
np.random.seed(4)
tf.random.set_seed(2)

In [4]:
# Start the notebook in the terminal with "PYTHONHASHSEED=0 jupyter notebook" 
# or in anaconda "set PYTHONHASHSEED=0" then start jupyter notebook
import os
if os.environ.get("PYTHONHASHSEED") != "0":
    raise Exception("You must set PYTHONHASHSEED=0 when starting the Jupyter server to get reproducible results.")

This is the original author's code, just copied into separate cells of this jupyter notebook:

In [5]:
def get_busy_vs_relax_timeframes(path, ident, seconds):
    """Returns raw data from either 'on task' or 'relax' time frames and their class (0 or 1).
    TODO: join functions"""

    dataread = datareader.DataReader(path, ident)  # initialize path to data
    data = dataread.read_grc_data()  # read from files
    samp_rate = int(round(len(data[1]) / max(data[0])))
    cog_res = dataread.read_cognitive_load_study(str(ident) + '-primary-extract.txt')

    tasks_data = np.empty((0, seconds*samp_rate))
    tasks_y = np.empty((0, 1))

    busy_n = dataread.get_data_task_timestamps(return_indexes=True)
    relax_n = dataread.get_relax_timestamps(return_indexes=True)

    for i in cog_res['task_number']:
        task_num_table = i - 225  # 0 - 17

        ### task versus relax (1 sample each)
        dataextract = dataextractor.DataExtractor(data[0][busy_n[task_num_table][0]:busy_n[task_num_table][1]],
                                                  data[1][busy_n[task_num_table][0]:busy_n[task_num_table][1]],
                                                  samp_rate)

        dataextract_relax = dataextractor.DataExtractor(data[0][relax_n[task_num_table][0]:relax_n[task_num_table][1]],
                                                        data[1][relax_n[task_num_table][0]:relax_n[task_num_table][1]],
                                                        samp_rate)
        try:
            tasks_data = np.vstack((tasks_data, dataextract.y[-samp_rate * seconds:]))
            tasks_y = np.vstack((tasks_y, 1))
            tasks_data = np.vstack((tasks_data, dataextract_relax.y[-samp_rate * seconds:]))
            tasks_y = np.vstack((tasks_y, 0))
        except ValueError:
            continue
#             print(ident)  # ignore short windows

    return tasks_data, tasks_y


In [6]:
def get_engagement_increase_vs_decrease_timeframes(path, ident, seconds):
    """Returns raw data from either engagement 'increase' or 'decrease' time frames and their class (0 or 1).
    TODO: join functions"""

    dataread = datareader.DataReader(path, ident)  # initialize path to data
    data = dataread.read_grc_data()  # read from files
    samp_rate = int(round(len(data[1]) / max(data[0])))
    cog_res = dataread.read_cognitive_load_study(str(ident) + '-primary-extract.txt')

    tasks_data = np.empty((0, seconds * samp_rate))
    tasks_y = np.empty((0, 1))

    busy_n = dataread.get_data_task_timestamps(return_indexes=True)
    relax_n = dataread.get_relax_timestamps(return_indexes=True)

    for i in cog_res['task_number']:
        task_num_table = i - 225  # 0 - 17

        ### engagement increase / decrease
        if task_num_table == 0:
            continue
        mid = int((relax_n[task_num_table][0] + relax_n[task_num_table][1])/2)
        length = int(samp_rate*30)
        for j in range(10):
            new_end = int(mid-j*samp_rate)

            new_start2 = int(mid+j*samp_rate)

            dataextract_decrease = dataextractor.DataExtractor(data[0][new_end - length:new_end],
                                                               data[1][new_end-length:new_end],
                                                               samp_rate)

            dataextract_increase = dataextractor.DataExtractor(data[0][new_start2: new_start2 + length],
                                                               data[1][new_start2: new_start2 + length], samp_rate)

            try:
                tasks_data = np.vstack((tasks_data, dataextract_increase.y))
                tasks_y = np.vstack((tasks_y, 1))
                tasks_data = np.vstack((tasks_data, dataextract_decrease.y))
                tasks_y = np.vstack((tasks_y, 0))
            except ValueError:
                print(ident)  # ignore short windows

    return tasks_data, tasks_y


In [7]:
def get_task_complexities_timeframes(path, ident, seconds):
    """Returns raw data along with task complexity class.
    TODO: join functions. Add parameter to choose different task types and complexities"""

    dataread = datareader.DataReader(path, ident)  # initialize path to data
    data = dataread.read_grc_data()  # read from files
    samp_rate = int(round(len(data[1]) / max(data[0])))
    cog_res = dataread.read_cognitive_load_study(str(ident) + '-primary-extract.txt')

    tasks_data = np.empty((0, seconds*samp_rate))
    tasks_y = np.empty((0, 1))

    busy_n = dataread.get_data_task_timestamps(return_indexes=True)
    relax_n = dataread.get_relax_timestamps(return_indexes=True)

    for i in cog_res['task_number']:
        task_num_table = i - 225  # 0 - 17

        ### task complexity classification
        if cog_res['task_complexity'][task_num_table] == 'medium':
            continue
        # if cog_res['task_label'][task_num_table] == 'FA' or cog_res['task_label'][task_num_table] == 'HP':
        #     continue
        if cog_res['task_label'][task_num_table] != 'NC':
            continue
        map_compl = {
            'low': 0,
            'medium': 2,
            'high': 1
        }
        for j in range(10):
            new_end = int(busy_n[task_num_table][1] - j * samp_rate)
            new_start = int(new_end - samp_rate*30)
            dataextract = dataextractor.DataExtractor(data[0][new_start:new_end],
                                                      data[1][new_start:new_end], samp_rate)
            try:
                tasks_data = np.vstack((tasks_data, dataextract.y))
                tasks_y = np.vstack((tasks_y, map_compl.get(cog_res['task_complexity'][task_num_table])))
            except ValueError:
                print(ident)

    return tasks_data, tasks_y


In [8]:
def get_TLX_timeframes(path, ident, seconds):
    """Returns raw data along with task load index class.
    TODO: join functions. Add parameter to choose different task types and complexities"""

    dataread = datareader.DataReader(path, ident)  # initialize path to data
    data = dataread.read_grc_data()  # read from files
    samp_rate = int(round(len(data[1]) / max(data[0])))
    cog_res = dataread.read_cognitive_load_study(str(ident) + '-primary-extract.txt')

    tasks_data = np.empty((0, seconds*samp_rate))
    tasks_y = np.empty((0, 1))

    busy_n = dataread.get_data_task_timestamps(return_indexes=True)
    relax_n = dataread.get_relax_timestamps(return_indexes=True)

    for i in cog_res['task_number']:
        task_num_table = i - 225  # 0 - 17

        ### task load index
        if cog_res['task_complexity'][task_num_table] == 'medium' or cog_res['task_label'][task_num_table] != 'PT':
            continue
        for j in range(10):
            new_end = int(busy_n[task_num_table][1] - j * samp_rate)
            new_start = int(new_end - samp_rate*30)
            dataextract = dataextractor.DataExtractor(data[0][new_start:new_end],
                                                      data[1][new_start:new_end], samp_rate)
            try:
                tasks_data = np.vstack((tasks_data, dataextract.y))
                tasks_y = np.vstack((tasks_y, cog_res['task_load_index'][task_num_table]))
            except ValueError:
                print(ident)

    return tasks_data, tasks_y


In [9]:
def get_data_from_idents(path, idents, seconds):
    """Go through all user data and take out windows of only <seconds> long time frames,
    along with the given class (from 'divide_each_task' function).
    """
    samp_rate = 43  # hard-coded sample rate
    data, ys = np.empty((0, samp_rate*seconds)), np.empty((0, 1))
    for i in idents:
        x, y = get_busy_vs_relax_timeframes(path, i, seconds) # either 'get_busy_vs_relax_timeframes',
        # get_engagement_increase_vs_decrease_timeframes, get_task_complexities_timeframes or get_TLX_timeframes
        # TODO: ^ modify, so that different functions can be accessible by parameter
        data = np.vstack((data, x))
        ys = np.vstack((ys, y))
    return data, ys


In [10]:
def model_train(model, x_train, y_train, batch_size, epochs, x_valid, y_valid, x_test, y_test):
    """Train model with the given training, validation, and test set, with appropriate batch size and # epochs."""
    epoch_data = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_data=(x_valid, y_valid), verbose=0)
    score = model.evaluate(x_test, y_test, batch_size=batch_size)
    acc = score[1]
    score = score[0]
    return score, acc, epoch_data


In [11]:
def sequence_padding(x, maxlen):
    """Pad sequences (all have to be same length)."""
    print('Pad sequences (samples x time)')
    return sequence.pad_sequences(x, maxlen=maxlen, dtype=np.float)


In [12]:
def get_task_complexities_timeframes_br_hb(path, ident, seconds):
    """Returns raw data along with task complexity class.
    TODO: join functions. Add parameter to choose different task types and complexities"""

    dataread = datareader.DataReader(path, ident)  # initialize path to data
    data = dataread.read_grc_data()  # read from files
    samp_rate = int(round(len(data[1]) / max(data[0])))
    cog_res = dataread.read_cognitive_load_study(str(ident) + '-primary-extract.txt')

    tasks_data = np.empty((0, seconds*samp_rate))
    tasks_y = np.empty((0, 1))
    breathing = np.empty((0,12))
    heartbeat = np.empty((0,10))

    busy_n = dataread.get_data_task_timestamps(return_indexes=True)
    
    for i in cog_res['task_number']:
        task_num_table = i - 225  # 0 - 17
        tmp_tasks_data = np.empty((0, seconds*samp_rate))
        tmp_tasks_y = np.empty((0, 1))
        tmp_breathing = np.empty((0,12))
        tmp_heartbeat = np.empty((0,10))
        
        ### task complexity classification
        if cog_res['task_complexity'][task_num_table] == 'medium':
            continue
        # if cog_res['task_label'][task_num_table] == 'FA' or cog_res['task_label'][task_num_table] == 'HP':
        #     continue
#         if cog_res['task_label'][task_num_table] != 'NC':
#             continue
            
        map_compl = {
            'low': 0,
            'medium': 2,
            'high': 1
        }
        for j in range(10):
            new_end = int(busy_n[task_num_table][1] - j * samp_rate)
            new_start = int(new_end - samp_rate*30)
            dataextract = dataextractor.DataExtractor(data[0][new_start:new_end],
                                                      data[1][new_start:new_end], samp_rate)
            # get extracted features for breathing
            tmpBR = dataextract.extract_from_breathing_time(data[0][new_start:new_end],
                                                                 data[1][new_start:new_end])
            #get extracted features for heartbeat
            tmpHB = dataextract.extract_from_heartbeat_time(data[0][new_start:new_end],
                                                                 data[1][new_start:new_end])
            
            try:
                
                tmp_tasks_data = np.vstack((tmp_tasks_data, dataextract.y[-samp_rate * seconds:]))
                tmp_tasks_y = np.vstack((tmp_tasks_y, map_compl.get(cog_res['task_complexity'][task_num_table])))

                tmp_breathing = np.vstack((tmp_breathing, tmpBR.to_numpy(dtype='float64', na_value=0)[0][:-1]))
                tmp_heartbeat = np.vstack((tmp_heartbeat, tmpHB.to_numpy(dtype='float64', na_value=0)[0][:-1]))
                
            except ValueError:
#                 print(ident)
                continue

            tasks_data = np.vstack((tasks_data, dataextract.y))
            tasks_y = np.vstack((tasks_y, map_compl.get(cog_res['task_complexity'][task_num_table])))
            breathing = np.vstack((breathing, tmpBR.to_numpy(dtype='float64', na_value=0)[0][:-1]))
            heartbeat = np.vstack((heartbeat, tmpHB.to_numpy(dtype='float64', na_value=0)[0][:-1]))
            
    return tasks_data, tasks_y, breathing, heartbeat

In [13]:
def get_data_from_idents_br_hb(path, idents, seconds):
    """Go through all user data and take out windows of only <seconds> long time frames,
    along with the given class (from 'divide_each_task' function).
    """
    samp_rate = 43  # hard-coded sample rate
    data, ys = np.empty((0, samp_rate*seconds)), np.empty((0, 1))
    brs = np.empty((0,12))
    hbs = np.empty((0,10))
    combined = np.empty((0,22))
    
    # was gettign some weird warnings; stack overflow said to ignore them
    with warnings.catch_warnings():
        warnings.simplefilter("ignore", category=RuntimeWarning)
        for i in idents:
            #x, y, br, hb = get_busy_vs_relax_timeframes_br_hb(path, i, seconds) # either 'get_busy_vs_relax_timeframes',
            # get_engagement_increase_vs_decrease_timeframes, get_task_complexities_timeframes or get_TLX_timeframes
            x, y, br, hb = get_task_complexities_timeframes_br_hb(path, i, seconds)
            
            data = np.vstack((data, x))
            ys = np.vstack((ys, y))
            brs = np.vstack((brs, br))
            hbs = np.vstack((hbs, hb))
        combined = np.hstack((brs,hbs))
    
    return data, ys, brs, hbs, combined

In [14]:
def scale_data(x_train, x_valid, x_test, standardScaler=True, minMaxScaler=True):
    
    # copy data variables
    xt_train = x_train
    xt_valid = x_valid
    xt_test = x_test
    
    if standardScaler:
        # Scale with standard scaler
        sscaler = StandardScaler()
        sscaler.fit(np.vstack((xt_train, xt_valid, xt_test)))
        xt_train = sscaler.transform(xt_train)
        xt_valid = sscaler.transform(xt_valid)
        xt_test = sscaler.transform(xt_test)

    if minMaxScaler:
        # Scale with MinMax to range [0,1]
        mmscaler = MinMaxScaler((0,1))
        mmscaler.fit(np.vstack((xt_train, xt_valid, xt_test)))
        xt_train = mmscaler.transform(xt_train)
        xt_valid = mmscaler.transform(xt_valid)
        xt_test = mmscaler.transform(xt_test)
    
    return xt_train, xt_valid, xt_test

In [15]:
# Accs is a dictionary which holds 1d arrays of accuracies in each key
# except the key 'test id' which holds strings of the id which yielded the coresponding accuracies
def print_accs_stats(accs):
    # loop over each key
    for key in accs:
    
        if (key == 'test id'):
            # skip calculating ids
            continue

        # calculate and print some statistics
        print(key, "accuracies:")
        print("- min:", np.min(accs[key]))
        print("- max:", np.max(accs[key]))
        print("- mean:", np.mean(accs[key]))
        print("- median:", np.median(accs[key]))
        print("")

## Autoencoders

#### Undercomplete Autoencoder  
from https://blog.keras.io/building-autoencoders-in-keras.html

In [16]:
def undercomplete_ae(x, encoding_dim=64, encoded_as_model=False):
    # Simplest possible autoencoder from https://blog.keras.io/building-autoencoders-in-keras.html

    # this is our input placeholder
    input_data = Input(shape=x[0].shape, name="input")
    dropout = Dropout(0.25, name="dropout")(input_data)
    # "encoded" is the encoded representation of the input
    encoded = Dense(encoding_dim, activation='relu', name="encoded")(dropout)
    
    # "decoded" is the lossy reconstruction of the input
    decoded = Dense(x[0].shape[0], activation='sigmoid', name="decoded")(encoded)

    autoencoder = Model(input_data, decoded)
    
    # compile the model
    autoencoder.compile(optimizer='adam', loss='binary_crossentropy', metrics=metrics)
    
    # if return encoder in the encoded variable
    if encoded_as_model:
        encoded = Model(input_data, encoded)
    
    return autoencoder, encoded

#### Sparse Autoencoder  
from https://blog.keras.io/building-autoencoders-in-keras.html

In [17]:
def sparse_ae(x, encoding_dim=64, encoded_as_model=False):
    # Simplest possible autoencoder from https://blog.keras.io/building-autoencoders-in-keras.html

    # this is our input placeholder
    input_data = Input(shape=x[0].shape, name="input")
    dropout = Dropout(0.25, name="dropout") (input_data)
    # "encoded" is the encoded representation of the input
    # add a sparsity constraint
    encoded = Dense(encoding_dim, activation='relu', name="encoded",
                    activity_regularizer=regularizers.l1(10e-5))(dropout)
    
    # "decoded" is the lossy reconstruction of the input
    decoded = Dense(x[0].shape[0], activation='sigmoid', name="decoded")(encoded)

    # this model maps an input to its reconstruction
    autoencoder = Model(input_data, decoded, name="sparse_ae")
    
    # compile the model
    autoencoder.compile(optimizer='adam', loss='binary_crossentropy', metrics=metrics)
    
    # if return encoder in the encoded variable
    if encoded_as_model:
        encoded = Model(input_data, encoded)
    
    return autoencoder, encoded

#### Deep Autoencoder  
from https://blog.keras.io/building-autoencoders-in-keras.html

In [18]:
def deep_ae(x, enc_layers=[512,256], encoding_dim=64, dec_layers=[256,512], encoded_as_model=False):
    # From https://www.tensorflow.org/guide/keras/functional#use_the_same_graph_of_layers_to_define_multiple_models
    input_data = keras.Input(shape=x[0].shape, name="normalized_signal")
    model = Dropout(0.25, name="dropout", autocast=False)(input_data)
    for i in enumerate(enc_layers):
        model = Dense(i[1], activation="relu", name="dense_enc_" + str(i[0]+1))(model)
    encoded_output = Dense(encoding_dim, activation="relu", name="encoded_signal")(model)

    encoded = encoded_output

    model = layers.Dense(dec_layers[0], activation="sigmoid", name="dense_dec_1")(encoded_output)
    for i in enumerate(dec_layers[1:]):
        model = Dense(i[1], activation="sigmoid", name="dense_dec_" + str(i[0]+2))(model)
    decoded_output = Dense(x[0].shape[0], activation="sigmoid", name="reconstructed_signal")(model)
    
    autoencoder = Model(input_data, decoded_output, name="autoencoder")
    
    # compile the model
    autoencoder.compile(optimizer='adam', loss='binary_crossentropy', metrics=metrics)
    
    # if return encoder in the encoded variable
    if encoded_as_model:
        encoded = Model(input_data, encoded)

    return autoencoder, encoded

#### Contractive Autoencoder
From: https://wiseodd.github.io/techblog/2016/12/05/contractive-autoencoder/

In [19]:
# define a function to be able to access the autoencoder in the loss funciton
def loss_with_params(autoencoder):
    # loss function from https://wiseodd.github.io/techblog/2016/12/05/contractive-autoencoder/
    def contractive_loss(y_pred, y_true):

        lam = 1e-4
        mse = K.mean(K.square(y_true - y_pred), axis=1)

        W = K.variable(value=autoencoder.get_layer('encoded').get_weights()[0])  # N x N_hidden
        W = K.transpose(W)  # N_hidden x N
        h = autoencoder.get_layer('encoded').output
        dh = h * (1 - h)  # N_batch x N_hidden

        # N_batch x N_hidden * N_hidden x 1 = N_batch x 1
        contractive = lam * K.sum(dh**2 * K.sum(W**2, axis=1), axis=1)

        return mse + contractive
    return contractive_loss

In [20]:
def contractive_ae(x, encoding_dim=64, encoded_as_model=False):
    # From https://wiseodd.github.io/techblog/2016/12/05/contractive-autoencoder/

    input_data = Input(shape=x[0].shape, name="input")
    encoded = Dense(encoding_dim, activation='sigmoid', name='encoded')(input_data)
    outputs = Dense(x[0].shape[0], activation='linear', name="output")(encoded)

    autoencoder = Model(input_data, outputs, name="autoencoder")
    
    # compile the model
    autoencoder.compile(optimizer='adam', loss=loss_with_params(autoencoder), metrics=metrics)
    
    # if return encoder in the encoded variable
    if encoded_as_model:
        encoded = Model(input_data, encoded)
    
    return autoencoder, encoded

## Classifiers

Initialize variables:

In [64]:
# initialize a dictionary to store accuracies for comparison
accuracies = {}

# used for reading the data into an array
seconds = 30  # time window length
idents = ['2gu87', 'iz2ps', '1mpau', '7dwjy', '7swyk', '94mnx', 'bd47a', 'c24ur', 'ctsax', 'dkhty', 'e4gay',
              'ef5rq', 'f1gjp', 'hpbxa', 'pmyfl', 'r89k1', 'tn4vl', 'td5pr', 'gyqu9', 'fzchw', 'l53hg', '3n2f9',
              '62i9y']
path = '../../../StudyData/'

# change to len(idents) at the end to use all the data
n = 3 #len(idents)

#### Simple dense layer

Define the classifier:

In [22]:
def dense_classifier(model, params):
    
    model = Dropout(params['dropout'], name='dropout_cl')(model)
    model = Dense(params['hidden_size'], activation=params['activation'], name='dense_cl1')(model)
    model = Dense(1, activation=params['last_activation'], name='dense_cl2')(model)

    return model

In [23]:
def dense_classifier_base():
    model = Sequential()
    model.add(Dropout(0))
    model.add(Dense(32))
    model.add(Activation('sigmoid'))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=metrics)
    return model

In [24]:
params = {
    'dropout': 0.24,
    'optimizer': 'Adam',
    'hidden_size': 64,
    'loss': 'binary_crossentropy',
    'last_activation': 'sigmoid',
    'activation': 'softmax',
    'batch_size': 256,
    'epochs': 100
}

Combine the autoencoders with the classifier: 

In [25]:
# set the variables in the dictionary
accuracies['simple_dense'] = {}
accs = accuracies['simple_dense']
accs['phase'] = []
accs['breathing'] = []
accs['heartbeat'] = []
accs['combined br hb'] = []
accs['undercomplete'] = []
accs['sparse'] = []
accs['deep'] = []
accs['contractive'] = []
accs['test id'] = []
start_time = datetime.now()

with tf.compat.v1.Session(config=config) as sess:
    # leave out person out validation
    for ident in range(n):

        # print current iteration and time elapsed from start
        print("iteration:", ident+1, "of", n, "; time elapsed:", datetime.now()-start_time)

        ## ----- Data preparation:
        # Split the data
        train_idents = [x for i, x in enumerate(idents) if (i != ident and i != (n-1+ident)%n)]
        validation_idents = [idents[ident]]
        test_idents = [idents[ident-1]]
        
        # save test id to see which id yielded which accuracies
        accs['test id'].append(test_idents[0])

        # Load data (xt-raw phase data, y-class, br-breathing data, hb-heartbeat data, cmb-combined [br,hb])
        xt_train, y_train, br_train, hb_train, cmb_train = get_data_from_idents_br_hb(path, train_idents, seconds)
        xt_valid, y_valid, br_valid, hb_valid, cmb_valid = get_data_from_idents_br_hb(path, validation_idents, seconds)
        xt_test, y_test, br_test, hb_test, cmb_test = get_data_from_idents_br_hb(path, test_idents, seconds)

        # Scale data with standard scaler then MinMax scaler
        # Raw Phase data:
        xt_train, xt_valid, xt_test = scale_data(xt_train, xt_valid, xt_test, standardScaler=True, minMaxScaler=True)
        # Hand extracted breathing data:
        br_train, br_valid, br_test = scale_data(br_train, br_valid, br_test, standardScaler=True, minMaxScaler=True)
        # Hand extracted Heartbeat data:
        hb_train, hb_valid, hb_test = scale_data(hb_train, hb_valid, hb_test, standardScaler=True, minMaxScaler=True)
        # Combined breathing and heartbeat data (joined together into one matrix)
        cmb_train, cmb_valid, cmb_test = scale_data(cmb_train, cmb_valid, cmb_test, standardScaler=True, minMaxScaler=True)
        
        
        
        ## ----- Classify without autoencoders:
        # Phase classifier:
        model = dense_classifier_base()
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['phase'].append(curr_acc)

        # Breathing classifier:
        model = dense_classifier_base()
        sc, curr_acc, epoch_data = model_train(model, br_train, y_train, params['batch_size'], params['epochs'],
                                               br_valid, y_valid, br_test, y_test)
        accs['breathing'].append(curr_acc)

        # Heartbeat classifier:
        model = dense_classifier_base()
        sc, curr_acc, epoch_data = model_train(model, hb_train, y_train, params['batch_size'], params['epochs'],
                                               hb_valid, y_valid, hb_test, y_test)
        accs['heartbeat'].append(curr_acc)

        # Combined classifier:
        model = dense_classifier_base()
        sc, curr_acc, epoch_data = model_train(model, cmb_train, y_train, params['batch_size'], params['epochs'],
                                               cmb_valid, y_valid, cmb_test, y_test)
        accs['combined br hb'].append(curr_acc)
        
        
        
        ## ----- Classify with autoencoders:
        # AE Training params
        batch_size = 256
        epochs = 100

        # Undercomplete AE:
        autoencoder, encoded = undercomplete_ae(xt_train, 40)
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = dense_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['undercomplete'].append(curr_acc)

        # Sparse AE:
        autoencoder, encoded = sparse_ae(xt_train, 40)
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = dense_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['sparse'].append(curr_acc)

        # Deep AE:
        autoencoder, encoded = deep_ae(xt_train, enc_layers=[512,256], encoding_dim=40, dec_layers=[256,512])
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = dense_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['deep'].append(curr_acc)

        # Contractive AE:
        autoencoder, encoded = contractive_ae(xt_train, 40)
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = dense_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['contractive'].append(curr_acc)

# Print total time required to run this
end_time = datetime.now()
elapsed_time = end_time - start_time
print("Completed!", "Time elapsed:", elapsed_time)

iteration: 1 of 3 ; time elapsed: 0:00:00.007950
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
iteration: 2 of 3 ; time elapsed: 0:03:20.478562
iteration: 3 of 3 ; time elapsed: 0:06:47.710645
Completed! Time elapsed: 0:10:25.853433


In [26]:
# print accuracies of each method and corresponding id which yielded that accuracy (same row)
pandas.DataFrame.from_dict(accs)

Unnamed: 0,phase,breathing,heartbeat,combined br hb,undercomplete,sparse,deep,contractive,test id
0,0.55,0.55,0.433333,0.691667,0.566667,0.591667,0.508333,0.558333,3n2f9
1,0.616667,0.708333,0.55,0.558333,0.658333,0.583333,0.641667,0.516667,2gu87
2,0.466667,0.333333,0.491667,0.425,0.533333,0.641667,0.633333,0.458333,iz2ps


In [27]:
# print some statistics for each method
print_accs_stats(accs)

phase accuracies:
- min: 0.46666667
- max: 0.6166667
- mean: 0.5444445
- median: 0.55

breathing accuracies:
- min: 0.33333334
- max: 0.7083333
- mean: 0.53055555
- median: 0.55

heartbeat accuracies:
- min: 0.43333334
- max: 0.55
- mean: 0.49166667
- median: 0.49166667

combined br hb accuracies:
- min: 0.425
- max: 0.69166666
- mean: 0.55833334
- median: 0.55833334

undercomplete accuracies:
- min: 0.53333336
- max: 0.65833336
- mean: 0.5861111
- median: 0.56666666

sparse accuracies:
- min: 0.5833333
- max: 0.64166665
- mean: 0.60555553
- median: 0.59166664

deep accuracies:
- min: 0.5083333
- max: 0.64166665
- mean: 0.59444445
- median: 0.6333333

contractive accuracies:
- min: 0.45833334
- max: 0.55833334
- mean: 0.51111114
- median: 0.51666665



#### LSTM-based classifier  
based on the original author's code

Optimize hyperparameters with talos:

In [28]:
def LSTM_classifier(model, params):

    model = layers.Reshape((-1, 1), input_shape=(model.shape), name='reshape_cl') (model)

    model = layers.Dropout(params['dropout'], name='dropout_cl1') (model)
    
    model = Conv1D(params['filters'],
                     params['kernel_size'],
                     padding='valid',
                     activation=params['activation'],
                     strides=params['strides'],
                     name='conv1d_cl1') (model)
    
    model = MaxPooling1D(pool_size=params['pool_size'], name='maxpool_cl1') (model)
    
    model = Conv1D(params['filters'],
                     params['kernel_size'],
                     padding='valid',
                     activation=params['activation'],
                     strides=params['strides'],
                     name='conv1d_cl2') (model)
    
    model = MaxPooling1D(pool_size=params['pool_size'], name='maxpool_cl2') (model)
    
    model = layers.Dropout(params['dropout'], name='dropout_cl2') (model)

    model = LSTM(params['lstm_output_size'], activation='sigmoid', name='lstm_cl') (model)

    model = Dense(1, activation=params['last_activation'], name='dense_cl') (model)
    return model

In [29]:
def LSTM_classifier_base(params):
    
    model = Sequential()
    model.add(Dropout(params['dropout']))
    model.add(Conv1D(params['filters'],
                     params['kernel_size'],
                     padding='valid',
                     activation=params['activation'],
                     strides=params['strides']))

    model.add(MaxPooling1D(pool_size=params['pool_size']))
    model.add(Conv1D(params['filters'],
                     params['kernel_size'],
                     padding='valid',
                     activation=params['activation'],
                     strides=params['strides']))
    model.add(MaxPooling1D(pool_size=params['pool_size']))

    model.add(Dropout(params['dropout']))
    model.add(LSTM(params['lstm_output_size']))
    model.add(Dense(1))
    model.add(Activation(params['last_activation']))

    model.compile(loss=params['loss'],
                  optimizer=params['optimizer'],
                  metrics=['acc'])
    
    return model

In [30]:
params_phase = {
    'kernel_size': 32,
    'strides': 4,
    'pool_size': 2,
    'filters': 8,
    'lstm_output_size': 236,
    'loss': 'binary_crossentropy',
    'dropout': 0.09,
    'activation': 'relu',
    'optimizer': 'Nadam',
    'last_activation': 'sigmoid'
}

In [31]:
params_br_hb = {
    'kernel_size': 2,
    'strides': 1,
    'pool_size': 1,
    'filters': 2,
    'lstm_output_size': 4,
    'loss': 'binary_crossentropy',
    'dropout': 0.09,
    'activation': 'relu',
    'optimizer': 'Nadam',
    'last_activation': 'sigmoid'
}

In [67]:
params = {
    'kernel_size': 4,
    'filters': 2,
    'strides': 2,
    'pool_size': 2,
    'dropout': 0.09,
    'optimizer': 'Nadam',
    'loss': 'binary_crossentropy',
    'activation': 'relu',
    'last_activation': 'sigmoid',
    'lstm_output_size': 256,
    'batch_size': 64,
    'epochs': 100
}

Combine the autoencoders with the classifier: 

In [68]:
# set the variables in the dictionary
accuracies['LSTM'] = {}
accs = accuracies['LSTM']
accs['phase'] = []
accs['breathing'] = []
accs['heartbeat'] = []
accs['combined br hb'] = []
accs['undercomplete'] = []
accs['sparse'] = []
accs['deep'] = []
accs['contractive'] = []
accs['test id'] = []
start_time = datetime.now()

with tf.compat.v1.Session(config=config) as sess:
    # leave out person out validation
    for ident in range(n):

        # print current iteration and time elapsed from start
        print("iteration:", ident+1, "of", n, "; time elapsed:", datetime.now()-start_time)

        ## ----- Data preparation:
        # Split the data
        train_idents = [x for i, x in enumerate(idents) if (i != ident and i != (n-1+ident)%n)]
        validation_idents = [idents[ident]]
        test_idents = [idents[ident-1]]

        # save test id to see which id yielded which accuracies
        accs['test id'].append(test_idents[0])
        
        # Load data (xt-raw phase data, y-class, br-breathing data, hb-heartbeat data, cmb-combined [br,hb])
        xt_train, y_train, br_train, hb_train, cmb_train = get_data_from_idents_br_hb(path, train_idents, seconds)
        xt_valid, y_valid, br_valid, hb_valid, cmb_valid = get_data_from_idents_br_hb(path, validation_idents, seconds)
        xt_test, y_test, br_test, hb_test, cmb_test = get_data_from_idents_br_hb(path, test_idents, seconds)

        # Scale data with standard scaler then MinMax scaler
        # Raw Phase data:
        xt_train, xt_valid, xt_test = scale_data(xt_train, xt_valid, xt_test, standardScaler=True, minMaxScaler=True)
        # Hand extracted breathing data:
        br_train, br_valid, br_test = scale_data(br_train, br_valid, br_test, standardScaler=True, minMaxScaler=True)
        # Hand extracted Heartbeat data:
        hb_train, hb_valid, hb_test = scale_data(hb_train, hb_valid, hb_test, standardScaler=True, minMaxScaler=True)
        # Combined breathing and heartbeat data (joined together into one matrix)
        cmb_train, cmb_valid, cmb_test = scale_data(cmb_train, cmb_valid, cmb_test, standardScaler=True, minMaxScaler=True)
        
        
        
        ## ----- Classify without autoencoders:
        # Phase classifier:
        model = LSTM_classifier_base(params_phase)
        # reshape data for the classifier
        xtt_train = xt_train.reshape(-1, xt_train[0].shape[0], 1)
        xtt_valid = xt_valid.reshape(-1, xt_valid[0].shape[0], 1)
        xtt_test = xt_test.reshape(-1, xt_test[0].shape[0], 1)
        # train and evaluate
        sc, curr_acc, epoch_data = model_train(model, xtt_train, y_train, params['batch_size'], params['epochs'],
                                               xtt_valid, y_valid, xtt_test, y_test)
        accs['phase'].append(curr_acc)

        # Breathing classifier:
        model = LSTM_classifier_base(params_br_hb)
        # reshape data for the classifier
        brt_train = br_train.reshape(-1, br_train[0].shape[0], 1)
        brt_valid = br_valid.reshape(-1, br_valid[0].shape[0], 1)
        brt_test = br_test.reshape(-1, br_test[0].shape[0], 1)
        # train and evaluate
        sc, curr_acc, epoch_data = model_train(model, brt_train, y_train, params['batch_size'], params['epochs'],
                                               brt_valid, y_valid, brt_test, y_test)
        accs['breathing'].append(curr_acc)

        # Heartbeat classifier:
        model = LSTM_classifier_base(params_br_hb)
        # reshape data for the classifier
        hbt_train = hb_train.reshape(-1, hb_train[0].shape[0], 1)
        hbt_valid = hb_valid.reshape(-1, hb_valid[0].shape[0], 1)
        hbt_test = hb_test.reshape(-1, hb_test[0].shape[0], 1)
        # train and evaluate
        sc, curr_acc, epoch_data = model_train(model, hbt_train, y_train, params['batch_size'], params['epochs'],
                                               hbt_valid, y_valid, hbt_test, y_test)
        accs['heartbeat'].append(curr_acc)

        # Combined classifier:
        model = LSTM_classifier_base(params_br_hb)
        # reshape data for the classifier
        cmbt_train = cmb_train.reshape(-1, cmb_train[0].shape[0], 1)
        cmbt_valid = cmb_valid.reshape(-1, cmb_valid[0].shape[0], 1)
        cmbt_test = cmb_test.reshape(-1, cmb_test[0].shape[0], 1)
        # train and evaluate
        sc, curr_acc, epoch_data = model_train(model, cmbt_train, y_train, params['batch_size'], params['epochs'],
                                               cmbt_valid, y_valid, cmbt_test, y_test)
        accs['combined br hb'].append(curr_acc)

        
        
        ## ----- Classify with autoencoders:
        # AE Training params
        batch_size = 256
        epochs = 100

        # undercomplete AE
        autoencoder, encoded = undercomplete_ae(xt_train, 40)
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = LSTM_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['undercomplete'].append(curr_acc)

        # sparse AE
        autoencoder, encoded = sparse_ae(xt_train, 40)
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = LSTM_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['sparse'].append(curr_acc)

        # deep AE
        autoencoder, encoded = deep_ae(xt_train, enc_layers=[512,256], encoding_dim=40, dec_layers=[256,512])
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = LSTM_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['deep'].append(curr_acc)

        # contractive AE
        autoencoder, encoded = contractive_ae(xt_train, 40)
        sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                               xt_valid, xt_valid, xt_test, xt_test)
        model = LSTM_classifier(encoded, params)
        model = Model(inputs=autoencoder.inputs, outputs=model)
        model.compile(loss=params['loss'],
                      optimizer=params['optimizer'],
                      metrics=metrics)
        sc, curr_acc, epoch_data = model_train(model, xt_train, y_train, params['batch_size'], params['epochs'],
                                               xt_valid, y_valid, xt_test, y_test)
        accs['contractive'].append(curr_acc)

end_time = datetime.now()
elapsed_time = end_time - start_time
print("Completed!", "Time elapsed:", elapsed_time)

iteration: 1 of 3 ; time elapsed: 0:00:00.005986


KeyboardInterrupt: 

In [34]:
# print accuracies of each method and corresponding id which yielded that accuracy (same row)
pandas.DataFrame.from_dict(accs)

Unnamed: 0,phase,breathing,heartbeat,combined br hb,undercomplete,sparse,deep,contractive,test id
0,0.691667,0.608333,0.416667,0.525,0.566667,0.575,0.566667,0.558333,3n2f9
1,0.45,0.633333,0.5,0.366667,0.541667,0.616667,0.483333,0.566667,2gu87
2,0.533333,0.316667,0.5,0.425,0.683333,0.516667,0.558333,0.575,iz2ps


In [35]:
# print some statistics for each method
print_accs_stats(accs)

phase accuracies:
- min: 0.45
- max: 0.69166666
- mean: 0.55833334
- median: 0.53333336

breathing accuracies:
- min: 0.31666666
- max: 0.6333333
- mean: 0.51944447
- median: 0.60833335

heartbeat accuracies:
- min: 0.41666666
- max: 0.5
- mean: 0.4722222
- median: 0.5

combined br hb accuracies:
- min: 0.36666667
- max: 0.525
- mean: 0.43888888
- median: 0.425

undercomplete accuracies:
- min: 0.5416667
- max: 0.68333334
- mean: 0.59722227
- median: 0.56666666

sparse accuracies:
- min: 0.51666665
- max: 0.6166667
- mean: 0.5694444
- median: 0.575

deep accuracies:
- min: 0.48333332
- max: 0.56666666
- mean: 0.5361111
- median: 0.55833334

contractive accuracies:
- min: 0.55833334
- max: 0.575
- mean: 0.56666666
- median: 0.56666666



### Helper loop function definition

In [36]:
# a helper loop funciton for the sklearn and XGBoost classifiers
def helper_loop(classifier_function, idents, n=5):
    #returns a dictionary with accuracies

    # set the variables in the dictionary
    accs = {}
    accs['phase'] = []
    accs['breathing'] = []
    accs['heartbeat'] = []
    accs['combined br hb'] = []
    accs['undercomplete'] = []
    accs['sparse'] = []
    accs['deep'] = []
    accs['contractive'] = []
    accs['test id'] = []
    start_time = datetime.now()

    with tf.compat.v1.Session(config=config) as sess:
        # leave out person out validation
        for ident in range(n):

            # print current iteration and time elapsed from start
            print("iteration:", ident+1, "of", n, "; time elapsed:", datetime.now()-start_time)

            ## ----- Data preparation:
            # Split the data
            train_idents = [x for i, x in enumerate(idents) if (i != ident and i != (n-1+ident)%n)]
            validation_idents = [idents[ident]]
            test_idents = [idents[ident-1]]

            # save test id to see which id yielded which accuracies
            accs['test id'].append(test_idents[0])

            # Load data (xt-raw phase data, y-class, br-breathing data, hb-heartbeat data, cmb-combined [br,hb])
            xt_train, y_train, br_train, hb_train, cmb_train = get_data_from_idents_br_hb(path, train_idents, seconds)
            xt_valid, y_valid, br_valid, hb_valid, cmb_valid = get_data_from_idents_br_hb(path, validation_idents, seconds)
            xt_test, y_test, br_test, hb_test, cmb_test = get_data_from_idents_br_hb(path, test_idents, seconds)

            # change the y arrays to flat 1d arrays
            y_train = y_train.ravel()
            y_valid = y_valid.ravel()
            y_test = y_test.ravel()
            
            # Scale data with standard scaler then MinMax scaler
            # Raw Phase data:
            xt_train, xt_valid, xt_test = scale_data(xt_train, xt_valid, xt_test, standardScaler=True, minMaxScaler=True)
            # Hand extracted breathing data:
            br_train, br_valid, br_test = scale_data(br_train, br_valid, br_test, standardScaler=True, minMaxScaler=True)
            # Hand extracted Heartbeat data:
            hb_train, hb_valid, hb_test = scale_data(hb_train, hb_valid, hb_test, standardScaler=True, minMaxScaler=True)
            # Combined breathing and heartbeat data (joined together into one matrix)
            cmb_train, cmb_valid, cmb_test = scale_data(cmb_train, cmb_valid, cmb_test, standardScaler=True, minMaxScaler=True)



            ## ----- Classify without autoencoders:
            # Phase classifier:
            model = classifier_function()
            model.fit(xt_train, y_train)
            curr_acc = np.sum(model.predict(xt_test) == y_test) / y_test.shape[0]
            accs['phase'].append(curr_acc)

            # Breathing classifier:
            base_model = classifier_function()
            base_model.fit(br_train, y_train)
            curr_acc = np.sum(base_model.predict(br_valid) == y_valid) / y_test.shape[0]
            accs['breathing'].append(curr_acc)

            # Heartbeat classifier:
            base_model = classifier_function()
            base_model.fit(hb_train, y_train)
            curr_acc = np.sum(base_model.predict(hb_valid) == y_valid) / y_test.shape[0]
            accs['heartbeat'].append(curr_acc)

            # Combined classifier:
            base_model = classifier_function()
            base_model.fit(cmb_train, y_train)
            curr_acc = np.sum(base_model.predict(cmb_valid) == y_valid) / y_test.shape[0]
            accs['combined br hb'].append(curr_acc)



            ## ----- Classify with autoencoders:
            # AE Training params
            batch_size = 256
            epochs = 100

            # undercomplete AE
            autoencoder, encoded = undercomplete_ae(xt_train, 40, encoded_as_model=True)
            sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                                   xt_valid, xt_valid, xt_test, xt_test)
            model = classifier_function()
            xtt_train = encoded.predict(xt_train)
            xtt_test = encoded.predict(xt_test)
            model.fit(xtt_train, y_train)
            curr_acc = np.sum(model.predict(xtt_test) == y_test) / y_test.shape[0]
            accs['undercomplete'].append(curr_acc)

            # sparse AE
            autoencoder, encoded = sparse_ae(xt_train, 60, encoded_as_model=True)
            sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                                   xt_valid, xt_valid, xt_test, xt_test)
            model = classifier_function()
            xtt_train = encoded.predict(xt_train)
            xtt_test = encoded.predict(xt_test)
            model.fit(xtt_train, y_train)
            curr_acc = np.sum(model.predict(xtt_test) == y_test) / y_test.shape[0]
            accs['sparse'].append(curr_acc)

            # deep AE
            autoencoder, encoded = deep_ae(xt_train, enc_layers=[512,256], encoding_dim=60, dec_layers=[256,512], encoded_as_model=True)
            sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                                   xt_valid, xt_valid, xt_test, xt_test)
            model = classifier_function()
            xtt_train = encoded.predict(xt_train)
            xtt_test = encoded.predict(xt_test)
            model.fit(xtt_train, y_train)
            curr_acc = np.sum(model.predict(xtt_test) == y_test) / y_test.shape[0]
            accs['deep'].append(curr_acc)

            # contractive AE
            autoencoder, encoded = contractive_ae(xt_train, 40, encoded_as_model=True)
            sc, curr_acc, epoch_data = model_train(autoencoder, xt_train, xt_train, batch_size, epochs,
                                                   xt_valid, xt_valid, xt_test, xt_test)
            model = classifier_function()
            xtt_train = encoded.predict(xt_train)
            xtt_test = encoded.predict(xt_test)
            model.fit(xtt_train, y_train)
            curr_acc = np.sum(model.predict(xtt_test) == y_test) / y_test.shape[0]
            accs['contractive'].append(curr_acc)

    # Print total time required to run this
    end_time = datetime.now()
    elapsed_time = end_time - start_time
    print("Completed!", "Time elapsed:", elapsed_time)
    
    return accs

#### kNN

In [37]:
from sklearn.neighbors import KNeighborsClassifier

def KNN_classifier():
    model = KNeighborsClassifier(p=3, n_neighbors=7, metric='cosine')
    return model

Combine the autoencoders with the classifier: 

In [38]:
accs = helper_loop(KNN_classifier, idents, n)

iteration: 1 of 3 ; time elapsed: 0:00:00.007594
iteration: 2 of 3 ; time elapsed: 0:03:18.464849
iteration: 3 of 3 ; time elapsed: 0:06:47.224069
Completed! Time elapsed: 0:10:17.814471


In [39]:
accuracies['kNN'] = accs

In [40]:
# print accuracies of each method and corresponding id which yielded that accuracy (same row)
pandas.DataFrame.from_dict(accs)

Unnamed: 0,phase,breathing,heartbeat,combined br hb,undercomplete,sparse,deep,contractive,test id
0,0.775,0.416667,0.416667,0.566667,0.583333,0.716667,0.733333,0.666667,3n2f9
1,0.633333,0.583333,0.458333,0.558333,0.55,0.525,0.608333,0.483333,2gu87
2,0.491667,0.5,0.583333,0.55,0.633333,0.625,0.616667,0.633333,iz2ps


In [41]:
# print some statistics for each method
print_accs_stats(accs)

phase accuracies:
- min: 0.49166666666666664
- max: 0.775
- mean: 0.6333333333333333
- median: 0.6333333333333333

breathing accuracies:
- min: 0.4166666666666667
- max: 0.5833333333333334
- mean: 0.5
- median: 0.5

heartbeat accuracies:
- min: 0.4166666666666667
- max: 0.5833333333333334
- mean: 0.48611111111111116
- median: 0.4583333333333333

combined br hb accuracies:
- min: 0.55
- max: 0.5666666666666667
- mean: 0.5583333333333333
- median: 0.5583333333333333

undercomplete accuracies:
- min: 0.55
- max: 0.6333333333333333
- mean: 0.5888888888888889
- median: 0.5833333333333334

sparse accuracies:
- min: 0.525
- max: 0.7166666666666667
- mean: 0.6222222222222222
- median: 0.625

deep accuracies:
- min: 0.6083333333333333
- max: 0.7333333333333333
- mean: 0.6527777777777778
- median: 0.6166666666666667

contractive accuracies:
- min: 0.48333333333333334
- max: 0.6666666666666666
- mean: 0.5944444444444444
- median: 0.6333333333333333



####  SVC

In [42]:
from sklearn.svm import SVC

def SVC_classifier():
    model = SVC(kernel='rbf', C=1.5)
    return model

Combine the autoencoders with the classifier: 

In [43]:
accs = helper_loop(SVC_classifier, idents, n)

iteration: 1 of 3 ; time elapsed: 0:00:00.006118
iteration: 2 of 3 ; time elapsed: 0:03:35.575648
iteration: 3 of 3 ; time elapsed: 0:07:14.893960
Completed! Time elapsed: 0:11:14.471448


In [44]:
accuracies['SVC'] = accs

In [45]:
# print accuracies of each method and corresponding id which yielded that accuracy (same row)
pandas.DataFrame.from_dict(accs)

Unnamed: 0,phase,breathing,heartbeat,combined br hb,undercomplete,sparse,deep,contractive,test id
0,0.566667,0.475,0.491667,0.433333,0.5,0.508333,0.475,0.533333,3n2f9
1,0.425,0.366667,0.408333,0.466667,0.5,0.491667,0.491667,0.55,2gu87
2,0.475,0.575,0.508333,0.441667,0.5,0.5,0.516667,0.575,iz2ps


In [46]:
# print some statistics for each method
print_accs_stats(accs)

phase accuracies:
- min: 0.425
- max: 0.5666666666666667
- mean: 0.48888888888888893
- median: 0.475

breathing accuracies:
- min: 0.36666666666666664
- max: 0.575
- mean: 0.47222222222222215
- median: 0.475

heartbeat accuracies:
- min: 0.4083333333333333
- max: 0.5083333333333333
- mean: 0.4694444444444444
- median: 0.49166666666666664

combined br hb accuracies:
- min: 0.43333333333333335
- max: 0.4666666666666667
- mean: 0.44722222222222224
- median: 0.44166666666666665

undercomplete accuracies:
- min: 0.5
- max: 0.5
- mean: 0.5
- median: 0.5

sparse accuracies:
- min: 0.49166666666666664
- max: 0.5083333333333333
- mean: 0.5
- median: 0.5

deep accuracies:
- min: 0.475
- max: 0.5166666666666667
- mean: 0.49444444444444446
- median: 0.49166666666666664

contractive accuracies:
- min: 0.5333333333333333
- max: 0.575
- mean: 0.5527777777777778
- median: 0.55



#### Random Forest

In [47]:
from sklearn.ensemble import RandomForestClassifier
def random_forest_classifier():
    model = RandomForestClassifier(n_estimators = 250,
                                     min_samples_split = 10,
                                     min_samples_leaf = 4,
                                     max_features = 'auto',
                                     max_depth = 90,
                                     bootstrap = True)
    return model

Combine the autoencoders with the classifier: 

In [48]:
accs = helper_loop(random_forest_classifier, idents, n)

iteration: 1 of 3 ; time elapsed: 0:00:00.006981
iteration: 2 of 3 ; time elapsed: 0:03:56.278379
iteration: 3 of 3 ; time elapsed: 0:07:58.041324
Completed! Time elapsed: 0:12:17.872026


In [49]:
accuracies['random_forest'] = accs

In [50]:
# print accuracies of each method and corresponding id which yielded that accuracy (same row)
pandas.DataFrame.from_dict(accs)

Unnamed: 0,phase,breathing,heartbeat,combined br hb,undercomplete,sparse,deep,contractive,test id
0,1.0,0.616667,0.425,0.625,1.0,0.991667,0.983333,0.991667,3n2f9
1,0.416667,0.483333,0.316667,0.466667,0.416667,0.491667,0.5,0.391667,2gu87
2,0.375,0.558333,0.558333,0.5,0.525,0.575,0.583333,0.525,iz2ps


In [51]:
# print some statistics for each method
print_accs_stats(accs)

phase accuracies:
- min: 0.375
- max: 1.0
- mean: 0.5972222222222222
- median: 0.4166666666666667

breathing accuracies:
- min: 0.48333333333333334
- max: 0.6166666666666667
- mean: 0.5527777777777778
- median: 0.5583333333333333

heartbeat accuracies:
- min: 0.31666666666666665
- max: 0.5583333333333333
- mean: 0.43333333333333335
- median: 0.425

combined br hb accuracies:
- min: 0.4666666666666667
- max: 0.625
- mean: 0.5305555555555556
- median: 0.5

undercomplete accuracies:
- min: 0.4166666666666667
- max: 1.0
- mean: 0.6472222222222223
- median: 0.525

sparse accuracies:
- min: 0.49166666666666664
- max: 0.9916666666666667
- mean: 0.6861111111111112
- median: 0.575

deep accuracies:
- min: 0.5
- max: 0.9833333333333333
- mean: 0.688888888888889
- median: 0.5833333333333334

contractive accuracies:
- min: 0.39166666666666666
- max: 0.9916666666666667
- mean: 0.6361111111111111
- median: 0.525



#### Naive Bayesian

In [52]:
from sklearn.naive_bayes import ComplementNB

def naive_bayesian_classifier():
    model = ComplementNB()
    return model

Combine the autoencoders with the classifier: 

In [53]:
accs = helper_loop(naive_bayesian_classifier, idents, n)

iteration: 1 of 3 ; time elapsed: 0:00:00.007977
iteration: 2 of 3 ; time elapsed: 0:03:43.846404
iteration: 3 of 3 ; time elapsed: 0:07:34.599746
Completed! Time elapsed: 0:11:46.461805


In [54]:
accuracies['naive_bayesian'] = accs

In [55]:
# print accuracies of each method and corresponding id which yielded that accuracy (same row)
pandas.DataFrame.from_dict(accs)

Unnamed: 0,phase,breathing,heartbeat,combined br hb,undercomplete,sparse,deep,contractive,test id
0,0.508333,0.6,0.533333,0.616667,0.475,0.566667,0.458333,0.633333,3n2f9
1,0.608333,0.366667,0.566667,0.333333,0.541667,0.641667,0.541667,0.516667,2gu87
2,0.508333,0.3,0.525,0.325,0.541667,0.583333,0.508333,0.516667,iz2ps


In [56]:
# print some statistics for each method
print_accs_stats(accs)

phase accuracies:
- min: 0.5083333333333333
- max: 0.6083333333333333
- mean: 0.5416666666666666
- median: 0.5083333333333333

breathing accuracies:
- min: 0.3
- max: 0.6
- mean: 0.4222222222222222
- median: 0.36666666666666664

heartbeat accuracies:
- min: 0.525
- max: 0.5666666666666667
- mean: 0.5416666666666666
- median: 0.5333333333333333

combined br hb accuracies:
- min: 0.325
- max: 0.6166666666666667
- mean: 0.425
- median: 0.3333333333333333

undercomplete accuracies:
- min: 0.475
- max: 0.5416666666666666
- mean: 0.5194444444444444
- median: 0.5416666666666666

sparse accuracies:
- min: 0.5666666666666667
- max: 0.6416666666666667
- mean: 0.5972222222222223
- median: 0.5833333333333334

deep accuracies:
- min: 0.4583333333333333
- max: 0.5416666666666666
- mean: 0.5027777777777778
- median: 0.5083333333333333

contractive accuracies:
- min: 0.5166666666666667
- max: 0.6333333333333333
- mean: 0.5555555555555555
- median: 0.5166666666666667



#### XGBoost

In [57]:
from xgboost import XGBClassifier

def XGBoost_classifier():
    model = XGBClassifier(n_estimators = 83)
    return model

Combine the autoencoders with the classifier: 

In [58]:
accs = helper_loop(XGBoost_classifier, idents, n)

iteration: 1 of 3 ; time elapsed: 0:00:00.009973
iteration: 2 of 3 ; time elapsed: 0:04:15.784391
iteration: 3 of 3 ; time elapsed: 0:08:29.976965
Completed! Time elapsed: 0:13:07.088634


In [59]:
accuracies['XGBoost'] = accs

In [60]:
# print accuracies of each method and corresponding id which yielded that accuracy (same row)
pandas.DataFrame.from_dict(accs)

Unnamed: 0,phase,breathing,heartbeat,combined br hb,undercomplete,sparse,deep,contractive,test id
0,1.0,0.583333,0.45,0.433333,1.0,1.0,0.991667,1.0,3n2f9
1,0.45,0.516667,0.35,0.433333,0.408333,0.433333,0.533333,0.416667,2gu87
2,0.425,0.516667,0.541667,0.483333,0.508333,0.541667,0.55,0.533333,iz2ps


In [61]:
# print some statistics for each method
print_accs_stats(accs)

phase accuracies:
- min: 0.425
- max: 1.0
- mean: 0.625
- median: 0.45

breathing accuracies:
- min: 0.5166666666666667
- max: 0.5833333333333334
- mean: 0.5388888888888889
- median: 0.5166666666666667

heartbeat accuracies:
- min: 0.35
- max: 0.5416666666666666
- mean: 0.44722222222222224
- median: 0.45

combined br hb accuracies:
- min: 0.43333333333333335
- max: 0.48333333333333334
- mean: 0.45
- median: 0.43333333333333335

undercomplete accuracies:
- min: 0.4083333333333333
- max: 1.0
- mean: 0.6388888888888888
- median: 0.5083333333333333

sparse accuracies:
- min: 0.43333333333333335
- max: 1.0
- mean: 0.6583333333333333
- median: 0.5416666666666666

deep accuracies:
- min: 0.5333333333333333
- max: 0.9916666666666667
- mean: 0.6916666666666668
- median: 0.55

contractive accuracies:
- min: 0.4166666666666667
- max: 1.0
- mean: 0.65
- median: 0.5333333333333333



###  Compare Accuracies

Print min, max, mean, median for each clasifier/autoencoder combination:

In [62]:
for classifier in accuracies:
    print("-----------", classifier + ":", "-----------")
    accs = accuracies[classifier]
    print_accs_stats(accs)
    print("\n")

----------- simple_dense: -----------
phase accuracies:
- min: 0.46666667
- max: 0.6166667
- mean: 0.5444445
- median: 0.55

breathing accuracies:
- min: 0.33333334
- max: 0.7083333
- mean: 0.53055555
- median: 0.55

heartbeat accuracies:
- min: 0.43333334
- max: 0.55
- mean: 0.49166667
- median: 0.49166667

combined br hb accuracies:
- min: 0.425
- max: 0.69166666
- mean: 0.55833334
- median: 0.55833334

undercomplete accuracies:
- min: 0.53333336
- max: 0.65833336
- mean: 0.5861111
- median: 0.56666666

sparse accuracies:
- min: 0.5833333
- max: 0.64166665
- mean: 0.60555553
- median: 0.59166664

deep accuracies:
- min: 0.5083333
- max: 0.64166665
- mean: 0.59444445
- median: 0.6333333

contractive accuracies:
- min: 0.45833334
- max: 0.55833334
- mean: 0.51111114
- median: 0.51666665



----------- LSTM: -----------
phase accuracies:
- min: 0.45
- max: 0.69166666
- mean: 0.55833334
- median: 0.53333336

breathing accuracies:
- min: 0.31666666
- max: 0.6333333
- mean: 0.51944447
- me

Print all accuracies in table form:

In [63]:
for classifier in accuracies:
    print(classifier + ":")
    print(pandas.DataFrame.from_dict(accuracies[classifier]))
    print("\n")

simple_dense:
      phase  breathing  heartbeat  combined br hb  undercomplete    sparse  \
0  0.550000   0.550000   0.433333        0.691667       0.566667  0.591667   
1  0.616667   0.708333   0.550000        0.558333       0.658333  0.583333   
2  0.466667   0.333333   0.491667        0.425000       0.533333  0.641667   

       deep  contractive test id  
0  0.508333     0.558333   3n2f9  
1  0.641667     0.516667   2gu87  
2  0.633333     0.458333   iz2ps  


LSTM:
      phase  breathing  heartbeat  combined br hb  undercomplete    sparse  \
0  0.691667   0.608333   0.416667        0.525000       0.566667  0.575000   
1  0.450000   0.633333   0.500000        0.366667       0.541667  0.616667   
2  0.533333   0.316667   0.500000        0.425000       0.683333  0.516667   

       deep  contractive test id  
0  0.566667     0.558333   3n2f9  
1  0.483333     0.566667   2gu87  
2  0.558333     0.575000   iz2ps  


kNN:
      phase  breathing  heartbeat  combined br hb  undercomplete 