# PAMAP 2 Model 1: Artificial Neural Network
#### Dataset Source: https://archive.ics.uci.edu/ml/datasets/PAMAP2+Physical+Activity+Monitoring
#### Cleaning code source: https://www.kaggle.com/avrahamcalev/time-series-models-pamap2-dataset

This is the same model as our ANN, but tested on the dataset PAMAP2 to validate the architecture of our model.

INPUT: **pamap2_clean.csv**



### Import libraries

In [4]:
import numpy as np
import pandas as pd
import tensorflow as tf
import random
import keras
from keras import models
from keras import layers
import numpy as np
from keras.utils import np_utils

### Import data

In [5]:
random.seed(321)

In [6]:
df  = pd.read_csv('../../../../10_code/40_usable_data_for_models/42_PAMAP2/pamap2_clean.csv')

### Take only relevant classes
Classes 1, 3 and 4 represent the classes most representative for our data. 

**Classes**
- 1: Lying (DB)
- 3: Standing
- 4: Walking

Take a look at the readme in the data source to pick other classes.

In [7]:
df = df[df['activity_id'].isin([1, 3, 4])]

In [8]:
df['activity_id'] = df['activity_id'].astype(int)
df['id'] = df['id'].astype(int)

### Split into test and train sets (by Subject ID)

In [9]:
ID_list = list(df['id'].unique())
random.shuffle(ID_list)
train = pd.DataFrame()
test = pd.DataFrame()

The size of the test train split can be changed by changing the index below. For our purposes, n = 6 for train and n = 3 for test.

In [10]:
train = df[df['id'].isin(ID_list[:6])]
test = df[df['id'].isin(ID_list[6:])]

In [11]:
print(train.shape, test.shape)

(463666, 55) (157549, 55)


### Prepare data for input into neural networks

In [12]:
no_act = ['id', 'heart_rate', 'hand_temperature',
       'hand_3D_acceleration_16_x', 'hand_3D_acceleration_16_y',
       'hand_3D_acceleration_16_z', 'hand_3D_acceleration_6_x',
       'hand_3D_acceleration_6_y', 'hand_3D_acceleration_6_z',
       'hand_3D_gyroscope_x', 'hand_3D_gyroscope_y', 'hand_3D_gyroscope_z',
       'hand_3D_magnetometer_x', 'hand_3D_magnetometer_y',
       'hand_3D_magnetometer_z', 'hand_4D_orientation_x',
       'hand_4D_orientation_y', 'hand_4D_orientation_z',
       'hand_4D_orientation_w', 'chest_temperature',
       'chest_3D_acceleration_16_x', 'chest_3D_acceleration_16_y',
       'chest_3D_acceleration_16_z', 'chest_3D_acceleration_6_x',
       'chest_3D_acceleration_6_y', 'chest_3D_acceleration_6_z',
       'chest_3D_gyroscope_x', 'chest_3D_gyroscope_y', 'chest_3D_gyroscope_z',
       'chest_3D_magnetometer_x', 'chest_3D_magnetometer_y',
       'chest_3D_magnetometer_z', 'chest_4D_orientation_x',
       'chest_4D_orientation_y', 'chest_4D_orientation_z',
       'chest_4D_orientation_w', 'ankle_temperature',
       'ankle_3D_acceleration_16_x', 'ankle_3D_acceleration_16_y',
       'ankle_3D_acceleration_16_z', 'ankle_3D_acceleration_6_x',
       'ankle_3D_acceleration_6_y', 'ankle_3D_acceleration_6_z',
       'ankle_3D_gyroscope_x', 'ankle_3D_gyroscope_y', 'ankle_3D_gyroscope_z',
       'ankle_3D_magnetometer_x', 'ankle_3D_magnetometer_y',
       'ankle_3D_magnetometer_z', 'ankle_4D_orientation_x',
       'ankle_4D_orientation_y', 'ankle_4D_orientation_z',
       'ankle_4D_orientation_w']

Here we choose the relevant columns, excluding only the timestamps.

In [13]:
X_train = train[no_act]
X_test = test[no_act]

print(X_train.shape, X_test.shape)
train_shape = len(X_train)

(463666, 53) (157549, 53)


In [14]:
X_train_df = X_train

### Get y labels (Activity)

In [15]:
y_train = train['activity_id'].values
y_test = test['activity_id'].values

### Apply one-hot encoding to Subject ID

One hot encoding is applied so subject_ID, so that it may be used as a variable in our model. This allows the model to understand that testing data contains new subjects that are not present in the training data. 

In [16]:
full_df = X_train.append(X_test)
full_df_dummy = pd.concat([full_df, pd.get_dummies(full_df['id'], prefix='id', drop_first = True)],axis=1)
full_df_dummy = full_df_dummy.drop(['id'], axis = 1)
full_df_dummy.shape

(621215, 59)

In [17]:
X_train = full_df_dummy.iloc[:train_shape]
X_test = full_df_dummy.iloc[train_shape:]
print(X_train.shape,  y_train.shape, X_test.shape, y_test.shape)

(463666, 59) (463666,) (157549, 59) (157549,)


### Scaling features

Scaling is used to change values without distorting differences in the range of values for each sensor. We do this because different sensor values are not in similar ranges of each other and if we did not scale the data, gradients may oscillate back and forth and take a long time before finding the local minimum. It may not be necessary for this data, but to be sure, we normalized the features.

The standard score of a sample x is calculated as:

$$z = \frac{x-u}{s}$$

Where u is the mean of the data, and s is the standard deviation of the data of a single sample. The scaling is fit on the training set and applied to both the training and test set.

In [18]:
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

In [19]:
ss = StandardScaler()

In [20]:
X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)

Tensorflow requires one-hot encoding for more than two classes in the target class.

In [21]:
y_train_dummy = np_utils.to_categorical(y_train)
y_test_dummy = np_utils.to_categorical(y_test)

In [22]:
y_train_dummy

array([[0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       ...,
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1.],
       [0., 0., 0., 0., 1.]], dtype=float32)

### Neural Network
#### Architecture:
- 7 hidden **fully connected** layers with 128 nodes

- The **Dropout** layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting.

- **Softmax** acitvation function - Used to generate probabilities for each class as an output in the final fully connected layer of the model

We decided to use ADAM as our optimizer as it is computationally efficient and updates the learning rate on a per-parameter basis, based on a moving estimate per-parameter gradient, and the per-parameter squared gradient. 

In [23]:
from keras.callbacks import ModelCheckpoint
import datetime
model_checkpoint = ModelCheckpoint('./models/HARnet.hdf5', monitor='val_loss', verbose=1, save_best_only=True)
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

Model checkpoint is used to save weights of the best model. In the code below, we set the best model to be defined as the model with the least validation loss.

In [24]:
dict1 = dict.fromkeys({'id', 'acc'})

In [None]:
from sklearn.model_selection import LeaveOneGroupOut

# Lists to store metrics
acc_per_fold = []
loss_per_fold = []
dict_list = []

# Define the K-fold Cross Validator
groups = X_train_df['id'].values
inputs = X_train
targets = y_train_dummy
logo = LeaveOneGroupOut()

logo.get_n_splits(inputs, targets, groups)

cv = logo.split(inputs, targets, groups)

fold_no = 1
for train, test in cv:
    #Define the model architecture
    network3 = models.Sequential()
    network3.add(layers.Dense(128, activation='relu', input_dim = 59))
    network3.add(layers.Dense(128, activation='relu', input_dim = 59))
    network3.add(layers.Dense(128, activation='relu', input_dim = 59))
    network3.add(layers.Dropout(0.5))
    network3.add(layers.Dense(128, activation='relu', input_dim = 59))
    network3.add(layers.Dense(128, activation='relu', input_dim = 59))
    network3.add(layers.Dense(128, activation='relu', input_dim = 59))
    network3.add(layers.Dense(128, activation='relu', input_dim = 59))
    network3.add(layers.Dense(5, activation='softmax'))
    network3.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    print('------------------------------------------------------------------------')
    print(f'Training for fold {fold_no} ...')

    # Fit data to model
    history = network3.fit(inputs[train], targets[train],
              batch_size=32,
              epochs=5,
              verbose=1,
            callbacks = [model_checkpoint])

    # Generate generalization metrics
    scores = network3.evaluate(inputs[test], targets[test], verbose=0)
    print(f'Score for fold {fold_no}: {network3.metrics_names[0]} of {scores[0]}; {network3.metrics_names[1]} of {scores[1]*100}%')
    acc_per_fold.append(scores[1] * 100)
    loss_per_fold.append(scores[0])
    
    # Increase fold number
    fold_no = fold_no + 1
    

# == Provide average scores ==
print('------------------------------------------------------------------------')
print('Score per fold')

for i in range(0, len(acc_per_fold)):
    print('------------------------------------------------------------------------')
    print(f'> Fold {i+1} - Loss: {loss_per_fold[i]} - Accuracy: {acc_per_fold[i]}%')
    dict_new = dict1.copy()
    dict_new['id'] = ID_list[i]
    dict_new['acc'] = acc_per_fold[i]
    dict_list.append(dict_new)
print('------------------------------------------------------------------------')
print('Average scores for all folds:')
print(f'> Accuracy: {np.mean(acc_per_fold)} (+- {np.std(acc_per_fold)})')
print(f'> Loss: {np.mean(loss_per_fold)}')
print('------------------------------------------------------------------------')

------------------------------------------------------------------------
Training for fold 1 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Score for fold 1: loss of 37.284629821777344; accuracy of 69.47875618934631%
------------------------------------------------------------------------
Training for fold 2 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Score for fold 2: loss of 2.0920610427856445; accuracy of 98.62642288208008%
------------------------------------------------------------------------
Training for fold 3 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Score for fold 3: loss of 0.07124839723110199; accuracy of 99.03648495674133%
------------------------------------------------------------------------
Training for fold 4 ...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Score for fold 4: loss of 0.28446343541145325; accuracy of 98.81771206855774%
------------------------------------------------------------------------
Training for fold 5 ...
Epoch

### Prediction

Argmax is used to select the output class with the highest probability in the output as these are the prediction labels for our test data.

In [None]:
y_pred = np.argmax(network3.predict(X_test), axis=-1)

In [None]:
results = network3.evaluate(X_test, y_test_dummy, batch_size=128)
print("Test loss, Test acc:", results)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, f1_score

A **confusion matrix** is generated to observe where the model is classifying well and to see classes which the model is not classifying well.

In [None]:
cm = confusion_matrix(y_pred, y_test)
cm = cm/cm.astype(np.float).sum(axis=1)

In [None]:
import seaborn as sns
from matplotlib import pyplot as plt

ax = plt.subplot()
sns.heatmap(cm, annot = True, fmt = '.2f',cmap = 'Blues', xticklabels = ['Lying', 'Standing', 'Walking'], yticklabels = ['Lying', 'Standing', 'Walking'])
ax.set_xlabel("Predicted labels")
ax.set_ylabel('Actual labels')
plt.title('PAMAP2 ANN Confusion Matrix')
plt.savefig('PAMAP2_ANN_conf_matrix.png')

The **accuracy** score represents the proportion of correct classifications over all classifications.

The **F1 score** is a composite metric of two other metrics:

Specificity: proportion of correct 'positive predictions' over all 'positive' predictions.

Sensitivity: number of correct 'negative' predictions over all 'negative' predictions.

The F1 score gives insight as to whether all classes are predicted correctly at the same rate. A low F1 score and high accuracy can indicate that only a majority class is predicted.

In [None]:
a_s = accuracy_score(y_test, y_pred)
f1_s = f1_score(y_test, y_pred, average = 'weighted')
print(f'Accuracy Score: {a_s} \nF1 Score: {f1_s}')

In [None]:
network3.save("./model/")

In [None]:
pd.DataFrame(dict_list).to_csv('ANN_both_results.csv', index = False)