# Classifying state of activity with acceleration data

In this document, we want to evaluate performance of a one-dimensional convolutional network model on multiclass classification of activity using acceleration data, and compare with other models (logistic regression, svm, gradient boosting). 

### Loading/splitting data

In this document, we will compare performance of different models on activity classification with accelerations data. 

We will first load the data and the labels, and print first few lines of them to see that we have four different classes. 

In [1]:
import json
import pandas as pd
import numpy as np

df1 = pd.read_csv('data/motion1.csv',header=None)
df2 = pd.read_csv('data/motion2.csv',header=None)
df3 = pd.read_csv('data/motion3.csv',header=None)
df4 = pd.read_csv('data/motion4.csv',header=None)
df5 = pd.read_csv('data/motion5.csv',header=None)
df = pd.concat([df1, df2, df3, df4, df5])
print('# rows of acceleration data:', len(df))
print('')
acc_data = df.loc[:,1:3]
lbl_data = df.loc[:,8]
print('first 5 lines of acceleration data:')
print(acc_data.head())
print('')
print('first 5 lines of labels:')
print(lbl_data.head())
print('')
print('classes:')
print(set(lbl_data))

# rows of acceleration data: 2018

first 5 lines of acceleration data:
         1        2         3
0  0.27203  1.00820 -0.082102
1  0.27203  1.00820 -0.082102
2  0.44791  0.91636 -0.013684
3  0.44791  0.91636 -0.013684
4  0.34238  0.96229 -0.059296

first 5 lines of labels:
0    1
1    1
2    1
3    1
4    1
Name: 8, dtype: int64

classes:
{1, 2, 3, 4}


We will then split the data and labels into training and testing sets. 

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(acc_data, lbl_data, test_size=0.3, shuffle=True)

### One-dimensional convolutional network

We will start by evaluating the performance of a 1d conv net on classifying this data. 

The 1d conv net is constructed as follows. 

In [3]:
import tensorflow as tf
from tensorflow import keras

# conv net
cnn = keras.Sequential([
    keras.layers.Conv1D(filters=100, kernel_size=2, input_shape=(None, 3), padding='same'), 
    keras.layers.Dropout(0.5), 
    #keras.layers.MaxPooling1D(2, padding='same'), 
    keras.layers.Dense(4, activation='softmax')
])
cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

As in the previous document, we need to reshape our data and labels to appropriate dimensions. 

In [4]:
X_train_cnn = np.array(X_train)
X_test_cnn = np.array(X_test)

idx_train = [int(idx) for idx in y_train-1]
idx_test = [int(idx) for idx in y_test-1]
y_train_cnn = np.zeros((len(y_train), 4))
y_train_cnn[np.arange(len(y_train)), idx_train] = 1
y_test_cnn = np.zeros((len(y_test), 4))
y_test_cnn[np.arange(len(y_test)), idx_test] = 1

X_train_cnn = X_train_cnn.reshape(X_train_cnn.shape[0], 1, X_train_cnn.shape[1])
X_test_cnn = X_test_cnn.reshape(X_test_cnn.shape[0], 1, X_test_cnn.shape[1])
y_train_cnn = y_train_cnn.reshape(y_train_cnn.shape[0], 1, y_train_cnn.shape[1])
y_test_cnn = y_test_cnn.reshape(y_test_cnn.shape[0], 1, y_test_cnn.shape[1])

We will then evaluate the performance of our model with accuracy, precision-recall, and f-score. 

In [5]:
cnn.fit(X_train_cnn, y_train_cnn, validation_split=0.3, epochs=50)
loss, acc = cnn.evaluate(X_test_cnn, y_test_cnn)
print('1d conv net results:')
print('loss:', loss)
print('accuracy:', acc)

from sklearn.metrics import precision_score, recall_score, f1_score
y_pred_cnn_test = cnn.predict(X_test_cnn)
y_pred_cnn_test = np.argmax(y_pred_cnn_test, axis=2)
y_pred_cnn_test = [row[0] for row in (y_pred_cnn_test+1)]
print('precision for 4 classes:', precision_score(y_test, y_pred_cnn_test, average=None))
print('recall for 4 classes:', recall_score(y_test, y_pred_cnn_test, average=None))
print('f1-score for 4 classes:', f1_score(y_test, y_pred_cnn_test, average=None))

Train on 988 samples, validate on 424 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
1d conv net results:
loss: 0.24605347122689677
accuracy: 0.9092409229121192
precision for 4 classes: [0.77486911 0.88679245 1.         0.64705882]
recall for 4 classes: [0.925      0.75806452 1.         0.28205128]
f1-score for 4 classes: [0.84330484 0.8173913  1.         0.39285714]


This network has about 91% accuracy, high precision-recall rates for the first 3 classes, but lower precision-recall rates for class 4. 

### Comparison with other models

It is important for us to run other models on the same data to interpret the results of the 1d conv net. 

We will first run logistic regression and svm on our data. 

In [6]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression().fit(X_train, y_train)
print('logistic regression results:')
print('accuracy', accuracy_score(y_test, lr.predict(X_test)))
print('precision for 4 classes:', precision_score(y_test, lr.predict(X_test), average=None))
print('recall for 4 classes:', recall_score(y_test, lr.predict(X_test), average=None))
print('f1-score for 4 classes:', f1_score(y_test, lr.predict(X_test), average=None))
print('')

from sklearn.svm import SVC
sv = SVC().fit(X_train, y_train)
print('svm results:')
print('accuracy', accuracy_score(y_test, sv.predict(X_test)))
print('precision for 4 classes:', precision_score(y_test, sv.predict(X_test), average=None))
print('recall for 4 classes:', recall_score(y_test, sv.predict(X_test), average=None))
print('f1-score for 4 classes:', f1_score(y_test, sv.predict(X_test), average=None))

logistic regression results:
accuracy 0.8712871287128713
precision for 4 classes: [0.67521368 1.         1.         0.5       ]
recall for 4 classes: [0.9875     0.37096774 1.         0.05128205]
f1-score for 4 classes: [0.80203046 0.54117647 1.         0.09302326]

svm results:
accuracy 0.8811881188118812
precision for 4 classes: [0.70183486 0.85714286 0.99710983 0.        ]
recall for 4 classes: [0.95625    0.58064516 1.         0.        ]
f1-score for 4 classes: [0.80952381 0.69230769 0.99855282 0.        ]


  'precision', 'predicted', average, warn_for)
  'precision', 'predicted', average, warn_for)


While these models have only slightly lower accuracies, the precision-recall rates are much worse. 

We will now try the gradient boosting classifier. 

In [7]:
from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier().fit(X_train, y_train)
print('gradient boosting results:')
print('accuracy', accuracy_score(y_test, gb.predict(X_test)))
print('precision for 4 classes:', precision_score(y_test, gb.predict(X_test), average=None))
print('recall for 4 classes:', recall_score(y_test, gb.predict(X_test), average=None))
print('f1-score for 4 classes:', f1_score(y_test, gb.predict(X_test), average=None))

gradient boosting results:
accuracy 0.9620462046204621
precision for 4 classes: [0.89714286 0.93333333 1.         0.96296296]
recall for 4 classes: [0.98125    0.90322581 0.99710145 0.66666667]
f1-score for 4 classes: [0.93731343 0.91803279 0.99854862 0.78787879]


The gradient boosting classifier works really well in this set up, with higher accuracy of about 96% and very good precision-recall rates. 