# Human Activity Recognition using TSFEL

On this example we will perform Human Activty Recognition using our library **Time Series Features Library**. 

The first step consists on importing the library. Press play. 

The import can take a few seconds, but the play button will change so that you know the import has started. Please, try to import just once.

In [0]:
#@title Import Time Series Features Library
import warnings
warnings.filterwarnings('ignore')
!git clone https://github.com/TSFDlib/TSFEL.git >/dev/null 2>&1
!pip install --upgrade -q gspread >/dev/null 2>&1
!pip install gspread oauth2client >/dev/null 2>&1
!pip install pandas >/dev/null 2>&1
!pip install scipy >/dev/null 2>&1
!pip install novainstrumentation >/dev/null 2>&1
!pip install pandas_profiling >/dev/null 2>&1
!pip install matplotlib

To check if everything was correctly imported, access "Files" (on the left side of the screen) and press "Refresh". If a TSFEL paste does not appear import again.

---


#Dataset

The dataset we will be using is open-source. It is provided by UCI and it was performed by 30 volunteers using a smartphone on the waist. It contains 6 activities: 

*   Walking
*   Standing
*   Sitting
*   Laying
*   Upstairs
*   Downstairs

To access this dataset click [here](https://archive.ics.uci.edu/ml/machine-learning-databases/00240/) and dowload the zip "UCI_HAR_Dataset". Now, unzip the folder and upload to "Files" the following txt:

*   UCI HAR Dataset/train/Inertial Signals/total_acc_x_train.txt
*   UCI HAR Dataset/train/Inertial Signals/total_acc_y_train.txt
*   UCI HAR Dataset/train/Inertial Signals/total_acc_z_train.txt
*   UCI HAR Dataset/test/Inertial Signals/total_acc_x_test.txt
*   UCI HAR Dataset/test/Inertial Signals/total_acc_y_test.txt
*   UCI HAR Dataset/test/Inertial Signals/total_acc_z_test.txt
*   UCI HAR Dataset/train/y_train.txt
*   UCI HAR Dataset/test/y_test.txt
*   UCI HAR Dataset/activity_labels.txt






In [0]:
#@title Data Preparation
# Import libraries
import pandas as pd 
import numpy as np
import TSFEL as tslib
import matplotlib.pyplot as plt 
import seaborn as sns
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
sns.set()

# Load data
x_train_sig = np.loadtxt('total_acc_x_train.txt', dtype='float32')
x_test_sig = np.loadtxt('total_acc_x_test.txt', dtype='float32')
y_test = np.loadtxt('y_test.txt')
y_train = np.loadtxt('y_train.txt')
activity_labels = np.array(pd.read_csv('activity_labels.txt', header=None, delimiter=' '))[:,1]

In [0]:
#@Signal Preview
plt.close('all')
plt.figure()
plt.plot(np.concatenate(x_train_sig[0:5]))
plt.xlabel("time (s)")
plt.ylabel("Acceleration (m/s²)")
plt.title("Accelerometer Signal")
plt.legend('x axis')
plt.show()

#Feature Extraction

The features to extract are defined in the [google sheet](https://docs.google.com/spreadsheets/d/15Db3m7if7xkZBqHDUXtFxrwIcBqKvIBU0XnV6aKa4SI/edit?ts=5bd1eca0#gid=0). Save a copy on your local drive and share it with featext@featext.iam.gserviceaccount.com.

Through **Feature Extraction** methodologies, the data is translated into a feature vecture containing information about the signal properties of each window. These properties can be classifier according to their domain as Time, Frequency and Statistical features and allow to characterise the signal in a compact way, enhancing its chracteristics. This features will be used as input to the machine learning classifier, thus, the chosen set of features can strongly influence the classification output.

A feature report is saved in /utils.

In [23]:
#@title Feature Extraction

# Extract excel info
googleSheet_name = 'Configuration Manager2'
cfg_file = tslib.extract_sheet(googleSheet_name)

# Get features
X_train = tslib.extract_features(x_train_sig, 'x', cfg_file, segment=False)
X_test = tslib.extract_features(x_test_sig, 'x', cfg_file, segment=False)

*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***


# Feature Selection

After the sensor data is extracted, redundancies and noise should be removed. Thus, minimising the algorithm's error, time and computational complexity. 

In [25]:
# Concatenation of entire data
features = pd.concat([X_train, X_test])
# Highly correlated features are removed
features = tslib.correlation_report(features)
X_train = features[:len(X_train)]
X_test = features[len(X_train):]

Do you wish to remove correlated features? Enter y/n: y
Removing x_Mean
Removing x_Mean absolute deviation
Removing x_Mean absolute diff
Removing x_Median
Removing x_Median absolute deviation
Removing x_Median absolute diff
Removing x_Minimum peaks
Removing x_Root mean square
Removing x_Signal distance
Removing x_Spectral centroid
Removing x_Spectral roll-off
Removing x_Spectral skewness
Removing x_Spectral slope
Removing x_Standard Deviation
Removing x_Sum absolute diff
Removing x_Total energy
Removing x_Variance


In [26]:
#@title Preview Features
features

Unnamed: 0,x_Autocorrelation,x_Centroid,x_Curve distance,x_Fundamental frequency,x_Interquartile range,x_Kurtosis,x_Linear regression,x_Max,x_Max power spectrum,x_Maximum frequency,...,x_Median frequency,x_Min,x_Skewness,x_Spectral decrease,x_Spectral kurtosis,x_Spectral maximum peaks,x_Spectral roll-on,x_Spectral spread,x_Spectral variation,x_Zero crossing rate
0,132.990082,0.600000,-5.378850,7.142857,0.002154,1.252338,0.000040,1.024606,0.145345,42.857143,...,19.841270,1.012817,-0.278930,0.054072,1.754656,15.0,2.380952,185.373373,0.169098,0.0
1,133.027573,0.600174,-11.461497,7.142857,0.002227,1.274470,0.000057,1.024606,0.168814,38.095238,...,14.285714,1.012893,-0.395466,-0.112505,2.501218,18.0,1.587302,126.667827,0.243198,0.0
2,133.153091,0.600440,-20.330274,6.349206,0.003097,1.796904,0.000430,1.027664,0.158406,35.714286,...,14.285714,1.009013,-0.565047,-0.155699,3.313346,18.0,1.587302,99.562581,0.289037,0.0
3,133.263214,0.600048,-17.354680,2.380952,0.002786,3.924806,0.000038,1.027664,0.072581,37.301587,...,14.285714,1.009013,-0.810924,0.090457,3.719072,17.0,0.793651,103.574423,0.370093,0.0
4,133.238373,0.599864,-12.770025,6.349206,0.002506,0.980626,-0.000138,1.026194,0.138429,38.888889,...,14.285714,1.013645,-0.220642,-0.048981,3.064475,16.0,1.587302,127.444790,0.149942,0.0
5,133.218201,0.600062,-14.468570,2.380952,0.002937,0.231196,-0.000064,1.026194,0.225888,37.301587,...,14.285714,1.013645,-0.096764,0.117347,2.697658,20.0,3.174603,116.400343,0.183154,0.0
6,133.336243,0.600031,-13.127302,4.761905,0.003302,-0.403270,0.000281,1.027167,0.270912,42.857143,...,15.079365,1.014804,0.004810,0.037938,2.609747,15.0,2.380952,162.824526,0.393111,0.0
7,133.331238,0.599679,-18.351105,7.142857,0.003470,-0.369774,-0.000419,1.027167,0.174296,38.095238,...,12.698413,1.014804,0.126112,-0.040477,3.048659,18.0,1.587302,122.737242,0.266086,0.0
8,133.321686,0.599966,-15.176105,6.349206,0.002640,0.457663,0.000194,1.027036,0.115106,37.301587,...,11.904762,1.014837,0.313581,-0.030006,2.709078,17.0,0.793651,122.824971,0.491425,0.0
9,133.487946,0.600263,-20.852336,5.555556,0.003564,0.501358,0.000183,1.028210,0.236294,36.507937,...,12.698413,1.011610,-0.262807,0.086988,2.914249,16.0,2.380952,104.471844,0.427126,0.0


#Classification

In this example the classification is performed with a [Decision Tree](https://scikit-learn.org/stable/modules/tree.html) classifier.
  

In [27]:
classifier = DecisionTreeClassifier()

# Train the classifier
classifier.fit(X_train, y_train.ravel())

# Predict test data
y_test_predict = classifier.predict(X_test)

# Get the classification report
accuracy = accuracy_score(y_test, y_test_predict) *100
print(classification_report(y_test, y_test_predict))
print("Accuracy: " + str(accuracy) + '%')

              precision    recall  f1-score   support

         1.0       0.74      0.86      0.80       496
         2.0       0.79      0.72      0.76       471
         3.0       0.83      0.76      0.79       420
         4.0       0.58      0.54      0.56       491
         5.0       0.60      0.64      0.62       532
         6.0       1.00      1.00      1.00       537

   micro avg       0.76      0.76      0.76      2947
   macro avg       0.76      0.75      0.75      2947
weighted avg       0.76      0.76      0.75      2947

Accuracy: 0.7556837461825585%


In [0]:
df_cm = pd.DataFrame(cm, index=[i for i in activity_labels], columns=[i for i in activity_labels])
plt.figure()
ax = sns.heatmap(df_cm,  cbar = False, cmap="BuGn", annot=True, fmt="d")
plt.setp(ax.get_xticklabels(), rotation=45)
plt.ylabel('True label', fontweight='bold', fontsize = 18)
plt.xlabel('Predicted label', fontweight='bold', fontsize = 18)
plt.show()

# Conclusion

As it can be seen in the confusion matrix, the misclassification was higher between WALKING UPSTAIRS vs WALKING DOWNSTAIRS vs WALKING and SITTING vs STANDING. Dynamic activities, due to their distinct motion characteristics and cyclic behaviour, were clearly discriminated against static activities.

In [56]:
#@title Improvements
# Load data
y_train_sig = np.loadtxt('total_acc_y_train.txt', dtype='float32')
y_test_sig = np.loadtxt('total_acc_y_test.txt', dtype='float32')
z_train_sig = np.loadtxt('total_acc_z_train.txt', dtype='float32')
z_test_sig = np.loadtxt('total_acc_z_test.txt', dtype='float32')
import numba      


@numba.jit
def magnitude(all_sig):
  mag_train = []
  for i in range(len(all_sig[0])):
    mag_row= []
    for j in range(len(all_sig[0][i])):
        mag_row.append(np.sqrt(all_sig[0][i,j]**2 + all_sig[1][i,j]**2 + all_sig[2][i,j]**2))
    mag_train.append(mag_row)
  
  return np.array(mag_train)

## Data Preparation
@numba.jit
def extract_sig(dir, sig_name, mag=True):
  feat = pd.DataFrame()
  all_sig = []
  for idx_d, d in enumerate(dir):
    sig = np.loadtxt(d, dtype='float32')
    XMag_test = tslib.extract_features(sig, sig_name[idx_d], cfg_file, segment=False)
    feat = pd.concat([feat, XMag_test], axis=1)
    if mag:
      all_sig.append(sig)
      if idx_d == 2 or idx_d == 5 or idx_d == 8:
        _mag = magnitude(all_sig)
        feat_mag = tslib.extract_features(_mag, sig_name[idx_d+1], cfg_file, segment=False)
        feat = pd.concat([feat, feat_mag], axis=1)
        all_sig=[]
  return feat


dir = ['total_acc_x_train.txt', 'total_acc_y_train.txt', 'total_acc_z_train.txt', 
       'body_acc_x_train.txt', 'body_acc_y_train.txt', 'body_acc_z_train.txt',
      'body_gyro_x_train.txt', 'body_gyro_y_train.txt', 'body_gyro_z_train.txt']
sig_name = ['tot_x', 'tot_y', 'tot_z','tot_mag', 'body_x', 'body_y', 'body_z', 'body_mag',
           'gyr_x', 'gyr_y', 'gyr_z', 'gyr_mag']
X_train = extract_sig(dir, sig_name, mag=True)

dir = ['total_acc_x_test.txt', 'total_acc_y_test.txt', 'total_acc_z_test.txt', 
       'body_acc_x_test.txt', 'body_acc_y_test.txt', 'body_acc_z_test.txt',
      'body_gyro_x_test.txt', 'body_gyro_y_test.txt', 'body_gyro_z_test.txt']
X_test = extract_sig(dir, sig_name, mag=True)

## Feature Selection
# Concatenation of entire data
names = [Xt.replace(" ", "_") for Xt in X_train.columns]
features = pd.concat([X_train, X_test], names=names)
# Highly correlated features are removed
#features = tslib.correlation_report(features)
X_train = features[:len(X_train)]
X_test = features[len(X_train):]

# Classification
classifier = DecisionTreeClassifier()
# Train the classifier
classifier.fit(X_train, y_train.ravel())
# Predict test data
y_test_predict = classifier.predict(X_test)

# Get the classification report
accuracy = accuracy_score(y_test, y_test_predict)*100
print(classification_report(y_test, y_test_predict))
print("Accuracy: " + str(accuracy) + '%')

*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Feature extraction started ***
*** Feature extraction finished ***
*** Fe

In [0]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn import svm
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.model_selection import cross_val_score

# Finds best supervised learning classifier
classifier = tslib.find_best_slclassifier(features, labels, X_train, X_test, y_train, y_test)

# Feature Selection
FS_X_train, FS_X_test, FS_lab_description = tslib.FSE(X_train, X_test, y_train, y_test, list(X_train.columns), classifier)
    
# Train the classifier
classifier.fit(FS_X_train, y_train.ravel())

# Predict test data
y_test_predict = classifier.predict(FS_X_test)

# Get the classification accuracy
accuracy = accuracy_score(y_test, y_test_predict)*100
scores = cross_val_score(c, np.concatenate([FS_X_train,FS_X_test]), np.concatenate([y_train, y_test]), cv=10)
print("Accuracy: %0.3f (+/- %0.3f)" % (scores.mean(), scores.std() * 2))
print("Accuracy: " + str(accuracy) + '%')

Nearest Neighbors
Accuracy: 0.786 (+/- 0.051)
Accuracy: 78.79199185612488%
-----------------------------------------
Decision Tree
Accuracy: 0.852 (+/- 0.065)
Accuracy: 84.45877163216831%
-----------------------------------------
Random Forest
Accuracy: 0.820 (+/- 0.050)
Accuracy: 79.74211062097048%
-----------------------------------------
SVM
