# HUMAN ACTIVITY RECOGNITION

This repository consists of Implementation of sequence based classification on UCI-HAR Dataset for human activity recognition.

Multi class sequence classification problem, (with inputs from sensing signals from accelerometer and gyroscope).

Input: Feature Information in the dataset include:
* 3-axial linear acceleration
* 3-axial angular velocity


Output classes:
* walking
* walking upstairs
* walking downstairs
* sitting
* standing
* laying



## Implement learning based models to classify the sequential data provided

In this subtask, a detailed explanation for the following steps are provided

1. Importing the data
2. Processing the data
3. Building the below mentioned models to classify the sequential data provided in the dataset:


*   Simple RNN
*   LSTM
*   Bi-directional recurrent network
*   CNN
*   Reinforcement learning


#### Importing the data

Initially we import the required libraries to import and process the data

In [None]:
#Importing necessary libraries to import and process the data
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook
from pandas import Series, DataFrame, read_csv

from numpy import mean,std,dstack
from keras.utils import to_categorical

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


The provided code segment loads training and testing data for linear acceleration and angular velocity from respective text files using Pandas, organizing them into numpy arrays.


In [None]:
#loading the training linear acceleration data
train_dir= '/content/drive/MyDrive/UCI HAR Dataset/train/Inertial Signals'
x_acc_train = pd.read_csv(train_dir+'/body_acc_x_train.txt', header=None, delim_whitespace=True)
y_acc_train = pd.read_csv(train_dir+'body_acc_y_train.txt', header=None, delim_whitespace=True)
z_acc_train = pd.read_csv(train_dir+'body_acc_z_train.txt', header=None, delim_whitespace=True)

In [None]:
#loading the training angular velocity data
x_gyro_train = pd.read_csv(train_dir+'/body_gyro_x_train.txt', header=None, delim_whitespace=True)
y_gyro_train = pd.read_csv(train_dir+'/body_gyro_y_train.txt', header=None, delim_whitespace=True)
z_gyro_train = pd.read_csv(train_dir+'/body_gyro_z_train.txt', header=None, delim_whitespace=True)

In [None]:
#loading the testing linear acceleration data
test_dir= '/content/drive/MyDrive/UCI HAR Dataset/test/Inertial Signals'
x_acc_test = pd.read_csv(test_dir+'/body_acc_x_test.txt', header=None, delim_whitespace=True)
y_acc_test = pd.read_csv(test_dir+'/body_acc_y_test.txt', header=None, delim_whitespace=True)
z_acc_test = pd.read_csv(test_dir+'/body_acc_z_test.txt', header=None, delim_whitespace=True)

In [None]:
#loading the testing angular velocity data
x_gyro_test = pd.read_csv(test_dir+'/body_gyro_x_test.txt', header=None, delim_whitespace=True)
y_gyro_test = pd.read_csv(test_dir+'/body_gyro_y_test.txt', header=None, delim_whitespace=True)
z_gyro_test = pd.read_csv(test_dir+'/body_gyro_z_test.txt', header=None, delim_whitespace=True)

In [None]:
#laoding the output training data
Y_train= pd.read_csv('/content/drive/MyDrive/CS5062_AssessmentII_Dataset/train/y_train.txt', header=None, delim_whitespace=True)

In [None]:
# loading the output test data
Y_test= pd.read_csv('/content/drive/MyDrive/CS5062_AssessmentII_Dataset/test/y_test.txt', header=None, delim_whitespace=True)

In [None]:
num_categories = Y_train[0].nunique()
print("Number of categories in Y_train:", num_categories)


Number of categories in Y_train: 6


#### Processing the data

The raw imported data is stacked along the second axis to create input sequences. Specifically, `X_train` and `X_test` are created as numpy arrays by stacking the data arrays along axis 2.

The shape of `X_train` is (7352, 128, 6), signifying 7352 samples, each with a sequence length of 128 time steps and 6 features.


In [None]:
import numpy as np

# The vectors are stacked along the second axis (axis=2) to create a single input sequence.
X_train = np.stack((x_acc_train, y_acc_train, z_acc_train, x_gyro_train, y_gyro_train, z_gyro_train), axis=2)
X_test = np.stack((x_acc_test, y_acc_test, z_acc_test, x_gyro_test, y_gyro_test, z_gyro_test), axis=2)
# The shape of X_train(input_sequence) will be (7352,128,6).

The output raw data is processed to adjust class labels by subtracting 1 from them (zero-offsetting) for both training and testing data. Then, it performs one-hot encoding on these adjusted labels using the `to_categorical` function, converting them into binary vectors.

In [None]:
# zero-offset class values
Y_train = Y_train - 1
Y_test = Y_test - 1
# one hot encode y
Y_train = to_categorical(Y_train)
Y_test = to_categorical(Y_test)
print(X_train.shape, Y_train.shape, X_test.shape, Y_test.shape)

(7352, 128, 6) (7352, 6) (2947, 128, 6) (2947, 6)


#### Building model(s)
Using keras, a popular deep learning library, we build the following models
1. Simple RNN model
2. Bidirectional RNN model
3. LSTM model
4. Convolutional LSTM model

The code below  has Simple RNN, Bidirectional RNN, LSTM and CNN developed in detail.
The recurrent neural networks are a suitable choice for human activity recognition, as it can capture temporal dependencies in the data, such as sequences of linear acceleration and angular velocity for the given data, facilitating accurate classification.

All the RNN models defined in this task, leverage their ability to model sequential information.


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.layers import SimpleRNN, Flatten

In [None]:
# define an LSTM model
def lstm_model(X_train, Y_train, X_test, Y_test):
 n_timesteps, n_features, n_outputs = X_train.shape[1], X_train.shape[2], Y_train.shape[1]
 model = Sequential()
 model.add(LSTM(128, input_shape=(n_timesteps,n_features)))
 model.add(Dropout(0.5))
 model.add(Dense(100, activation='relu'))
 model.add(Dense(n_outputs, activation='softmax'))
 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

In [None]:
# define an simpleRNN
def simpleRNN_model(X_train, Y_train, X_test, Y_test):
  model = Sequential()
  n_timesteps, n_features, n_outputs = X_train.shape[1], X_train.shape[2], Y_train.shape[1]
  inputs = np.random.random([64, 128, 6]).astype(np.float32)
  model.add((SimpleRNN(128, return_sequences=True)))
  model.add(Dropout(0.5))
  model.add(Flatten())  # Flatten before dense layers
  model.add(Dense(100, activation='relu'))
  model.add(Dense(n_outputs, activation='softmax'))
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

In [None]:
# define an BidirectionalRNN
def bidirectionalRNN():
  model = Sequential()
  # Define the model
  model = Sequential([
      Bidirectional(LSTM(128, return_sequences=True)),
      Dropout(0.5),
      Flatten(),
      Dense(100, activation='relu'),
      Dense(6, activation='softmax')
  ])
  # Compile the model
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

In [None]:
# fit network
def fit_model(model):
  verbose, epochs, batch_size = 0, 25, 64
  history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=verbose)
  model.save('/content/drive/MyDrive/model.h5')
  return history, model

In [None]:
# calling an LSTM model
model = lstm_model(X_train, Y_train, X_test, Y_test)
# other models can also be called by uncommenting the below lines
#model = simpleRNN_model(X_train, Y_train, X_test, Y_test)
#model = bidirectionalRNN()
history, model = fit_model(model)

  saving_api.save_model(


###Convolutional LSTM 2D

The `convLSTM_model` function defines a Convolutional LSTM (ConvLSTM) model for sequence data classification. It reshapes the input data and includes layers such as ConvLSTM2D, Dropout, Flatten, and Dense. The model is compiled with categorical cross-entropy loss and the Adam optimizer.

In [None]:
# Defining the convolutional LSTM model
from keras.layers import ConvLSTM2D
import matplotlib.pyplot as plt

def convLSTM_model(X_train, Y_train, X_test, Y_test):
 # define model
 verbose, epochs, batch_size = 0, 25, 64
 n_timesteps, n_features, n_outputs = X_train.shape[1], X_train.shape[2], Y_train.shape[1]
 # reshape into subsequences (samples, time steps, rows, cols, channels)
 n_steps, n_length = 4, 32
 X_train = X_train.reshape((X_train.shape[0], n_steps, 1, n_length, n_features))
 X_test = X_test.reshape((X_test.shape[0], n_steps, 1, n_length, n_features))
 print(X_train.shape)
 print(X_test.shape)
 # define model
 model = Sequential()
 model.add(ConvLSTM2D(filters=128, kernel_size=(1,3), activation='relu', input_shape=(n_steps, 1, n_length, n_features)))
 model.add(Dropout(0.5))
 model.add(Flatten())
 model.add(Dense(100, activation='relu'))
 model.add(Dense(n_outputs, activation='softmax'))
 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
 #history = convL_model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=verbose, callbacks=[history])
 #history = History()
 history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=verbose)
 y_pred = model.predict(X_test)
 y_pred_classes = np.argmax(y_pred, axis=1)
 y_true = np.argmax(Y_test, axis=1)
 # Accuracy
 accuracy = accuracy_score(y_true, y_pred_classes)
 print("Accuracy: %.3f" % accuracy)
 # Precision, recall, and classification report
 report = classification_report(y_true, y_pred_classes)
 print("Classification Report:\n", report)
 plt_conf_matrix(Y_test, y_pred)
 # AUROC
 auroc = roc_auc_score(Y_test, y_pred, multi_class='ovr')
 print("AUROC: %.3f" % auroc)
 #plt_auroc(y_true, y_pred_classes)
 # Plot training history
 plot_training(history)


In [None]:
convLSTM_model(X_train, Y_train, X_test, Y_test)

(7352, 4, 1, 32, 6)
(2947, 4, 1, 32, 6)
Accuracy: 0.809
Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.62      0.76       496
           1       0.60      0.87      0.71       471
           2       0.74      0.85      0.79       420
           3       0.87      0.81      0.84       491
           4       0.79      0.75      0.77       532
           5       1.00      0.95      0.97       537

    accuracy                           0.81      2947
   macro avg       0.83      0.81      0.81      2947
weighted avg       0.84      0.81      0.81      2947



<IPython.core.display.Javascript object>

AUROC: 0.951


<IPython.core.display.Javascript object>

## Evaluation of training process and metrics:

To evaluate the models performance we consider the following:
We initially define the methods:
* To show the intermediate results of the training process, we plot the training loss and
training accuracy
* Confusion matrix of each model
* Following three metrics to report the model's performance:
  1. Precision Recall
  2. Accuracy
  3. Area under the curve (AUROC)

All of the methods are called by the method result Metrics




In [None]:
# Plotting training and loss curves
from keras.callbacks import History
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_auc_score
import numpy as np
import matplotlib.pyplot as plt

def plot_training(history):
  # Plotting the training loss and accuracy curves
  plt.figure(figsize=(12, 6))

  # Training loss curve
  plt.subplot(1, 2, 1)
  plt.plot(history.history['loss'], label='Training Loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()
  plt.savefig('Training_loss.png')

  # Training accuracy curve
  plt.subplot(1, 2, 2)
  plt.plot(history.history['accuracy'], label='Training Accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Accuracy')
  plt.legend()
  plt.savefig('Training_accuracy.png')

  plt.show()


In [None]:
# Activities are the class labels
# It is a 6 class classification
ACTIVITIES = {
    0: 'WALKING',
    1: 'WALKING_UPSTAIRS',
    2: 'WALKING_DOWNSTAIRS',
    3: 'SITTING',
    4: 'STANDING',
    5: 'LAYING',
}
#Y_pred = model.predict(X_test)
Y_pred = pd.Series([ACTIVITIES[y] for y in np.argmax(Y_pred, axis=1)])
Y_true = pd.Series([ACTIVITIES[y] for y in np.argmax(Y_test, axis=1)])

In [None]:
# Confusion metrics
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming Y_pred and Y_test are Pandas Series or NumPy arrays
# Convert them to a list or array if they are not already

def plt_conf_matrix(Y_true, Y_pred):
  # Calculate the confusion matrix
  Y_pred = pd.Series([ACTIVITIES[y] for y in np.argmax(Y_pred, axis=1)])
  Y_true = pd.Series([ACTIVITIES[y] for y in np.argmax(Y_test, axis=1)])
  cm = confusion_matrix(Y_true, Y_pred)

  # Create a DataFrame for visualization using Seaborn
  cm_df = pd.DataFrame(cm, index=Y_true.unique(), columns=Y_true.unique())

  # Create a heatmap of the confusion matrix
  plt.figure(figsize=(10, 8))
  sns.heatmap(cm_df, annot=True, fmt='d', cmap='Blues')
  plt.xlabel('Predicted')
  plt.ylabel('True')
  plt.title('Confusion Matrix')
  plt.show()
  plt.savefig('confusion_matrix.png')

In [None]:
#AUROC
from sklearn.metrics import roc_curve, auc

def plt_auroc(y_true, y_pred, n_classes):
    fpr = dict()
    tpr = dict()
    roc_auc = dict()

    # Calculate one-vs-all ROC curve and AUC for each class
    for i in range(n_classes):
      fpr[i], tpr[i], _ = roc_curve(y_true == i, y_pred[:, i])
      roc_auc[i] = auc(fpr[i], tpr[i])

    # Plot the ROC curves
    plt.figure(figsize=(8, 6))
    colors = ['b', 'g', 'r', 'c', 'm', 'y']  # Colors for each class

    for i in range(n_classes):
        plt.plot(fpr[i], tpr[i], color=colors[i], lw=2, label=f'Class {i} (AUC = {roc_auc[i]:.2f})')

    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('One-vs-All ROC Curves')
    plt.legend(loc='lower right')
    plt.savefig('AUROC.png')
    plt.show()




## Result Metrics

The `resultMetrics` function is designed to evaluate the performance of a classification model on the test set. It calculates and displays the following metrics:

* Accuracy: Calculated using `accuracy_score`.
* Precision, recall, and classification report: Generated using `classification_report`.
* Confusion matrix: Visualized using `plt_conf_matrix`.
* Area under the Receiver Operating Characteristic (AUROC): Computed for a one-vs-all scenario using `roc_auc_score`.
* AUROC curves: Plotted with `plt_auroc` for a multi-class problem.
* Training history: Displayed using `plot_training`.

This function provides a comprehensive assessment of the model's performance on the test data, including classification metrics, AUROC, and training history.

In [None]:
# Predict on the test set
def resultMetrics(model, history):
  y_pred = model.predict(X_test)
  y_pred_classes = np.argmax(y_pred, axis=1)
  y_true = np.argmax(Y_test, axis=1)
  # Accuracy
  accuracy = accuracy_score(y_true, y_pred_classes)
  print("Accuracy: %.3f" % accuracy)
  # Precision, recall, and classification report
  report = classification_report(y_true, y_pred_classes)
  print("Classification Report:\n", report)
  plt_conf_matrix(Y_test, y_pred)
  # AUROC
  auroc = roc_auc_score(Y_test, y_pred, multi_class='ovr')
  print("AUROC: %.3f" % auroc)
  plt_auroc(y_true, y_pred,6)
  # Plot training history
  plot_training(history)

In [None]:
resultMetrics(model, history)

Accuracy: 0.913
Classification Report:
               precision    recall  f1-score   support

           0       0.90      0.94      0.92       496
           1       0.90      0.94      0.92       471
           2       0.93      0.89      0.91       420
           3       0.88      0.84      0.86       491
           4       0.88      0.86      0.87       532
           5       0.99      1.00      0.99       537

    accuracy                           0.91      2947
   macro avg       0.91      0.91      0.91      2947
weighted avg       0.91      0.91      0.91      2947



<IPython.core.display.Javascript object>

AUROC: 0.987


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>