# Human Activity Recognition (HAR) with MLP

This notebook demonstrates how to load and explore the UCI HAR dataset, build and train a simple Multi-Layer Perceptron (MLP) model for activity recognition, and evaluate its performance.

### 1. Load the dataset

In [11]:
import pandas as pd
import numpy as np

# Define the file path in your own pc
dataset_path = '/content/drive/MyDrive/HAR prepocessed dataset/COEN498-691_HAR_preprocessed_dataset.csv'

# Load the dataset
df = pd.read_csv(dataset_path)

# Display the shape of the loaded dataframe
print("Shape of the dataset:", df.shape)

# Display the first few rows of the dataset
print("\nFirst 5 rows of the dataset:")
display(df.head())

Shape of the dataset: (4751, 39)

First 5 rows of the dataset:


Unnamed: 0,ax_mean,ax_std,ax_max,ax_min,ax_range,ax_skew,ax_kurt,ax_zcr,ay_mean,ay_std,...,ayG_mean,azG_mean,Gx,Gy,Gz,Gx_angle,Gy_angle,Gz_angle,activity_id,participant_id
0,-0.00204,0.007937,0.013687,-0.024382,0.038069,-0.428371,1.735067,0,1.8e-05,0.003702,...,-1.023843,0.187873,0.065561,-0.981462,0.180096,1.505189,2.948741,1.389712,1,LL
1,0.00489,0.009201,0.032755,-0.011421,0.044177,1.177575,2.846603,0,0.001896,0.004891,...,-1.023652,0.190706,0.069292,-0.980722,0.182708,1.501449,2.944921,1.387056,1,LL
2,-0.010652,0.043875,0.070725,-0.110145,0.18087,-0.53521,0.442954,6,-0.003511,0.019515,...,-1.023306,0.193795,0.07318,-0.979901,0.185575,1.497551,2.940762,1.384139,1,LL
3,-0.006527,0.066108,0.150761,-0.110145,0.260906,0.463284,-0.030079,9,0.000702,0.028293,...,-1.022801,0.197092,0.077102,-0.979012,0.188654,1.493618,2.936354,1.381005,1,LL
4,0.009423,0.05573,0.150761,-0.108607,0.259368,0.306438,0.663858,8,0.002152,0.026648,...,-1.022139,0.200553,0.080937,-0.97807,0.191906,1.489771,2.931781,1.377692,1,LL




### 2. Explore the dataset

In [12]:
# Check data types and missing values in the dataframe
print("Dataset Info:")
df.info()

# Check for missing values in the dataframe
print("\nMissing values in the dataset:", df.isnull().sum().sum())

# Check the distribution of activities in the dataset
print("\nActivity distribution in the dataset:")
display(df['activity_id'].value_counts())

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4751 entries, 0 to 4750
Data columns (total 39 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ax_mean         4751 non-null   float64
 1   ax_std          4751 non-null   float64
 2   ax_max          4751 non-null   float64
 3   ax_min          4751 non-null   float64
 4   ax_range        4751 non-null   float64
 5   ax_skew         4751 non-null   float64
 6   ax_kurt         4751 non-null   float64
 7   ax_zcr          4751 non-null   int64  
 8   ay_mean         4751 non-null   float64
 9   ay_std          4751 non-null   float64
 10  ay_max          4751 non-null   float64
 11  ay_min          4751 non-null   float64
 12  ay_range        4751 non-null   float64
 13  ay_skew         4751 non-null   float64
 14  ay_kurt         4751 non-null   float64
 15  ay_zcr          4751 non-null   int64  
 16  az_mean         4751 non-null   float64
 17  az_std          475

Unnamed: 0_level_0,count
activity_id,Unnamed: 1_level_1
1,1205
4,1196
3,1184
2,1166


### 3. Prepare the data

In [13]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split

# Separate features (X) and labels (y)
# Assuming the last two columns are 'activity_id' and 'participant_id'
X = df.drop(['activity_id', 'participant_id'], axis=1)
y = df['activity_id']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# One-hot encode the labels
encoder = OneHotEncoder(sparse_output=False)
y_train_encoded = encoder.fit_transform(y_train.values.reshape(-1, 1))
y_test_encoded = encoder.transform(y_test.values.reshape(-1, 1))

print("Shape of X_train_scaled:", X_train_scaled.shape)
print("Shape of y_train_encoded:", y_train_encoded.shape)
print("Shape of X_test_scaled:", X_test_scaled.shape)
print("Shape of y_test_encoded:", y_test_encoded.shape)

Shape of X_train_scaled: (3800, 37)
Shape of y_train_encoded: (3800, 4)
Shape of X_test_scaled: (951, 37)
Shape of y_test_encoded: (951, 4)


### 4. Build the MLP model

In [14]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Define the number of features and classes
n_features = X_train_scaled.shape[1]
n_classes = y_train_encoded.shape[1]

# Build the MLP model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(n_features,)))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Display the model summary
model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### 5. Train the model

In [15]:
# Train the model
history = model.fit(X_train_scaled, y_train_encoded, epochs=50, batch_size=32, validation_split=0.2)

Epoch 1/50
[1m95/95[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6527 - loss: 0.9068 - val_accuracy: 0.9974 - val_loss: 0.0355
Epoch 2/50
[1m95/95[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9774 - loss: 0.0997 - val_accuracy: 0.9987 - val_loss: 0.0124
Epoch 3/50
[1m95/95[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9894 - loss: 0.0525 - val_accuracy: 0.9974 - val_loss: 0.0111
Epoch 4/50
[1m95/95[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9917 - loss: 0.0330 - val_accuracy: 0.9974 - val_loss: 0.0105
Epoch 5/50
[1m95/95[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.9946 - loss: 0.0244 - val_accuracy: 0.9987 - val_loss: 0.0107
Epoch 6/50
[1m95/95[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.9985 - loss: 0.0144 - val_accuracy: 0.9974 - val_loss: 0.0119
Epoch 7/50
[1m95/95[0m [32m━━━━━━━━━━

### 6. Evaluate the model

In [22]:
from sklearn.metrics import classification_report, confusion_matrix

# Evaluate the model on the test data
loss, accuracy = model.evaluate(X_test_scaled, y_test_encoded, verbose=0)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

# Predict the activity labels for the test set
y_pred = model.predict(X_test_scaled)
y_pred_classes = np.argmax(y_pred, axis=1)
y_test_classes = np.argmax(y_test_encoded, axis=1)

print(y_pred, y_pred_classes, y_test_classes)

# Generate classification report
print("\nClassification Report:")

print(classification_report(y_test_classes, y_pred_classes))

# Generate confusion matrix
print("\nConfusion Matrix:")
display(confusion_matrix(y_test_classes, y_pred_classes))

Test Loss: 0.0032
Test Accuracy: 0.9989
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step
[[0.0000000e+00 0.0000000e+00 9.9999994e-01 1.6910529e-32]
 [2.5686865e-07 9.9999875e-01 9.7447673e-07 4.8975242e-09]
 [9.7143951e-38 2.6283655e-37 9.9999994e-01 2.4632521e-28]
 ...
 [3.3282436e-25 1.5334740e-23 1.0000000e+00 4.0788886e-19]
 [1.2511656e-22 6.9068080e-28 8.8979033e-19 1.0000000e+00]
 [4.1594875e-23 6.3106609e-28 1.3822409e-18 1.0000000e+00]] [2 1 2 2 3 2 3 2 1 0 0 0 1 1 1 2 3 1 1 1 3 1 3 3 0 1 0 2 2 1 2 2 2 3 3 1 3
 2 0 2 2 3 3 0 1 0 0 3 2 0 0 3 0 2 1 1 2 0 0 2 3 2 0 0 2 1 2 0 1 3 3 2 2 3
 3 1 1 2 3 2 1 0 3 3 1 2 3 0 2 1 0 2 2 3 0 3 0 1 3 0 3 3 3 1 3 0 2 2 1 0 3
 2 3 3 2 3 0 1 1 1 0 0 1 2 1 1 3 3 0 1 2 1 1 3 3 2 2 0 3 0 3 2 1 0 3 1 3 0
 0 2 1 1 2 1 2 1 3 0 1 0 0 2 0 2 2 0 3 1 1 3 2 2 3 3 3 3 2 0 2 2 0 2 1 1 2
 0 2 2 1 3 3 0 3 1 3 0 2 2 2 1 3 2 3 3 2 3 3 3 0 3 3 3 2 0 1 3 1 2 3 3 0 3
 2 2 0 1 3 2 3 0 1 3 1 1 2 2 3 0 1 2 2 2 0 0 3 3 2 3 3 3 1 0 1 2 1 0 1 1 3
 1

array([[241,   0,   0,   0],
       [  1, 232,   0,   0],
       [  0,   0, 237,   0],
       [  0,   0,   0, 240]])

### 7. Finish task

The notebook successfully loaded the HAR dataset, built and trained an MLP model, and evaluated its performance. The classification report and confusion matrix provide insights into the model's accuracy for each activity class.