# Human Activity Recognition Analysis

This dataset consists of sensor data from smartphones that can be used to predict the activity a person is engaged in, such as walking, sitting, or standing. Our main objective is to build and evaluate machine learning models to predict human activities accurately based on smartphone sensor data.

The dataset is divided into two main files: `train.csv` and `test.csv`, with a 7:3 ratio for training and testing. 

## Data Exploration

### Import libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn import preprocessing


pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

In [None]:
train_data = pd.read_csv("datasets/train.csv")
test_data = pd.read_csv("datasets/test.csv")


train_data.head()

### Pre-processing

In [None]:
# Shape

print(f'Shape of Train data: {train_data.shape}')
print(f'Shape of Test data: {test_data.shape}')

In [None]:
# Missing values

print(f'Missing values in Train data: {train_data.isnull().values.sum()}')
print(f'Missing values in Test data: {test_data.isnull().values.sum()}')

In [None]:
train_data.columns

In [None]:
# Target
pd.crosstab(index=train_data["Activity"],  # Make a crosstab
                  columns="count")         # Name the count column



#### Feature Scaling

In [None]:
# Separate features and target
X_train = pd.DataFrame(train_data.drop(['Activity','subject'],axis=1))
y_train = train_data.Activity.values.astype(object)

X_test = pd.DataFrame(test_data.drop(['Activity','subject'],axis=1))
y_test= test_data.Activity.values.astype(object)

print(X_train.shape)
print(y_train.shape)

In [None]:
# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Model selection & evaluation

Evaluate classifier models performance on the dataset:

- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
- k-Nearest Neighbors (KNN)

#### Decision Tree

In [None]:
dtc_model = DecisionTreeClassifier()
dtc_model.fit(X_train, y_train)

#### Random Forest

In [None]:
rf_model = RandomForestClassifier()
rf_model.fit(X_train, y_train)

#### SVM

In [None]:
svm_model = SVC()
svm_model.fit(X_train, y_train)

#### KNN

In [None]:
knn_model = KNeighborsClassifier()
knn_model.fit(X_train, y_train)

### Evaluate models

In [None]:
def evaluate_model(model, X, y):
    y_pred = model.predict(X)
    
    accuracy = accuracy_score(y, y_pred)
    precision = precision_score(y, y_pred, average='weighted')
    recall = recall_score(y, y_pred, average='weighted')
    f1 = f1_score(y, y_pred, average='weighted')
    
    cm = confusion_matrix(y, y_pred)

    return accuracy, precision, recall, f1, cm

dt_accuracy, dt_precision, dt_recall, dt_f1, dt_cm = evaluate_model(dtc_model, X_test, y_test)
rf_accuracy, rf_precision, rf_recall, rf_f1, rf_cm = evaluate_model(rf_model, X_test, y_test)
svm_accuracy, svm_precision, svm_recall, svm_f1, svm_cm = evaluate_model(svm_model, X_test, y_test)
knn_accuracy, knn_precision, knn_recall, knn_f1, knn_cm = evaluate_model(knn_model, X_test, y_test)

In [None]:
# Compare Model Performance
model_performances = {
    "Decision Tree": (dt_accuracy, dt_precision, dt_recall, dt_f1, dt_cm),
    "Random Forest": (rf_accuracy, rf_precision, rf_recall, rf_f1, rf_cm),
    "Support Vector Machine": (svm_accuracy, svm_precision, svm_recall, svm_f1, svm_cm),
    "K-Nearest Neighbors": (knn_accuracy, knn_precision, knn_recall, knn_f1, knn_cm)
}

print(f"Model Comparison:")
for model_name, (accuracy, precision, recall, f1, _) in model_performances.items():
    print(f"{model_name}:")
    print(f"  Accuracy: {accuracy:.4f}")
    print(f"  Precision: {precision:.4f}")
    print(f"  Recall: {recall:.4f}")
    print(f"  F1-score: {f1:.4f}")
    print()


## Key Findings

The model that best suits the main objective of this analysis is the Support Vector Machine(SVM) with an accuracy of 95% on the test data.

## Flaws and Plan of Action

- Potential flaws in the model may include overfitting or underfitting. To address this, we can revisit the analysis with additional data if available or experiment with different predictive modeling techniques.
- Further fine-tuning and hyperparameter optimization could be considered to improve model performance.