# Real Time Gait Asymmetry Detection  

In this notebook we are going to achieve the following:

1. **Create the Training Dataset**: We will preprocess and combine data from different sources to create a comprehensive training dataset for real time gait asymmetry detection.
2. **Feature Selection and Dimensionality Reduction**: We will identify the most relevant features for real time gait asymmetry detection using dimensionality reduction techniques.
3. **Model Evaluation**: We will test and compare the performance of multiple machine learning and deep learning algorithms for real time gait asymmetry detection.

In [None]:
# Libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import GroupKFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
from feature_extraction import generate_rolling_windows
from data_preprocessing import clean_extra_files, detection_merge_subject_features, detection_merge_raw_npz_files, detection_merge_csv_datasets, detection_merge_npz_datasets

# Constants
base_dir    = 'Data'
healthy_dir = 'Data/Healthy'
stroke_dir  = 'Data/Stroke'

detection_time_domain_name = 'detection_time_domain.csv'
detection_asymmetry_name   = 'detection_asymmetry.csv'

## Data Preprocessing, Feature Selection and Dimensionality Reduction

The first step is to create the dataset for the gait asymmetry and extract meaningful features from raw data. The goal is to create a set of different datasets that will provide a holistic view and different insights that we can use with the aim of creating a model that will provide a real time gait asymmetry detection. The process here includes the following steps:
1. Create a new base dataset that includes data in 2 second windows and a 1 second stride.
2. Extract simple statistic features from the 2 second signal like max, min and std. The simple statistics proved to be meaningful information for the classification task.
3. Extract a set of simple symmetry and asymmetry features from swing and stride times.

The data provided were designed for classification, since we have only a label for the whole signal as either 'Stroke Patient' or 'Healthy Subject'. In order to create the new real time asymmetry detection model we need to ensure that we have the correct labels. According to [Physio-pedia](https://www.physio-pedia.com/The_Gait_Cycle) the average gait cycle of a healthy adult is [1.2, 1.8] seconds. Taking into consideration the fact that impaired patients tend to have a longer stride we choose to use a 2 second window with a 1 second stride.

In [None]:
# # Create the new datasets for the healthy subjects

# # 1. Create all the feature files for the statistical, asymmetry and raw data 
# for patient_id in os.listdir(healthy_dir):
#     patient_path = os.path.join(healthy_dir, patient_id)
    
#     if not os.path.isdir(patient_path) or patient_path.__contains__('.DS_Store'):
#         continue
    
#     try:
#         generate_rolling_windows(patient_path)
#     except Exception as e:
#         print(f'Error processing {patient_id}: {e}')
        
# # 2. Merge all type of datasets into a single file for better training
# detection_merge_subject_features(healthy_dir, detection_time_domain_name, detection_time_domain_name)
# detection_merge_subject_features(healthy_dir, detection_asymmetry_name, detection_asymmetry_name)
# detection_merge_raw_npz_files(healthy_dir)

In [None]:
# # Create the new datasets for the stroke patients

# # 1. Create all the feature files for the statistical, asymmetry and raw data 
# for patient_id in os.listdir(stroke_dir):
#     patient_path = os.path.join(stroke_dir, patient_id)
    
#     if not os.path.isdir(patient_path) or patient_path.__contains__('.DS_Store'):
#         continue
    
#     try:
#         generate_rolling_windows(patient_path)
#     except Exception as e:
#         print(f'Error processing {patient_id}: {e}')
        
# # 2. Merge all type of datasets into a single file for better training
# detection_merge_subject_features(stroke_dir, detection_time_domain_name, detection_time_domain_name)
# detection_merge_subject_features(stroke_dir, detection_asymmetry_name, detection_asymmetry_name)
# detection_merge_raw_npz_files(stroke_dir)

In [None]:
# # Merge the healthy subjects and stroke patients to a single file
# detection_merge_csv_datasets(healthy_dir, stroke_dir, detection_time_domain_name)
# detection_merge_csv_datasets(healthy_dir, stroke_dir, detection_asymmetry_name)
# detection_merge_npz_datasets(base_dir)

## Gait Asymmetry Detection — Time-Domain Feature Model

This part of the notebook trains a binary classifier to detect gait asymmetry using time-domain features extracted from rolling IMU windows.

In [None]:
# Load Dataset
df = pd.read_csv("detection_time_domain.csv")

# Filter for Gait Windows Only (label 0 or 1)
df_gait = df[df['label_moderate'].isin([0, 1])].copy()

# Define Features and Labels
feature_cols = [
    'gyro-right-z-axis-max', 'gyro-left-z-axis-max',
    'gyro-right-z-axis-min', 'gyro-left-z-axis-min',
    'accel-right-z-axis-max', 'accel-left-z-axis-max',
    'accel-right-z-axis-min', 'accel-left-z-axis-min'
]

X = df_gait[feature_cols].values
y = df_gait['label_moderate'].values
groups = df_gait['patient_id'].values

# Normalize Features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Cross-Validation Setup
gkf = GroupKFold(n_splits=5)
model = RandomForestClassifier(n_estimators=100, random_state=8)

# Training and Evaluation
y_true_all, y_pred_all = [], []

for fold, (train_idx, test_idx) in enumerate(gkf.split(X_scaled, y, groups)):
    print(f"Fold {fold + 1}")
    model.fit(X_scaled[train_idx], y[train_idx])
    y_pred = model.predict(X_scaled[test_idx])
    
    y_true_all.extend(y[test_idx])
    y_pred_all.extend(y_pred)
    
    print(classification_report(y[test_idx], y_pred, digits=3))

# Confusion Matrix
cm = confusion_matrix(y_true_all, y_pred_all)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Symmetric", "Asymmetric"], yticklabels=["Symmetric", "Asymmetric"])
plt.title("Overall Confusion Matrix (5-fold Group CV)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()

## 2. Stride times Asymmetry values

## 3. Raw data values

In [None]:
clean_extra_files(healthy_dir)
clean_extra_files(stroke_dir)