<a href="https://colab.research.google.com/github/Ella-Shuyan/Portfolio/blob/main/train_and_test_acc_ml.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📈 Train and Test ACC ML: An example machine learning script for human activity recognition 🤾‍♀️

Install pycaret amd shap
- Pycaret is a machine learning library
- For more information, check out the documentation https://pycaret.readthedocs.io/en/latest/
- Shap is a model explanation library
- **When installation finishes, watch for a button called 'restart runtime'**
- **Click this button then proceed to run next cells**

In [None]:
!pip install pycaret

# Data Prep

Import libraries

In [None]:
from pycaret.classification import *
import pandas as pd
import numpy as np
import plotly.express as px
from google.colab import files
import io
import logging
import sys
logging.disable(sys.maxsize)
print("Imported libraries - you may proceed to the next cell")

Read in your training data file
- This should be one of the two experiments you performed with 3 different activities
- This file will be used to train a model to identify the activities you performed from the accelerometer data

In [None]:
uploaded = files.upload()
df = pd.read_csv(list(uploaded.keys())[0].strip(""))
df.describe()

Visualize raw acceleration data by three axes over time

In [None]:
px.line(df,x='Time (s)',y = ['Acceleration x (m/s^2)', 'Acceleration y (m/s^2)', 'Acceleration z (m/s^2)'])

Visualize absolute acceleration (averge of all axes) over time

In [None]:
px.line(df, x= 'Time (s)', y = ['Absolute acceleration (m/s^2)'])

# Activity Annotation

Answer the prompts below to identify when your activities took place
- The following code will provide an input
- Fill in the input with the correct information
- Your activity name should match the activities you performed
- Your activities start and end times should match the time in seconds as presented in the graphs above or match times in seconds that you noted while performing
- **Run the cell when all information is correct**

***Activity 1***

In [None]:
Activity_1 = "Slow Walking" #@param {type:"string"}
Activity_1_Start_Time = 0 #@param {type:"number"}
Activity_1_Finish_Time = 30 #@param {type:"number"}

***Activity 2***

In [None]:
Activity_2 = "Speed Walking" #@param {type:"string"}
Activity_2_Start_Time = 31 #@param {type:"number"}
Activity_2_Finish_Time = 59 #@param {type:"number"}

***Activity 3***

In [None]:
Activity_3 = "Running" #@param {type:"string"}
Activity_3_Start_Time = 60 #@param {type:"number"}
Activity_3_Finish_Time = 78 #@param {type:"number"}

In [None]:
df['class'] = 'NAN'
df['class'] = np.where(df['Time (s)'].between(Activity_1_Start_Time,Activity_1_Finish_Time), Activity_1, 0)
df['class'] = np.where(df['Time (s)'].between(Activity_2_Start_Time,Activity_2_Finish_Time), Activity_2, df['class'])
df['class'] = np.where(df['Time (s)'].between(Activity_3_Start_Time,Activity_3_Finish_Time), Activity_3, df['class'])
df = df[df['class'] != 'NAN']
df = df[df['class'] != '0']
px.scatter(df,x = 'Time (s)', y = 'class', color = 'class')

Plot absolute acceleration over time colored by activity

In [None]:
px.scatter(df, x = 'Time (s)', y = 'Absolute acceleration (m/s^2)', color = 'class')

View the distribution of absolute acceleration by each activity

In [None]:
px.histogram(df,x='Absolute acceleration (m/s^2)', color = 'class')

Feature Extraction

In [None]:
from scipy import stats
from scipy.signal import find_peaks
import warnings
warnings.filterwarnings('ignore')

df_train = df[['Absolute acceleration (m/s^2)', 'class']]
z_list = []
train_labels = []

window_size = 50
step_size = 50

# creating windows
for i in range(0, df_train['Absolute acceleration (m/s^2)'].shape[0] - window_size, step_size):
    zs = df_train['Absolute acceleration (m/s^2)'].values[i: i + window_size]

    z_list.append(zs)
    label = stats.mode(df_train['class'][i: i + window_size])[0][0]
    train_labels.append(label)

# Statistical Features
X_train = pd.DataFrame()

# mean
X_train['resultant_mean'] = pd.Series(z_list).apply(lambda x: x.mean())

# std dev
X_train['resultant_std'] = pd.Series(z_list).apply(lambda x: x.std())

# min
X_train['resultant_min'] = pd.Series(z_list).apply(lambda x: x.min())

# max
X_train['resultant_max'] = pd.Series(z_list).apply(lambda x: x.max())

# max-min diff
X_train['resultant_maxmin_diff'] = X_train['resultant_max'] - X_train['resultant_min']

# median
X_train['resultant_median'] = pd.Series(z_list).apply(lambda x: np.median(x))

# median abs dev
X_train['resultant_mad'] = pd.Series(z_list).apply(lambda x: np.mean(np.absolute(x - np.mean(x))))


# skewness
X_train['resultant_skewness'] = pd.Series(z_list).apply(lambda x: stats.skew(x))

# kurtosis
X_train['resultant_kurtosis'] = pd.Series(z_list).apply(lambda x: stats.kurtosis(x))

df = X_train
df['class'] = train_labels
print('Features extracted - model is ready for training')
df.describe()

Display features in relation to each activity

In [None]:
pc_df = df.copy()
classes = df['class'].unique()
pc_df['class'] = pc_df['class'].replace(classes[0], 0)
pc_df['class'] = pc_df['class'].replace(classes[1], 1)
pc_df['class'] = pc_df['class'].replace(classes[2], 2)
print("Classes: " + " //CLASS 0 = " + classes[0] + " //CLASS 1 = " + classes[1] + " //CLASS 2 = " + classes[2])
pc = px.parallel_coordinates(pc_df,
                             color = 'class',title='How do the resultant relate to your classes?',
                             width=1300)
pc.show()

# Model Training

For more advanced machine learning data prep, check out the documentation:
https://pycaret.readthedocs.io/en/latest/api/classification.html

Steps for model training
1. Setup up data
2. Compare models and performance metrics
3. Evaluate model
4. Apply model to new, unseen data

In [None]:
exp = setup(df, target = 'class')

Assess multiple models to understand which perform the best with our data and saving the top model

In [None]:
best = compare_models()

In [None]:
print("Model selected with key parameters")
print(best)

Visualize the performance of the model

In [None]:
evaluate_model(best)

# Model Testing
Apply your model to unseen data
- Add your 2nd activity file
- This file should include the same three activities but may include different order or different timimg

In [None]:
TEST_FILE = files.upload()
TEST = pd.read_csv('Raw Data (1).csv')
TEST

Extract features from the test data

In [None]:
df_train = TEST[['Absolute acceleration (m/s^2)']]
z_list = []

window_size = 50
step_size = 50

# creating windows
for i in range(0, df_train['Absolute acceleration (m/s^2)'].shape[0] - window_size, step_size):
    zs = df_train['Absolute acceleration (m/s^2)'].values[i: i + window_size]

    z_list.append(zs)

# Statistical Features
X_train = pd.DataFrame()

# mean
X_train['resultant_mean'] = pd.Series(z_list).apply(lambda x: x.mean())

# std dev
X_train['resultant_std'] = pd.Series(z_list).apply(lambda x: x.std())

# min
X_train['resultant_min'] = pd.Series(z_list).apply(lambda x: x.min())

# max
X_train['resultant_max'] = pd.Series(z_list).apply(lambda x: x.max())

# max-min diff
X_train['resultant_maxmin_diff'] = X_train['resultant_max'] - X_train['resultant_min']

# median
X_train['resultant_median'] = pd.Series(z_list).apply(lambda x: np.median(x))

# median abs dev
X_train['resultant_mad'] = pd.Series(z_list).apply(lambda x: np.mean(np.absolute(x - np.mean(x))))


# skewness
X_train['resultant_skewness'] = pd.Series(z_list).apply(lambda x: stats.skew(x))

# kurtosis
X_train['resultant_kurtosis'] = pd.Series(z_list).apply(lambda x: stats.kurtosis(x))

TEST = X_train
print("Extracted features from the test dataset")
TEST.describe()

### Model Predictions on the Test Data

In [None]:
predictions = predict_model(best, data = TEST)
print("Model Predictions on the Test Dataset")
predictions

Predicted time by each activity

In [None]:
#@title
#Getting each activity label and corresponding amount of activity (/2 because there data comes every half second)
first_act_lbl = predictions['prediction_label'].unique()[0]
first_act_time = len(predictions[predictions['prediction_label'] == first_act_lbl]) / 2
second_act_lbl = predictions['prediction_label'].unique()[1]
second_act_time = len(predictions[predictions['prediction_label'] == second_act_lbl]) / 2
third_act_lbl = predictions['prediction_label'].unique()[2]
third_act_time = len(predictions[predictions['prediction_label'] == third_act_lbl]) /2
act_numbers = pd.DataFrame({'Activities': [first_act_lbl, second_act_lbl, third_act_lbl], "Time": [first_act_time, second_act_time, third_act_time]})
px.histogram(act_numbers, y= 'Time', x = 'Activities', color_discrete_sequence=["#808080", "#808080", "#808080"])

### Visualize Model Predictions

In [None]:
import matplotlib
#shap.initjs()
predictions['Time in Seconds'] = predictions.index / 2
Your_Name = "Cole" #@param {type:"string"}
title_str = Your_Name + "'s model predictions on test data"
lab = str(Your_Name) + "'s Model"

fig = px.scatter(predictions, x = 'Time in Seconds',
                 y = 'resultant_mean',
                 color = 'prediction_label',
                 title = lab,
                 color_discrete_sequence=["#F1948A", "#48C9B0", "#85C1E9"])

fig.show()