<a href="https://colab.research.google.com/github/GKnibbs/EMG_Prosthetic_Project/blob/main/Model_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Clone the EMG_Prosthetic_Project GitHub repository
This will ensure all scripts are available in the Colab environment for training and evaluation.

# Git Clone

In [None]:
# If the repo already exists, remove it to avoid conflicts (safe for Colab sessions)
!rm -rf EMG_Prosthetic_Project
# Clone the latest version of your GitHub repo
!git clone https://github.com/GKnibbs/EMG_Prosthetic_Project.git

# Data Retrieval for Google Drive

In [None]:
!apt-get install -y p7zip-full

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
p7zip-full is already the newest version (16.02+dfsg-8).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from getpass import getpass
pw = getpass("7z password: ")
!7z x '/content/drive/MyDrive/Data_Secure' -o'/content' -p"$pw"

7z password: ··········

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU @ 2.00GHz (50653),ASM,AES-NI)

Scanning the drive for archives:
  0M Scan /content/drive/MyDrive/                                 1 folder, 1 file, 839935592 bytes (802 MiB)

Extracting archive: /content/drive/MyDrive/Data_Secure/Segregated_Data.7z
--
Path = /content/drive/MyDrive/Data_Secure/Segregated_Data.7z
Type = 7z
Physical Size = 839935592
Headers Size = 456
Method = LZMA2:25 7zAES
Solid = +
Blocks = 1

  0%      0% 1 - Segregated_Data/0_REST.csv                                     1% 1 - Segregated_Data/0_REST.csv                                     2% 1 - Segregated_Data/0_REST.c

# Running Scripts

In [None]:
# Install required packages (Colab already has many pre-installed, but you can force versions if needed)
!pip install --upgrade pip
!pip install tensorflow==2.15.0 numpy pandas

In [None]:
# Generating data manifest
# This script will create a manifest file for the dataset, which is useful for training models.
!python EMG_Prosthetic_Project/Scripts/Make_Manifest.py

In [None]:
# Splitting subject IDs into train/val/test
# This script will create train_ids.txt, val_ids.txt, and test_ids.txt from random selection.
!python EMG_Prosthetic_Project/Scripts/Data_Split.py --n_subjects 40 --train_ratio 0.7 --val_ratio 0.15 --test_ratio 0.15 --out_dir EMG_Prosthetic_Project/artifacts

In [None]:
# Running Fit_Scaler.py
# This script will fit a scaler to the training data, which is essential for normalizing the input features.
# This step is crucial for ensuring that the model can learn effectively from the data.
# Prevents overfitting by scaling the data.
!python EMG_Prosthetic_Project/Scripts/Fit_Scaler.py --data_dir Segregated_Data --train_ids EMG_Prosthetic_Project/artifacts/train_ids.txt

In [None]:
# Generating TFRecords (Raw Data Method)
# This script will create TFRecord files from the dataset, which are optimized for TensorFlow training.
# TFRecords are a binary file format that allows for efficient data loading and processing in TensorFlow
!python EMG_Prosthetic_Project/Scripts/Make_TFRecords.py --data_dir Segregated_Data

In [None]:
# Generating TFRecords (Feature Vector Extraction Data Method)
# This script will create TFRecord files from the dataset, which are optimized for TensorFlow training.
# TFRecords are a binary file format that allows for efficient data loading and processing in TensorFlow
!python EMG_Prosthetic_Project/Scripts/Make_TFRecords_Features.py --data_dir Segregated_Data

In [None]:
# Generating TFRecords (Feature Vector Extraction Data Method)
# This script will create TFRecord files from the dataset, which are optimized for TensorFlow training.
# TFRecords are a binary file format that allows for efficient data loading and processing in TensorFlow
!python EMG_Prosthetic_Project/Scripts/Make_TFRecords_VirtualChannels.py --data_dir Segregated_Data

In [None]:
# Debug: Inspect a batch from the training dataset to check data and label integrity
import sys
sys.path.append('EMG_Prosthetic_Project/Scripts')
from Train_Baseline import get_dataset
import numpy as np

# Print a batch of windows and labels
for x, y in get_dataset('train', win_len=200, batch_size=8).take(1):
    print("X shape:", x.shape)
    print("Y (one-hot) shape:", y.shape)
    print("Y (class indices):", np.argmax(y.numpy(), axis=1))
    print("Unique labels in batch:", np.unique(np.argmax(y.numpy(), axis=1)))

In [None]:
# Running the training script (Raw Data Method)
# This script will train the baseline model using the prepared dataset and the fitted scaler.
# It will save the trained model to the specified directory - this is the final step in the training process. 
!python EMG_Prosthetic_Project/Scripts/Train_Baseline.py --data_dir Segregated_Data

In [None]:
# Running the training script (Feature Vector Extraction Data Method)
# This script will train the baseline model using the prepared dataset and the fitted scaler.
# It will save the trained model to the specified directory - this is the final step in the training process.
!python EMG_Prosthetic_Project/Scripts/Train_Baseline_Features.py --data_dir Segregated_Data

In [None]:
# Running the training script (Feature Vector Extraction Data Method)
# This script will train the baseline model using the prepared dataset and the fitted scaler.
# It will save the trained model to the specified directory - this is the final step in the training process.
!python EMG_Prosthetic_Project/Scripts/Train_Baseline_VirtualChannels.py --data_dir Segregated_Data

In [None]:
# Generating TFRecords (Feature Vector Extraction with Virtual Channels)
# This script will create TFRecord files with 60 features (10 channels × 6 features) for the feature-based virtual channel pipeline.
!python EMG_Prosthetic_Project/Scripts/Make_TFRecords_Features_VirtualChannels.py --data_dir Segregated_Data

In [None]:
# Training the feature-based model with virtual channels
# This script will train an MLP on the 60-feature vectors (10 channels × 6 features).
!python EMG_Prosthetic_Project/Scripts/Train_Baseline_Features_VirtualChannels.py --data_dir Segregated_Data

In [None]:
# Evaluating the feature-based virtual channel model
# This script will evaluate the trained MLP on the test set and print/save results.
!python EMG_Prosthetic_Project/Scripts/Evaluate_Features_VirtualChannels.py

In [None]:
# Evaluating model outcome
# This script will evaluate the trained model on the test dataset and print the results.
!python EMG_Prosthetic_Project/Scripts/Evaluate_Model.py