<a href="https://colab.research.google.com/github/bermanlabemory/gait_signatures/blob/main/Gait_Signatures_Script_1_Train_Model_Architectures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## This script creates individual model folders corresponding to user specified model architectures. Various model architectures are trained and the best model (.h5 file) and the loss curves (.png) are saved in each respective folder to be used later for gait signature development and further analyses.

### This code allows testing of various model architectures (different nodes and lookback values from a single hidden layer model)

**notes:**
1.   The data is stored in .csv files: PareticvsNonP_RNNData.csv, Speedlabels.csv, Subjectlabels.csv
2.   A single hidden layer LSTM model is run on the data for each model architecture.




**Created by**: Taniel Winner

**Date**: 07/*18*/22

**Step 0**: Mount (connect to) your google drive folder where you want to save the simulation results and model parameters.


In [None]:
from google.colab import drive
drive.mount('/content/drive')
#drive.mount("/content/drive", force_remount=True)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# check python version
from platform import python_version

print(python_version())

3.8.16


In [None]:
# check tensorflow version
import tensorflow as tf
print(tf.__version__)

2.9.2


In [None]:
# Code works with these versions
!pip install keras==2.15.0
tensorflow==2.15.0

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 54.8 gigabytes of available RAM

You are using a high-RAM runtime!


## Enabling and testing the GPU

First, you'll need to enable GPUs for the notebook:

- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

Next, we'll confirm that we can connect to the GPU with tensorflow:

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Sun Dec 18 00:28:14 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   52C    P0    27W /  70W |    312MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
%tensorflow_version 2.x
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.
Found GPU at: /device:GPU:0


**Step 1**: Import necessary packages to develop model

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.python.keras.layers.recurrent import LSTM
from sklearn.model_selection import train_test_split
import sklearn.model_selection as model_selection
import matplotlib.pyplot as plt
import math
import keras as k
import pandas as pd
import numpy as np
from copy import copy
import scipy.io
from sklearn.decomposition import PCA

from scipy.signal import find_peaks
from scipy import interpolate
from numpy import sin,cos,pi,array,linspace,cumsum,asarray,dot,ones
from pylab import plot, legend, axis, show, randint, randn, std,lstsq

from sklearn.manifold import MDS
import csv
import os
from tqdm import tqdm

**Step 2**: Create folder in your drive and load module in Google Drive


In [None]:
# The path to save the models and read data from

### !!! please create and name the main folder that you would like to save all results -- ensure that the functions and data files are in this folder
folder = 'Gait_signatures/'


path = '/content/drive/My Drive/'+ folder

# Insert the directory
import sys
sys.path.insert(0,path)

**Step 3**: Load in data and specify variables/parameters

---



In [None]:
# Non-changing variables (specified specifically for our gait signature dataset )

# number of trials in dataset
trialnum = 72 # 72 total trials

# number of samples in each trial
trialsamp = 1500 # 1500 samples (15 seconds, 100 Hz)

# number of features per trial
feats = 6

#Batch size - same as the number of trials
batch_size = trialnum

# Number of Layers
numlayers = 1

# Choose the maximum number of iterations to train the model
finalepoch = 10000

# load the input data/kinematics
datafilepath = path + '/PareticvsNonP_RNNData.csv' #input data
all_csvnp = np.loadtxt(datafilepath,delimiter=',').T

# reshape all the input data into a tensor
all_inputdata_s = all_csvnp.reshape(trialnum,trialsamp,feats)
csvnp = all_inputdata_s
print('original input data shape is: '+ str(all_csvnp.shape ))
print('input data reshaped is: '+ str(all_inputdata_s.shape))

original input data shape is: (108000, 6)
input data reshaped is: (72, 1500, 6)


**Step 4**: Develop list of model architectures and corresponding variables to train. This step also generates a list of folder names and pathways where the models will be saved and accessed later.

In [None]:
# generate a list of models and corresponding parameters to test
test_model_nodes = [512] # specify the number of LSTM units for each model here

# Lookback parameter
# A number is chosen called the look back parameter where this many samples are used to predict the outputs, calculate error, attain a gradient which is
# back propagatted to update the weights, then finally the weights are reset to zero. Our lookback parameter is typically 1 less than a divisor of the trial length.
# Thus, lookback + 1 should be divisible by the trialsamp. This set up is specifically for training input and output sequences that are one step time shifted (lag = 1)
# versions of eachother. E.g. trial length = 500 lookback = 99, input = samples 0:99, output = samples 1:100. Since the lookback + 1 (99 + 1) = 100, trial length/(lookback + 1) = 5;
# thus there are 5 mini-batches of input/output sequences per trial where we can evaluate error (resetting paramters each time).

seqs = [249,499] #lookback parameter

runs = 1 # stability analysis - repeat each model architecture this many times to test stability of cost function outputs with different random initializations on each run

test_model_seq = np.repeat(seqs, runs)

All_nodes = np.empty([0,1], dtype='int')
All_seq = np.empty([0,1],dtype='int')
All_valseg = np.empty([0,1],dtype='int')
All_trainseg = np.empty([0,1],dtype='int')
All_modelname = []
All_mod_name = []


# Training and Validation Set-up
# Based on the length of the trials and the lookback parameter you can set how many mini-batches would be used for training vs validation.
# For example, if the trial length is 1500 and num = lookback+1 = 250, since num can be divided into the trial length 6 times, there would be 6 minibatches.
# One can specify that 4 of the mini-batches be used for training: trainseg = 4 corresponding to the 1st 4 mini-batches of the trial and 2 of the mini-batches used
# for validation: valseg = 2, corresponding to last 2 mini-batches of trial.


count = 0; #initialize model run -- this serves as the model run ID number
for a in test_model_nodes:
  for b in test_model_seq:
    if count < runs:
      count = count + 1
    else:
      count = 1 # reset counter when all runs of certain model attained
    #if statement for valseg, trainseg based on sequence length
    if int(b) == 249:
      trainseg = 4
      valseg = 2
    elif int(b) == 499:
      trainseg = 2
      valseg = 1
    elif int(b) == 749:
      trainseg = 1
      valseg = 1

    All_nodes = np.append(All_nodes, a)
    All_seq = np.append(All_seq, int(b))
    All_valseg = np.append(All_valseg, valseg)
    All_trainseg = np.append(All_trainseg, trainseg)
    All_modelname = np.append(All_modelname, 'run_' + str(count) + '_UNIT_' + str(a) + '_LB_' + str(b) + '/' )
    All_mod_name = np.append(All_mod_name, 'run_' + str(count) + '_UNIT_' + str(a) + '_LB_' + str(b) )

**Step 5**: Train model architectures and save in a loop.

In [None]:
for j in range(len(All_mod_name)): # loop through all model architectures

    newfoldpath = path + All_mod_name[j]  # specify folder path to store model
    try: # test if model folder already exists
      if not os.path.exists(os.path.dirname(newfoldpath)): # if model folder does not exist create
          os.makedirs(os.path.dirname(newfoldpath)) # create a new model folder
    except OSError as err:
          print(err)

    # specifiy path to store each model and generated results
    savepath = path + All_modelname[j]
    mod_name = All_mod_name[j]

    modnum = j+1 #model number counter for print statement
    print('Working on: ' + mod_name + ' model ' + str(modnum) + ' / ' + str(len(All_mod_name)))

    # Specify variables for model run instance

    # Number of Units
    numunits = All_nodes[j]

    lookback = All_seq[j]

    # Select the 1st X segments to be training
    trainseg = All_trainseg[j]

    # Select the last Y segments to be validation
    valseg = All_valseg[j]

    # Set up training and validation input and output sequences
    trainx = np.concatenate([csvnp[:,i*lookback:(i+1)*lookback,:] for i in range(trainseg)], axis=0)
    trainy = np.concatenate([csvnp[:,i*lookback+1:(i+1)*lookback+1,:] for i in range(trainseg)], axis=0)

    valindex = (lookback+1)*trainseg
    valx = np.concatenate([csvnp[:,valindex+i*lookback:valindex+(i+1)*lookback,:] for i in range(valseg)], axis=0)
    valy = np.concatenate([csvnp[:,valindex+1+i*lookback:valindex+1+(i+1)*lookback,:] for i in range(valseg)], axis=0)

    # Develop LSTM model
    with tf.device('/device:GPU:0'):

        model=k.models.Sequential()
        model.add(tf.compat.v1.keras.layers.CuDNNLSTM(units = numunits, stateful=True, return_sequences=True, batch_input_shape =(batch_size,lookback,feats)))
        model.add(tf.compat.v1.keras.layers.Dense(units=feats))

        # compile model
        model.compile(loss='mse', optimizer='adam',metrics=['accuracy'])

        # train the model using training and validation
        checkpoint_cb = k.callbacks.ModelCheckpoint(savepath + mod_name + '_bestwhole.h5',save_best_only = True)
        early_stopping_cb = k.callbacks.EarlyStopping(patience = 500,restore_best_weights = True)
        history = model.fit(trainx,trainy,batch_size=trialnum,epochs=finalepoch, validation_data=(valx,valy),shuffle=False, verbose=1,callbacks= [checkpoint_cb,early_stopping_cb])

        # Save history
        np.save(savepath +  mod_name + '_history_loss.npy', history.history['loss'])
        np.save(savepath +  mod_name + '_history_val_loss.npy', history.history['val_loss'])

        # Plot the training and validation loss curve
        fig = plt.figure(figsize=(20,15))
        loss_train = history.history['loss']
        loss_val = history.history['val_loss']
        epochs = np.array(range(1,len(loss_train)+1))
        epshift = epochs - 0.5
        plt.plot(epshift,loss_train, 'g', label='Training loss')
        plt.plot(epochs, loss_val, 'b', label='validation loss') # account for shift in Val loss curve
        plt.yscale('log', basey=10) #scale the y axis - base 10
        plt.title('Training and Validation loss')
        plt.xlabel('Epochs')
        plt.ylabel('Loss')
        plt.legend(loc="upper right")
        #plt.show()
        plt.savefig(savepath + mod_name + 'Training_vs_Validation.png', dpi = 300)
        plt.close(fig)# close figure in loop

        # save minumum training and validation loss
        min_val_loss = np.min(loss_val)
        min_train_loss = np.min(loss_train)

        np.save(savepath +  mod_name + '_MIN_val_loss.npy', min_val_loss)
        np.save(savepath +  mod_name + '_MIN_train_loss.npy', min_train_loss)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch 1077/10000
Epoch 1078/10000
Epoch 1079/10000
Epoch 1080/10000
Epoch 1081/10000
Epoch 1082/10000
Epoch 1083/10000
Epoch 1084/10000
Epoch 1085/10000
Epoch 1086/10000
Epoch 1087/10000
Epoch 1088/10000
Epoch 1089/10000
Epoch 1090/10000
Epoch 1091/10000
Epoch 1092/10000
Epoch 1093/10000
Epoch 1094/10000
Epoch 1095/10000
Epoch 1096/10000
Epoch 1097/10000
Epoch 1098/10000
Epoch 1099/10000
Epoch 1100/10000
Epoch 1101/10000
Epoch 1102/10000
Epoch 1103/10000
Epoch 1104/10000
Epoch 1105/10000
Epoch 1106/10000
Epoch 1107/10000
Epoch 1108/10000
Epoch 1109/10000
Epoch 1110/10000
Epoch 1111/10000
Epoch 1112/10000
Epoch 1113/10000
Epoch 1114/10000
Epoch 1115/10000
Epoch 1116/10000
Epoch 1117/10000
Epoch 1118/10000
Epoch 1119/10000
Epoch 1120/10000
Epoch 1121/10000
Epoch 1122/10000
Epoch 1123/10000
Epoch 1124/10000
Epoch 1125/10000
Epoch 1126/10000
Epoch 1127/10000
Epoch 1128/10000
Epoch 1129/10000
Epoch 1130/10000
Epoch 1131/10000
