# Acceleroseismology: A model for identifying FoG in Parkinson's Disease patients.

## Background:

Freezing of Gait is a symptom of Parkinson's Disease (PD) that is debilitating, lowers quality of life, reduces independence, and can lead to an increased risk of falling.  In an effort to identify Freezing of Gait signatures in accelerometer data (or better yet, predict occurences ahead of time), this notebook applies a machine learning algorithm to the time series accelerometry.  I hope the results are useful in diagnosing and potentially warning PD patients ahead of Freezing of Gait (FoG) episodes.


## Acceleroseismology:

As a mental model, I have approached this problem similar to seismology.  In studying earthquakes, seismologists discovered that P-waves arrive ahead of more destructive S-waves, allowing for an early warning system to avoid adverse effects.  Similarly, in what I am calling acceleroseismology, I analyze both time and frequency domain features in accelerometry data via a moving window approach in order to place each instance of time in its broader context.  Ultimately, I hope to engineer relevant features that can predict the coming onset of a FoG episode, as such a capability could enable wearable devices to alert the user and possibly reduce fall risk and improve the probability of avoiding the FoG event altogether.  Such a result would contribute to the United Nations' [Third Sustainable Development Goal: Good health and well-being](https://sdgs.un.org/goals).


## Motivation Behind Features:



In [1]:
# Import Libraries:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
#import random
import csv
#import matplotlib.pyplot as plt
from sklearn.multioutput import MultiOutputClassifier
from sklearn.linear_model import LogisticRegression
#from sklearn.neural_network import MLPClassifier
#from sklearn.ensemble import RandomForestRegressor
#from sklearn.ensemble import RandomForestClassifier
#from sklearn.metrics import mean_absolute_error
from sklearn.metrics import average_precision_score
#from sklearn.model_selection import train_test_split


####################################################################
########              PARAMETERS & CONSTANTS:               ########
####################################################################
sampleRateTDCSFOG = 128     #per second
sampleRateDEFOG = 100     #per second
windowHalfLength = 640 #on either side of the point of interest (for the moving window)
Q_window = 200
L_window = 400

#low frequency band
lowfBandMin = 0.0
lowfBandMax = 2.0
#high frequency band
highfBandMin = 2.0
highfBandMax = 6.0

dummyVariable = 9 #ignore this for now, please.



**Functions:**

In [2]:
#A fxn that computes the root mean abs square value of an array.
def getRMS(inputArray):
    return np.sqrt(np.mean(abs(inputArray)*abs(inputArray))) #imaginary args okay


#A low pass filter to remove high frequency noise.
def lowPassFilter(kArr, freqArr, cutOffFreq):
    for i in range(0,len(freqArr)):
        if freqArr[i] > cutOffFreq:
            kArr.real[i] = 0; 
            kArr.imag[i] = 0;
    return kArr


#A high pass filter to analyze only high frequencies.  
def highPassFilter(kArr, freqArr, cutOffFreq):
    for i in range(0,len(freqArr)):
        if freqArr[i] < cutOffFreq:
            kArr.real[i] = 0;
            kArr.imag[i] = 0;
    return kArr


#A quick FFT where W can be x, y, z accelerations etc.
def quickFFT(inputT, inputW, sampleRate, filterType, cutOff):
    kspaceData = np.fft.rfft(inputW)
    freq = np.fft.rfftfreq(inputT.shape[-1], d=1.0/sampleRate)
    if filterType == "low":
        filteredData = lowPassFilter(kspaceData, freq, cutOff)
    elif filterType == "high":
        filteredData = highPassFilter(kspaceData, freq, cutOff)
    else:
        filteredData = kspaceData
    outputW = np.fft.irfft(filteredData, len(inputW))
    return outputW


#A quick FFT where W can be x, y, z accelerations etc. (returns k-space)
def quickFFT_k(inputT, inputW, sampleRate, filterType, cutOff):
    kspaceData = np.fft.rfft(inputW)
    freq = np.fft.rfftfreq(inputT.shape[-1], d=1.0/sampleRate)
    if filterType == "low":
        filteredData = lowPassFilter(kspaceData, freq, cutOff)
    elif filterType == "high":
        filteredData = highPassFilter(kspaceData, freq, cutOff)
    else:
        filteredData = kspaceData
    return freq, filteredData

#experimental feature: Loud amplitudes followed by quiescence 
def experiment1(inputT1, inputW1, inputT2, inputW2, sampleRate, filterType, cutOff):
    freq1, amps1 = quickFFT_k(inputT1, inputW1, sampleRate, filterType, cutOff)
    freq2, amps2 = quickFFT_k(inputT2, inputW2, sampleRate, filterType, cutOff)
    temp1 = abs(amps1)*abs(amps1)
    P1 = np.sum(temp1)
    temp2 = abs(amps2)*abs(amps2)
    P2 = np.sum(temp2)
    return P2/(P1 + 0.01) #plus 0.01 is to avoid div by 0 for now, fix later.

#experimental feature: rmsRadius_AP_Vert
def experiment2(ap, vert):
    return np.sqrt(ap*ap + vert*vert)

**Read in training data:**


In [3]:
# Load the training and testing data:
#train_file_path = '/kaggle/input/parkinsonsfog-newfeatures-traintest/trainFinal.csv'
#test_file_path = '/kaggle/input/parkinsonsfog-newfeatures-traintest/testFinal.csv'

trainTDCSFOG_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/tdcsfog/'
trainDEFOG_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/train/defog/'

testTDCSFOG_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/tdcsfog/'
testDEFOG_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/defog/'


#print("Reading training file...")
#trainTDCSFOG_data = pd.read_csv(trainTDCSFOG_path)
#test_data = pd.read_csv(test_file_path)

**Process data:**

In [4]:
colNames_TDCS = ['Time', 'aVert', 'aML', 'aAP', 'StartHesitation', 'Turn', 'Walking', 'id_t']
colNames_DEFOG = ['Time', 'aVert', 'aML', 'aAP', 'StartHesitation', 'Turn', 'Walking', 'Valid', 'Task', 'id_t']

#features = ['maxAmpAP', 'maxAccelAmpML', 'maxAccelAmpAP', 'maxAccelAmpVert', 'rmsAmpsLowfAP', 'rmsAmpsLowfVert', 'rmsAmpsHighfML', 'QLratio_ML']
features = ['maxAccelAmpML', 'maxAccelAmpAP', 'maxAccelAmpVert', 'rmsAmpsHighfML', 'rmsRadius_AP_Vert', 'MLmAP', 'VertmAP', 'MLbyVert', 'APbyVert']

outputCols = ['StartHesitation', 'Turn', 'Walking']

newColNames_TDCS = ['maxAccelAmpML', 'maxAccelAmpAP', 'maxAccelAmpVert', 'rmsAmpsHighfML', 'rmsRadius_AP_Vert', 'MLmAP', 'VertmAP', 'MLbyVert', 'APbyVert', 'StartHesitation', 'Turn', 'Walking', 'id_t']
newColNames_DEFOG = ['maxAccelAmpML', 'maxAccelAmpAP', 'maxAccelAmpVert', 'rmsAmpsHighfML', 'rmsRadius_AP_Vert', 'MLmAP', 'VertmAP', 'MLbyVert', 'APbyVert', 'StartHesitation', 'Turn', 'Walking', 'id_t']


trainTDCS_List = []
trainDEFOG_List = []


#Process data:
print('Creating TDCS list...')
for dirname, _, filenames in os.walk(trainTDCSFOG_path):
    for filename in filenames:
        #print(my_files_path+filename)
        id = filename[:-4] #removes trailing .csv
        f_trainTDCS_List = [] #file specific list, add elements to main list later
        with open(trainTDCSFOG_path+filename) as file_obj:
            heading = next(file_obj)
            reader_obj = csv.reader(file_obj)   
            # Iterate over each row in the csv file:
            for row in reader_obj:
                id_t = str(id) + "_" + str(row[0])
                row.append(id_t)
                f_trainTDCS_List.append(row)
        #File specific list is f_trainTDCS_List
        #Feature engineering now begins for the file:
        numPoints = len(f_trainTDCS_List)
        dfTemp = pd.DataFrame(f_trainTDCS_List,columns=colNames_TDCS)
        t = np.array(dfTemp['Time'].astype('float32'))
        aML = np.array(dfTemp['aML'].astype('float32'))
        aAP = np.array(dfTemp['aAP'].astype('float32'))
        aVert = np.array(dfTemp['aVert'].astype('float32'))
        boolStartHes = np.array(dfTemp['StartHesitation'].astype('int8'))
        boolTurn = np.array(dfTemp['Turn'].astype('int8'))
        boolWalking = np.array(dfTemp['Walking'].astype('int8'))  
        id_t = dfTemp['id_t'].astype('string')
        #Pad list with zeros for now:
        """
        QL_array = np.zeros(numPoints)
        for tt in range(L_window, numPoints):
            t_Loud = t[tt-L_window:tt-Q_window]
            t_Quiet = t[tt-Q_window:tt]
            ML_Loud = aML[tt-L_window:tt-Q_window]
            ML_Quiet = aML[tt-Q_window:tt]
            QLratio_ML = experiment1(t_Quiet, ML_Quiet, t_Loud, ML_Loud, sampleRateTDCSFOG, "none", dummyVariable)
            QL_array[tt] = QLratio_ML
            #All i've done is this for loop for tt, need to update lists, zero padding, etc.
            #I think this will just be updating newColNames and similar for test
            #repeat for DEFOG
        """
        for k in range(0, windowHalfLength):
            addThis = [int(0)]*(len(newColNames_TDCS)-4) #4 is number of output cols plus one for id_t
            addThis.append(boolStartHes[k])
            addThis.append(boolTurn[k])
            addThis.append(boolWalking[k])
            addThis.append(id_t.iloc[k])
            trainTDCS_List.append(addThis)
        for tm in range(windowHalfLength, numPoints-windowHalfLength): #got rid of minus one
            tClip = t[tm-windowHalfLength:tm+windowHalfLength+1]
            aML_Clip = aML[tm-windowHalfLength:tm+windowHalfLength+1]
            aAP_Clip = aAP[tm-windowHalfLength:tm+windowHalfLength+1]
            aVert_Clip = aVert[tm-windowHalfLength:tm+windowHalfLength+1]
            freqML, ampsML = quickFFT_k(tClip, aML_Clip, sampleRateTDCSFOG, "none", dummyVariable)
            freqAP, ampsAP = quickFFT_k(tClip, aAP_Clip, sampleRateTDCSFOG, "none", dummyVariable)
            freqVert, ampsVert = quickFFT_k(tClip, aVert_Clip, sampleRateTDCSFOG, "none", dummyVariable)
            absAmpsAP     = abs(ampsAP)
            #Start Filling Arrays:
            addThis = []
            #addThis.append(np.amax(absAmpsAP,axis=0)) # abs val of maximum amplitude in FFT'd AP data
            addThis.append(np.max(abs(aML_Clip)) - np.min(abs(aML_Clip)))      # abs(max - min accel) for ML accel data
            addThis.append(np.max(abs(aAP_Clip)) - np.min(abs(aAP_Clip)))      # abs(max - min accel) for AP accel data
            addThis.append(np.max(abs(aVert_Clip)) - np.min(abs(aVert_Clip)))  # abs(max - min accel) for Vert accel data
            #addThis.append(getRMS(ampsAP[freqAP<lowfBandMax]))
            #addThis.append(getRMS(ampsVert[freqVert<lowfBandMax]))
            addThis.append(getRMS(ampsML[(freqML>highfBandMin)*(freqML<highfBandMax)]))
            #addThis.append(QL_array[tm]) #NEW
            addThis.append(experiment2(getRMS(ampsAP[freqAP<lowfBandMax]), getRMS(ampsVert[freqVert<lowfBandMax])))
            addThis.append(aML[tm]-aAP[tm])
            addThis.append(aVert[tm]-aAP[tm])
            addThis.append(aML[tm]/aVert[tm])
            addThis.append(aAP[tm]/aVert[tm])
            addThis.append(boolStartHes[tm])
            addThis.append(boolTurn[tm])
            addThis.append(boolWalking[tm])
            addThis.append(id_t.iloc[tm])
            #now add to list:
            trainTDCS_List.append(addThis)
        #pad ending with zeros    
        #addThis = [0]*(len(newColNames_TDCS)-1) #one because we need the id_t
        for k in range(0, windowHalfLength):
            addThis = [int(0)]*(len(newColNames_TDCS)-1) #one because we need the id_t
            addThis.append(id_t.iloc[numPoints - windowHalfLength + k])
            trainTDCS_List.append(addThis)


#print(len(trainTDCS_List))

print('Creating DEFOG list...')
for dirname, _, filenames in os.walk(trainDEFOG_path):
    for filename in filenames:
        #print(my_files_path+filename)
        id = filename[:-4] #removes trailing .csv
        f_trainDEFOG_List = [] #file specific list, add elements to main list later
        with open(trainDEFOG_path+filename) as file_obj:
            heading = next(file_obj)
            reader_obj = csv.reader(file_obj)   
            # Iterate over each row in the csv file:
            for row in reader_obj:
                id_t = str(id) + "_" + str(row[0])
                row.append(id_t)
                f_trainDEFOG_List.append(row)
        #Feature engineering now begins for the file:
        numPoints = len(f_trainDEFOG_List)
        dfTemp = pd.DataFrame(f_trainDEFOG_List,columns=colNames_DEFOG)
        t = np.array(dfTemp['Time'].astype('float32'))
        aML = np.array(dfTemp['aML'].astype('float32'))
        aAP = np.array(dfTemp['aAP'].astype('float32'))
        aVert = np.array(dfTemp['aVert'].astype('float32'))
        boolStartHes = np.array(dfTemp['StartHesitation'].astype('int8'))
        boolTurn = np.array(dfTemp['Turn'].astype('int8'))
        boolWalking = np.array(dfTemp['Walking'].astype('int8'))  
        id_t = dfTemp['id_t'].astype('string')
        #Pad list with zeros for now:
        """
        QL_array = np.zeros(numPoints)
        for tt in range(L_window, numPoints):
            t_Loud = t[tt-L_window:tt-Q_window]
            t_Quiet = t[tt-Q_window:tt]
            ML_Loud = aML[tt-L_window:tt-Q_window]
            ML_Quiet = aML[tt-Q_window:tt]
            QLratio_ML = experiment1(t_Quiet, ML_Quiet, t_Loud, ML_Loud, sampleRateDEFOG, "none", dummyVariable)
            QL_array[tt] = QLratio_ML
        """
        for k in range(0, windowHalfLength):
            addThis = [int(0)]*(len(newColNames_DEFOG)-4) #4 is number of output cols plus one for id_t
            addThis.append(boolStartHes[k])
            addThis.append(boolTurn[k])
            addThis.append(boolWalking[k])
            addThis.append(id_t.iloc[k])
            trainDEFOG_List.append(addThis)
        for tm in range(windowHalfLength, numPoints-windowHalfLength): #got rid of minus one
            tClip = t[tm-windowHalfLength:tm+windowHalfLength+1]
            aML_Clip = aML[tm-windowHalfLength:tm+windowHalfLength+1]
            aAP_Clip = aAP[tm-windowHalfLength:tm+windowHalfLength+1]
            aVert_Clip = aVert[tm-windowHalfLength:tm+windowHalfLength+1]
            freqML, ampsML = quickFFT_k(tClip, aML_Clip, sampleRateDEFOG, "none", dummyVariable)
            freqAP, ampsAP = quickFFT_k(tClip, aAP_Clip, sampleRateDEFOG, "none", dummyVariable)
            freqVert, ampsVert = quickFFT_k(tClip, aVert_Clip, sampleRateDEFOG, "none", dummyVariable)
            absAmpsAP     = abs(ampsAP)
            #Start Filling Arrays:
            addThis = []
            #addThis.append(np.amax(absAmpsAP,axis=0)) # abs val of maximum amplitude in FFT'd AP data
            addThis.append(np.max(abs(aML_Clip)) - np.min(abs(aML_Clip)))      # abs(max - min accel) for ML accel data
            addThis.append(np.max(abs(aAP_Clip)) - np.min(abs(aAP_Clip)))      # abs(max - min accel) for AP accel data
            addThis.append(np.max(abs(aVert_Clip)) - np.min(abs(aVert_Clip)))  # abs(max - min accel) for Vert accel data
            #addThis.append(getRMS(ampsAP[freqAP<lowfBandMax]))
            #addThis.append(getRMS(ampsVert[freqVert<lowfBandMax]))
            addThis.append(getRMS(ampsML[(freqML>highfBandMin)*(freqML<highfBandMax)]))
            #addThis.append(QL_array[tm]) #NEW
            addThis.append(experiment2(getRMS(ampsAP[freqAP<lowfBandMax]), getRMS(ampsVert[freqVert<lowfBandMax])))
            addThis.append(aML[tm]-aAP[tm])
            addThis.append(aVert[tm]-aAP[tm])
            addThis.append(aML[tm]/aVert[tm])
            addThis.append(aAP[tm]/aVert[tm])
            addThis.append(boolStartHes[tm])
            addThis.append(boolTurn[tm])
            addThis.append(boolWalking[tm])
            addThis.append(id_t.iloc[tm])
            #now add to list:
            trainDEFOG_List.append(addThis)
        #pad ending with zeros    
        #addThis = [0]*(len(newColNames_DEFOG)-1) #one because we need the id_t
        for k in range(0, windowHalfLength):
            addThis = [int(0)]*(len(newColNames_DEFOG)-1) #one because we need the id_t
            addThis.append(id_t.iloc[numPoints - windowHalfLength + k])
            trainDEFOG_List.append(addThis)






print('Creating train data dataframes...')
dfTrain_TDCS = pd.DataFrame(trainTDCS_List, columns = newColNames_TDCS)
dfTrain_DEFOG = pd.DataFrame(trainDEFOG_List, columns = newColNames_DEFOG)







Creating TDCS list...
Creating DEFOG list...
Creating train data dataframes...


print(len(trainTDCS_List))
print(len(trainDEFOG_List))
print(len(trainTDCS_List[13]))
print(len(trainDEFOG_List[13]))
print(dfTrain_TDCS.head())
print(dfTrain_DEFOG.head())
print(dfTrain_TDCS.isnull().sum())
print(dfTrain_DEFOG.isnull().sum())
print(dfTrain_TDCS.loc[13, outputCols])
print(dfTrain_DEFOG.loc[13, outputCols])
print(dfTrain_TDCS.iloc[513])
print(dfTrain_DEFOG.iloc[513])

#print(dfTrain_TDCS.Turn.unique())
#print(dfTrain_TDCS.Turn.max())
#print(dfTrain_TDCS.Turn.min())
#print(dfTrain_TDCS.Turn.mean())

**Train the models:**

In [5]:

#TDCS
print("Training the TDCS model...")
X_TDCS = dfTrain_TDCS.loc[:, features] 
y_TDCS = dfTrain_TDCS.loc[:, outputCols]
clf_TDCS = MultiOutputClassifier(LogisticRegression(max_iter=2000, n_jobs=-1)).fit(X_TDCS, y_TDCS)


"""
print("Training the RF TDCS model...")
addl_clf_test = MultiOutputClassifier(RandomForestClassifier(n_estimators = 50, random_state = 314), n_jobs=-1).fit(X_TDCS, y_TDCS)


feat_impts = [] 
for clf in addl_clf_test.estimators_:
    feat_impts.append(clf.feature_importances_)

print("Average Feature Importances: ", np.mean(feat_impts, axis=0)
"""

#DEFOG
print("Training the DEFOG model...")
X_DEFOG = dfTrain_DEFOG.loc[:, features] 
y_DEFOG = dfTrain_DEFOG.loc[:, outputCols]
clf_DEFOG = MultiOutputClassifier(LogisticRegression(max_iter=2000, n_jobs=-1)).fit(X_DEFOG, y_DEFOG)



Training the TDCS model...
Training the DEFOG model...


**Now Test the model:**

In [6]:
#Read in test data
print("Reading testing files...")
#test_file_path_TDCS = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/tdcsfog/003f117e14.csv'
#test_file_path_DEFOG = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/defog/02ab235146.csv'

testTDCSFOG_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/tdcsfog/'
testDEFOG_path = '/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/test/defog/'

testCols = ['maxAccelAmpML', 'maxAccelAmpAP', 'maxAccelAmpVert', 'rmsAmpsHighfML', 'rmsRadius_AP_Vert', 'MLmAP', 'VertmAP', 'MLbyVert', 'APbyVert', 'id_t']

testTDCS_List = []
testDEFOG_List = []


            

#Process data:
print('Creating TDCS list...')
for dirname, _, filenames in os.walk(testTDCSFOG_path):
    for filename in filenames:
        #print(my_files_path+filename)
        id = filename[:-4] #removes trailing .csv
        f_testTDCS_List = [] #file specific list, add elements to main list later
        with open(testTDCSFOG_path+filename) as file_obj:
            heading = next(file_obj)
            reader_obj = csv.reader(file_obj)   
            # Iterate over each row in the csv file:
            for row in reader_obj:
                id_t = str(id) + "_" + str(row[0])
                row.append(id_t)
                f_testTDCS_List.append(row)
        #Feature engineering now begins for the file:
        numPoints = len(f_testTDCS_List)
        dfTemp = pd.DataFrame(f_testTDCS_List,columns=['Time', 'aVert', 'aML', 'aAP', 'id_t'])
        t = np.array(dfTemp['Time'].astype('float32'))
        aML = np.array(dfTemp['aML'].astype('float32'))
        aAP = np.array(dfTemp['aAP'].astype('float32'))
        aVert = np.array(dfTemp['aVert'].astype('float32'))  
        id_t = dfTemp['id_t'].astype('string')
        #Pad list with zeros for now:
        #addThis = [0]*(len(testCols)-1) #one for id_t
        """
        QL_array = np.zeros(numPoints)
        for tt in range(L_window, numPoints):
            t_Loud = t[tt-L_window:tt-Q_window]
            t_Quiet = t[tt-Q_window:tt]
            ML_Loud = aML[tt-L_window:tt-Q_window]
            ML_Quiet = aML[tt-Q_window:tt]
            QLratio_ML = experiment1(t_Quiet, ML_Quiet, t_Loud, ML_Loud, sampleRateTDCSFOG, "none", dummyVariable)
            QL_array[tt] = QLratio_ML
        """
        for k in range(0, windowHalfLength):
            addThis = [0]*(len(testCols)-1) #one for id_t
            addThis.append(id_t.iloc[k])
            testTDCS_List.append(addThis)
        for tm in range(windowHalfLength, numPoints-windowHalfLength): #got rid of minus one
            tClip = t[tm-windowHalfLength:tm+windowHalfLength+1]
            aML_Clip = aML[tm-windowHalfLength:tm+windowHalfLength+1]
            aAP_Clip = aAP[tm-windowHalfLength:tm+windowHalfLength+1]
            aVert_Clip = aVert[tm-windowHalfLength:tm+windowHalfLength+1]
            freqML, ampsML = quickFFT_k(tClip, aML_Clip, sampleRateTDCSFOG, "none", dummyVariable)
            freqAP, ampsAP = quickFFT_k(tClip, aAP_Clip, sampleRateTDCSFOG, "none", dummyVariable)
            freqVert, ampsVert = quickFFT_k(tClip, aVert_Clip, sampleRateTDCSFOG, "none", dummyVariable)
            absAmpsAP     = abs(ampsAP)
            #Start Filling Arrays:
            addThis = []
            #addThis.append(np.amax(absAmpsAP,axis=0)) # abs val of maximum amplitude in FFT'd AP data
            addThis.append(np.max(abs(aML_Clip)) - np.min(abs(aML_Clip)))      # abs(max - min accel) for ML accel data
            addThis.append(np.max(abs(aAP_Clip)) - np.min(abs(aAP_Clip)))      # abs(max - min accel) for AP accel data
            addThis.append(np.max(abs(aVert_Clip)) - np.min(abs(aVert_Clip)))  # abs(max - min accel) for Vert accel data
            #addThis.append(getRMS(ampsAP[freqAP<lowfBandMax]))
            #addThis.append(getRMS(ampsVert[freqVert<lowfBandMax]))
            addThis.append(getRMS(ampsML[(freqML>highfBandMin)*(freqML<highfBandMax)]))
            #addThis.append(QL_array[tm]) #NEW
            addThis.append(experiment2(getRMS(ampsAP[freqAP<lowfBandMax]), getRMS(ampsVert[freqVert<lowfBandMax])))
            addThis.append(aML[tm]-aAP[tm])
            addThis.append(aVert[tm]-aAP[tm])
            addThis.append(aML[tm]/aVert[tm])
            addThis.append(aAP[tm]/aVert[tm])
            addThis.append(id_t.iloc[tm])
            #now add to list:
            testTDCS_List.append(addThis)
        #pad ending with zeros    
        #addThis = [0]*(len(testCols)-1) #one because we need the id_t
        for k in range(0, windowHalfLength):
            addThis = [0]*(len(testCols)-1) #one because we need the id_t
            addThis.append(id_t.iloc[numPoints - windowHalfLength + k])
            testTDCS_List.append(addThis)




print('Creating DEFOG list...')
for dirname, _, filenames in os.walk(testDEFOG_path):
    for filename in filenames:
        #print(my_files_path+filename)
        id = filename[:-4] #removes trailing .csv
        f_testDEFOG_List = [] #file specific list, add elements to main list later
        with open(testDEFOG_path+filename) as file_obj:
            heading = next(file_obj)
            reader_obj = csv.reader(file_obj)   
            # Iterate over each row in the csv file:
            for row in reader_obj:
                id_t = str(id) + "_" + str(row[0])
                row.append(id_t)
                f_testDEFOG_List.append(row)
        #Feature engineering now begins for the file:
        numPoints = len(f_testDEFOG_List)
        dfTemp = pd.DataFrame(f_testDEFOG_List,columns=['Time', 'aVert', 'aML', 'aAP', 'id_t'])
        t = np.array(dfTemp['Time'].astype('float32'))
        aML = np.array(dfTemp['aML'].astype('float32'))
        aAP = np.array(dfTemp['aAP'].astype('float32'))
        aVert = np.array(dfTemp['aVert'].astype('float32'))  
        id_t = dfTemp['id_t'].astype('string')
        #Pad list with zeros for now:
        #addThis = [0]*(len(testCols)-1) #one for id_t
        """
        QL_array = np.zeros(numPoints)
        for tt in range(L_window, numPoints):
            t_Loud = t[tt-L_window:tt-Q_window]
            t_Quiet = t[tt-Q_window:tt]
            ML_Loud = aML[tt-L_window:tt-Q_window]
            ML_Quiet = aML[tt-Q_window:tt]
            QLratio_ML = experiment1(t_Quiet, ML_Quiet, t_Loud, ML_Loud, sampleRateDEFOG, "none", dummyVariable)
            QL_array[tt] = QLratio_ML
        """
        for k in range(0, windowHalfLength):
            addThis = [0]*(len(testCols)-1) #one for id_t
            addThis.append(id_t.iloc[k])
            testDEFOG_List.append(addThis)
        for tm in range(windowHalfLength, numPoints-windowHalfLength): #got rid of minus one
            tClip = t[tm-windowHalfLength:tm+windowHalfLength+1]
            aML_Clip = aML[tm-windowHalfLength:tm+windowHalfLength+1]
            aAP_Clip = aAP[tm-windowHalfLength:tm+windowHalfLength+1]
            aVert_Clip = aVert[tm-windowHalfLength:tm+windowHalfLength+1]
            freqML, ampsML = quickFFT_k(tClip, aML_Clip, sampleRateDEFOG, "none", dummyVariable)
            freqAP, ampsAP = quickFFT_k(tClip, aAP_Clip, sampleRateDEFOG, "none", dummyVariable)
            freqVert, ampsVert = quickFFT_k(tClip, aVert_Clip, sampleRateDEFOG, "none", dummyVariable)
            absAmpsAP     = abs(ampsAP)
            #Start Filling Arrays:
            addThis = []
            #addThis.append(np.amax(absAmpsAP,axis=0)) # abs val of maximum amplitude in FFT'd AP data
            addThis.append(np.max(abs(aML_Clip)) - np.min(abs(aML_Clip)))      # abs(max - min accel) for ML accel data
            addThis.append(np.max(abs(aAP_Clip)) - np.min(abs(aAP_Clip)))      # abs(max - min accel) for AP accel data
            addThis.append(np.max(abs(aVert_Clip)) - np.min(abs(aVert_Clip)))  # abs(max - min accel) for Vert accel data
            #addThis.append(getRMS(ampsAP[freqAP<lowfBandMax]))
            #addThis.append(getRMS(ampsVert[freqVert<lowfBandMax]))
            addThis.append(getRMS(ampsML[(freqML>highfBandMin)*(freqML<highfBandMax)]))
            #addThis.append(QL_array[tm]) #NEW
            addThis.append(experiment2(getRMS(ampsAP[freqAP<lowfBandMax]), getRMS(ampsVert[freqVert<lowfBandMax])))
            addThis.append(aML[tm]-aAP[tm])
            addThis.append(aVert[tm]-aAP[tm])
            addThis.append(aML[tm]/aVert[tm])
            addThis.append(aAP[tm]/aVert[tm])
            addThis.append(id_t.iloc[tm])
            #now add to list:
            testDEFOG_List.append(addThis)
        #pad ending with zeros    
        #addThis = [0]*(len(testCols)-1) #one because we need the id_t
        for k in range(0, windowHalfLength):
            addThis = [0]*(len(testCols)-1) #one because we need the id_t
            addThis.append(id_t.iloc[numPoints - windowHalfLength + k])
            testDEFOG_List.append(addThis)




            
print('Creating test data dataframes...')
dfTest_TDCS = pd.DataFrame(testTDCS_List, columns = testCols)
dfTest_DEFOG = pd.DataFrame(testDEFOG_List, columns = testCols)


Reading testing files...
Creating TDCS list...
Creating DEFOG list...
Creating test data dataframes...


In [7]:

#TDCS
X_val_TDCS = dfTest_TDCS.loc[:, features]
print("Making TDCS predictions...")
clf_predictions_TDCS = clf_TDCS.predict_proba(X_val_TDCS)
startHesProb = []
turnProb = []
walkingProb = []

for i in range(0, len(clf_predictions_TDCS[0])):
    #[class][data point][prob of no event = 0 prob of event = 1]
    mx = np.argmax([clf_predictions_TDCS[0][i][1], clf_predictions_TDCS[1][i][1], clf_predictions_TDCS[2][i][1]])
    #print(mx)
    #print(clf_predictions_TDCS[i])
    if mx==0:
        startHesProb.append(clf_predictions_TDCS[0][i][1])
        turnProb.append(0)
        walkingProb.append(0)
    elif mx==1:
        startHesProb.append(0)
        turnProb.append(clf_predictions_TDCS[1][i][1])
        walkingProb.append(0)
    elif mx==2:
        startHesProb.append(0)
        turnProb.append(0)
        walkingProb.append(clf_predictions_TDCS[2][i][1])
    else:
        startHesProb.append(0)
        turnProb.append(0)
        walkingProb.append(0)

df_clf_predictions_TDCS = pd.DataFrame()
df_clf_predictions_TDCS['StartHesitation'] = startHesProb
df_clf_predictions_TDCS['Turn'] = turnProb
df_clf_predictions_TDCS['Walking'] = walkingProb





#DEFOG:
X_val_DEFOG = dfTest_DEFOG.loc[:, features]
print("Making DEFOG predictions...")
clf_predictions_DEFOG = clf_DEFOG.predict_proba(X_val_DEFOG)
startHesProb = []
turnProb = []
walkingProb = []

for i in range(0, len(clf_predictions_DEFOG[0])):
    #[class][data point][prob of no event = 0 prob of event = 1]
    mx = np.argmax([clf_predictions_DEFOG[0][i][1], clf_predictions_DEFOG[1][i][1], clf_predictions_DEFOG[2][i][1]])
    if mx==0:
        startHesProb.append(clf_predictions_DEFOG[0][i][1])
        turnProb.append(0)
        walkingProb.append(0)
    elif mx==1:
        startHesProb.append(0)
        turnProb.append(clf_predictions_DEFOG[1][i][1])
        walkingProb.append(0)
    elif mx==2:
        startHesProb.append(0)
        turnProb.append(0)
        walkingProb.append(clf_predictions_DEFOG[2][i][1])
    else:
        startHesProb.append(0)
        turnProb.append(0)
        walkingProb.append(0)

df_clf_predictions_DEFOG = pd.DataFrame()
df_clf_predictions_DEFOG['StartHesitation'] = startHesProb
df_clf_predictions_DEFOG['Turn'] = turnProb
df_clf_predictions_DEFOG['Walking'] = walkingProb


Making TDCS predictions...
Making DEFOG predictions...


In [8]:
#skip post processing?

In [9]:
#store df of submission results.
finalCols = ['Id', 'StartHesitation', 'Turn', 'Walking']

id_t_TDCS = dfTest_TDCS.loc[:, 'id_t']
id_t_DEFOG = dfTest_DEFOG.loc[:, 'id_t']

#print(dfTest_TDCS.loc[:, 'id_t'])
#print(dfTest_DEFOG.loc[:, 'id_t'])


df_clf_predictions_TDCS['Id'] =  id_t_TDCS
df_clf_predictions_DEFOG['Id'] = id_t_DEFOG

df_TDCS = df_clf_predictions_TDCS[finalCols]
df_DEFOG = df_clf_predictions_DEFOG[finalCols]

output_df = pd.concat([df_TDCS, df_DEFOG], ignore_index=True)



"""
print(len(output_df))
print(output_df.dtypes)
samp = pd.read_csv('/kaggle/input/tlvmc-parkinsons-freezing-gait-prediction/sample_submission.csv')
print(len(samp))
print(samp.dtypes)

print(output_df.head())
print(samp.head())

print(output_df.tail())
print(samp.tail())
"""


print('Writing submission file...')
output_df.to_csv("submission.csv", index=False)


Writing submission file...
