# Data Pre-Processing 

This code will read in data from the experiments carried out and extract the important information which is fed to the LSTM cell. 

The very first step would be to extract the start and end indices of the data for each subject using the following function:


In [4]:
def start_end_extractor (data, prior_seconds): 
    # data is the dataset; prior_seconds is the number of seconds to be considered in the data
    # before the warning is given 

    warningStartIndices= data[data['warning'].notnull()].index # Extracts the warning indices
    startindex = np.zeros(shape=(len(warningStartIndices),1))
    endindex = np.zeros(shape=(len(warningStartIndices),1))
    ei = 0
    si = 0
    
    for i in warningStartIndices:
        k = 0
        
        # The following WHILE loop identifies the startindex for relevant data
        while ((data['time(s)'][i] - data['time(s)'][i + k]) < (prior_seconds + 0.001) ):
            k -= 1
        j = 0
        
        # The following WHILE loop identifies the endindex for relevant data
        while (abs(data['steering angle(deg)'][i] - data['steering angle(deg)'][i+j]) < 2)\
        & (data['padel1'][i+j] <= 32511):
            j += 1
        
        startindex[si] = i + k 
        si += 1
        endindex[ei] = i + j
        ei += 1
        
        startindex = startindex.astype(int)
        endindex = endindex.astype(int)
    dataSplitVector = endindex - startindex
    
    #print(startindex, endindex, dataSplitVector,warningStartIndices)   
    return startindex, endindex, dataSplitVector,warningStartIndices
    
    

Once we have the relevant start and end points for each event we can extract other information such as the collision/no-collision output for the events as done in the function below:


In [5]:
def output_extractor (data, startindex, endindex):
    # This function takes as its input the unlceaned raw data and the event start and end 
    # indices. It outputs an array which tells whether colision took place for every event
    # for a given subject
    
    output = np.zeros(shape = (endindex.size))
    
    for i in range(endindex.size):
        if i != (endindex.size -1): # The last index will not have a start index for the next event 
            if data['crash'].values[endindex[i]][0] != data['crash'].values[startindex[i+1]]:
            # Comparing the values in the crash column for end index of event i to start index of 
            # event i+1. If these values are different then it suggests there was a collision.
            # This means the output will be 1 = collision
                output[i] = 1
            else:
                output[i] = 0
        else:
            if data['crash'].values[endindex[i]][0] != data['crash'].values[-1]:
                output[i] = 1
            else:
                output[i] = 0
                
    output = pd.DataFrame(output)
            
    return output




The following set of functions will now start modeling the lead vehicle movement in terms of lead vehicle velocity and position:


In [6]:
def event_lenghts (event_sequence, incident_distance, max_distance):
    # This function will take as its input the event_sequence which is taken from the 'event'
    # column of the 'SUMMARY' sheet. The incident_distance and max_distance arguments are the 
    # parameters defined in the experiment setup. The former is the distance at which the 
    # incident takes place and the latter is the max distance over which an event occurs. This
    # function returns the distance from the very start at which every scenario begins and ends.
    

    incident_startpt = np.zeros(shape= (16,1))
    incident_endpt = np.zeros(shape= (16,1))
    
    # dist_counter keeps track of the distance covered
    dist_counter = 0   
    
    for i in range (0,16): 
        # incident_startpt is the distance where the lead vehicle starts decelerating
        incident_startpt[i] = dist_counter + incident_distance[int(event_sequence[i])-1]
        # incident_endpt is the distance where the scenario ends
        incident_endpt[i] = dist_counter + max_distance[int(event_sequence[i])-1]
        dist_counter += max_distance[int(event_sequence[i])-1]
    
    return incident_startpt, incident_endpt

In [7]:
def lead_vehicle_movement(data,event_sequence,scen_data,incident_startpt,incident_endpt,warningStartIndices):
    # This function takes as its input the raw data which has been modified to include the 
    # columns for lead vehicle velocity and distance. It also takes the sequence of events 
    # from the summary sheet, the summary data along with incident_startpt and incident_endpt 
    # as its inputs. 
    
    # dp_counter will keep incrimenting to go to the next data point - dp stands 
    # for data point here
    dp_counter = 0

    # We will check every scenario and match it to the events 1 through 8
    for i in range (0,16):
        # The following condition allows events only from 1,2,3,4,7 and 8 as they have similar
        # lead vehicle movement
        if (int((event_sequence[i]) != 5) & (int(event_sequence[i]) != 6)):
            
            # v_finder keeps track of when the deceleration of lead vehicle starts
            v_finder = 0
            
            # This loop will acertain that all the data points of this event are  
            while (data['dis(feet)'][dp_counter] <= incident_endpt[i]):
                #print(dp_counter)
                # The lead vehicle mimics the movement of the subject vehicle and is always 
                # 100 ft ahead of it before the latter reaches the designated distance for
                # the deceleration to begin. This is already decided in the experiment.
                if data['dis(feet)'][dp_counter] < (incident_startpt[i]-100):
                    data['lvv'][dp_counter] = data['long v(m/s)'][dp_counter]
                    data['lvd'][dp_counter] = data['dis(feet)'][dp_counter] + 100
                    
                # Once the event starts the lead vehicle decelerates to a stop in 2 seconds    
                elif data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                    
                    if v_finder == 0: 
                    # This condition ensures that the following calculations take place only
                    # once per event
                        v_finder += 1
                        init_v = data['long v(m/s)'][dp_counter]       # decc start velocity
                        init_t = data['time(s)'][dp_counter]           # decc start time
                        init_d = data['dis(feet)'][dp_counter] + 100   # decc start position
                        t = 2                                          # time to deccelerate 
                        a = -init_v/t                                  # decceleration

                    # Time elapsed between decceleration start and the current data point
                    T = data['time(s)'][dp_counter]-init_t               
                         
                    # Velocity will keep reducing until it is 0 after which it will stay 0 
                    data['lvv'][dp_counter] = max(0, init_v + a*T)
                    
                    # If the velocity is zero then the distance will stop changing
                    if data['lvv'][dp_counter] == 0:
                        data['lvd'][dp_counter] = data['lvd'][dp_counter-1]
                    else:
                        data['lvd'][dp_counter] =  init_d + init_v*T + 0.5*a*T*T

                dp_counter += 1
                if dp_counter == len(data):
                    break

        # The following condition will allow only event 6 with warning lead time of 2.5s
        elif ((int(event_sequence[i]) == 6) & (scen_data['lead time'][i] == 2.5)) :
            
            # v_finder will keep track of the decceleration start
            v_finder = 0
            
            while data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                
                # The lead vehicle will be 50 ft ahead of the subject before the incident 
                # starts and its velocity will be the same as that of the subject vehicle
                if data['dis(feet)'][dp_counter] < (incident_startpt[i] - 50):
                    data['lvv'][dp_counter] = data['long v(m/s)'][dp_counter]
                    data['lvd'][dp_counter] = data['dis(feet)'][dp_counter] + 50
                
                # When the lead vehicle starts to deccelerate it will do so in 1s.
                elif data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                    if v_finder == 0:
                        v_finder += 1
                        init_v = data['long v(m/s)'][dp_counter]     # decc start velocity
                        init_t = data['time(s)'][dp_counter]         # decc start time
                        init_d = data['dis(feet)'][dp_counter] + 50  # decc start position
                        t = 1                                         # time for decceleration
                        a = -init_v/t                                 # decceleration

                    # Time elapsed between decceleration start and the current data point
                    T = data['time(s)'][dp_counter]-init_t               
                         
                    # Velocity will keep reducing until it is 0 after which it will stay 0 
                    data['lvv'][dp_counter] = max(0, init_v + a*T)
                    
                    # If the velocity is zero then the distance will stop changing
                    if data['lvv'][dp_counter] == 0:
                        data['lvd'][dp_counter] = data['lvd'][dp_counter-1]
                    else:
                        data['lvd'][dp_counter] =  init_d + init_v*T + 0.5*a*T*T

                dp_counter += 1
                if dp_counter == len(data):
                    break
                
        # The following condition will allow only event 6 with warning lead time of 4.5s
        elif ((int(event_sequence[i]) == 6) & (scen_data['lead time'][i] == 4.5)) :
            
            v_finder = 0
            while data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                
                # The lead vehicle will be 90 ft ahead of the subject before the incident 
                # starts and its velocity will be the same as that of the subject vehicle
                if data['dis(feet)'][dp_counter] < (incident_startpt[i] - 90):
                    data['lvv'][dp_counter] = data['long v(m/s)'][dp_counter]
                    data['lvd'][dp_counter] = data['dis(feet)'][dp_counter] + 90
                
                # When the lead vehicle starts to deccelerate it will do so in 1s.
                elif data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                    if v_finder == 0:
                        v_finder += 1
                        init_v = data['long v(m/s)'][dp_counter]     # decc start velocity
                        init_t = data['time(s)'][dp_counter]         # decc start time
                        init_d = data['dis(feet)'][dp_counter] + 90  # decc start position 
                        t = 1                                        # time for decceleration 
                        a = -init_v/t                                # decceleration

                    # Time elapsed between decceleration start and the current data point
                    T = data['time(s)'][dp_counter]-init_t               
                         
                    # Velocity will keep reducing until it is 0 after which it will stay 0 
                    data['lvv'][dp_counter] = max(0, init_v + a*T)
                    
                    # If the velocity is zero then the distance will stop changing
                    if data['lvv'][dp_counter] == 0:
                        data['lvd'][dp_counter] = data['lvd'][dp_counter-1]
                    else:
                        data['lvd'][dp_counter] =  init_d + init_v*T + 0.5*a*T*T

                dp_counter += 1
                if dp_counter == len(data):
                    break

        # The following condition will allow event 5 with warning lead time of 2.5s       
        elif ((int(event_sequence[i]) == 5) & (scen_data['lead time'][i] == 2.5)) :
            
            WT = data['time(s)'][warningStartIndices[i]]            # Waring time instant
            u = data['long v(m/s)'][warningStartIndices[i]] + 66    # Relative velocity of lead vehicle
            import math
            v = -((134.36)**2 + 2*(-66)*179)
            v = math.sqrt(v)                                        # Relative velocity when distance is covered 
            t = (u-v)/66                                            # Time to deccelerate to that velocity
            T = 2.5 - t                                             # Time before WT that decc starts
            decc_start_time = WT + T                                # Time when decc starts
            
            while data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                #print(dp_counter, decc_start_time, WT, u, v , t , T)
                # Before decc_start_time the speed of the oncoming vehicle will be constant at
                # 66 ft/s and its distance will be calculated accordingly
                if data['time(s)'][dp_counter] <= decc_start_time:
                    data['lvv'][dp_counter] = 66
                    data['lvd'][dp_counter] = (decc_start_time - data['time(s)'][dp_counter])\
                    *66 + 179 + data['dis(feet)'][dp_counter]
                    
                elif data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                    data['lvv'][dp_counter] = max(0, (data['lvv'][dp_counter-1] + (data['time(s)'][dp_counter]\
                                                                             - data['time(s)'][dp_counter- 1])*(-66)))
                    if data['lvv'][dp_counter] == 0:
                        data['lvd'][dp_counter] = data['lvd'][dp_counter-1]
                    else:
                        time = (data['time(s)'][dp_counter] - data['time(s)'][dp_counter-1])
                        init_v = data['lvv'][dp_counter - 1]
                        data['lvd'][dp_counter] = data['lvd'][dp_counter-1] - init_v*time - (0.5*(-66)*(time**2))
                
                dp_counter += 1
                if dp_counter == len(data):
                    break
                
                
        # The following condition will allow event 5 with warning lead time of 4.5s       
        elif ((int(event_sequence[i]) == 5) & (scen_data['lead time'][i] == 4.5)) :
            
            WT = data['time(s)'][warningStartIndices[i]]            # Waring time instant
            u = data['long v(m/s)'][warningStartIndices[i]] + 66    # Relative velocity of lead vehicle
            import math
            v = -((134.36)**2 + 2*(-66)*179)
            v = math.sqrt(v)                                        # Relative velocity when distance is covered 
            t = (u-v)/66                                            # Time to deccelerate to that velocity
            T = 4.5 - t                                             # Time before WT that decc starts
            decc_start_time = WT + T                                # Time when decc starts
            
            while data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                
                # Before decc_start_time the speed of the oncoming vehicle will be constant at
                # 66 ft/s and its distance will be calculated accordingly
                if data['time(s)'][dp_counter] <= decc_start_time:
                    data['lvv'][dp_counter] = 66
                    data['lvd'][dp_counter] = (decc_start_time - data['time(s)'][dp_counter])\
                    *66 + 311 + data['dis(feet)'][dp_counter]
                    
                elif data['dis(feet)'][dp_counter] <= incident_endpt[i]:
                    data['lvv'][dp_counter] = max(0, (data['lvv'][dp_counter-1] + (data['time(s)'][dp_counter]\
                                                                             - data['time(s)'][dp_counter- 1])*(-66)))
                    if data['lvv'][dp_counter] == 0:
                        data['lvd'][dp_counter] = data['lvd'][dp_counter-1]
                    else:
                        time = (data['time(s)'][dp_counter] - data['time(s)'][dp_counter-1])
                        init_v = data['lvv'][dp_counter - 1]
                        data['lvd'][dp_counter] = data['lvd'][dp_counter-1] - init_v*time - (0.5*(-66)*(time**2))
                
                dp_counter += 1
                if dp_counter == len(data):
                    break
    
    return data

Now that we have a data frame which also has the lead vehicle movement in it we will work on extracting data relevent to us and store it in a new data frame.

The following function modifies the dataframe to include the columns lvv and lvd and initialise them to 0

In [8]:
def df_modifier_1 (data):
    data['lvv'] = 0
    data['lvd'] = 0
    
    return data

The follwoing function will modify the dataframe to rename the columns and and drop the unnecessary ones

In [9]:
def df_modifier_2 (data, startindex, endindex, dataSplitVector):
    
    # Write the code here
    data2 = np.zeros(shape=(sum(dataSplitVector).astype(int)[0],data.shape[1]))
    j = 0
    for i in range (endindex.size):
        for k in range (startindex[i][0], endindex[i][0]):
            data2[j] = data.values[k]
            j += 1
        print("The no. of time stamps for event {} are {}".format(i+1,(endindex[i][0] - startindex[i][0])))
    featureList = data.columns[0:16]
    #print(featureList)
    data3 = pd.DataFrame(data = data2)
    data3.columns = featureList
    #data3
    
    return data3

The main function is as follows:


In [10]:
def warning_scenario_extractor(scen_data, i):
    feature_list = ["message type (0:command type; 1: notification type)", "lead time"] 
    data = scen_data[feature_list]
    data = data.dropna()
    data["scenario"] = 0
    data["warning reliability"] = 0
    if i < 16 :
        for m in range(16):
            data["warning reliability"][(i*16)+m] = 1 # All events with 90% warning relability are coded as 1
    for l in range(len(scen_data["event"])):
        if (scen_data["event"][l] == 5) :
            data["scenario"] = 1 # Scenarios with straight paths are coded as 1
        elif (scen_data["event"][l] == 6):
            data["scenario"] = 2
            
    return data

In [11]:
def function1(data, scen_data, i):

    # Step 2: Dropping unnecessary columns
    data = data.drop(data.columns[14:],axis = 1)

    # Step 3: Extracting start and end indices of the data which are relevnt to our model training
    startindex, endindex, dataSplitVector, warningStartIndices = start_end_extractor(data, prior_seconds = 1)

    # Step 4: Extracting the output of the events 
    output = output_extractor(data,startindex,endindex)

    # Step 5: Modifying the initial dataset to add lead vehicle movement columns
    data = df_modifier_1(data)

    # Step 6: Extract the event sequence data from the Summary sheet  
    event_sequence = scen_data['event'].values

    # Step 7: Extract the incidence start point and end point to model the movement
    incident_distance = [2880,3880,4880,3880,0,1100,2400,880]
    max_distance = [4000,5000,6000,5000,3000,2000,3000,2000]
    incident_startpt, incident_endpt = event_lenghts(event_sequence, incident_distance, max_distance)
    incident_endpt[-1] = data['dis(feet)'][len(data['dis(feet)'])-1]

    # Step 8: Model the lead vehicle movement and update the 'lvv' and 'lvd' columns
    data2 = lead_vehicle_movement(data,event_sequence,scen_data,incident_startpt,incident_endpt, warningStartIndices)

    # Step 9: Extract the relevant portion of the data 
    data_rel = df_modifier_2(data2, startindex, endindex, dataSplitVector)
    
    # Step 10: Extract the "message type (0:command type; 1: notification type)", "lead time"
    #          "warning reliability" and "scenario data"
    warning_scenario = warning_scenario_extractor(scen_data, i)
    

    return data_rel, dataSplitVector, output, warning_scenario

In [12]:
import pandas as pd
import numpy as np

filepaths = pd.read_excel('/Users/RSM/Desktop/cvs_data/filepaths.xlsx', header = None, sheet_name = 'Sheet1')

input_data = pd.DataFrame()
output_data = pd.DataFrame()
splitVector = pd.DataFrame()
warning_scenario_data  = pd.DataFrame() 

for i in range (len(filepaths)):
    filepath = filepaths[0][i]
    data = pd.read_excel(filepath, sheet_name = 'Data')
    scen_data = pd.read_excel(filepath, sheet_name = 'SUMMARY')
    cleaned_data, dataSplitVector, output, warning_scenario = function1(data, scen_data, i)
    dataSplitVector = pd.DataFrame(dataSplitVector)
    input_data = input_data.append(cleaned_data, ignore_index = True)
    output_data = output_data.append(output, ignore_index = True)
    splitVector = splitVector.append(dataSplitVector, ignore_index = True)
    warning_scenario_data = warning_scenario_data.append(warning_scenario, ignore_index = True)
    print("Data for Subject {} has been cleaned.".format(i+1))


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a

The no. of time stamps for event 1 are 19
The no. of time stamps for event 2 are 18
The no. of time stamps for event 3 are 14
The no. of time stamps for event 4 are 19
The no. of time stamps for event 5 are 31
The no. of time stamps for event 6 are 31
The no. of time stamps for event 7 are 16
The no. of time stamps for event 8 are 50
The no. of time stamps for event 9 are 19
The no. of time stamps for event 10 are 17
The no. of time stamps for event 11 are 16
The no. of time stamps for event 12 are 17
The no. of time stamps for event 13 are 15
The no. of time stamps for event 14 are 16
The no. of time stamps for event 15 are 17
The no. of time stamps for event 16 are 15


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Data for Subject 1 has been cleaned.


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a

The no. of time stamps for event 1 are 22
The no. of time stamps for event 2 are 18
The no. of time stamps for event 3 are 16
The no. of time stamps for event 4 are 19
The no. of time stamps for event 5 are 18
The no. of time stamps for event 6 are 21
The no. of time stamps for event 7 are 17
The no. of time stamps for event 8 are 16
The no. of time stamps for event 9 are 16
The no. of time stamps for event 10 are 16
The no. of time stamps for event 11 are 11
The no. of time stamps for event 12 are 13
The no. of time stamps for event 13 are 16
The no. of time stamps for event 14 are 15
The no. of time stamps for event 15 are 16
The no. of time stamps for event 16 are 16
Data for Subject 2 has been cleaned.
The no. of time stamps for event 1 are 17
The no. of time stamps for event 2 are 17
The no. of time stamps for event 3 are 14
The no. of time stamps for event 4 are 17
The no. of time stamps for event 5 are 15
The no. of time stamps for event 6 are 16
The no. of time stamps for event

The no. of time stamps for event 1 are 22
The no. of time stamps for event 2 are 14
The no. of time stamps for event 3 are 16
The no. of time stamps for event 4 are 20
The no. of time stamps for event 5 are 15
The no. of time stamps for event 6 are 15
The no. of time stamps for event 7 are 18
The no. of time stamps for event 8 are 16
The no. of time stamps for event 9 are 17
The no. of time stamps for event 10 are 15
The no. of time stamps for event 11 are 15
The no. of time stamps for event 12 are 15
The no. of time stamps for event 13 are 16
The no. of time stamps for event 14 are 25
The no. of time stamps for event 15 are 19
The no. of time stamps for event 16 are 23
Data for Subject 14 has been cleaned.
The no. of time stamps for event 1 are 21
The no. of time stamps for event 2 are 18
The no. of time stamps for event 3 are 18
The no. of time stamps for event 4 are 20
The no. of time stamps for event 5 are 65
The no. of time stamps for event 6 are 18
The no. of time stamps for even

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


The no. of time stamps for event 1 are 23
The no. of time stamps for event 2 are 19
The no. of time stamps for event 3 are 17
The no. of time stamps for event 4 are 10
The no. of time stamps for event 5 are 16
The no. of time stamps for event 6 are 10
The no. of time stamps for event 7 are 31
The no. of time stamps for event 8 are 32
The no. of time stamps for event 9 are 16
The no. of time stamps for event 10 are 15
The no. of time stamps for event 11 are 14
The no. of time stamps for event 12 are 9
The no. of time stamps for event 13 are 14
The no. of time stamps for event 14 are 13
The no. of time stamps for event 15 are 12
The no. of time stamps for event 16 are 14
Data for Subject 17 has been cleaned.
The no. of time stamps for event 1 are 34
The no. of time stamps for event 2 are 22
The no. of time stamps for event 3 are 25
The no. of time stamps for event 4 are 22
The no. of time stamps for event 5 are 19
The no. of time stamps for event 6 are 27
The no. of time stamps for event

The no. of time stamps for event 9 are 16
The no. of time stamps for event 10 are 18
The no. of time stamps for event 11 are 17
The no. of time stamps for event 12 are 14
The no. of time stamps for event 13 are 18
The no. of time stamps for event 14 are 19
The no. of time stamps for event 15 are 25
The no. of time stamps for event 16 are 17
Data for Subject 28 has been cleaned.
The no. of time stamps for event 1 are 52
The no. of time stamps for event 2 are 40
The no. of time stamps for event 3 are 22
The no. of time stamps for event 4 are 21
The no. of time stamps for event 5 are 17
The no. of time stamps for event 6 are 26
The no. of time stamps for event 7 are 22
The no. of time stamps for event 8 are 25
The no. of time stamps for event 9 are 18
The no. of time stamps for event 10 are 25
The no. of time stamps for event 11 are 18
The no. of time stamps for event 12 are 10
The no. of time stamps for event 13 are 21
The no. of time stamps for event 14 are 16
The no. of time stamps for

In [13]:
input_data = input_data.drop(['crash', 'trans_gear', 'trans gear', 'warning'], axis = 1)

In [12]:
#output_copy = output_data[:]
#output_copy = output_copy.drop(splitVector[splitVector[0] == 0].index)

In [13]:
output_data.size

512

In [14]:
#output_copy.size

In [14]:
splitVector[splitVector[0]==0].index

Int64Index([480], dtype='int64')

In [None]:
416/16

In [14]:
input_data.to_excel('cleaned_data_version_01.xlsx')
output_data.to_excel('output_data_version_01.xlsx')
splitVector.to_excel('data_split_version_01.xlsx')
warning_scenario_data.to_excel('scenario_data_version_01.xlsx')

# Modeling the LSTM

This part of the code will do some further preprocessing of the data to make it ready for the LSTM. Follwoing which, an LSTM model will be built, trained and tested. 

In [3]:
# done
import torch 
import torch.nn as nn
from torch.autograd import Variable

# Defining the LSTM model as a class. The model we will train later will be the instance of this class.

class collisionClassifier(nn.Module):
    def __init__(self, no_of_features, features_to_hidden, features_in_hidden, max_sequence_length, no_of_layers = 1):
        super().__init__()
        self.features_in_hidden = features_in_hidden
        self.features_to_hidden = features_to_hidden
        self.no_of_layers = no_of_layers
        self.max_sequence_length = max_sequence_length
        self.linear_layer_1 = nn.Linear(no_of_features, features_to_hidden)
        self.lstm_cell = nn.LSTM(features_to_hidden, features_in_hidden, no_of_layers)
        self.dropout_layer = nn.Dropout(0.2)
        self.relu_layer = nn.ReLU()
        features_in_hidden += 4
        self.linear_layer_2 = nn.Linear(features_in_hidden,2)
        self.smc = nn.Softmax()
        
    def __init__hidden(self, max_sequence_length):
        hidden = torch.rand(1, max_sequence_length, self.features_in_hidden)
        return Variable(hidden, requires_grad = True)
        
    def __init__cellstate(self, max_sequence_length):
        cellstate = torch.rand(1,max_sequence_length , self.features_in_hidden)
        return Variable(cellstate, requires_grad = True)
    
    def forward_pass(self, input_data,sequence_length, scenario_vect):
        output = self.linear_layer_1(input_data)
        output, _ = self.lstm_cell(output)
        output = self.relu_layer(output[0,int(sequence_length-1),:])
        output = torch.cat((output,scenario_vect),0) # keep an eye out for the dimension which is now 2 assuming its a 2D tensor
        output = self.linear_layer_2(output)
        output = self.smc(output)     #hidden[:,(sequence_length-1),:])
        return output

In [4]:
# done
def data_multiplier_and_shaper(driving_data, output_data, data_split_vector, scenario_vector, multiplier):
    driving_data_large = driving_data
    output_data_large = output_data
    data_split_vector_large = data_split_vector
    scenario_vector_large = scenario_vector
    for i in range (0,multiplier-1):
        driving_data_large = np.append(driving_data_large, driving_data)
        output_data_large = np.append(output_data_large,output_data)
        data_split_vector_large = np.append(data_split_vector_large, data_split_vector)
        scenario_vector_large = np.append(scenario_vector_large, scenario_vector)
    x = int(driving_data_large.shape[0]/13)
    
    driving_data_large = driving_data_large.reshape(x,13)
    output_data_large = output_data_large.reshape(output_data_large.shape[0],1)
    data_split_vector_large = data_split_vector_large.reshape(data_split_vector_large.shape[0],1)
    scenario_vector_large = scenario_vector_large.reshape(int(scenario_vector_large.shape[0]/4),4)
    
    print(driving_data_large.shape, data_split_vector_large.shape,scenario_vector_large.shape, output_data_large.shape)
    a = len(data_split_vector)
    b = len(data_split_vector_large)
    for i in range(a,b):
        driving_data_large[a,:] = driving_data_large[a,:] + \
        np.random.normal(loc = 0, scale = 1, size = (1,driving_data.shape[1]))

    return data_scaler(driving_data_large, data_split_vector_large) , output_data_large, data_split_vector_large, scenario_vector_large

In [5]:
# done
def data_scaler(driving_data, data_split_vector):
    from sklearn.preprocessing import MinMaxScaler

    no_of_data_points = driving_data.shape[0]
    no_of_columns = driving_data.shape[1]

    scaler = MinMaxScaler(feature_range = (0,1))
    scaledData = scaler.fit_transform(driving_data)
    scaledData = scaledData.reshape([1, no_of_data_points, no_of_columns])
    
    return paddingMethod(scaledData, data_split_vector)


In [6]:
# done
def paddingMethod(scenario_data, data_split_vector):
    scenario_tensor = torch.zeros((len(data_split_vector),max(max(data_split_vector)),scenario_data.shape[2])).float()

    j = 0    
    for idx, scen_length in enumerate(data_split_vector):
        for i in range (max(scen_length)):
            scenario_tensor[idx,i,:] = torch.FloatTensor(scenario_data[0,j+i,:])
        j += max(scen_length)
        
    print("The input data shape is {}".format(scenario_tensor.shape))
    
    return scenario_tensor

In [7]:
# done
def avg_error(linear_output, target):
    loss = nn.CrossEntropyLoss()
    error = loss(linear_output,target)
    return error


In [8]:
# done
def test_train_split(k_folds, input_data, output_data, data_split_vector, scenario_vector, k): # k is the fold number
    #k_folds = 8
    independent_events = input_data.size()[0]
    data_per_fold = int(independent_events/k_folds)
    train_events = (k_folds - 1)*data_per_fold    #int(independent_events*0.8)
    input_test_data = torch.zeros((data_per_fold,input_data.size()[1],input_data.size()[2]))
    input_train_data = torch.zeros((data_per_fold*(k_folds-1),input_data.size()[1],input_data.size()[2]))
    input_train_target = torch.zeros((data_per_fold*(k_folds - 1)))
    input_train_target = input_train_target.type(torch.LongTensor)
    train_datasplit = np.zeros([1,data_per_fold*(k_folds - 1)])
    test_datasplit = np.zeros([1, data_per_fold])
    train_scenario = torch.zeros((data_per_fold*(k_folds - 1),4))
    train_scenario = train_scenario.type(torch.FloatTensor)
    #print(output_data)
    counter = 0 
    for l in range (k_folds):
        if l == k:
            input_test_data = input_data[l*data_per_fold:(l+1)*data_per_fold,:,:]
            test_target = output_data[l*data_per_fold:(l+1)*data_per_fold,0]
            test_datasplit = data_split_vector[l*data_per_fold:(l+1)*data_per_fold,0]
            test_scenario = scenario_vector[l*data_per_fold:(l+1)*data_per_fold,:]
        else:
            input_train_data[counter*data_per_fold:(counter+1)*data_per_fold,:,:] = input_data[l*data_per_fold:\
                                                                                               (l+1)*data_per_fold,:,:]
            #print(output_data[l*data_per_fold:(l+1)*data_per_fold] ,output_data.shape)
            input_train_target[counter*data_per_fold:(counter+1)*data_per_fold]= output_data[l*data_per_fold:(l+1)*data_per_fold,0] 
            train_datasplit[0,counter*data_per_fold:(counter+1)*data_per_fold] = data_split_vector[l*data_per_fold:(l+1)*data_per_fold,0]
            train_scenario[counter*data_per_fold:(counter+1)*data_per_fold,:] = scenario_vector[l*data_per_fold:(l+1)*data_per_fold,:]
            counter += 1
            
    return input_train_data, input_train_target, input_test_data, test_target, train_events, train_datasplit, test_datasplit, train_scenario, test_scenario

In [9]:
# done
def model_init(input_train_data, input_train_target, no_of_batches, data_split_vector, train_datasplit,train_scenario, epoch):
    
    collision_predictor = collisionClassifier(no_of_features = input_train_data.size()[2],features_to_hidden = 20, \
                                          features_in_hidden = 5 , max_sequence_length = max(data_split_vector))
    
    batch_size = int(input_train_data.size()[0]/no_of_batches)
    optimizer = torch.optim.Adam(collision_predictor.parameters(), lr=0.005)
    
    return model_trainer(input_train_data, input_train_target, no_of_batches, data_split_vector,\
                         train_datasplit, train_scenario, batch_size, optimizer, collision_predictor, epoch)

In [10]:
# done
def model_trainer(input_train_data, input_train_target, no_of_batches, data_split_vector,\
                         train_datasplit, train_scenario, batch_size, optimizer, collision_predictor, epoch):
    
    for i in range(epoch):
        for j in range(no_of_batches):
            pred_tensor = torch.zeros(batch_size,2)
            target = torch.LongTensor(input_train_target)
            target_tensor = target[j*batch_size:(j+1)*batch_size] 
            for k in range (batch_size):
                event_no = (batch_size*j)+k
                prediction = collision_predictor.forward_pass(input_train_data[event_no,:,:].unsqueeze(0),\
                                                                      train_datasplit[0,event_no], train_scenario[event_no,:])
                pred_tensor[k] = prediction
        
            #print(pred_tensor.shape, target_tensor.shape)
            error = avg_error(pred_tensor,target_tensor)
            collision_predictor.zero_grad()
            error.backward()
            optimizer.step()
        
        print("The error for epoch {} was: {}".format((i+1),error))
            
    return collision_predictor

In [11]:
# done
def test(input_test_data, test_target, collision_predictor, test_datasplit, test_scenario):
    
    test_target = np.array(test_target)
    test_predictions = np.zeros(test_target.shape)
    event_no = 0
    for i in range(0,input_test_data.shape[0]):
        test_output = collision_predictor.forward_pass(input_test_data[i,:,:].unsqueeze(0), \
                                                                test_datasplit[event_no], test_scenario[event_no,:])
        if test_output[0] < test_output[1]:
            test_predictions[i] = 1
        else :
            test_predictions[i] = 0
        event_no += 1
    tp = 0
    fp = 0
    fn = 0
    tn = 0
    for i in range(0, input_test_data.shape[0]):
        if ((test_predictions[i] == test_target[i]) & (test_predictions[i] == 1)):
            tp += 1
        if ((test_predictions[i] == test_target[i]) & (test_predictions[i] == 0)):
            tn += 1
        elif((test_predictions[i] != test_target[i]) & (test_predictions[i] == 1)):
            fp += 1
        elif((test_predictions[i] != test_target[i]) & (test_predictions[i] == 0)):
            fn += 1
    return tp, fp, fn, tn, test_predictions

In [12]:
# done
def main_function(driving_data, output_data,data_split_vector, scenario_vector,multiplier = 0, k_folds = 8,no_of_batches = 4, epoch = 20):

    input_data, output_data, data_split_vector, scenario_vector = data_multiplier_and_shaper(driving_data, output_data,data_split_vector, scenario_vector,multiplier)
    print("There are {} inputs to the LSTM. How many folds do you want for your cross-validation, default being 8?".format(input_data.shape[0]))
    k_folds = input()
    k_folds = int(k_folds)
    print("The number of folds: {}".format(k_folds))
    print("What is the number of batches you desire the training data be split into? Default is set at 4")
    no_of_batches = input()
    no_of_batches = int(no_of_batches)
    print("The number of batches: {}".format(no_of_batches))
    print("What is the number of epochs for which the training should be carried out? Default is set at 20")
    epoch = input()
    epoch = int(epoch)
    print("The number of epoch: {}".format(epoch))
    output_data = torch.LongTensor(output_data)
    scenario_vector = torch.FloatTensor(scenario_vector)
    #print(scenario_vector, scenario_vector.shape)
    true_positive  = 0
    false_positive = 0
    false_negative = 0
    true_negative = 0
    
    
    for k in range(k_folds):
        input_train_data, input_train_target, input_test_data, test_target,train_events, \
        train_datasplit, test_datasplit, train_scenario, test_scenario = test_train_split(k_folds, input_data, output_data, data_split_vector, scenario_vector, k)
        collision_predictor = model_init(input_train_data, input_train_target, no_of_batches, data_split_vector, train_datasplit, train_scenario, epoch)
        tp, fp, fn, tn, test_predictions = test(input_test_data, test_target, collision_predictor, test_datasplit, test_scenario)
        true_positive  += tp
        false_positive += fp
        false_negative += fn
        true_negative  += tn

    #confusion_matrix = pd.DataFrame(np.array([true_positive,true_negative],[false_positive, false_negative]),columns = ['Positive (Collision)','Negative (No Collision)'], index = ['true','false'])
    
    return true_positive,true_negative,false_positive, false_negative, test_predictions
                     

In [13]:
# done
import numpy as np
import pandas as pd

# importing the time series input data

filepath_main_data = "/Users/RSM/cleaned_data_version_01.xlsx"
driving_data = pd.read_excel(filepath_main_data)
driving_data = driving_data.values

# importing the collision outcome for the corresponding input data

filepath_output = "/Users/RSM/output_data_version_01.xlsx"
output_data = pd.read_excel(filepath_output)
output_data = output_data.values

# importing the data split vector which stores the length of each time series data

filepath_data_split = "/Users/RSM/data_split_version_01.xlsx"
data_split_vector = pd.read_excel(filepath_data_split)
data_split_vector = data_split_vector.values

# importing the scenario data 

filepath_scenario_data = "/Users/RSM/scenario_data_version_01.xlsx"
scenario_vector = pd.read_excel(filepath_scenario_data)
scenario_vector = scenario_vector.values

print("The shape of the input driving data is:{}\
The shape of the output data is: {}\
The shape of the data split vector is: {}\
The shape of the scenario vector is: {}".format(driving_data.shape, output_data.shape, data_split_vector.shape, scenario_vector.shape))


The shape of the input driving data is:(10300, 13)The shape of the output data is: (512, 1)The shape of the data split vector is: (512, 1)The shape of the scenario vector is: (512, 4)


In [None]:
# done
true_positive,true_negative,false_positive, false_negative, test_predictions = main_function(driving_data, output_data,data_split_vector, scenario_vector, multiplier = 100, k_folds = 8,no_of_batches = 4, epoch = 20)


(1030000, 13) (51200, 1) (51200, 4) (51200, 1)
The input data shape is torch.Size([51200, 67, 13])
There are 51200 inputs to the LSTM. How many folds do you want for your cross-validation, default being 8?
8
The number of folds: 8
What is the number of batches you desire the training data be split into? Default is set at 4
448
The number of batches: 448
What is the number of epochs for which the training should be carried out? Default is set at 20
200
The number of epoch: 200




The error for epoch 1 was: 0.5184559226036072
The error for epoch 2 was: 0.4842184782028198
The error for epoch 3 was: 0.47389012575149536
The error for epoch 4 was: 0.4672342538833618
The error for epoch 5 was: 0.46413493156433105
The error for epoch 6 was: 0.46269145607948303
The error for epoch 7 was: 0.4692426323890686
The error for epoch 8 was: 0.4602414667606354
The error for epoch 9 was: 0.45742371678352356
The error for epoch 10 was: 0.45778006315231323
The error for epoch 11 was: 0.4578185975551605
The error for epoch 12 was: 0.4646086096763611
The error for epoch 13 was: 0.4631819427013397
The error for epoch 14 was: 0.46441298723220825
The error for epoch 15 was: 0.465087890625
The error for epoch 16 was: 0.46917784214019775
The error for epoch 17 was: 0.46438854932785034
The error for epoch 18 was: 0.46529877185821533
The error for epoch 19 was: 0.46726658940315247
The error for epoch 20 was: 0.4703204333782196
The error for epoch 21 was: 0.46053218841552734
The error for e

The error for epoch 174 was: 0.4833434820175171
The error for epoch 175 was: 0.4734594225883484
The error for epoch 176 was: 0.45886093378067017
The error for epoch 177 was: 0.46094998717308044
The error for epoch 178 was: 0.4661977291107178
The error for epoch 179 was: 0.46808701753616333
The error for epoch 180 was: 0.5358789563179016
The error for epoch 181 was: 0.48550277948379517
The error for epoch 182 was: 0.4775751233100891
The error for epoch 183 was: 0.4640921652317047
The error for epoch 184 was: 0.44818341732025146
The error for epoch 185 was: 0.4816450774669647
The error for epoch 186 was: 0.4469529986381531
The error for epoch 187 was: 0.4630793631076813
The error for epoch 188 was: 0.46140769124031067
The error for epoch 189 was: 0.46352654695510864
The error for epoch 190 was: 0.44784966111183167
The error for epoch 191 was: 0.4550076723098755
The error for epoch 192 was: 0.445015549659729
The error for epoch 193 was: 0.4448782205581665
The error for epoch 194 was: 0.44

The error for epoch 147 was: 0.5132619738578796
The error for epoch 148 was: 0.5132619738578796
The error for epoch 149 was: 0.5132619738578796
The error for epoch 150 was: 0.5132619738578796
The error for epoch 151 was: 0.5132619738578796
The error for epoch 152 was: 0.5132619738578796
The error for epoch 153 was: 0.5132619738578796
The error for epoch 154 was: 0.5132619738578796
The error for epoch 155 was: 0.5132619738578796
The error for epoch 156 was: 0.5132619738578796
The error for epoch 157 was: 0.5132619738578796
The error for epoch 158 was: 0.5132619738578796
The error for epoch 159 was: 0.5132619738578796
The error for epoch 160 was: 0.5132619738578796
The error for epoch 161 was: 0.5132619738578796
The error for epoch 162 was: 0.5132619738578796
The error for epoch 163 was: 0.5132619738578796
The error for epoch 164 was: 0.5132619738578796
The error for epoch 165 was: 0.5132619738578796
The error for epoch 166 was: 0.5132619738578796
The error for epoch 167 was: 0.513261973

The error for epoch 119 was: 0.4059862792491913
The error for epoch 120 was: 0.42090195417404175
The error for epoch 121 was: 0.40417224168777466
The error for epoch 122 was: 0.4133135974407196
The error for epoch 123 was: 0.41124945878982544
The error for epoch 124 was: 0.40382859110832214
The error for epoch 125 was: 0.41388213634490967
The error for epoch 126 was: 0.40394049882888794
The error for epoch 127 was: 0.4463594853878021
The error for epoch 128 was: 0.4076773524284363
The error for epoch 129 was: 0.40563347935676575
The error for epoch 130 was: 0.4038527011871338
The error for epoch 131 was: 0.4038524627685547
The error for epoch 132 was: 0.4458048939704895
The error for epoch 133 was: 0.40486401319503784
The error for epoch 134 was: 0.4092780649662018
The error for epoch 135 was: 0.40379002690315247
The error for epoch 136 was: 0.40378338098526
The error for epoch 137 was: 0.45654261112213135
The error for epoch 138 was: 0.44732993841171265
The error for epoch 139 was: 0.

The error for epoch 91 was: 0.45286595821380615
The error for epoch 92 was: 0.4529167413711548
The error for epoch 93 was: 0.4529786705970764
The error for epoch 94 was: 0.45303016901016235
The error for epoch 95 was: 0.4648865759372711
The error for epoch 96 was: 0.45200589299201965
The error for epoch 97 was: 0.4521849751472473
The error for epoch 98 was: 0.4525068998336792
The error for epoch 99 was: 0.4526086449623108
The error for epoch 100 was: 0.4526869058609009
The error for epoch 101 was: 0.45275741815567017
The error for epoch 102 was: 0.455372154712677
The error for epoch 103 was: 0.46654778718948364
The error for epoch 104 was: 0.45999470353126526
The error for epoch 105 was: 0.45832452178001404
The error for epoch 106 was: 0.5172900557518005
The error for epoch 107 was: 0.4577833116054535
The error for epoch 108 was: 0.4571799337863922
The error for epoch 109 was: 0.5232251286506653
The error for epoch 110 was: 0.5227597951889038
The error for epoch 111 was: 0.522744655609

The error for epoch 63 was: 0.43650010228157043
The error for epoch 64 was: 0.4107685089111328
The error for epoch 65 was: 0.41865208745002747
The error for epoch 66 was: 0.41795676946640015
The error for epoch 67 was: 0.4250456988811493
The error for epoch 68 was: 0.42789822816848755
The error for epoch 69 was: 0.4138927459716797
The error for epoch 70 was: 0.41692063212394714
The error for epoch 71 was: 0.41975563764572144
The error for epoch 72 was: 0.41428741812705994
The error for epoch 73 was: 0.41788098216056824
The error for epoch 74 was: 0.409673810005188
The error for epoch 75 was: 0.4100501537322998
The error for epoch 76 was: 0.41038990020751953
The error for epoch 77 was: 0.4084830582141876
The error for epoch 78 was: 0.41999301314353943
The error for epoch 79 was: 0.418701171875
The error for epoch 80 was: 0.4567100405693054
The error for epoch 81 was: 0.41834408044815063
The error for epoch 82 was: 0.41844138503074646
The error for epoch 83 was: 0.420955091714859
The err

The error for epoch 34 was: 0.45502543449401855
The error for epoch 35 was: 0.4667535424232483
The error for epoch 36 was: 0.46219930052757263
The error for epoch 37 was: 0.474400132894516
The error for epoch 38 was: 0.4553617238998413
The error for epoch 39 was: 0.4727317690849304
The error for epoch 40 was: 0.4635385274887085
The error for epoch 41 was: 0.48298102617263794
The error for epoch 42 was: 0.4534952640533447
The error for epoch 43 was: 0.464219868183136
The error for epoch 44 was: 0.464446097612381
The error for epoch 45 was: 0.46085861325263977
The error for epoch 46 was: 0.4620027244091034
The error for epoch 47 was: 0.4540601968765259
The error for epoch 48 was: 0.46008121967315674
The error for epoch 49 was: 0.4701193571090698
The error for epoch 50 was: 0.4793239235877991
The error for epoch 51 was: 0.4629918038845062
The error for epoch 52 was: 0.45243626832962036
The error for epoch 53 was: 0.4538523554801941
The error for epoch 54 was: 0.455515056848526
The error f

The error for epoch 6 was: 0.4878391921520233
The error for epoch 7 was: 0.4825390875339508
The error for epoch 8 was: 0.4846537709236145
The error for epoch 9 was: 0.48259153962135315
The error for epoch 10 was: 0.489576518535614
The error for epoch 11 was: 0.4871387183666229
The error for epoch 12 was: 0.47870877385139465
The error for epoch 13 was: 0.4893416166305542
The error for epoch 14 was: 0.48178744316101074
The error for epoch 15 was: 0.4888462722301483
The error for epoch 16 was: 0.48275238275527954
The error for epoch 17 was: 0.48290374875068665
The error for epoch 18 was: 0.47111445665359497
The error for epoch 19 was: 0.47415977716445923
The error for epoch 20 was: 0.4702937602996826
The error for epoch 21 was: 0.4685893654823303
The error for epoch 22 was: 0.4761420786380768
The error for epoch 23 was: 0.45568832755088806
The error for epoch 24 was: 0.47218388319015503
The error for epoch 25 was: 0.46949055790901184
The error for epoch 26 was: 0.4644591212272644
The erro

The error for epoch 178 was: 0.44359448552131653
The error for epoch 179 was: 0.44353243708610535
The error for epoch 180 was: 0.44406089186668396
The error for epoch 181 was: 0.4372440278530121
The error for epoch 182 was: 0.4458073377609253
The error for epoch 183 was: 0.44571149349212646
The error for epoch 184 was: 0.4750107228755951
The error for epoch 185 was: 0.4735282063484192
The error for epoch 186 was: 0.4742606282234192
The error for epoch 187 was: 0.4542381167411804
The error for epoch 188 was: 0.4543016850948334
The error for epoch 189 was: 0.4550814926624298
The error for epoch 190 was: 0.4560507833957672
The error for epoch 191 was: 0.4562549591064453
The error for epoch 192 was: 0.45612427592277527
The error for epoch 193 was: 0.45543086528778076
The error for epoch 194 was: 0.4547966718673706
The error for epoch 195 was: 0.45489585399627686
The error for epoch 196 was: 0.46409597992897034
The error for epoch 197 was: 0.4560401141643524
The error for epoch 198 was: 0.4

The error for epoch 152 was: 0.6884765625
The error for epoch 153 was: 0.6884765028953552
The error for epoch 154 was: 0.6884765028953552
The error for epoch 155 was: 0.6884762644767761
The error for epoch 156 was: 0.6884763836860657
The error for epoch 157 was: 0.6884763240814209
The error for epoch 158 was: 0.6884760856628418
The error for epoch 159 was: 0.688476026058197
The error for epoch 160 was: 0.6884762048721313
The error for epoch 161 was: 0.6884760856628418
The error for epoch 162 was: 0.6884758472442627
The error for epoch 163 was: 0.6884756684303284
The error for epoch 164 was: 0.6884758472442627
The error for epoch 165 was: 0.6884755492210388
The error for epoch 166 was: 0.6884757280349731
The error for epoch 167 was: 0.6884755492210388
The error for epoch 168 was: 0.6884759664535522
The error for epoch 169 was: 0.6884752511978149
The error for epoch 170 was: 0.6884750127792358
The error for epoch 171 was: 0.6884751915931702
The error for epoch 172 was: 0.6884751319885254

In [17]:
sum = (true_positive + true_negative + false_positive + false_negative)/100
print("true positive: {} \
true negative: {} \
false positive: {} \
false negative: {}".format(true_positive/sum,true_negative/sum,false_positive/sum, false_negative/sum))

true positive: 12.55078125 true negative: 69.0859375 false positive: 3.1796875 false negative: 15.18359375


In [18]:
print("accuracy: {}".format((true_positive + true_negative)/(true_positive + true_negative+ false_positive + false_negative)))

accuracy: 0.8163671875


The warning parameters we need to include are 1. warning reliability 2. warning lead time 3. warning type 4. Missed warning

1. Warning reliability relates to missed warnings. This parameter has two values 75% and 90%. The first set of experiments were subjected to warnings with 75% reliability while the second half was subjected to 90% reliability. 

2. Warning lead time: This has two levels 2.5s and 4.5s hence a categorical value which can be

3. Warning type: There are two warning types - command and notification. This again is of the categorical type. 

The scenario data over here pertains to whether it was a straight line or a cross-section at which these events took place. Once again it is of the categorical type. Now this value has to be coded into the dataset from the experiment and consequently needs confirmation

The strategy to go about using this new information in the model is to concatenate a tensor of this new information to the output of the relevant last hidden layer of the LSTM. This will then in turn be fed to the linear layer and the softmax classifier. 

This new tensor will be fed to the model along with the training data. It can be treated in a way similar to the datasplitvector. This will require us to first extract the relevant data from the summary sheet and store it in a variable. This can be further manipulated during the formation of the folds and then fed to the model. 

In [None]:
# learning rate = 0.001 - overfit after 30 epochs
