# Extracting labels

In this script we extract event labels and timestamps from tsv file for dyad number 5, and then correlate those labels with ECG and EEG sensor readings.

## This part comes from Tomas Dang and Morgan Belcher

### Reading ECG data 

In [20]:
import random
from tkinter import Tk     # from tkinter import Tk for Python 3.x
from tkinter import filedialog
import csv
import pandas as pd

# These are for heart data processing

import numpy as np 
import matplotlib.pyplot as plt
import heartpy as hp
from scipy.signal import resample

In [21]:
#You dont need to understand this, just know that it takes a list of numbers, and a desired value. It will return the closest value in the list to K
def closest(lst, K):
    return lst[min(range(len(lst)), key=lambda i: abs(lst[i] - K))]

Here we choose the tsv file with labels and csv file with ECG readings. For dyad number 5 these files are ECG_P_session_5.csv and event_ARL_RWNVDEDP_session_5_task_RealworldDriving_subjectLabId_4069_recording_1.tsv

In [68]:
#open a dialog box and get the selected csv file
root = Tk()
root.withdraw()
csvfile = filedialog.askopenfile(parent=root,mode='r',filetypes=[('Excel file','*.csv')],title='Choose CSV file')

# open another dialog box and get the selected tsv file
tsvfile = filedialog.askopenfile(parent=root,mode='r',filetypes=[('Excel file','*.tsv')],title='Choose TSV file')

### Extract acceleration

Here we just set up environment - create empty lists to store acceleration start and end times

In [38]:
# What kind of events do you want? These should be the EXACT text of what is in the tsv column
eventtype_start = 'Event/Description/Driver begins accelerating at an aggressive rate'
eventtype_end = 'Event/Description/Driver ends accelerating at an aggressive rate'
eventtimes_start = [] # create a blank list of event times
eventtimes_end = [] # create a blank list of event times
f=(1000/4.06522225562659) # Zephr Bioharness 3 sample frequency
                           # 4.06.... average sample rate in CSV_UNIX_TO_MS.py file

### Extract braking

In [51]:
# What kind of events do you want? These should be the EXACT text of what is in the tsv column
eventtype_start = 'Event/Description/Driver begins braking'
eventtype_end = 'Event/Description/Driver stops braking'
eventtimes_start = [] # create a blank list of event times
eventtimes_end = [] # create a blank list of event times
f=(1000/4.06522225562659) # Zephr Bioharness 3 sample frequency
                           # 4.06.... average sample rate in CSV_UNIX_TO_MS.py file

### Extract lane change

In [69]:
# What kind of events do you want? These should be the EXACT text of what is in the tsv column
eventtype_start = 'Driver changes lanes'
#eventtype_end = 'Event/Description/Driver stops braking'
eventtimes_start = [] # create a blank list of event times
eventtimes_end = [] # create a blank list of event times
f=(1000/4.06522225562659) # Zephr Bioharness 3 sample frequency
                           # 4.06.... average sample rate in CSV_UNIX_TO_MS.py file

### This is only for lane change!!!

In [70]:
# open the selected tsv file
with open(tsvfile.name) as fd:
    rd = csv.reader(fd, delimiter="\t", quotechar='"') # read the file, line by line, into seperate rows
    for row in rd: # for each row in the tsv file
        data = row[2].split(',') # this 'splits' our row string by commas and creates a list of elements
        #print(data)
        if eventtype_start in data[1]: # this checks to see if our event type string (eventtype above) is inside of current tsv file event
            eventtimes_start.append(float(row[1])) # if its the kind of event we want, lets add it to our eventtimes list
            print(data[1])
        #if eventtype_end in data[1]:
        #    eventtimes_end.append(float(row[1]))

 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Ev

#### My intervention

Since Tomas is only concerned about aggressive acceleration and I need braking and lane change, I will modify the code above to extract braking and lane changes. The braking event descriptors are: **Driver begins braking** and **Driver stops braking**. For lane change we have just **Lane Change**. These are exact wordings to be used to extract events from tsv file. I am not concerned with extracting the direction of change and will treat left and right lane changes as same category events.

Here Tomas's code continues, I am not intervening yet.

In [66]:
# open the selected tsv file
with open(tsvfile.name) as fd:
    rd = csv.reader(fd, delimiter="\t", quotechar='"') # read the file, line by line, into seperate rows
    for row in rd: # for each row in the tsv file
        data = row[2].split(',') # this 'splits' our row string by commas and creates a list of elements
        if eventtype_start in data[1]: # this checks to see if our event type string (eventtype above) is inside of current tsv file event
            eventtimes_start.append(float(row[1])) # if its the kind of event we want, lets add it to our eventtimes list
            print(data[1])
        if eventtype_end in data[1]:
            eventtimes_end.append(float(row[1]))

 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Event/Description/Driver changes lanes to the left
 Event/Description/Driver changes lanes to the right
 Ev

Here we got timestamps for beginning and ending of aggressive acceleration

In [71]:
print("First few accelerations' start times\n")
print(eventtimes_start[:10])
print(len(eventtimes_start))
print("\n\n")
print("First few accelerations' end times\n")
print(eventtimes_end[:10])

First few accelerations' start times

[2256.1367, 2276.3086, 2305.4727, 2346.3516, 2349.6172, 2420.8711, 2585.207, 2644.3633, 2671.2734, 2707.4922]
53



First few accelerations' end times

[]


#### Reading ECG file 

Here we read ECG readings and extract their timestamps into separate list

In [72]:
df = pd.read_csv(csvfile.name) #ecg data file 
times = df.Time.tolist()
print('The Total Number of Events = ' + str(len(eventtimes_start)))
print('The Total Number of Events = ' + str(len(eventtimes_end)))
#print (eventtimes)

The Total Number of Events = 53
The Total Number of Events = 0


What is p300? Why add 0.650 seconds? Need to talk to Tomas

Here Tomas just multiples event start and end times measured in microseconds by 1000 to convert them to milliseconds.

In [73]:
StartIndex = []
EndIndex = [] 
for eventtime in eventtimes_start: # go through each event start time, one at a time
    EventStartTime = eventtime * 1000  # Convert to Milliseconds
    #EventEndTime = (eventtime) * 1000  # no need to Add .650 seconds for p300 then convert to Milliseconds 

    # with our new times, lets find the closest times in our csv file
    StartIndex.append(round(EventStartTime/(1000/f))) 
    
for eventtime in eventtimes_end: # go through each event start time, one at a time
    EventEndTime = eventtime * 1000  # Convert to Milliseconds
    #EventEndTime = (eventtime) * 1000  # no need to Add .650 seconds for p300 then convert to Milliseconds 

    # with our new times, lets find the closest times in our csv file
    EndIndex.append(round(EventEndTime/(1000/f))) 

In [74]:
# Let's print ECG data and see what's hiding there
df.head()

Unnamed: 0,Time,ECG
0,0.0,512.0
1,4.06543,512.0
2,8.130859,511.0
3,12.196289,511.0
4,16.261719,511.0


In [75]:
### Data to numpy array
ECG_Data = df.to_numpy(copy=False)

In [76]:
### Min-Max Normalization 
v = ECG_Data[:, 1]   
ECG_Data[:, 1] = (v - v.min()) / (v.max() - v.min())

In [77]:
### Extract the acceleration events from start index (end indexes not included as some events are too short)
### For this, I will include 5 seconds after the event start index
### 5 s = 5000 ms ; sample rate = 4.065514218205907 ; 5000 / 4.065514218205907 = ceiling(1229.8567245465144) = 1230 instances after start index 
ECG_Agg_Acc = []
for i in range(len(StartIndex)):
    ECG_Agg_Acc.append(ECG_Data[range(StartIndex[i],StartIndex[i]+1230),:])

In [78]:
##############################################################################
# Extracting ibi and bpm from aggressive acceleration in session 1
##############################################################################

### Empty arrays 
ibi = np.zeros(len(ECG_Agg_Acc))
bpm = np.zeros(len(ECG_Agg_Acc))
sample_rate = np.zeros(len(ECG_Agg_Acc))
filtered = []

In [79]:
### Source: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html
from scipy.signal import find_peaks
for i in range(len(ECG_Agg_Acc)):
    # Find sample and remove baseline wander and plot
    sample_rate[i] = hp.get_samplerate_mstimer(ECG_Agg_Acc[i][:,0])
    filtered.append(hp.remove_baseline_wander(ECG_Agg_Acc[i][:,1], sample_rate[i]))
    #plt.figure(figsize=(12,3))
    #plt.title('Signal with Baseline Wander Removed')
    #plt.plot(filtered[i])
    #plt.show()
    
    # Finding peak / rr list 
    peaks, _ = find_peaks(filtered[i], height=0, distance = 150) #consider maxima above 0
                                                                 #positions of QRS complex within the ECG by demanding a distance of at least 150 samples
    # Plot with peaks of R in ECG (results show some misinterpreted peaks but negligible)
    #x = filtered[i]
    #plt.plot(x)
    #plt.plot(peaks, x[peaks], "x")
    #plt.plot(np.zeros_like(x), "--", color="gray")
    #plt.show()
    
    # Find the time (ms) difference between each R wave = ibi = interbeat interval
    #RR_list = peaks.copy() #instances of where rr peaks occur
    ibi[i] = np.diff(ECG_Agg_Acc[i][:,0][peaks]).mean() 
    bpm[i] = 60000 / ibi[i]

In [80]:
ibibpm = np.vstack((ibi, bpm)).T

We keep re-running the cells above, changing the type of event and then storing the outputted values of ibi and bpm in dataframes

In [50]:
df_ecg_accel = pd.DataFrame(ibibpm, columns = ['ibi', 'bpm'])
print(df_ecg_accel.shape)
df_ecg_accel.to_csv("ecg_accel.csv", index = False)

(97, 2)


In [63]:
df_ecg_brake = pd.DataFrame(ibibpm, columns = ['ibi', 'bpm'])
print(df_ecg_brake.shape)
df_ecg_brake.to_csv("ecg_brake.csv", index = False)

(56, 2)


In [81]:
df_ecg_lane = pd.DataFrame(ibibpm, columns = ['ibi', 'bpm'])
print(df_ecg_lane.shape)
df_ecg_lane.to_csv("ecg_lane.csv", index = False)

(53, 2)
