## Seer Medical Database 101
Note: the dataframes seen in this notebook are a work of fiction. Any similarity to actual dataframes is purely coincidental.

In [245]:
import os
import pandas as pd
import numpy as np
import datetime

### The data is separated by patient and modality. Here's an example of a baseline dataframe:
Note: in this dataframe, all the files that do not contain any seizure are concatenated

In [246]:
baseline_df = pd.read_pickle(f'baseline_data_Empatica-{modality}')
baseline_df

Unnamed: 0,Empatica-EDA
2019-02-27 16:08:12.000000000,-0.453108
2019-02-27 16:08:12.007815105,0.447005
2019-02-27 16:08:12.015630210,-0.339824
2019-02-27 16:08:12.023445315,0.449507
2019-02-27 16:08:12.031260420,-0.566362
2019-02-27 16:08:12.039075525,0.452315
2019-02-27 16:08:12.046890630,-0.339824
2019-02-27 16:08:12.054705735,0.453413
2019-02-27 16:08:12.062520840,-0.396466
2019-02-27 16:08:12.070335945,0.454115


### Again, the data is separated by patient and modality. Here's an example of a seizure dataframe:
Note: regrading the baseline dataframe, this one has an additional column that corresponds to the seizure annotation (i.e. 0 corresponds to no seizure and i corresponds to the i-th annotated seizure)

In [247]:
seizures_df = pd.read_pickle(f'seizures_data_Empatica-{modality}')
seizures_df

Unnamed: 0,Empatica-EDA,sz
2019-02-28 13:06:00.000000000,-0.726279,0.0
2019-02-28 13:06:00.007812516,0.492721,0.0
2019-02-28 14:07:39.961673344,0.491073,0.0
2019-02-28 14:07:39.969485899,-0.307657,0.0
2019-02-28 14:07:39.977298453,0.491073,0.0
2019-02-28 14:07:39.985111007,-0.128178,0.0
2019-02-28 14:07:39.992923561,0.491073,0.0
2019-02-28 14:07:40.000736116,-0.038453,27.0
2019-02-28 14:07:40.008548670,0.491073,27.0
2019-02-28 14:07:40.016361224,0.051302,27.0


In [248]:
print(f'types of seizures recorded: {np.delete(seizures_df["sz"].unique(), 0)}')

types of seizures recorded: [27.]


### Here you can see how the features are extracted and how the new timestamps are generated

Check if there are any discontinuities in the timestamps and, if yes, segment it

In [249]:
diff_time = np.diff(seizures_df.index).astype('timedelta64[ms]')
print(diff_time)

diff_time = np.argwhere(diff_time != datetime.timedelta(milliseconds=np.floor((1/128)*1000))) 
print(diff_time)

[      7 3699953       7       7       7       7       7       7       7
       7       7       7       7]
[[1]]


In [250]:
crop_df1 = seizures_df.iloc[0:int(diff_time)+1]
crop_df2 = seizures_df.iloc[int(diff_time)+1:]
crop_df2

Unnamed: 0,Empatica-EDA,sz
2019-02-28 14:07:39.961673344,0.491073,0.0
2019-02-28 14:07:39.969485899,-0.307657,0.0
2019-02-28 14:07:39.977298453,0.491073,0.0
2019-02-28 14:07:39.985111007,-0.128178,0.0
2019-02-28 14:07:39.992923561,0.491073,0.0
2019-02-28 14:07:40.000736116,-0.038453,27.0
2019-02-28 14:07:40.008548670,0.491073,27.0
2019-02-28 14:07:40.016361224,0.051302,27.0
2019-02-28 14:07:40.024173778,0.491104,27.0
2019-02-28 14:07:40.031986333,0.230781,27.0


Expand the original seizure annotations using preseizure and postseizure. After this, the rows that do not contain a seizure are dropped

In [251]:
## these values are just for visualization, in reality we would have to perform the operations in the comments
preseizure = 3 # int(preseizure * fs)
postseizure = 1 # int(postseizure * fs)

In [252]:
seizure_array = crop_df2['sz'].values

expanded_seizures = np.copy(seizure_array)
uni = np.unique(seizure_array)

for sz in uni:
    if sz == 0: 
        continue
        
    indx = np.argwhere(seizure_array == sz)
    
    start_ind = max(int(indx[0])-preseizure, 0)
    end_ind = min(int(indx[-1])+postseizure+1, len(seizure_array))
    
    aux_ind = np.arange(start_ind, end_ind)
    np.put(expanded_seizures, aux_ind, sz*np.ones((len(aux_ind),)))
    

In [253]:
df = pd.concat([pd.DataFrame(seizure_array, columns=['original']), pd.DataFrame(expanded_seizures, columns=['expanded'])], axis=1)
df

Unnamed: 0,original,expanded
0,0.0,0.0
1,0.0,0.0
2,0.0,27.0
3,0.0,27.0
4,0.0,27.0
5,27.0,27.0
6,27.0,27.0
7,27.0,27.0
8,27.0,27.0
9,27.0,27.0


In [254]:
crop_df2['sz'] = expanded_seizures
crop_df2 = crop_df2[crop_df2.sz != 0.]
crop_df2

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  crop_df2['sz'] = expanded_seizures


Unnamed: 0,Empatica-EDA,sz
2019-02-28 14:07:39.977298453,0.491073,27.0
2019-02-28 14:07:39.985111007,-0.128178,27.0
2019-02-28 14:07:39.992923561,0.491073,27.0
2019-02-28 14:07:40.000736116,-0.038453,27.0
2019-02-28 14:07:40.008548670,0.491073,27.0
2019-02-28 14:07:40.016361224,0.051302,27.0
2019-02-28 14:07:40.024173778,0.491104,27.0
2019-02-28 14:07:40.031986333,0.230781,27.0
2019-02-28 14:07:40.039798887,0.491104,27.0


Segment time and _sz_ columm according to the window and overlap chosen

In [255]:
## these values are just for visualization, in reality we would have to perform the operations in the comments
window = 2 #int(window * fs) 
overlap = int(window * (1 - 0))

In [256]:
time1_ = [crop_df1.index[i] for i in range(0, len(crop_df1)-window, overlap)]
time2_ = [crop_df2.index[i] for i in range(0, len(crop_df2)-window, overlap)]

sz2_ = [crop_df2.sz[i] for i in range(0, len(crop_df2)-window, overlap)]
sz2_

[27.0, 27.0, 27.0, 27.0]

In [257]:
pd.concat([pd.DataFrame(crop_df2.index.values, index=crop_df2.index, columns=['original time']), pd.DataFrame(time2_, index=time2_, columns=['features time'])], axis=1)

Unnamed: 0,original time,features time
2019-02-28 14:07:39.977298453,2019-02-28 14:07:39.977298453,2019-02-28 14:07:39.977298453
2019-02-28 14:07:39.985111007,2019-02-28 14:07:39.985111007,NaT
2019-02-28 14:07:39.992923561,2019-02-28 14:07:39.992923561,2019-02-28 14:07:39.992923561
2019-02-28 14:07:40.000736116,2019-02-28 14:07:40.000736116,NaT
2019-02-28 14:07:40.008548670,2019-02-28 14:07:40.008548670,2019-02-28 14:07:40.008548670
2019-02-28 14:07:40.016361224,2019-02-28 14:07:40.016361224,NaT
2019-02-28 14:07:40.024173778,2019-02-28 14:07:40.024173778,2019-02-28 14:07:40.024173778
2019-02-28 14:07:40.031986333,2019-02-28 14:07:40.031986333,NaT
2019-02-28 14:07:40.039798887,2019-02-28 14:07:40.039798887,NaT


Structure of the final feature dataframe

In [260]:
features = pd.read_pickle('features_s_EDA_2s_0.h5')
features

Unnamed: 0,Empatica-EDA_mean,Empatica-EDA_median,Empatica-EDA_var,Empatica-EDA_std,Empatica-EDA_abs_dev,Empatica-EDA_kurtosis,Empatica-EDA_skewness,Empatica-EDA_iqr,Empatica-EDA_rms,Empatica-EDA_maxAmp,...,Empatica-EDA_amps_mean,Empatica-EDA_amps_median,Empatica-EDA_amps_var,Empatica-EDA_amps_std,Empatica-EDA_amps_abs_dev,Empatica-EDA_amps_kurtosis,Empatica-EDA_amps_skewness,Empatica-EDA_amps_iqr,Empatica-EDA_amps_rms,sz
2019-02-28 14:07:39.977298453,0.242266,0.253545,0.016317,0.12774,243.330722,0.716919,0.274979,0.128531,0.27388,0.388357,...,0.158654,0.089185,0.0229,0.151327,1.721878,1.705695,1.42056,0.174629,0.219251,27.0
2019-02-28 14:07:39.992923561,0.236606,0.23939,0.019116,0.138262,263.937687,0.503763,0.461193,0.142597,0.274042,0.394018,...,0.148718,0.152124,0.023261,0.152514,1.668006,2.019658,1.429625,0.188687,0.21302,27.0
2019-02-28 14:07:40.008548670,0.221635,0.209432,0.027773,0.166652,334.146958,-0.421653,0.361765,0.198193,0.277299,0.40039,...,0.235393,0.187679,0.030246,0.173915,1.931705,-0.209262,0.692913,0.184611,0.292671,27.0
2019-02-28 14:07:40.024173778,0.202829,0.21221,0.031793,0.178305,368.09581,-0.715261,0.037166,0.247868,0.27006,0.419196,...,0.225302,0.262848,0.024918,0.157855,1.927647,-1.112483,0.118821,0.253459,0.275099,27.0
