### Generating Feature Set for Base Model  
  
The purpose of this notebook is to generate the feature set for my base model. The features will be band power (delta, theta, alpha, beta, gamma) and sample entropy. These are calculated by channel, so each sub EEG channel will have 6 features. There are 19 EEG channels, so that means each sub EEG will have 114 features total.

In [1]:
import numpy as np
import pandas as pd
import fastparquet, pyarrow
import matplotlib.pyplot as plt
import mne
from mne.decoding import Scaler
from sklearn.preprocessing import StandardScaler

In [2]:
df = pd.read_csv('metadata_80.csv')

### Overview of Coding Goals  
  
Main Task: Calculate feature data for enough sub EEGs to get a representative sample to train and evaluate a base model with.  
  
Individual Tasks:  
  
Sort metadata by patient ID low to high  
  
Concatenate sub EEGs by patient ID  
  
Load in patient data and scale  
  
Using get_data function, calculate feature data on specified splits of the patient data  
  
Loop over this pulling patient IDs at random until the target variable distribution is similar to the distribution of the target variable for the overall data  
  
I want everything other than the loop which pulls patient IDs at random to be done via a function with two outputs. The first output should be a pandas dataframe for the feature set and the second output should be a pandas dataframe for the corresponding target value. This function will be run once to get these two datafarmes established, and then the function will be looped over pulling patient IDs at random and the outputs for each patient will be concatenated with the already established feature and target sets.  
  
### Sorting Metadata by Patient ID Low to High

In [3]:
patient_list = np.unique(df['patient_id'])

In [4]:
df_sorted = df[df['patient_id'] == patient_list[0]].copy()

In [5]:
for i in range(1, len(patient_list)):
    to_stack = df[df['patient_id'] == patient_list[i]].copy()
    df_sorted = pd.concat([df_sorted, to_stack])

In [6]:
df_sorted = df_sorted.reset_index().drop(columns = 'index')

In [7]:
df_sorted.to_csv('by_patient.csv', index = None)

### Sub EEG Limit  
  
I've written code that will concatenate a patient's sub EEGs, scale that data by channel individually, and then load that data as an MNE RawArray. Then that RawArray is appended to a list of RawArrays which will be looped over to calculate feature data.  
  
I pulled 5 patients at random to test the code and make sure it works and the first time the kernel died. The second time it didn't. The fourth patient had 169 sub EEGs which is 1.69 million rows of data. The patient whose data killed the kernel before must have had more than this and it was too much for my computer to handle. So, I'm setting an upper limit and will only select patients with fewer sub EEGs than that limit to ensure my computer can handle the task. Eventually I'll alter my code so that it pulls a specified number of sub EEGs from these patients at random so that I'm not ignoring them entirely, but the goal right now is to get a feature set and run a base model. Ignoring these patients with an abnormal amount of sub EEGs shouldn't alter the distribution of the target variable substantially.

In [8]:
n_segs = []
for p in patient_list:
    n_segs.append(df_sorted[df_sorted['patient_id'] == p].shape[0])

In [9]:
n_segs = pd.DataFrame(n_segs, columns = ['segments'])
n_segs['patient'] = patient_list
seg_80 = n_segs['segments'].quantile(q = 0.80)
seg_85 = n_segs['segments'].quantile(q = 0.85)

In [10]:
n_segs

Unnamed: 0,segments,patient
0,52,56
1,38,105
2,9,149
3,59,195
4,9,198
...,...,...
1773,4,65377
1774,362,65378
1775,43,65430
1776,30,65442


In [11]:
segments_85 = n_segs[n_segs['segments'] < seg_85].copy()
segments_80 = n_segs[n_segs['segments'] < seg_80].copy()
segments_85 = segments_85.reset_index().drop(columns = 'index')
segments_80 = segments_80.reset_index().drop(columns = 'index')

In [12]:
p85 = segments_85['patient']
p80 = segments_80['patient']

### Functions to Concatenate EEGs by Patient ID

In [13]:
def within_eeg(data, eeg_id):
    subset = data[data['eeg_id'] == eeg_id].copy()
    subset = subset.reset_index().drop(columns = 'index')
    eeg = pd.read_parquet('train_eegs/{}.parquet'.format(eeg_id), engine = 'pyarrow')
    full = eeg[0:10000].copy()
    for i in range(1, subset.shape[0]):
        start = int(subset['eeg_label_offset_seconds'][i] * 200)
        stop = start + 10000
        sub_eeg = eeg[start:stop].copy()
        full = pd.concat([full, sub_eeg])
    return full.reset_index().drop(columns = 'index')

In [14]:
def within_patient(data, patient):
    patient_data = data[data['patient_id'] == patient].copy()
    patient_data = patient_data.reset_index().drop(columns = 'index')
    unique_eegs = np.unique(patient_data['eeg_id'])
    full = within_eeg(patient_data, unique_eegs[0])
    for i in range(1, len(unique_eegs)):
        eeg = within_eeg(patient_data, unique_eegs[i])
        full = pd.concat([full, eeg])
    return full.reset_index().drop(columns = 'index')

In [15]:
patient1 = within_patient(df_sorted, patient_list[0])

### Scaling with Scikit-Learn  
  
I've written two functions to scale EEG data by channel. I'm going to use these functions and then load in the scaled data and generate epochs with MNE. Eventually, I'd like to use MNE's scaler to do this, but I'm having some difficulty getting the scaler to transform the data in place. My best alternative right now seems to be setting a new variable equal to the transformed data and loading that in with MNE, but I'm a bit confused by the dimensionality of the epoched data. When I load in the raw data at first, it's two dimensions. When I establish epochs, it's three dimensions. If I load that back in, will MNE treat it as already epoched? I need to figure these things out, but I also need to get base model results for tomorrow.

In [16]:
def standardize_channel(eeg, channel):
    scaler = StandardScaler()
    scaled = scaler.fit_transform(eeg[[channel]]).flatten()
    return scaled

def standardize_eeg(eeg):
    channels = eeg.columns
    for channel in channels:
        eeg[channel] = standardize_channel(eeg, channel)
    return eeg

In [17]:
patient1_scaled = standardize_eeg(patient1)

### Loading in First Patient with MNE

In [18]:
info = mne.create_info(
    patient1_scaled.columns.to_list(),
    ch_types=(["eeg"]*(len(patient1_scaled.columns)-1))+['ecg'],
    sfreq=200
)
info.set_montage("standard_1020")

0,1
Measurement date,Unknown
Experimenter,Unknown
Participant,Unknown

0,1
Digitized points,22 points
Good channels,"19 EEG, 1 ECG"
Bad channels,
EOG channels,Not available
ECG channels,EKG

0,1
Sampling frequency,200.00 Hz
Highpass,0.00 Hz
Lowpass,100.00 Hz


In [19]:
raw = mne.io.RawArray(
    patient1_scaled.to_numpy().T,
    info
)

Creating RawArray with float64 data, n_channels=20, n_times=520000
    Range : 0 ... 519999 =      0.000 ...  2599.995 secs
Ready.


### Data Loading, Scaling, and Feature Extraction Overview  
  
- Select Patient at Random  
- Concatenate Sub EEGs  
- Scale Channels Individually  
- Load Scaled Data with MNE  
- Calculate Feature Data for Each Sub EEG  
- Return Feature Data as DataFrame and Target Values as DataFrame  
- Loop Over Randomly Selected Patients and Repeat Steps 4 - 8  
- DataFrames from Step 9 Concatenated with DataFrames from Step 8

In [20]:
from mne_features.univariate import compute_samp_entropy

In [21]:
help(compute_samp_entropy)

Help on function compute_samp_entropy in module mne_features.univariate:

compute_samp_entropy(data, emb=2, metric='chebyshev')
    Sample Entropy (SampEn, per channel).
    
    Parameters
    ----------
    data : ndarray, shape (n_channels, n_times)
    
    emb : int (default: 2)
        Embedding dimension.
    
    metric : str (default: chebyshev)
        Name of the metric function used with KDTree. The list of available
        metric functions is given by: `KDTree.valid_metrics`.
    
    Returns
    -------
    output : ndarray, shape (n_channels,)
    
    Notes
    -----
    Alias of the feature function: **samp_entropy**. See [1]_.
    
    References
    ----------
    .. [1] Richman, J. S. et al. (2000). Physiological time-series analysis
           using approximate entropy and sample entropy. American Journal of
           Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.



In [22]:
entropies = compute_samp_entropy(raw.get_data(start = 0, stop = 10000, picks = 'eeg'))

In [23]:
channels = raw.ch_names

In [24]:
electrodes = []
for i in range(len(channels)):
    if channels[i] != 'EKG':
        electrodes.append(channels[i])

In [25]:
entropy_df = pd.DataFrame(entropies, index = electrodes, columns = ['Entropy']).transpose()

In [26]:
entropy_df

Unnamed: 0,Fp1,F3,C3,P3,F7,T3,T5,O1,Fz,Cz,Pz,Fp2,F4,C4,P4,F8,T4,T6,O2
Entropy,0.17802,0.720601,0.712341,0.960745,0.581941,0.590132,0.727185,0.754397,0.595437,0.966532,0.694161,0.408214,0.788874,0.690479,0.714794,0.590244,0.645452,0.726696,0.739333


### To Do  
 
The function below calculates the sample entropy for each channel of an EEG and reads those entropies into a pandas df. Next I need to write code that will loop over the rest of the sub EEGs in a patient's data and calculate the entropies for those sub EEGs. As those entropies are calculated, I need them concatenated with the other entropies so that the result is one entropy df with the entropies for each channel for each sub EEG.  
  
Then I need code that does this with band power features.  
  
Lastly, I need to gather this feature data together and split it into training and testing sets to run a base model on,

In [27]:
def entropies(data):
    ent_vals = compute_samp_entropy(data)
    return pd.DataFrame(ent_vals, 
                        index = ['{}_ent'.format(electrodes[i]) for i in range(len(electrodes))]).transpose()

In [28]:
entropy_df = entropies(raw.get_data(start = 0, stop = 10000, picks = 'eeg'))

In [29]:
entropy_df

Unnamed: 0,Fp1_ent,F3_ent,C3_ent,P3_ent,F7_ent,T3_ent,T5_ent,O1_ent,Fz_ent,Cz_ent,Pz_ent,Fp2_ent,F4_ent,C4_ent,P4_ent,F8_ent,T4_ent,T6_ent,O2_ent
0,0.17802,0.720601,0.712341,0.960745,0.581941,0.590132,0.727185,0.754397,0.595437,0.966532,0.694161,0.408214,0.788874,0.690479,0.714794,0.590244,0.645452,0.726696,0.739333


### Raw Array List

In [30]:
raw_list = []
def mne_patient(data, patient):
    patient_eegs = within_patient(data, patient)
    scaled_eegs = standardize_eeg(patient_eegs)
    info = mne.create_info(
        scaled_eegs.columns.to_list(),
        ch_types=(["eeg"]*(len(scaled_eegs.columns)-1))+['ecg'],
        sfreq=200
    )
    info.set_montage("standard_1020")
    raw = mne.io.RawArray(
        scaled_eegs.to_numpy().T,
        info
    )
    return raw

random_patients = np.random.choice(p80, size = 5, replace = False)

for p in random_patients:
    raw_list.append(mne_patient(df_sorted, p))

Creating RawArray with float64 data, n_channels=20, n_times=220000
    Range : 0 ... 219999 =      0.000 ...  1099.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=40000
    Range : 0 ... 39999 =      0.000 ...   199.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=210000
    Range : 0 ... 209999 =      0.000 ...  1049.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=80000
    Range : 0 ... 79999 =      0.000 ...   399.995 secs
Ready.


In [31]:
segment_length = 10000    # length of each sub EEG = 10k 

all_entropy_df = pd.DataFrame()

# raw_list = Raw objects for each patient (i.e. list of patients as RawArray objects) 
# 
# example: raw_list = [patient_0_RawArray, patient_1_RawArray, ..., patient_N_RawArray]
for patient_idx, raw in enumerate(raw_list):
    total_samples = raw.n_times                    # n_times = total sample count 
    n_segments = total_samples // segment_length   # integer division to get full segments only 
    
    # loop over each segment for the current patient 
    for seg_idx in range(n_segments):
        start_sample = seg_idx * segment_length     # start_sample = 0 * 10000 = 0 
        stop_sample = start_sample + segment_length # stop_sample = 0 + 10000 = 10000  
        
        # NOTE: to ensure this code is as computationally efficient as possible, 
        # make sure to use raw.get_data() to only extract the necessary parts of 
        # the patient's data at this iteration. 
        segment_data = raw.get_data(start = start_sample, stop = stop_sample, picks = 'eeg')
        
        entropy_features_df = entropies(segment_data)
        
        # Add some metadata to entropy dataframe 
        entropy_features_df['patient_id'] = patient_idx 
        entropy_features_df['segment_idx'] = seg_idx 
        
        # all_entropy_df = 
        all_entropy_df = pd.concat([all_entropy_df, entropy_features_df], ignore_index = True)
        
# confirm shape of data 
all_entropy_df.shape

(56, 21)

In [32]:
all_entropy_df

Unnamed: 0,Fp1_ent,F3_ent,C3_ent,P3_ent,F7_ent,T3_ent,T5_ent,O1_ent,Fz_ent,Cz_ent,...,Fp2_ent,F4_ent,C4_ent,P4_ent,F8_ent,T4_ent,T6_ent,O2_ent,patient_id,segment_idx
0,0.322159,0.574096,0.505417,0.466664,0.339096,0.41008,0.557342,0.432998,0.577314,0.512105,...,0.317492,0.496561,0.524223,0.449031,0.503805,0.439627,0.492933,0.463704,0,0
1,0.635825,1.02935,0.984118,0.967424,0.841308,1.180598,1.071399,1.052801,0.988318,0.934891,...,0.64112,1.012501,1.078486,1.109073,1.08246,1.11054,1.110861,1.13469,0,1
2,0.54167,0.9811,0.948508,0.945381,0.697895,1.123398,1.072462,1.041799,0.947347,0.908774,...,0.545767,0.983262,1.044389,1.092711,1.045189,1.077772,1.099917,1.118449,0,2
3,0.343638,0.791828,0.85781,0.92611,0.506485,1.113662,1.071187,0.995813,0.750125,0.697293,...,0.348159,0.828127,0.868198,0.951077,0.858004,0.924422,1.007323,1.027495,0,3
4,0.278006,0.726103,0.790374,0.870756,0.444595,1.059577,1.048448,0.951224,0.673407,0.608701,...,0.281814,0.761886,0.779521,0.87362,0.769365,0.832562,0.942819,0.967038,0,4
5,0.259211,0.701748,0.778225,0.858347,0.427345,1.03273,1.043421,0.953998,0.64561,0.593847,...,0.26353,0.735866,0.759079,0.851935,0.748036,0.816206,0.926854,0.95121,0,5
6,0.222502,0.63808,0.718437,0.794094,0.388594,1.079916,0.996838,0.962586,0.586451,0.533484,...,0.222333,0.672846,0.681266,0.760643,0.666524,0.79953,0.865476,0.875372,0,6
7,0.322716,0.731156,0.759652,0.763035,0.550862,1.153079,0.906015,0.89292,0.675321,0.61616,...,0.308378,0.719938,0.764429,0.847896,0.740774,0.861481,0.877064,0.886682,0,7
8,0.576533,0.74873,0.717122,0.771789,0.77268,0.98042,0.926144,0.823641,0.755644,0.846204,...,0.518594,0.762555,0.877906,0.943342,0.769206,0.935407,0.854804,0.968286,0,8
9,0.476278,0.671077,0.632878,0.688296,0.663375,0.905741,0.84224,0.77082,0.666417,0.727904,...,0.440712,0.658918,0.772305,0.855493,0.675859,0.834264,0.778136,0.875758,0,9


In [33]:
y_vals = pd.DataFrame()
for p in random_patients:
    target_vals = df_sorted[df_sorted['patient_id'] == p]['expert_consensus']
    y_vals = pd.concat([y_vals, target_vals], ignore_index = True)

In [34]:
all_entropy_df['activity'] = y_vals

In [35]:
all_entropy_df

Unnamed: 0,Fp1_ent,F3_ent,C3_ent,P3_ent,F7_ent,T3_ent,T5_ent,O1_ent,Fz_ent,Cz_ent,...,F4_ent,C4_ent,P4_ent,F8_ent,T4_ent,T6_ent,O2_ent,patient_id,segment_idx,activity
0,0.322159,0.574096,0.505417,0.466664,0.339096,0.41008,0.557342,0.432998,0.577314,0.512105,...,0.496561,0.524223,0.449031,0.503805,0.439627,0.492933,0.463704,0,0,Other
1,0.635825,1.02935,0.984118,0.967424,0.841308,1.180598,1.071399,1.052801,0.988318,0.934891,...,1.012501,1.078486,1.109073,1.08246,1.11054,1.110861,1.13469,0,1,Seizure
2,0.54167,0.9811,0.948508,0.945381,0.697895,1.123398,1.072462,1.041799,0.947347,0.908774,...,0.983262,1.044389,1.092711,1.045189,1.077772,1.099917,1.118449,0,2,Seizure
3,0.343638,0.791828,0.85781,0.92611,0.506485,1.113662,1.071187,0.995813,0.750125,0.697293,...,0.828127,0.868198,0.951077,0.858004,0.924422,1.007323,1.027495,0,3,Seizure
4,0.278006,0.726103,0.790374,0.870756,0.444595,1.059577,1.048448,0.951224,0.673407,0.608701,...,0.761886,0.779521,0.87362,0.769365,0.832562,0.942819,0.967038,0,4,Seizure
5,0.259211,0.701748,0.778225,0.858347,0.427345,1.03273,1.043421,0.953998,0.64561,0.593847,...,0.735866,0.759079,0.851935,0.748036,0.816206,0.926854,0.95121,0,5,Seizure
6,0.222502,0.63808,0.718437,0.794094,0.388594,1.079916,0.996838,0.962586,0.586451,0.533484,...,0.672846,0.681266,0.760643,0.666524,0.79953,0.865476,0.875372,0,6,Seizure
7,0.322716,0.731156,0.759652,0.763035,0.550862,1.153079,0.906015,0.89292,0.675321,0.61616,...,0.719938,0.764429,0.847896,0.740774,0.861481,0.877064,0.886682,0,7,Seizure
8,0.576533,0.74873,0.717122,0.771789,0.77268,0.98042,0.926144,0.823641,0.755644,0.846204,...,0.762555,0.877906,0.943342,0.769206,0.935407,0.854804,0.968286,0,8,Seizure
9,0.476278,0.671077,0.632878,0.688296,0.663375,0.905741,0.84224,0.77082,0.666417,0.727904,...,0.658918,0.772305,0.855493,0.675859,0.834264,0.778136,0.875758,0,9,Seizure


### Band Power  
  
MNE has a function to calculate frequency band power, so I'll use that as above to get those features calculated. For now, the plan is to calculate bands individually. In other words, a function will be written that calculates theta. Then a function that calculates delta. Then alpha, beta, and gamma. These functions will be similar to the entropy function. They'll be placed in their own dataframes and then once all of them have been calculated that be concatenated with all_entropy_df.

In [36]:
from mne_features.univariate import compute_pow_freq_bands

In [37]:
freq_bands = ['delta', 'theta', 'alpha', 'beta', 'gamma']
band_features = []
for e in electrodes:
    for b in freq_bands:
        feat_name = '{}_{}'.format(e, b)
        band_features.append(feat_name)

In [38]:
def activity_df(activity):
    return pd.DataFrame(df_sorted[df_sorted['expert_consensus'] == activity]).reset_index().drop(columns = 'index')

other_df = activity_df('Other')
seizure_df = activity_df('Seizure')
lpd_df = activity_df('LPD')
gpd_df = activity_df('GPD')
lrda_df = activity_df('LRDA')
grda_df = activity_df('GRDA')

In [39]:
def activity_patients(data):
    return np.unique(data['patient_id'])

other_ids = activity_patients(other_df)
seizure_ids = activity_patients(seizure_df)
lpd_ids = activity_patients(lpd_df)
gpd_ids = activity_patients(gpd_df)
lrda_ids = activity_patients(lrda_df)
grda_ids = activity_patients(grda_df)

In [40]:
random_other = np.random.choice(other_ids, size = 100, replace = False)
random_seizure = np.random.choice(seizure_ids, size = 100, replace = False)
random_grda = np.random.choice(grda_ids, size = 100, replace = False)
random_lrda = np.random.choice(lrda_ids, size = 100, replace = False)
random_gpd = np.random.choice(gpd_ids, size = 100, replace = False)
random_lpd = np.random.choice(lpd_ids, size = 100, replace = False)

In [41]:
def mne_patient(data, patient):
    patient_eegs = within_patient(data, patient)
    scaled_eegs = standardize_eeg(patient_eegs)
    info = mne.create_info(
        scaled_eegs.columns.to_list(),
        ch_types=(["eeg"]*(len(scaled_eegs.columns)-1))+['ecg'],
        sfreq=200
    )
    info.set_montage("standard_1020")
    raw = mne.io.RawArray(
        scaled_eegs.to_numpy().T,
        info
    )
    return raw

def get_raw_list(data, patients):
    raw_list = []
    for p in patients:
        raw_list.append(mne_patient(data, p))
    return raw_list

In [42]:
def freq_band_features(data):
    band_vals = compute_pow_freq_bands(200, data)
    return pd.DataFrame(band_vals, index = band_features).transpose()

In [43]:
def band_feature_df(raw_list):
    band_df = pd.DataFrame()
    segment_length = 10000
    for patient_idx, raw in enumerate(raw_list):
        total_samples = raw.n_times                     
        n_segments = total_samples // segment_length   

        for seg_idx in range(n_segments):
            start_sample = seg_idx * segment_length     # start_sample = 0 * 10000 = 0 
            stop_sample = start_sample + segment_length # stop_sample = 0 + 10000 = 10000  

            segment_data = raw.get_data(start = start_sample, stop = stop_sample, picks = 'eeg')

            band_features_df = freq_band_features(segment_data)

            band_features_df['patient_id'] = patient_idx 
            band_features_df['segment_idx'] = seg_idx 

            band_df = pd.concat([band_df, band_features_df], ignore_index = True)
    
    return band_df

### Brief Note  
  
The code appears to be working. I need to adjust the band_feature_df function so that the target variable is included as a column of the feature set. One other thing is that I may need to re-do the activity datasets because I split the data that wasn't limited by the number of sub EEGs for individual patients. This may result in a patient being pulled at random that kills the kernel. If that happens, the datasets will need to be adjusted so that those patients aren't included.

In [44]:
def bands_activity(i_vals, ids, data):
    activity_band_df = pd.DataFrame()
    for i in i_vals:
        patient_list = [ids[j] for j in range(i, i + 25)]
        raw_list = get_raw_list(data, ids)
        band_subset = band_feature_df(raw_list)
        activity_band_df = pd.concat([activity_band_df, band_subset], ignore_index = True)
    return activity_band_df

In [None]:
i_vals = [i * 25 for i in range(4)]
other_band_df = bands_activity(i_vals, random_other, other_df)

Creating RawArray with float64 data, n_channels=20, n_times=940000
    Range : 0 ... 939999 =      0.000 ...  4699.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=80000
    Range : 0 ... 79999 =      0.000 ...   399.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=140000
    Range : 0 ... 139999 =      0.000 ...   699.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=60000
    Range : 0 ... 59999 =      0.000 ...   299.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=390000
    Range : 0 ... 389999 =      0.000 ...  1949.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=40000
    Range : 0 ... 39999 =      0.000 ...   199.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=20000
    Range : 0 ... 19999 =      0.000 ...    99.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 = 

    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=310000
    Range : 0 ... 309999 =      0.000 ...  1549.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=630000
    Range : 0 ... 629999 =      0.000 ...  3149.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=80000
    Range : 0 ... 79999 =      0.000 ...   399.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=540000
    Range : 0 ... 539999 =      0.000 ...  2699.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=230000
    Range : 0 ... 229999 =      0.000 ...  1149.995 secs
Ready.
Creating RawArray with float6

Ready.
Creating RawArray with float64 data, n_channels=20, n_times=540000
    Range : 0 ... 539999 =      0.000 ...  2699.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=100000
    Range : 0 ... 99999 =      0.000 ...   499.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=50000
    Range : 0 ... 49999 =      0.000 ...   249.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=100000
    Range : 0 ... 99999 =      0.000 ...   499.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=20000
    Range : 0 ... 19999 =      0.000 ...    99.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=60000
    Range : 0 ... 59999 =      0.000 ...   299.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=1060000
    Range : 0 ... 1

    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=80000
    Range : 0 ... 79999 =      0.000 ...   399.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=50000
    Range : 0 ... 49999 =      0.000 ...   249.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=600000
    Range : 0 ... 599999 =      0.000 ...  2999.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=200000
    Range : 0 ... 199999 =      0.000 ...   999.995 secs
Ready.
Creating RawArray with float64 dat

Ready.
Creating RawArray with float64 data, n_channels=20, n_times=40000
    Range : 0 ... 39999 =      0.000 ...   199.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=50000
    Range : 0 ... 49999 =      0.000 ...   249.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=80000
    Range : 0 ... 79999 =      0.000 ...   399.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=60000
    Range : 0 ... 59999 =      0.000 ...   299.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=20000
    Range : 0 ... 19999 =      0.000 ...    99.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =  

    Range : 0 ... 449999 =      0.000 ...  2249.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=20000
    Range : 0 ... 19999 =      0.000 ...    99.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=50000
    Range : 0 ... 49999 =      0.000 ...   249.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=430000
    Range : 0 ... 429999 =      0.000 ...  2149.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=30000
    Range : 0 ... 29999 =      0.000 ...   149.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=40000
    Range : 0 ... 39999 =      0.000 ...   199.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=70000
    Range : 0 ... 69999 =      0.000 ...   349.995 secs
Ready.
Creating RawArray with float64 d

In [45]:
other_band_df

Unnamed: 0,Fp1_delta,Fp1_theta,Fp1_alpha,Fp1_beta,Fp1_gamma,F3_delta,F3_theta,F3_alpha,F3_beta,F3_gamma,...,T6_alpha,T6_beta,T6_gamma,O2_delta,O2_theta,O2_alpha,O2_beta,O2_gamma,patient_id,segment_idx
0,0.873875,0.024793,0.009951,0.008037,0.006614,0.797807,0.048296,0.014525,0.010866,0.073873,...,0.013854,0.011357,0.073922,0.805501,0.079954,0.010912,0.012005,0.063105,0,0
1,0.893718,0.013167,0.020272,0.020496,0.004838,0.724514,0.024545,0.016274,0.017523,0.186106,...,0.019562,0.019056,0.113812,0.784289,0.046815,0.015118,0.018854,0.092235,0,1
2,0.831039,0.028669,0.022758,0.022194,0.005297,0.727416,0.027063,0.020194,0.017829,0.134972,...,0.014285,0.015780,0.087268,0.813517,0.036729,0.013620,0.015903,0.071548,0,2
3,0.835627,0.045243,0.017670,0.011373,0.007576,0.720713,0.049159,0.015016,0.012369,0.134079,...,0.016389,0.012264,0.095388,0.828913,0.028960,0.012837,0.012608,0.081821,0,3
4,0.814237,0.044268,0.022961,0.016622,0.004930,0.769363,0.035808,0.020817,0.010391,0.080948,...,0.026290,0.023046,0.087322,0.808283,0.046559,0.022635,0.020682,0.071968,0,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1952,0.532840,0.037277,0.038241,0.130056,0.191200,0.672104,0.100924,0.049316,0.099561,0.032166,...,0.031351,0.047890,0.082293,0.879723,0.019982,0.019502,0.010378,0.017501,22,0
1953,0.772732,0.044988,0.026833,0.032355,0.050461,0.619722,0.086896,0.067787,0.094173,0.092466,...,0.015287,0.162635,0.602849,0.258310,0.052804,0.041447,0.130090,0.499418,23,0
1954,0.776403,0.037602,0.025436,0.051686,0.047832,0.667907,0.065614,0.056118,0.100016,0.076022,...,0.018617,0.144183,0.589753,0.308150,0.059612,0.039597,0.136117,0.440514,23,1
1955,0.753272,0.043242,0.033843,0.040734,0.046945,0.579593,0.087471,0.085798,0.119413,0.090069,...,0.015121,0.146290,0.599183,0.287962,0.050806,0.039303,0.122679,0.483276,23,2


In [46]:
y_vals = pd.DataFrame()
for p in other_ids:
    target_vals = other_df[other_df['patient_id'] == p]['expert_consensus']
    y_vals = pd.concat([y_vals, target_vals], ignore_index = True)

In [47]:
y_vals.shape

(15061, 1)

In [62]:
i_vals = [i * 25 for i in range(6)]
other_band_df = pd.DataFrame()
for i in i_vals:
    patient_list = [other_ids[j] for j in range(i, i + 25)]
    raw_list = get_raw_list(other_df, patient_list)
    band_subset = band_feature_df(raw_list)
    other_band_df = pd.concat([other_band_df, band_subset], ignore_index = True)

Creating RawArray with float64 data, n_channels=20, n_times=520000
    Range : 0 ... 519999 =      0.000 ...  2599.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=220000
    Range : 0 ... 219999 =      0.000 ...  1099.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=250000
    Range : 0 ... 249999 =      0.000 ...  1249.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=90000
    Range : 0 ... 89999 =      0.000 ...   449.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=60000
    Range : 0 ... 59999 =      0.000 ...   299.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=30000
    Range : 0 ... 29999 =      0.000 ...   149.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=900000
    Range : 0 ... 899999 

    Range : 0 ... 349999 =      0.000 ...  1749.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=400000
    Range : 0 ... 399999 =      0.000 ...  1999.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=530000
    Range : 0 ... 529999 =      0.000 ...  2649.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=430000
    Range : 0 ... 429999 =      0.000 ...  2149.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=20000
    Range : 0 ... 19999 =      0.000 ...    99.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=20000
    Range : 0 ... 19999 =      0.000 ...    99.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=40000
    Range : 0 ... 39999 =      0.000 ...   199.995 secs
Ready.
Creating RawArray with float

Ready.
Creating RawArray with float64 data, n_channels=20, n_times=60000
    Range : 0 ... 59999 =      0.000 ...   299.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=30000
    Range : 0 ... 29999 =      0.000 ...   149.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=40000
    Range : 0 ... 39999 =      0.000 ...   199.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=30000
    Range : 0 ... 29999 =      0.000 ...   149.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=90000
    Range : 0 ... 89999 =      0.000 ...   449.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=50000
    Range : 0 ... 49999 =      0.000 ...   249.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=60000
    Range : 0 ... 59999 =      0.000 ...   299.995 secs
Ready.
Creating RawArray with float64 data, n_channels=20, n_times=40000
    Range : 0 ... 39999 

In [63]:
other_band_df

Unnamed: 0,Fp1_delta,Fp1_theta,Fp1_alpha,Fp1_beta,Fp1_gamma,F3_delta,F3_theta,F3_alpha,F3_beta,F3_gamma,...,T6_alpha,T6_beta,T6_gamma,O2_delta,O2_theta,O2_alpha,O2_beta,O2_gamma,patient_id,segment_idx
0,0.873875,0.024793,0.009951,0.008037,0.006614,0.797807,0.048296,0.014525,0.010866,0.073873,...,0.013854,0.011357,0.073922,0.805501,0.079954,0.010912,0.012005,0.063105,0,0
1,0.893718,0.013167,0.020272,0.020496,0.004838,0.724514,0.024545,0.016274,0.017523,0.186106,...,0.019562,0.019056,0.113812,0.784289,0.046815,0.015118,0.018854,0.092235,0,1
2,0.831039,0.028669,0.022758,0.022194,0.005297,0.727416,0.027063,0.020194,0.017829,0.134972,...,0.014285,0.015780,0.087268,0.813517,0.036729,0.013620,0.015903,0.071548,0,2
3,0.835627,0.045243,0.017670,0.011373,0.007576,0.720713,0.049159,0.015016,0.012369,0.134079,...,0.016389,0.012264,0.095388,0.828913,0.028960,0.012837,0.012608,0.081821,0,3
4,0.814237,0.044268,0.022961,0.016622,0.004930,0.769363,0.035808,0.020817,0.010391,0.080948,...,0.026290,0.023046,0.087322,0.808283,0.046559,0.022635,0.020682,0.071968,0,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1952,0.532840,0.037277,0.038241,0.130056,0.191200,0.672104,0.100924,0.049316,0.099561,0.032166,...,0.031351,0.047890,0.082293,0.879723,0.019982,0.019502,0.010378,0.017501,22,0
1953,0.772732,0.044988,0.026833,0.032355,0.050461,0.619722,0.086896,0.067787,0.094173,0.092466,...,0.015287,0.162635,0.602849,0.258310,0.052804,0.041447,0.130090,0.499418,23,0
1954,0.776403,0.037602,0.025436,0.051686,0.047832,0.667907,0.065614,0.056118,0.100016,0.076022,...,0.018617,0.144183,0.589753,0.308150,0.059612,0.039597,0.136117,0.440514,23,1
1955,0.753272,0.043242,0.033843,0.040734,0.046945,0.579593,0.087471,0.085798,0.119413,0.090069,...,0.015121,0.146290,0.599183,0.287962,0.050806,0.039303,0.122679,0.483276,23,2


In [40]:
segment_length = 10000    # length of each sub EEG = 10k 

all_band_df = pd.DataFrame()

# raw_list = Raw objects for each patient (i.e. list of patients as RawArray objects) 
# 
# example: raw_list = [patient_0_RawArray, patient_1_RawArray, ..., patient_N_RawArray]
for patient_idx, raw in enumerate(raw_list):
    total_samples = raw.n_times                    # n_times = total sample count 
    n_segments = total_samples // segment_length   # integer division to get full segments only 
    
    # loop over each segment for the current patient 
    for seg_idx in range(n_segments):
        start_sample = seg_idx * segment_length     # start_sample = 0 * 10000 = 0 
        stop_sample = start_sample + segment_length # stop_sample = 0 + 10000 = 10000  
        
        # NOTE: to ensure this code is as computationally efficient as possible, 
        # make sure to use raw.get_data() to only extract the necessary parts of 
        # the patient's data at this iteration. 
        segment_data = raw.get_data(start = start_sample, stop = stop_sample, picks = 'eeg')
        
        band_features_df = freq_band_features(segment_data)
        
        # Add some metadata to band dataframe 
        band_features_df['patient_id'] = patient_idx 
        band_features_df['segment_idx'] = seg_idx 
        
        # all_band_df = 
        all_band_df = pd.concat([all_band_df, band_features_df], ignore_index = True)
        
# confirm shape of data 
all_band_df.shape

(50, 97)