### Working on MNE Pipeline for Preprocessing and Feature Extraction  
  
The goal for this notebook is to get a working pipeline written that loads a sub EEG into MNE, scales the sub EEG, loads that back into MNE as a RawArray with the info from the beginning used for info, bandpass filters the scaled sub EEG, and extracts the band power features that were used in the base models. As I move forward with this project, this pipeline will have more steps added to it. This includes notch filtering, re-referencing, and artifact handling.

In [1]:
import numpy as np
import pandas as pd
import fastparquet, pyarrow
import matplotlib.pyplot as plt
import mne
from mne_features.univariate import compute_pow_freq_bands

In [2]:
df = pd.read_csv('by_patient.csv')

Splitting metadata by activity type

In [3]:
def activity_df(activity):
    return pd.DataFrame(df[df['expert_consensus'] == activity]).reset_index().drop(columns = 'index')

other_df = activity_df('Other')
seizure_df = activity_df('Seizure')
lpd_df = activity_df('LPD')
gpd_df = activity_df('GPD')
lrda_df = activity_df('LRDA')
grda_df = activity_df('GRDA')

Loading in specified sub eeg

In [4]:
def get_sub_eeg(data, row):
    whole_eeg = pd.read_parquet('train_eegs/{}.parquet'.format(data['eeg_id'][row]), engine = 'pyarrow')
    start = int(data['eeg_label_offset_seconds'][row] * 200)
    stop = start + 10000
    sub_eeg = whole_eeg[start: stop].reset_index().drop(columns = 'index')
    return sub_eeg

Loading sub eeg into MNE as a RawArray object and then setting up the entire sub EEG as one epoch so that the dimensions fit what MNE's scaler expect. After scaling, the scaled sub EEG data is loaded back into MNE as a RawArray object, but Raw Array objects are meant to be 2D. This is resolved by indexing. The first and only epoch of the scaled sub EEG is selected. Doing this takes the dimensions from (1, 20, 10000) to (20, 10000).

In [5]:
def initialize_eeg(data, row):
    sub_eeg = get_sub_eeg(data, row)
    info = mne.create_info(
        sub_eeg.columns.to_list(),
        ch_types=(["eeg"]*(len(sub_eeg.columns)-1))+['ecg'],
        sfreq=200
    )
    info.set_montage("standard_1020")

    raw = mne.io.RawArray(
        sub_eeg.to_numpy().T,
        info
    )
    return raw

def epoch_eeg(data, row):
    raw = initialize_eeg(data, row)
    return mne.make_fixed_length_epochs(raw, duration = 50.0)

In [6]:
from mne.decoding import Scaler

In [7]:
def mne_scale(raw_data, raw_info):
    scaler = Scaler(scalings = 'mean')
    scaled_eeg = scaler.fit_transform(raw_data)
    raw_scaled = mne.io.RawArray(scaled_eeg[0], raw_info)
    return raw_scaled

### MNE Pipeline  
  
The goal is to have a single function that is called which will run through a series of functions and output feature data to be stored in a feature set. As of now, the pipeline loads in a sub EEG, stores that in MNE, establishes one epoch per sub EEG which encapsulates the entire sub EEG and is done simply to add a third dimension for the purpose of scaling. Then the sub EEG is scaled with MNE's scaler and the resulting sub EEG is made two dimensional and read back into MNE as a RawArray object.

In [8]:
def load_scale(data, row):
    epoched_raw = epoch_eeg(data, row)
    scaled_raw = mne_scale(epoched_raw.get_data(), epoched_raw.info)
    return scaled_raw

In [9]:
scaled_raw = load_scale(lrda_df, 0)

Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.
Not setting metadata
1 matching events found
No baseline correction applied
0 projection items activated
Using data from preloaded Raw for 1 events and 10000 original time points ...
0 bad epochs dropped
Creating RawArray with float64 data, n_channels=20, n_times=10000
    Range : 0 ... 9999 =      0.000 ...    49.995 secs
Ready.


### Next Step  
  
The code above loads and scales a sub EEG with MNE. The next step is filtering which can be done by running scaled_raw.filter with l_freq set to 1 and h_freq set to 70. After that, I need to take the function for calculating band powers and use it in a function that loops over a random set of indexes for each activity df a generates a band power feature set.