In [1]:
from scipy.io import loadmat

import mne
import pandas as pd
import numpy as np

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import MinMaxScaler

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (Dense, Dropout, Flatten, Conv1D, 
                                     MaxPooling1D, GlobalAveragePooling1D)
from tensorflow.keras import utils

import itertools

import matplotlib.pyplot as plt

from IPython.utils import io

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


### Common Spatial Patterns (CSP)

Next, we will implement and test CSP against our data to try and improve our predictive ability. In general, CSP is a signal processing technique (particularly for classification problems) in which multivariate signals (e.g., an EEG device with 30 electrodes, like we are using here) are separated into subcomponents which maximize the differences between the classes of signal.

In practice, this will collapse our dataset for each test from an array of 30 channels x thousands of samples to just a few vector values. We will perform a gridsearch to find the ideal number of vectors for our problem, as well as whether a neural network or Linear Discriminant Analysis is the best modeling tool to make class predictions based on those CSP vectors.

### First, let's ingest our data using the data ingester we built
After running, this script will load several dictionaries into memory, as well as other needed objects:
1. raw_dict - containing MNE raw objects with all the data
2. event_dict - which indicates the sample number at which each stimulus was applied
3. y_dict - which has the type of experiment conducted in each trial
4. info -  file used to create MNE raw objects including channel names, type, and sampling frequency
5. events_explained - dictionary which provides the names for each of the five trial types
6. ch_names - list of all channel names

In [2]:
%run data_ingester.py

### Let's find the best parameters to use to create the clearest differentiation using CSP

In this grid search we'll be doing changing three major kinds of parameters to optimize our CSP settings: 

1. A smaller subset of the preprocessing options we tested using a CNN
2. The CSP parameters to use in CSP feature extraction
3. Whether an LDA or simple neural network makes more successful predictions based on CSP features

Later, we will refine and iterate on the models we use to make predictions using this CSP data, but for now we will use a straightforward LDA and shallow neural network to test which sets of parameters lead to the most differentiable CSP features.

Here is a full explanation of the CSP parameters and models we will be testing. For a full explanation of the preprocessing options, see the preprocessing options grid search notebook:

1. Preprocessing options
2. CSP parameters
    - Number of CSP components to create (n_components) and transform data into
    - Whether covariance matrices are created based on epochs concatenated together or on individual epochs and concatenated (cov_est)
    - Whether a log transform is applied to standardize features
3. Model to run through
    - LDA
    - Shallow neural network
4. Which pairwise combination of trials to compare
    - (1, 5) Word association vs imagining foot movement
    - (1, 4) Word association vs imagining hand movement
    - (2, 4) Mental subtraction vs imagining hand movement
    - (1, 3) Word association vs mental navigation
    
**Also note that these CSP models will be created individually (i.e., looking at the data for only one participant in training and validation)**

In [2]:
#Preprocessing parameters to gridsearch
l_freq_filter_options = [None]
h_freq_filter_options = [40]
channels_to_drop_options = [['AFz', 'F7', 'F8']]
baseline_correction_options = [None]
projectors_to_apply_options = [slice(1)] #check that this generated best results
selected_frequency_options = [256]
tmin_options = [1] #Later start to avoid initialization of thought pattern
tmax_options = [4.5]
detrend_options = [None]
reject_options = [{'eeg': 150}] #Customize per subject later
flat_options = [{'eeg': 20}] #Customize per subject later
ica_to_exclude_options = [None] #Incorporate later if helpful in other gridsearch
scaler_options = ['robust', None] #Test no scaler for CSP
#CSP parameters to gridsearch
n_components_options = [4, 12, 24]
cov_est_options = ['concat', 'epoch']
log_options = [True, False]
#Model to run through
model_type_options = ['NN', 'LDA']
#Combinations of trial types to compare
trial_combo_options = [(1, 5), (1, 4), (2, 4), (1, 3)]

In [3]:
#Create column names for test dataframe
columns = ['l_freq_filter',
           'h_freq_filter',
           'channels_to_drop',
           'baseline_correction',
           'projectors_to_apply',
           'selected_frequency',
           'tmin',
           'tmax',
           'detrend',
           'reject',
           'flat',
           'ica_to_exclude',
           'scaler', 
           'n_components',
           'cov_est', 
           'log', 
           'model_type',
           'trial_combo']

In [4]:
#Create dataframe with all combinations of tests as rows
test_df = pd.DataFrame(itertools.product(l_freq_filter_options, 
                                         h_freq_filter_options, 
                                         channels_to_drop_options, 
                                         baseline_correction_options, 
                                         projectors_to_apply_options, 
                                         selected_frequency_options,
                                         tmin_options,
                                         tmax_options, 
                                         detrend_options,
                                         reject_options,
                                         flat_options,
                                         ica_to_exclude_options,
                                         scaler_options, 
                                         n_components_options, 
                                         cov_est_options, 
                                         log_options, 
                                         model_type_options, 
                                         trial_combo_options), 
                      columns=columns)

In [5]:
#Append columns for each subject, where we will record results for each test
subject_columns = ['sub_A', 'sub_C', 'sub_D', 'sub_E', 'sub_F', 'sub_G', 
                   'sub_H', 'sub_J', 'sub_L']

In [6]:
#Add those combos to our test_df to save highest val accuracy achieved
test_df = test_df.reindex(columns=columns + subject_columns)

In [7]:
test_df.shape

(1152, 27)

### Let's test these options

**set number of ICA components to create before running**

In [None]:
for row in range(test_df.shape[0]):
    #Load each sessions data into an MNE raw object
    raw_dict = {}
    for key, value in data_dict.items():
        raw_dict[key] = mne.io.RawArray(value.T, info, verbose=0)
    
    #Filter data with bandpass. Note raw.filter applies in place
    for key, value in raw_dict.items():
        value.filter(l_freq=test_df.l_freq_filter[row], 
                     h_freq=test_df.h_freq_filter[row], 
                     method='fir', phase='zero', verbose=0)

    #Create epoch object with our raw objects and events arrays
    channels_to_keep = [ch for ch in ch_names if 
                        ch not in test_df.channels_to_drop[row]]
    epoch_dict = {}
    for key, value in raw_dict.items():
        epoch_dict[key] = mne.Epochs(value, events=event_dict[key], 
                                    event_id=events_explained, 
                                    tmin=-3, tmax=test_df.tmax[row], 
                                    baseline=test_df.baseline_correction[row],
                                    preload=True,
                                    picks=channels_to_keep, verbose=0,
                                    detrend=test_df.detrend[row],
                                    reject=test_df.reject[row],
                                    flat=test_df.flat[row],
                                    reject_tmin=test_df.tmin[row],
                                    reject_tmax=test_df.tmax[row])

    #Skip creating projectors step to save compute time if not being
    #applied in this iteration
    if test_df.projectors_to_apply[row]:
        #Create dictionary of top 5 signal space projection vectors for each epoch
        proj_dict = {}
        for key, value in epoch_dict.items():
            proj_dict[key] = mne.compute_proj_epochs(value, n_eeg=1, verbose=0)
        #apply projectors
        for key, value in epoch_dict.items():
            value.add_proj(proj_dict[key][test_df.projectors_to_apply[row]], 
                           verbose=0)
            value.apply_proj(verbose=0)

    #Skip creating ICA components step to save compute time if not
    #being applied in this iteration
    if test_df.ica_to_exclude[row]:
        #create and fit ICA object to epochs
        for key, value in epoch_dict.items():
            ica = mne.preprocessing.ICA(n_components=5, method='picard', 
                                        max_iter='auto', verbose=0)
            ica.fit(value, verbose=0)
            #Apply the ICA
            ica.apply(value, exclude=test_df.ica_to_exclude[row],
                     verbose=0)

    #Resample the data at a new frequency
    for key, value in epoch_dict.items():
        value.resample(sfreq=test_df.selected_frequency[row])

    #Extract and standard scale data from all non-dropped epochs
    #Creates intermediate data dictionary
    int_data_dict = {}
    #Use robust sklearn scaler
    if test_df.scaler[row] == 'robust':
        mne_scaler = mne.decoding.Scaler(scalings='median')
        for key, value in epoch_dict.items():
            #with scalings=median implements sklearn robust scaler
            int_data_dict[key] = (mne_scaler.
                                  fit_transform(value.
                                                get_data(tmin=test_df.tmin[row], 
                                                         tmax=test_df.tmax[row])))
    #No scaling option
    if test_df.scaler[row] is None:
        for key, value in epoch_dict.items():
            int_data_dict[key] = value.get_data(tmin=test_df.tmin[row], 
                                                  tmax=test_df.tmax[row])
    
    #Create updated dictionary of y values to reflect dropped epochs
    int_y_dict = {}
    for key, value in y_dict.items():
        temp_y_list = []
        for i, epoch in enumerate(epoch_dict[key].drop_log):
    #MNE drop log shows empty parens for epochs that were not dropped - 
    #these are the trials we are keeping in each iteration
            if epoch == ():
                temp_y_list.append(value[i])
        int_y_dict[key] = temp_y_list
    
    #Assemble final y dict with only trials in our current combo
    #In each combo, coding 1st trial type to 0, 2nd trial type to 1
    final_y_dict = {}
    for key, value in int_y_dict.items():
        temp_y_list = []
        for y in value:
            if y == test_df.trial_combo[row][0]:
                temp_y_list.append(0)
            if y == test_df.trial_combo[row][1]:
                temp_y_list.append(1)
        final_y_dict[key] = np.array(temp_y_list)

    #Assemble data dict with only trials in our current combo
    final_data_dict = {}
    for key, value in int_data_dict.items():
        index_list = []
        for i, y in enumerate(int_y_dict[key]):
            if (y == test_df.trial_combo[row][0] or 
                y == test_df.trial_combo[row][1]):
                index_list.append(i)
        final_data_dict[key] = value[index_list]

    #Create csp_dict of csp objects
    csp_dict = {}
    for key, value in epoch_dict.items():
        #Only want to create csp objects for our train data - from session 1
        if 'sesh_1' in key:
            csp_dict[key] = mne.decoding.CSP(n_components=int(test_df.n_components[row]), 
                                             cov_est=test_df.cov_est[row], 
                                             log=bool(test_df.log[row]));

    #Suppress output from this noisy function with no verbose option
    with io.capture_output() as captured:
    #Fit csp objects to training data from session 1        
        for key, value in csp_dict.items():
            value.fit(X=final_data_dict[key], 
                      y=final_y_dict[key]);

    #Use csp objects to transform and save resulting data
    csp_data_dict = {}
    for key, value in csp_dict.items():
        csp_data_dict[key] = value.transform(final_data_dict[key]);
        key2 = key.replace('1', '2')
        csp_data_dict[key2] = value.transform(final_data_dict[key2]);


    #Model against our data for each subject and save the resulting score
    #First, LDA model
    if test_df.model_type[row] == 'LDA':
        #Base LDA objects on csp dict keys so only created for sesh 1
        lda_dict = {}
        for key in csp_dict.keys():
            lda_dict[key] = LinearDiscriminantAnalysis()

        #Fit LDA objects to training data from sesh 1
        for key, value in lda_dict.items():
            value.fit(csp_data_dict[key], final_y_dict[key])
            
        #Score on testing data from sesh 2 and save in test_df
        for key, value in lda_dict.items():
            key2 = key.replace('1', '2')
            subject = key[:5]
            test_df.at[row, subject] = value.score(csp_data_dict[key2], 
                                               final_y_dict[key2])
    
    #Neural network
    if test_df.model_type[row] == 'NN':
        #Build model for each subject
        #Base NN analysis on csp dict keys so only created for sesh 1
        for key in csp_dict.keys():      
            #Build model
            model = Sequential()
            #inputs qre equal to n_components created via CSP
            model.add(Dense(test_df.n_components[row], 
                               input_dim=test_df.n_components[row], 
                               activation='relu'))
            model.add(Dropout(0.2))
            #Add hidden layer with half as many nodes as input
            model.add(Dense(test_df.n_components[row]/2, activation='relu'))
            model.add(Dropout(0.2))
            #Hidden layer with 1/4 as many nodes as input
            model.add(Dense(test_df.n_components[row]/4, activation='relu'))
            model.add(Dropout(0.2))
            #output layer
            model.add(Dense(1, activation='sigmoid'))
            
            #Compile model
            model.compile(loss='binary_crossentropy', 
                          optimizer='adam', 
                          metrics=['acc'])
            
            #Fit model
            key2 = key.replace('1', '2')
            history = model.fit(csp_data_dict[key], final_y_dict[key], 
                                validation_data=(csp_data_dict[key2], 
                                                 final_y_dict[key2]), 
                                epochs=5, verbose=0)
            
            #Save validation accuracy into dataframe
            subject = key[:5]
            test_df.at[row, subject] = max(history.history['val_acc'])
    test_df.to_csv('data/csp_grid_search.csv', index=False)
    if row % 20 == 0:
        print(f'Grid search complete through row {row} of {test_df.shape[0]}')

### Model refinement

Ensemble model testing - later
- Testing LDA on individual model vs NN on individual model vs NN on heigher sample weighted but all data ingested model

**I think if transform into is CSP space it will yield data in a shape that I should still feed it into a CNN - don't think that really belongs in this test set. Do that separately.**

**Grid search for reject and flat settings to maintain 90%, 75%, 60% of epochs for each trial participant, right now we're dropping way more with some participants**

Grid search sending in a shorter period of data into the CSP

Grid search parameters for LDA if it is performing better than simple NN

Test pulling more time shifted samples

5. Resampling our training data (e.g., including -0.3 to 4.9, 0. to 5.2, and 0.3 to 5.5) to give our model more data to train on

When doing the individual models, I can set the rejection criteria for epochs to match the individual in question better - if I have time.

Read this article for other preprocessing ideas after getting dropping epochs set up: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5915520/

Another thing I could play around with:

- EEGLib
    - https://www.sciencedirect.com/science/article/pii/S2352711021000753
    - Primarily used for feature extraction after the data has been processed, but it does have some preprocessing capability
    - Appears to be written to allow visual inspection of data and then creation of features based on the selected point - certainly worth investigating