## Tutorial #4: Applying Machine Learning Methods to EEG Data

In this tutorial, machine learning methods that are commonly preferred for EEG analysis will be discussed. Additionally different techinques for approaching a classification problem on eeg data will be demonstrated. 

##### Dataset:
As the dataset for this tutorial, 'Emotion-Antecedent Appraisal Checks:
EEG and EMG data sets for Novelty and Pleasantness' selected. In this dataset, there are 26 participants, each have varying number of trials. There are three main categories as Target, Novel and Familar. Athough 70% of all images are in familiar category, 20% of them belongs to novel category and the rest is Target, each category has equal portion of Pleasant, Unpleasant and Neutral color images. If an image among 'Pleasant' images is selected and if it is also familiar to the participant, then the trial that is conducted with this image have 'FP' (Familiar-Pleasant) label. 
Therefore, in total there are 6 categories in this dataset. However number of categories may be changed by combining trials with the same familiarity level or the same pleasantness level. 

Since familar images forms the 70% of the dataset, trials with familiar pictures will be used throughout this tutorial to analyse effects of stimuli with different pleasantness levels on eeg.

Note that, only one participant will be used in this tutorial.



##### Machine Learning Methods:
For modelling eeg data, there are three common methods: Support Vector Machines, Linear Discriminant Analysis and Logistic Regression. These methods will be employed for modeling two class problems. In the end, we will get 3 models per method.   

In [53]:
# Load necessary libraries
import mne
from mne.decoding import Vectorizer

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_val_score, train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report, accuracy_score

# Models
from sklearn import svm
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.linear_model import LogisticRegression

As the first step, load epoch data from file. Please note that originally, this dataset is given in csv file. However it cannot be used directly as it is with MNE library. Therefore, if you want to proceed this tutorial, firstly go through the tutorial named 'study1'. 

In [17]:
data_file = '../../study1/study1_eeg/epochdata/P-01'

# Read the EEG epochs:
epochs = mne.read_epochs(data_file + '.fif')
print(len(epochs))

Reading ../../study1/study1_eeg/epochdata/P-01.fif ...
    Found the data of interest:
        t =       0.00 ...    1496.09 ms
        0 CTF compensation matrices available
484 matching events found
No baseline correction applied
Not setting metadata
0 projection items activated
484


  epochs = mne.read_epochs(data_file + '.fif')


In [18]:
epochs.event_id

{'FU': 0, 'FN': 1, 'NP': 2, 'FP': 3, 'NU': 4, 'NN': 5}

Keep familiar events only. Among FU, FN and FP events, create datasets with all possible event type pairs to build models for binary classification.

In [73]:
epochs_UN = epochs['FU', 'FN']
epochs_UP = epochs['FU', 'FP']
epochs_NP = epochs['FN', 'FP']

### Example #1:  Classification between Unpleasant and Neutral Events

Get data and the label for the dataset with unpleasant and neutral events.

In [20]:
# Dataset with unpleasant and neutral events
data_UN = epochs_UN.get_data()
labels_UN = epochs_UN.events[:,-1]

Split dataset into two sub-datasets as training set and test set with 70 - 30 ratio. 

In [21]:
train_data_UN, test_data_UN, labels_train_UN, labels_test_UN = train_test_split(data_UN, labels_UN, test_size=0.3, random_state=42)

Construct the pipeline with make_pipeline() function of sklearn library. The steps should be defined in the order of execution. Apart from the classifier, Vectorizer() and StandardScaler() will be used. The purpose of using Vectorizer is to convert eeg data from (n_epochs, n_channels, n_times) structure to a vector of (samples x channels) form. 

After having vectorized data, StandardScaler standardizes data feature-wise by following standard score technique which has the formulation of z = (x - u) / s. In this formula, u is the mean of the feature and s is the standard deviation of the same feature. By applying this technique, each feature's mean and standard deviation will become zero and one respectively. Standardization of features, which are eeg channels in our case, prevent domination of a channel(or a feature) just becuase it contains larger variation.

The final parameter of make_pipeline() will be the machine learning model. In the folowing example, the selected model is support vector machine model with rbf kernel and penalty parameter C=1. 

In [22]:
clf_svm_0 = make_pipeline(Vectorizer(), StandardScaler(), svm.SVC(kernel='rbf', C=1))

If the hyperparameters for the classifier is determined in advance, they can be directly fixed as in above cell. However, usually it is quite difficult to determine optimal hyperparameters before testing different values for each parameter. If we test each parameter manually, we end up with another issue named overfitting. In such a case we would train the classifier with training set and test the performance on test set. So, we optimized parameters on the test set. In general, test set is smaller that training set and it may change when the trained classifier is being started to use in real systems. Therefore, having an overfitted model to test set would not provide a good generalization. 

For this problem cross validation would be a solution. In this approach, training set is devided into equally sized folds and trials run on different folds. In this way, hypeparameters will not be optimized on just one dataset, instead different chuncks of data will be used for hyperparameter optimization. Note that, at each trial one fold will be excluded from the training set and excluded fold will be used as test set during evaluation.

For cross validation, sklearn libarary has a method cross_val_score which runs cross validation and calculates and returns the accuracy of each fold. Number of folds is parameter for this function but as a heuristic, the 5-fold or 10-fold cross validation is preferred. 

In [23]:
clf_svm_0 = make_pipeline(Vectorizer(), StandardScaler(), svm.SVC(kernel='rbf', C=1))
scores = cross_val_score(clf_svm_0, data_UN, labels_UN, cv=5)
for i in range(len(scores)):   
    print('Accuracy of ' + str(i+1) + 'th fold is ' + str(scores[i]) + '\n')

Accuracy of 1th fold is 0.6470588235294118

Accuracy of 2th fold is 0.64

Accuracy of 3th fold is 0.76

Accuracy of 4th fold is 0.6326530612244898

Accuracy of 5th fold is 0.673469387755102



Another option is GridSearchCV which searchs the best performing parameters among the given list of possible parameter values exhaustedly. You can specify the scoring method and cross validation strategy inside GridSearchCV.

In [34]:
#svm
clf_svm_pip = make_pipeline(Vectorizer(), StandardScaler(), svm.SVC())
parameters = {'svc__kernel':['linear', 'rbf', 'sigmoid'], 'svc__C':[0.1, 1, 10]}
gs_cv_svm = GridSearchCV(clf_svm_pip, parameters, scoring='accuracy', cv=StratifiedKFold(n_splits=5), return_train_score=True)

Train the classifier by passing training data and thier labels to fit() function. 

In [35]:
gs_cv_svm.fit(train_data_UN, labels_train_UN)
print('Best Parameters: {}'.format(gs_cv_svm.best_params_))
print('Best Score: {}'.format(gs_cv_svm.best_score_))



Best Parameters: {'svc__C': 0.1, 'svc__kernel': 'linear'}
Best Score: 0.7011494252873564


Finally, evaluate the model on test set. 

In [54]:
predictions_svm = gs_cv_svm.predict(test_data_UN)
report_svm = classification_report(labels_test_UN, predictions_svm, target_names=['Unpleasant', 'Neutral'])
print('SVM Clasification Report:\n {}'.format(report_svm))
acc_svm = accuracy_score(labels_test_UN, predictions_svm)
print("Accuracy of SVM model: {}".format(acc_svm))

SVM Clasification Report:
               precision    recall  f1-score   support

  Unpleasant       0.79      0.59      0.68        37
     Neutral       0.68      0.84      0.75        38

    accuracy                           0.72        75
   macro avg       0.73      0.72      0.71        75
weighted avg       0.73      0.72      0.72        75

Accuracy of SVM model: 0.72


The same steps can be applied to train any other machine learning models.

1. Classification between Unpleasant and Neutral events:

The example above classifies unpleasant and neutral events with SVM. In the following cells, Logistic Regression and LDA will be built for the same classification task and then, performance of three different models will be compared.

In [56]:
# Logistic Regression
clf_lr_pip = make_pipeline(Vectorizer(), StandardScaler(), LogisticRegression())
parameters = {'logisticregression__penalty':['l1', 'l2']}
gs_cv_lr = GridSearchCV(clf_lr_pip, parameters, scoring='accuracy')
gs_cv_lr.fit(train_data_UN, labels_train_UN)

print('Best Parameters: {}'.format(gs_cv_lr.best_params_))
print('Best Score: {}'.format(gs_cv_lr.best_score_))

predictions_lr = gs_cv_lr.predict(test_data_UN)
report_lr = classification_report(labels_test_UN, predictions_lr, target_names=['Unpleasant', 'Neutral'])
print('LR Clasification Report:\n {}'.format(report_lr))
acc_lr = accuracy_score(labels_test_UN, predictions_lr)
print("Accuracy of LR model: {}".format(acc_lr))



Best Parameters: {'logisticregression__penalty': 'l1'}
Best Score: 0.7701149425287356
LR Clasification Report:
               precision    recall  f1-score   support

  Unpleasant       0.74      0.62      0.68        37
     Neutral       0.68      0.79      0.73        38

    accuracy                           0.71        75
   macro avg       0.71      0.71      0.70        75
weighted avg       0.71      0.71      0.70        75

Accuracy of LR model: 0.7066666666666667


In [55]:
# Linear Discriminant Analysis
clf_lda_pip = make_pipeline(Vectorizer(), StandardScaler(), LinearDiscriminantAnalysis(solver='svd'))
clf_lda_pip.fit(train_data_UN,labels_train_UN)

predictions_lda = clf_lda_pip.predict(test_data_UN)
report_lda = classification_report(labels_test_UN, predictions_lda, target_names=['Unpleasant', 'Neutral'])
print('LDA Clasification Report:\n {}'.format(report_lda))
acc_lda = accuracy_score(labels_test_UN, predictions_lda)
print("Accuracy of LDA model: {}".format(acc_lda))

LDA Clasification Report:
               precision    recall  f1-score   support

  Unpleasant       0.76      0.68      0.71        37
     Neutral       0.71      0.79      0.75        38

    accuracy                           0.73        75
   macro avg       0.74      0.73      0.73        75
weighted avg       0.74      0.73      0.73        75

Accuracy of LDA model: 0.7333333333333333




In [59]:
print('accuracy of SVM = {0}\naccuracy of LR = {1}\naccuracy of LDA = {2}'.format(acc_svm, acc_lr, acc_lda))

accuracy of SVM = 0.72
accuracy of LR = 0.7066666666666667
accuracy of LDA = 0.7333333333333333


### Example #2:  Classification between Unpleasant and Pleasant Events

In [None]:
# Prepare dataset

In [90]:
# Dataset with unpleasant and pleasant events
data_UP = epochs_UP.get_data()
labels_UP = epochs_UP.events[:,-1]
train_data_UP, test_data_UP, labels_train_UP, labels_test_UP = train_test_split(data_UP, labels_UP, test_size=0.3, random_state=42)


Building SVM, LR and LDA models as in the first example.

In [84]:
# SVM
clf_svm_pip = make_pipeline(Vectorizer(), StandardScaler(), svm.SVC())
parameters = {'svc__kernel':['linear', 'rbf', 'sigmoid'], 'svc__C':[0.1, 1, 10]}
gs_cv_svm = GridSearchCV(clf_svm_pip, parameters, scoring='accuracy', cv=StratifiedKFold(n_splits=5), return_train_score=True)
gs_cv_svm.fit(train_data_UP, labels_train_UP)

print('Best Parameters: {}'.format(gs_cv_svm.best_params_))
print('Best Score: {}'.format(gs_cv_svm.best_score_))

# Make prediction
predictions_svm = gs_cv_svm.predict(test_data_UP)
report_svm = classification_report(labels_test_UP, predictions_svm, target_names=['Unpleasant', 'Pleasant'])
print('SVM Clasification Report:\n {}'.format(report_svm))
acc_svm = accuracy_score(labels_test_UP, predictions_svm)
print("Accuracy of SVM model: {}".format(acc_svm))

77
SVM Clasification Report:
               precision    recall  f1-score   support

  Unpleasant       0.52      0.52      0.52        31
    Pleasant       0.67      0.67      0.67        46

    accuracy                           0.61        77
   macro avg       0.60      0.60      0.60        77
weighted avg       0.61      0.61      0.61        77

Accuracy of SVM model: 0.6103896103896104


In [86]:
#Logistic Regression
clf_lr_pip = make_pipeline(Vectorizer(), StandardScaler(), LogisticRegression())
parameters = {'logisticregression__penalty':['l1', 'l2']}
gs_cv_lr = GridSearchCV(clf_lr_pip, parameters, scoring='accuracy')
gs_cv_lr.fit(train_data_UP, labels_train_UP)

print('Best Parameters: {}'.format(gs_cv_lr.best_params_))
print('Best Score: {}'.format(gs_cv_lr.best_score_))

# Make prediction
predictions_lr = gs_cv_lr.predict(test_data_UP)
report_lr = classification_report(labels_test_UP, predictions_lr, target_names=['Unpleasant', 'Pleasant'])
print('LR Clasification Report:\n {}'.format(report_lr))
acc_lr = accuracy_score(labels_test_UP, predictions_lr)
print("Accuracy of LR model: {}".format(acc_lr))



Best Parameters: {'logisticregression__penalty': 'l1'}
Best Score: 0.6815642458100558
LR Clasification Report:
               precision    recall  f1-score   support

  Unpleasant       0.61      0.61      0.61        31
    Pleasant       0.74      0.74      0.74        46

    accuracy                           0.69        77
   macro avg       0.68      0.68      0.68        77
weighted avg       0.69      0.69      0.69        77

Accuracy of LR model: 0.6883116883116883


In [87]:
clf_lda_pip = make_pipeline(Vectorizer(), StandardScaler(), LinearDiscriminantAnalysis(solver='svd'))
clf_lda_pip.fit(train_data_UP,labels_train_UP)

predictions_lda = clf_lda_pip.predict(test_data_UP)
report_lda = classification_report(labels_test_UP, predictions_lda, target_names=['Unpleasant', 'Plesant'])
print('LDA Clasification Report:\n {}'.format(report_lda))
acc_lda = accuracy_score(labels_test_UP, predictions_lda)
print("Accuracy of LDA model: {}".format(acc_lda))

LDA Clasification Report:
               precision    recall  f1-score   support

  Unpleasant       0.48      0.52      0.50        31
     Plesant       0.66      0.63      0.64        46

    accuracy                           0.58        77
   macro avg       0.57      0.57      0.57        77
weighted avg       0.59      0.58      0.59        77

Accuracy of LDA model: 0.5844155844155844




In [88]:
    print('accuracy of SVM = {0}\naccuracy of LR = {1}\naccuracy of LDA = {2}'.format(acc_svm, acc_lr, acc_lda))                    

accuracy of SVM = 0.6103896103896104
accuracy of LR = 0.6883116883116883
accuracy of LDA = 0.5844155844155844


### Example #3: Classification between Neutral and Pleasant Events

In [89]:
# Dataset with neutral and pleasant events
data_NP = epochs_NP.get_data()
labels_NP = epochs_NP.events[:,-1]
train_data_NP, test_data_NP, labels_train_NP, labels_test_NP = train_test_split(data_NP, labels_NP, test_size=0.3, random_state=42)

In [92]:
# SVM
clf_svm_pip = make_pipeline(Vectorizer(), StandardScaler(), svm.SVC())
parameters = {'svc__kernel':['linear', 'rbf', 'sigmoid'], 'svc__C':[0.1, 1, 10]}
gs_cv_svm = GridSearchCV(clf_svm_pip, parameters, scoring='accuracy', cv=StratifiedKFold(n_splits=5), return_train_score=True)
gs_cv_svm.fit(train_data_NP, labels_train_NP)

print('Best Parameters: {}'.format(gs_cv_svm.best_params_))
print('Best Score: {}'.format(gs_cv_svm.best_score_))

# Make prediction
predictions_svm = gs_cv_svm.predict(test_data_NP)
report_svm = classification_report(labels_test_NP, predictions_svm, target_names=['Neutral', 'Pleasant'])
print('SVM Clasification Report:\n {}'.format(report_svm))
acc_svm = accuracy_score(labels_test_NP, predictions_svm)
print("Accuracy of SVM model: {}".format(acc_svm))



Best Parameters: {'svc__C': 10, 'svc__kernel': 'sigmoid'}
Best Score: 0.6132596685082873
SVM Clasification Report:
               precision    recall  f1-score   support

  Unpleasant       0.71      0.63      0.67        43
    Pleasant       0.60      0.69      0.64        35

    accuracy                           0.65        78
   macro avg       0.66      0.66      0.65        78
weighted avg       0.66      0.65      0.65        78

Accuracy of SVM model: 0.6538461538461539


In [93]:
#Logistic Regression
clf_lr_pip = make_pipeline(Vectorizer(), StandardScaler(), LogisticRegression())
parameters = {'logisticregression__penalty':['l1', 'l2']}
gs_cv_lr = GridSearchCV(clf_lr_pip, parameters, scoring='accuracy')
gs_cv_lr.fit(train_data_NP, labels_train_NP)

print('Best Parameters: {}'.format(gs_cv_lr.best_params_))
print('Best Score: {}'.format(gs_cv_lr.best_score_))

# Make prediction
predictions_lr = gs_cv_lr.predict(test_data_NP)
report_lr = classification_report(labels_test_NP, predictions_lr, target_names=['Neutral', 'Pleasant'])
print('LR Clasification Report:\n {}'.format(report_lr))
acc_lr = accuracy_score(labels_test_NP, predictions_lr)
print("Accuracy of LR model: {}".format(acc_lr))



Best Parameters: {'logisticregression__penalty': 'l1'}
Best Score: 0.6629834254143646
LR Clasification Report:
               precision    recall  f1-score   support

     Neutral       0.79      0.70      0.74        43
    Pleasant       0.68      0.77      0.72        35

    accuracy                           0.73        78
   macro avg       0.73      0.73      0.73        78
weighted avg       0.74      0.73      0.73        78

Accuracy of LR model: 0.7307692307692307


In [96]:
clf_lda_pip = make_pipeline(Vectorizer(), StandardScaler(), LinearDiscriminantAnalysis(solver='svd'))
clf_lda_pip.fit(train_data_NP,labels_train_NP)

predictions_lda = clf_lda_pip.predict(test_data_NP)
report_lda = classification_report(labels_test_NP, predictions_lda, target_names=['Neutral', 'Plesant'])
print('LDA Clasification Report:\n {}'.format(report_lda))
acc_lda = accuracy_score(labels_test_NP, predictions_lda)
print("Accuracy of LDA model: {}".format(acc_lda))

LDA Clasification Report:
               precision    recall  f1-score   support

     Neutral       0.74      0.60      0.67        43
     Plesant       0.60      0.74      0.67        35

    accuracy                           0.67        78
   macro avg       0.67      0.67      0.67        78
weighted avg       0.68      0.67      0.67        78

Accuracy of LDA model: 0.6666666666666666




In [97]:
 print('accuracy of SVM = {0}\naccuracy of LR = {1}\naccuracy of LDA = {2}'.format(acc_svm, acc_lr, acc_lda))                    

accuracy of SVM = 0.6538461538461539
accuracy of LR = 0.7307692307692307
accuracy of LDA = 0.6666666666666666
