## Building a filterbank model with alignment

This notebook is based on the [MNE example](https://mne.tools/dev/auto_examples/decoding/decoding_csp_eeg.html) and illustrates the construction of the filterbank models including alignment steps. Here, we perform cross-subject classification.

First we load the data of two subjects from the EEGBCI dataset: one for train, that we refer as the source subject, and one for test, the target subject.

In [1]:
import numpy as np
import pandas as pd

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import ShuffleSplit, cross_val_score

import mne
from mne import Epochs, pick_types, events_from_annotations
from mne.io import concatenate_raws, read_raw_edf
from mne.datasets import eegbci

from coffeine import compute_coffeine, make_filter_bank_classifier

In [2]:
mne.set_log_level('critical')
pd.set_option("large_repr", "info")

In [3]:
tmin, tmax = -1.0, 4.0
event_id = dict(hands=2, feet=3)
subject_source = 1
subject_target = 5
runs = [6, 10, 14]  # motor imagery: hands vs feet
raw_fnames_source = eegbci.load_data(subject_source, runs)
raw_fnames_target = eegbci.load_data(subject_target, runs)
raw_source = concatenate_raws([read_raw_edf(f, preload=True) for f in raw_fnames_source])
raw_target = concatenate_raws([read_raw_edf(f, preload=True) for f in raw_fnames_target])
eegbci.standardize(raw_source)  # set channel names
eegbci.standardize(raw_target)  # set channel names

In [4]:
# Apply band-pass filter
raw_source.filter(4.0, 35.0, fir_design="firwin", skip_by_annotation="edge")
raw_target.filter(4.0, 35.0, fir_design="firwin", skip_by_annotation="edge")

events_source, _ = events_from_annotations(raw_source, event_id=dict(T1=2, T2=3))
events_target, _ = events_from_annotations(raw_target, event_id=dict(T1=2, T2=3))
picks_source = pick_types(raw_source.info, meg=False, eeg=True, stim=False, eog=False, exclude="bads")
picks_target = pick_types(raw_target.info, meg=False, eeg=True, stim=False, eog=False, exclude="bads")

# Read epochs (train will be done only between 1 and 2s)
# Testing will be done with a running classifier
epochs_source = Epochs(
    raw_source,
    events_source,
    event_id,
    tmin,
    tmax,
    proj=True,
    picks=picks_source,
    baseline=None,
    preload=True,
)
epochs_target = Epochs(
    raw_target,
    events_target,
    event_id,
    tmin,
    tmax,
    proj=True,
    picks=picks_target,
    baseline=None,
    preload=True,
)

labels_source = epochs_source.events[:, -1] - 2
labels_target = epochs_target.events[:, -1] - 2
conditions = ['feet', 'hand']

## Building seperate coffeine dataframes for source and target data

Covariances are computed on pre-defined frequency bands for each subject and dataframes are created with columns corresponding to the frequency bands. The elements of the dataframes are the covariances.

In [5]:
X_df_source, feature_info_source = compute_coffeine(epochs_source, frequencies=('ipeg', ['alpha1', 'alpha2']))
X_df_target, feature_info_target = compute_coffeine(epochs_target, frequencies=('ipeg', ['alpha1', 'alpha2']))
X_df_source.head()

## Comparing classification accuracy with and without alignment

We first construct a model without alignment steps as done in 

In [6]:
fb_model = make_filter_bank_classifier(
    names=list(X_df_source.columns),
    method='riemann',
    projection_params=dict(scale=1, n_compo=60, reg=0),
    estimator=LogisticRegression(solver='liblinear', C=1e7)
)
fb_model.fit(X_df_source, labels_source)
score = fb_model.score(X_df_target, labels_target)

In [7]:
score

0.5333333333333333

In [8]:
fb_model = make_filter_bank_classifier(
    names=list(X_df_source.columns),
    method='riemann',
    alignment=['re-center', 're-scale'],
    # domains=['source']*X_df_source.shape[0],
    projection_params=dict(scale=1, n_compo=60, reg=0),
    estimator=LogisticRegression(solver='liblinear', C=1e7)
)
fb_model.fit(X_df_source, labels_source, domains=['source']*X_df_source.shape[0])
score = fb_model.score(X_df_target, labels_target, domains=['target_domain']*X_df_target.shape[0])

In [9]:
score

0.6222222222222222

In [13]:
cv = ShuffleSplit(10, test_size=0.8, random_state=42)
scores = []
for train_index, test_index in cv.split(X_df_target):
    X_df_target_train = X_df_target.iloc[train_index]
    labels_target_train = labels_target[train_index]
    X_df_target_test = X_df_target.iloc[test_index]
    labels_target_test = labels_target[test_index]
    X_df_train = pd.concat([X_df_source, X_df_target_train])
    y_train = np.concatenate([labels_source, labels_target_train])
    domains = ['source']*X_df_source.shape[0] + ['target_domain']*X_df_target_train.shape[0]
    fb_model.fit(X_df_train, y_train, domains=domains)
    scores.append(fb_model.score(X_df_target_test, labels_target_test,
                                 domains=['target_domain']*X_df_target_test.shape[0]))

In [14]:
print(f'Mean classification accuracy: {np.mean(scores):0.2f}')

Mean classification accuracy: 0.61


In [15]:
scores

[0.75,
 0.5833333333333334,
 0.6388888888888888,
 0.6666666666666666,
 0.4722222222222222,
 0.5833333333333334,
 0.6388888888888888,
 0.6111111111111112,
 0.5,
 0.6944444444444444]