In [1]:
import numpy as np
import mne
from mne.datasets import eegbci
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.model_selection import StratifiedKFold, GridSearchCV
from mne.decoding import CSP

## 1. Setup for Hyperparameter Tuning

This notebook is dedicated to finding the optimal parameters for our BCI model. We will use data from a small, representative group of subjects to perform this search. The best parameters found here will be more robust than those from a single subject and will be used to train the final model.

In [2]:
# --- 1. Setup ---
# Define a small group of subjects to tune on
tuning_subjects = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] # Using 5 subjects is a good balance
all_tuning_epochs = []
print(f"Loading data for {len(tuning_subjects)} subjects to run GridSearch...")

Loading data for 20 subjects to run GridSearch...


## 2. Data Loading and Preprocessing

As in the baseline notebook, we first need to load and prepare the data. The key difference here is that we are looping through a small group of subjects and collecting all their data together. This ensures that the parameters we find are not overly specialized to a single person's brain patterns. The preprocessing steps (filtering and epoching) remain identical to ensure consistency.

In [3]:
# --- 2. Load and Process Data for All Tuning Subjects ---
for subject_id in tuning_subjects:
    try:
        runs_lr = [4, 8, 12]
        runs_f = [6, 10, 14]
        fnames_lr = eegbci.load_data(subject_id, runs=runs_lr, verbose=False)
        fnames_f = eegbci.load_data(subject_id, runs=runs_f, verbose=False)

        raw_lr = mne.concatenate_raws([mne.io.read_raw_edf(f, preload=True, verbose=False) for f in fnames_lr])
        raw_f = mne.concatenate_raws([mne.io.read_raw_edf(f, preload=True, verbose=False) for f in fnames_f])

        def process_and_epoch(raw, event_id_map, event_id_labels):
            raw.filter(l_freq=8., h_freq=35., verbose=False)
            events, _ = mne.events_from_annotations(raw, event_id=event_id_map, verbose=False)
            epochs = mne.Epochs(raw, events, event_id_labels, tmin=-0.5, tmax=3.5, preload=True,
                                baseline=None, picks='eeg', verbose=False)
            epochs.resample(160., verbose=False)
            return epochs

        epochs_lr = process_and_epoch(raw_lr, {'T1': 1, 'T2': 2}, {'left_fist': 1, 'right_fist': 2})
        epochs_f = process_and_epoch(raw_f, {'T2': 2}, {'both_feet': 2}) # Note: T1 is ignored here

        all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))
        print(f"  Successfully processed subject {subject_id}.")
    except Exception as e:
        print(f"  Skipping subject {subject_id} due to error: {e}")

  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 1.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 2.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 3.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 4.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 5.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 6.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 7.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 8.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 9.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 10.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 11.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 12.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 13.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 14.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 15.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 16.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 17.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 18.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


  Successfully processed subject 19.
  Successfully processed subject 20.


  all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))


## 3. Hyperparameter Tuning with GridSearchCV

This is the core of this notebook. The goal of **hyperparameter tuning** is to find the combination of model settings that yields the highest performance. A model is not just an algorithm; it has several "dials" or settings that need to be tuned for a specific dataset.

We use `GridSearchCV` from scikit-learn, which performs an exhaustive search over a specified parameter grid. For every possible combination of parameters, it trains and evaluates a model using cross-validation. Finally, it reports which combination performed the best.

We will test:
* **CSP Components**: How many spatial filters to use.
* **Classifier Type**: Comparing `LDA` vs. `SVC` (SVM).
* **SVM Parameters**: The `kernel`, regularization strength `C`, and kernel coefficient `gamma`.

In [4]:
# --- 3. Run Grid Search on the Combined Data ---
if all_tuning_epochs:
    print("\nStarting GridSearchCV on the collected data...")
    # Combine all epochs from all tuning subjects
    final_tuning_epochs = mne.concatenate_epochs(all_tuning_epochs, verbose=False)

    labels = final_tuning_epochs.events[:, -1]
    data = final_tuning_epochs.get_data(copy=False)

    # Define the pipeline and parameter grid as before
    pipeline = Pipeline([('CSP', CSP(reg=None, log=True)), ('Classifier', LDA())])
    param_grid = [
        {'CSP__n_components': [4, 6, 8, 10], 'Classifier': [LDA()]},
        {'CSP__n_components': [4, 6, 8, 10], 'Classifier': [SVC(kernel='linear')], 'Classifier__C': [0.1, 1, 10]},
        {'CSP__n_components': [4, 6, 8, 10], 'Classifier': [SVC(kernel='rbf')], 'Classifier__C': [0.1, 1, 10], 'Classifier__gamma': ['scale', 0.1]}
    ]
    
    cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    grid_search = GridSearchCV(pipeline, param_grid, cv=cv, n_jobs=-1, verbose=1)
    grid_search.fit(data, labels)

    print("\n--- Grid Search Results ---")
    print(f"Best parameters found: {grid_search.best_params_}")
    print(f"Best cross-validation score on {len(tuning_subjects)} subjects: {grid_search.best_score_:.4f}")
else:
    print("\nNo data was processed, could not run GridSearch.")



Starting GridSearchCV on the collected data...
Fitting 5 folds for each of 40 candidates, totalling 200 fits
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.5e+10  max singular value)
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.7e+10  max singular value)
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.7e+10  max singular value)
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.7e+10  max singular value)
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.4e+10  max singular value)
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.5e+10  max singular value)
    Using 



Estimating class=2 covariance using EMPIRICAL
Computing rank from data with rank=None
Done.
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.5e+10  max singular value)
Computing rank from data with rank=None
Computing rank from data with rank=None
Computing rank from data with rank=None
    Estimated rank (data): 64
    data: rank 64 computed from 64 data channels with 0 projectors
Reducing data rank from 64 -> 64
Estimating class=1 covariance using EMPIRICAL
Done.
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.4e+10  max singular value)
Estimating class=2 covariance using EMPIRICAL
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.6e+10  max singular value)
    Using tolerance 0.0012 (2.2e-16 eps * 64 dim * 8.7e+10  max singular value)
Done.
    Estimated rank (data): 64
    data: rank 64 computed from 64 data chan

## 4. Tuning Results

The grid search has completed. The results show the optimal combination of parameters and the average cross-validation score achieved with those settings on our multi-subject tuning dataset.

These parameters will now be used to train the final, optimized model on the full training set.

The grid search has completed. The results show the optimal combination of parameters and the average cross-validation score achieved with those settings on our multi-subject tuning dataset.

Best Parameters Found:

CSP Components: 10

Classifier: Support Vector Machine (SVC) with an RBF kernel

SVM C: 10

SVM Gamma: 'scale'

Best Cross-Validation Score: 66.57%