In [None]:
import numpy as np
import mne
from mne.datasets import eegbci
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from pyriemann.estimation import Covariances
from pyriemann.tangentspace import TangentSpace

## 1. Setup for Hyperparameter Tuning

This notebook is dedicated to finding the optimal parameters for our BCI model. We will use data from a small, representative group of subjects to perform this search. The best parameters found here will be more robust than those from a single subject and will be used to train the final model.

In [None]:
# --- 1. Setup ---
# Define a small group of subjects to tune on
tuning_subjects = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15, 16, 17, 18, 19, 20] # Using 5 subjects is a good balance
all_tuning_epochs = []
print(f"Loading data for {len(tuning_subjects)} subjects to run GridSearch...")

## 2. Data Loading and Preprocessing

As in the baseline notebook, we first need to load and prepare the data. The key difference here is that we are looping through a small group of subjects and collecting all their data together. This ensures that the parameters we find are not overly specialized to a single person's brain patterns. The preprocessing steps (filtering and epoching) remain identical to ensure consistency.

In [None]:
# --- 2. Load and Process Data for All Tuning Subjects ---
for subject_id in tuning_subjects:
    try:
        runs_lr = [4, 8, 12]
        runs_f = [6, 10, 14]
        fnames_lr = eegbci.load_data(subject_id, runs=runs_lr, verbose=False)
        fnames_f = eegbci.load_data(subject_id, runs=runs_f, verbose=False)

        raw_lr = mne.concatenate_raws([mne.io.read_raw_edf(f, preload=True, verbose=False) for f in fnames_lr])
        raw_f = mne.concatenate_raws([mne.io.read_raw_edf(f, preload=True, verbose=False) for f in fnames_f])

        def process_and_epoch(raw, event_id_map, event_id_labels):
            raw.filter(l_freq=8., h_freq=35., verbose=False)
            events, _ = mne.events_from_annotations(raw, event_id=event_id_map, verbose=False)
            epochs = mne.Epochs(raw, events, event_id_labels, tmin=-0.5, tmax=3.5, preload=True,
                                baseline=None, picks='eeg', verbose=False)
            epochs.resample(160., verbose=False)
            return epochs

        epochs_lr = process_and_epoch(raw_lr, {'T1': 1, 'T2': 2}, {'left_fist': 1, 'right_fist': 2})
        epochs_f = process_and_epoch(raw_f, {'T2': 2}, {'both_feet': 2}) # Note: T1 is ignored here

        all_tuning_epochs.append(mne.concatenate_epochs([epochs_lr, epochs_f], verbose=False))
        print(f"  Successfully processed subject {subject_id}.")
    except Exception as e:
        print(f"  Skipping subject {subject_id} due to error: {e}")

In [None]:
# Combine all epochs from all tuning subjects into one object
final_tuning_epochs = mne.concatenate_epochs(all_tuning_epochs, verbose=False)

# Now, extract the data and labels
data = final_tuning_epochs.get_data()
labels = final_tuning_epochs.events[:, -1]

## 3. The Riemannian Geometry Pipeline

This pipeline replaces the traditional CSP feature extractor with a more modern approach based on Riemannian geometry. This method is often more powerful as it better respects the natural structure of EEG signals.

### Step 1: Covariance Matrices
For each epoch, we compute a **covariance matrix**. This matrix is a compact representation of all the spatial information in the EEG channels—i.e., how the signal from each electrode relates to every other electrode. The covariance matrix for a single epoch $X$ is calculated as:

$$ C = \frac{1}{n-1} X X^T $$

- $X$ is the EEG data for one trial (channels x time points).
- $n$ is the number of time points.

### Step 2: Tangent Space Projection
The space of all covariance matrices is not a standard "flat" Euclidean space; it's a curved manifold. To use standard classifiers like SVM, we project these matrices onto a "flat" hyperplane that is tangent to the manifold. This **Tangent Space** projection transforms the complex matrices into simple feature vectors that a standard SVM can classify effectively. This is the key feature extraction step of the pipeline.

## 4. Hyperparameter Tuning with GridSearchCV

Now we will search for the optimal settings for the SVM classifier that follows the Riemannian feature extraction. We use `GridSearchCV` to perform an exhaustive search over a grid of parameters, using cross-validation to find the combination that yields the highest accuracy.

We will test:
* **SVM Kernel**: Comparing a `linear` kernel to a non-linear `rbf` kernel.
* **Regularization Strength (C)**: How much to penalize misclassified points.
* **Kernel Coefficient (gamma)**: The influence of a single training example (for the `rbf` kernel).

In [None]:
riemann_pipeline = Pipeline([
    # Step 1: Calculate the covariance matrix for each epoch
    ('Covariances', Covariances(estimator='lwf')),
    # Step 2: Project the covariance matrices onto the tangent space
    ('TangentSpace', TangentSpace(metric='riemann')),
    # Step 3: Classify the resulting feature vectors with a placeholder SVM
    ('Classifier', SVC()) # Use a placeholder SVC
])


In [None]:
param_grid = {
    'Classifier__kernel': ['rbf', 'linear'],
    'Classifier__C': [0.1, 1, 10, 100],
    'Classifier__gamma': ['scale', 'auto', 0.1]
}

In [None]:
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
grid_search = GridSearchCV(riemann_pipeline, param_grid, cv=cv, n_jobs=-1, verbose=1)
grid_search.fit(data, labels)

In [None]:
print(f"Best parameters for Riemannian pipeline: {grid_search.best_params_}")
print(f"Best score: {grid_search.best_score_:.4f}")

## 5. Tuning Results

The grid search has completed. The results show the optimal combination of parameters for the Riemannian pipeline and the average cross-validation score achieved on our multi-subject tuning dataset.

* **Best Parameters Found**:
    * **Classifier**: Support Vector Machine (SVC) with a **Linear kernel**
    * **SVM C**: 1
* **Best Cross-Validation Score**: **77.28%**

This demonstrates a significant improvement over our previous models. These optimal parameters will now be used to train the final model on the full training set.