## EEG Feature Extraction and Classification Using FB-CCA

This notebook focuses on EEG feature extraction and classification using Filter Bank Canonical Correlation Analysis (FB-CCA), a method particularly effective in SSVEP BCIs.

### Why Use Temporal Features (like CCA) Instead of Spectral Features?
While traditional spectral methods (e.g., power spectral density) analyze EEG frequency components independently, temporal feature extraction methods like Canonical Correlation Analysis (CCA) leverage correlations between EEG signals and predefined reference signals across multiple channels. This approach effectively captures spatial and temporal coherence patterns inherent in EEG data, providing robustness against noise and improving discriminative performance.

### What is FB-CCA and Why Might it Work?
Filter Bank CCA (FB-CCA) enhances the standard CCA by decomposing EEG data into multiple sub-band frequency components (filter banks). Each frequency band undergoes separate CCA processing, and their outputs are combined, typically through weighted averaging. This multiband approach exploits frequency-specific information more comprehensively, significantly improving classification accuracy, especially in noisy environments or with individual subject variations.

### Dataset and Experimental Setup
The dataset utilized here currently includes EEG recordings from 11 subjects SSVEP stimulation. Each subject's EEG data is segmented into epochs corresponding to specific stimulus frequencies, allowing evaluation of the FB-CCA method's effectiveness in classifying these frequencies. The experimental details, including stimulus conditions and EEG recording protocols, are described in a separate documentation file.

This notebook specifically demonstrates the preprocessing, FB-CCA feature extraction, and classification workflows, emphasizing the temporal coherence captured by FB-CCA.

### Classification Methods Compared
Initially, we will classify EEG trials simply by selecting the stimulus frequency that yields the highest CCA correlation coefficient per trial. This baseline approach will be compared with training a simple classifier (KNN) on extracted CCA features. Although KNN might not be the most efficient classifier, it is chosen here due to its simplicity and minimal hyperparameter tuning requirements, allowing to focus on feature extraction method instead. Finally, the performance of these methods will be compared against the enhanced FB-CCA approach.

In [None]:
from dataset import EEGDataset

data_path = r"path\to\your\dataset"

# Initialize dataset loader
dataset = EEGDataset(data_path)
print(dir(dataset))  # This should list all available methods
# Load all subjects
dataset.load_all_subjects()
print("Subjects loaded:", dataset.raw_data.keys())

# After loading the data, we preprocess it with the following steps:

 - EEG data re-referencing using channels A1 and A2 (mastoid references) to reduce noise and common artifacts.
 - Filtering and resampling EEG signals to a standard sampling frequency (256 Hz) to ensure consistency.
 - Removing irrelevant frequencies and noise through bandpass filtering.
 - Segmenting the EEG data into epochs corresponding to each stimulation event, specifically focusing on 4-second intervals from stimulus onset for subsequent analysis.


In [None]:
from preprocessing import EEGPreprocessor

# Initialize preprocessing
preprocessor = EEGPreprocessor(sfreq=256, l_freq=6.0, h_freq=80.0, invalid_keys=['100', '99', '36'])

# Preprocess EEG data
epochs_data = {}
for subject, raw in dataset.raw_data.items():
    epochs = preprocessor.create_epochs_from_raw(raw, tmin=0, tmax=4)

    if epochs:  # Only add epochs if valid
        epochs_data[subject] = epochs

# Feature Extraction and Label Standardization

In this step, we:

- Ensure consistent labeling across all subjects by creating a standardized mapping based on the first subject's event labels.
- Extract CCA features from EEG epochs for each subject. Currently, the extraction uses one harmonic, but increasing the number of harmonics could potentially enhance classification accuracy in other contexts.


In [None]:
from feature_extraction import FeatureExtractor
from classifier import EEGClassifier
import numpy as np



# Storage for features and labels per subject
features_by_subject = {}
labels_by_subject = {}

# Define stimulation frequencies
stim_frequencies = [6, 8, 10, 12, 14, 20, 25, 30, 0.1]  # Stimutation flicker frequencies

# Get standardized event mapping from the first subject
first_subject = next(iter(epochs_data.keys()))
first_subject_labels = epochs_data[first_subject].events[:, -1]  # Extract labels from the first subject
unique_labels_first_subject = np.unique(first_subject_labels)

# Create a mapping: {original label → standardized label (stimulation frequency)}
label_mapping = {orig_label: stim_frequencies[idx] for idx, orig_label in enumerate(unique_labels_first_subject)}

print("Standardized Label Mapping (from first subject):", label_mapping)

# Initialize Feature Extractor (Using CCA with Filterbank)
feature_extractor = FeatureExtractor(method="CCA", sfreq=256, num_harmonics=2)

for subject, epochs in epochs_data.items():
    if epochs is None:
        continue

    X = feature_extractor.extract_features(epochs, stim_frequencies)
    y = epochs.events[:, -1]

    features_by_subject[subject] = X
    labels_by_subject[subject] = y

    print(f"Subject: {subject} | Features shape: {X.shape} | Labels: {y[:5]}")





Standardized Label Mapping (from first subject): {1: 6, 2: 8, 3: 10, 4: 12, 5: 14, 6: 20, 7: 25, 8: 30, 9: 0.1}
Subject: S1 | Features shape: (182, 9) | Labels: [9 2 1 6 8]
Subject: S10 | Features shape: (184, 9) | Labels: [3 2 5 7 4]
Subject: S11 | Features shape: (183, 9) | Labels: [9 6 2 1 3]
Subject: S2 | Features shape: (182, 9) | Labels: [4 1 2 9 3]
Subject: S3 | Features shape: (182, 9) | Labels: [9 6 2 1 3]
Subject: S4 | Features shape: (183, 9) | Labels: [2 4 5 9 1]
Subject: S5 | Features shape: (185, 9) | Labels: [3 6 4 1 2]
Subject: S6 | Features shape: (185, 9) | Labels: [9 2 1 6 8]
Subject: S7 | Features shape: (180, 9) | Labels: [6 3 7 8 5]
Subject: S8 | Features shape: (183, 9) | Labels: [2 9 8 3 6]
Subject: S9 | Features shape: (184, 9) | Labels: [4 9 2 6 5]


### Classification Results and Discussion

We now evaluate the performance of two classification methods on our EEG data:

**1. Max CCA Feature Selection**

In this unsupervised approach, each EEG trial is independently classified by selecting the stimulus frequency with the highest CCA correlation coefficient. This simple method achieves an accuracy of **53.3%**, slightly above chance level with several obvious limitations:

- **6 Hz** stimulation trials frequently show stronger responses in the first harmonic (**12 Hz**), causing misclassification.
- Conversely, trials at **20 Hz** are often incorrectly classified as **10 Hz** due to similar harmonic confusion.
- Many trials are incorrectly labeled as **non-stimuli (0.1 Hz)** due to unexpectedly high correlation with baseline reference signals.

**2. KNN Classification (Leave-Subject-Out Cross-Validation)**

This supervised approach involves training a KNN classifier on EEG data from all subjects except one, which serves as the test set. This methodology simulates real-world scenarios where no data is initially available from new BCI users. Although KNN is a simple classifier chosen for ease of use and minimal hyperparameter tuning, it significantly improves classification accuracy to **63.7%**. Notably:

- The number of trials misclassified as harmonics or non-stimuli is reduced.
- Some subjects achieve individual accuracies exceeding **86%**, highlighting the effectiveness of supervised learning even with simple classifiers.


In [5]:
# Initialize classifier
classifier = EEGClassifier()

# **Method 1: Max CCA Feature Selection**
accuracy_maxcca, conf_matrix_maxcca = classifier.classify_max_cca(features_by_subject, labels_by_subject, stim_frequencies)

# **Method 2: KNN Classification**
accuracy_knn, conf_matrix_knn = classifier.classify_knn(features_by_subject, labels_by_subject)

print("\nFinal Results:")
print(f"Max-CCA Classification Accuracy: {accuracy_maxcca:.4f}")
print(f"KNN Classification Accuracy: {accuracy_knn:.4f}")


Classifying subject S1 with 182 trials.
Classifying subject S10 with 184 trials.
Classifying subject S11 with 183 trials.
Classifying subject S2 with 182 trials.
Classifying subject S3 with 182 trials.
Classifying subject S4 with 183 trials.
Classifying subject S5 with 185 trials.
Classifying subject S6 with 185 trials.
Classifying subject S7 with 180 trials.
Classifying subject S8 with 183 trials.
Classifying subject S9 with 184 trials.

Confusion Matrix (Max-CCA):
[[ 71   7   4 123   2   0   0   0  12]
 [ 13 172  12  27   8   5   1   1  19]
 [  7   3 176   7   4  12   0   0  11]
 [  6   1   4 192   3   0   0   0  14]
 [  8   4   4   5 183   1   0   1  12]
 [  6   5   9  18   5 152   2   1  23]
 [  5   4 116  17   1  28  29   2  17]
 [ 26  21  27  36   9   3   1  78  18]
 [ 24  51  57  45  11   8   1   2  20]]
Overall Max-CCA Classification Accuracy: 0.5330
Testing on subject S1, training on remaining subjects.
Subject S1 Accuracy: 0.6538
Confusion Matrix for Subject S1:
[[ 9  1  1  4