## üì¶ Imports & Configuration

This cell sets up the environment for the EEG classification project. It imports all the necessary libraries for data handling, preprocessing, visualization, and deep learning. Additionally, it defines key configuration parameters such as the number of subjects, EEG channels, sampling frequency, epoch duration, number of classes, batch size, and cross-validation splits. These settings ensure that the data and model are processed consistently throughout the workflow.


In [None]:
import os
import mne
import numpy as np
import gc
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import (
    Input, Conv1D, MaxPooling1D, TimeDistributed,
    LSTM, Dense, Dropout, BatchNormalization, Reshape, Permute
)
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras import backend as K

# --- CONFIGURATION ---
N_SUBJECTS = 109
RUN_ID = '04'  # Left vs Right Fist
N_CHANNELS = 64
SFREQ = 160.0
DOWNSAMPLE_FREQ = 80.0
EPOCH_DURATION_SEC = 4.0
N_TIMESTEPS = int(EPOCH_DURATION_SEC * DOWNSAMPLE_FREQ)
N_CLASSES = 2
EVENT_ID = {'T1': 1, 'T2': 2}
BATCH_SIZE = 64
KFOLD_SPLITS = 5


## üß† EEG Classifier Model Definition

This cell defines a deep learning model for EEG classification. The model combines convolutional and recurrent layers to capture both spatial and temporal patterns in the EEG signals:

- **Input & Dropout:** Accepts EEG data with the specified shape and applies initial dropout for regularization.  
- **CNN Block:** Extracts spatial features from each time step using 1D convolutions, batch normalization, and max pooling.  
- **Reshape & Permute:** Prepares the data for the recurrent layers by rearranging and flattening the spatial dimensions.  
- **LSTM Block:** Captures temporal dependencies in the EEG signal using stacked LSTM layers with dropout for regularization.  
- **Dense Layers & Output:** Processes features through a fully connected layer and outputs class probabilities via a softmax layer.

This architecture is designed to effectively combine spatial and temporal information for motor imagery EEG classification.


In [None]:
def create_classifier_model(input_shape=(N_CHANNELS, N_TIMESTEPS, 1)):
    input_layer = Input(shape=input_shape, name='eeg_input')
    x = Dropout(0.2)(input_layer)

    # CNN Block
    x = TimeDistributed(Conv1D(16, 3, activation='relu', padding='same'))(x)
    x = TimeDistributed(BatchNormalization())(x)
    x = TimeDistributed(MaxPooling1D(2))(x)

    # Permute for RNN
    x = Permute((2, 1, 3))(x)
    x = Reshape((-1, N_CHANNELS * 16))(x)

    # LSTM Block
    x = LSTM(32, return_sequences=True)(x)
    x = Dropout(0.5)(x)
    x = LSTM(16, return_sequences=False)(x)

    x = Dense(32, activation='relu')(x)
    output_layer = Dense(N_CLASSES, activation='softmax')(x)

    return Model(inputs=input_layer, outputs=output_layer)


## üì• Loading and Preprocessing EEG Data

This cell defines a function to load EEG data from all 109 subjects and preprocess it for model training. 

Key steps include:

- **Data Loading:** Reads each subject's EDF file for the selected motor imagery task.  
- **Filtering & Resampling:** Band-pass filters the EEG signals (8‚Äì30 Hz) and downsamples them to the target frequency.  
- **Epoching:** Segments continuous EEG into fixed-length epochs corresponding to the task events.  
- **Length Adjustment:** Ensures all epochs have the same number of time points, padding or truncating as necessary.  
- **Normalization:** Applies per-channel z-score normalization to standardize the signals.  
- **Data Aggregation:** Combines all subjects‚Äô data into single arrays for features (`X`) and labels (`Y`) and adds a channel dimension for the model input.  

This function returns preprocessed EEG data ready for training the deep learning classifier.


In [None]:
def load_all_109_subjects():
    X_list, Y_list = [], []
    print(f"Starting Data Load for {N_SUBJECTS} Subjects...")
    RAW_DATA_PATH = r'/content/drive/MyDrive/EEG'

    for sub_idx in range(1, N_SUBJECTS + 1):
        sub_str = f'S{sub_idx:03d}'
        file_path = os.path.join(RAW_DATA_PATH, sub_str, f'{sub_str}R{RUN_ID}.edf')

        try:
            raw = mne.io.read_raw_edf(file_path, preload=True, verbose=False)
            raw.filter(8., 30., fir_design='firwin', skip_by_annotation='edge', verbose=False)
            raw.resample(DOWNSAMPLE_FREQ, npad="auto")

            events, _ = mne.events_from_annotations(raw, event_id=EVENT_ID, verbose=False)
            epochs = mne.Epochs(raw, events, EVENT_ID, tmin=0., tmax=EPOCH_DURATION_SEC,
                                baseline=None, preload=True, verbose=False)

            data = epochs.get_data(units='uV')

            # Fix variable lengths
            if data.shape[2] > N_TIMESTEPS:
                data = data[:, :, :N_TIMESTEPS]
            elif data.shape[2] < N_TIMESTEPS:
                pad_width = N_TIMESTEPS - data.shape[2]
                data = np.pad(data, ((0,0),(0,0),(0, pad_width)), mode='constant')

            # Per-channel normalization
            for i in range(len(data)):
                mean = np.mean(data[i], axis=1, keepdims=True)
                std = np.std(data[i], axis=1, keepdims=True)
                data[i] = (data[i] - mean) / (std + 1e-6)

            X_list.append(data.astype('float32'))
            Y_list.append(epochs.events[:, 2] - 1)

            del raw, epochs, data
            gc.collect()

            if sub_idx % 10 == 0:
                print(f" -> Loaded {sub_idx} subjects...")

        except Exception:
            pass

    if not X_list:
        return None, None

    X_pool = np.concatenate(X_list, axis=0)
    Y_pool = np.concatenate(Y_list, axis=0)
    X_pool = np.expand_dims(X_pool, axis=-1)

    print(f"Total Epochs: {X_pool.shape[0]}")
    return X_pool, Y_pool


## üèãÔ∏è Training the Universal EEG Classifier

This cell defines a function to train the EEG classifier using stratified K-Fold cross-validation. 

Key steps include:

- **K-Fold Splitting:** Divides the dataset into training and testing folds while preserving class balance.  
- **Model Creation & Compilation:** Initializes the CNN-LSTM model and compiles it with the Adam optimizer and sparse categorical cross-entropy loss.  
- **Training with Callbacks:** Trains the model on each fold with early stopping and learning rate reduction to prevent overfitting and improve convergence.  
- **Evaluation:** Measures fold accuracy on the test set and displays a confusion matrix to visualize classification performance.  
- **Aggregation:** After all folds, calculates the mean accuracy across folds as an overall performance metric.

This approach ensures robust evaluation and leverages all available data for training the universal classifier.


In [None]:
def train_universal_classifier(X, Y):
    skf = StratifiedKFold(n_splits=KFOLD_SPLITS, shuffle=True, random_state=42)
    fold_accuracies = []

    for fold, (train_idx, test_idx) in enumerate(skf.split(X, Y)):
        print(f"\n{'='*20} FOLD {fold+1}/{KFOLD_SPLITS} {'='*20}")

        X_train, Y_train = X[train_idx], Y[train_idx]
        X_test, Y_test = X[test_idx], Y[test_idx]

        K.clear_session()
        model = create_classifier_model()

        optimizer = Adam(learning_rate=0.001)

        model.compile(
            optimizer=optimizer,
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy']
        )

        early_stop = EarlyStopping(monitor='val_loss', patience=7, restore_best_weights=True)
        reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)

        model.fit(
            X_train, Y_train,
            epochs=30,
            batch_size=BATCH_SIZE,
            validation_data=(X_test, Y_test),
            callbacks=[early_stop, reduce_lr],
            verbose=1
        )

        loss, acc = model.evaluate(X_test, Y_test, verbose=0)
        fold_accuracies.append(acc)
        print(f"Fold {fold+1} Accuracy: {acc:.4f}")

        # Confusion matrix
        Y_pred = np.argmax(model.predict(X_test), axis=1)
        cm = confusion_matrix(Y_test, Y_pred)
        plt.figure(figsize=(4, 3))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
        plt.title(f'Fold {fold+1}')
        plt.show()

        del model
        gc.collect()

    print(f"FINAL MEAN ACCURACY: {np.mean(fold_accuracies):.4f}")


## üöÄ Running the Full Pipeline

This cell executes the complete EEG classification workflow:

1. **Data Loading & Preprocessing:** Calls the function to load and preprocess EEG data from all 109 subjects.  
2. **Model Training & Evaluation:** If the data is successfully loaded, it trains the universal EEG classifier using stratified K-Fold cross-validation and evaluates its performance on each fold.

Effectively, this cell ties together all previous steps to produce a trained model and report its accuracy across the dataset.


In [None]:
X_pool, Y_pool = load_all_109_subjects()
if X_pool is not None:
    train_universal_classifier(X_pool, Y_pool)
