# OO Based Approach

https://www.perplexity.ai/search/what-is-the-best-practice-for-GsIM4MKiSryinGQldzwVug

# EEG Dataset Processing Pipeline

Scope:
- Process a raw EEG dataset 
- Dataset containing the results of an EEG study on multiple subjects
- Dataset that has been downloaded from OpenNeuro, and structured as per the BIDS standard, and in EEGLab '.set' format

The Pipeline Stages (For each subject in an EEG study dataset):
- EEG Dataset Load - Get the raw source EEG signal data
- EEG Preprocessing - Execute filtering etc of the raw EEG time series data
- Power Spectra Calculate - Calculate the power spectra, for all channels recorded
- Spectral Parameterisation - Determine the best fitting Aperiodic and Periodic components
- Features Set - Collate the subject and EEG data into a features set, Pandas Dataframe
- Save Features Set - Save study and subject meta data and the feature set for separate ML use 


## Dependencies

General dependencies:
- python = 3.11.13
- numpy = 2.0.2
- scipy = 1.15.3
- pandas = 2.2.3
- matplotlib = 3.10.3

ML dependencies:
- scikit-learn = 1.6.1

EEG specific dependencies:
- mne = 1.9.0
- specparam = 2.0.0rc3

## Python-MNE

Used for Import:
- MNE-Python: https://mne.tools/stable/index.html
- The Brain Imaging Data Structure (BIDS): https://bids.neuroimaging.io

Used for Power Spectrum Calculate
- MNE vs NeuroDSP: https://www.perplexity.ai/search/using-python-which-package-is-zOoiPqUvTnKbO.QfgmPsJQ

Formats:
- Assumes OpenNeuro, BIDS compliant datasets manually downloaded into the defined folders structure
- Assumes EEGLab '.set' format


## Spectral Parameterisation

Spectral Parameterisation:
- The Aperiodic Methods project - Documentation: https://aperiodicmethods.github.io/docs/index.html and Repo: in https://github.com/AperiodicMethods/AperiodicMethods
- And cite: https://www.biorxiv.org/content/10.1101/2024.09.15.613114v1

Documentation:
- SpecParam: https://specparam-tools.github.io and https://github.com/fooof-tools
- FOOOF: https://fooof-tools.github.io/fooof/ and https://github.com/fooof-tools/fooof

FOOOF vs SpecPram:
- FOOOF: More stable and used but deprecated
- SpecParam: Release candidate but some improved model/fit selection: https://pmc.ncbi.nlm.nih.gov/articles/PMC11326208/
- Summary: https://www.perplexity.ai/search/using-python-which-package-is-M7kzhERoTLuCrIKbXxN9sQ


## Imports


In [1]:
# Not availble through a Conda install/environment
%pip install specparam

Note: you may need to restart the kernel to use updated packages.


In [2]:
# General imports
import os
import gc
from datetime import datetime
import math
import numpy as np
import pandas as pd

# Plots
import matplotlib.pyplot as plt
plt.style.use('ggplot')

# MNE-Python
import mne

# Specparam
from specparam import SpectralGroupModel

In [3]:
# Check the version of the module
from specparam import __version__ as specparam_version
print('Current fooof version:', specparam_version)

Current fooof version: 2.0.0rc3


# Classes & Functions


In [4]:
# A utility function to establish relative paths for a given folder
def get_folder_path(folder_name):
    project_root = os.path.dirname(os.getcwd())
    folder_path = os.path.join(project_root, folder_name)
    if not os.path.isdir(folder_path):
        raise Exception(f'Directory not found: {folder_path}')  
     
    return folder_path

# Utility function to check for the existence of a file in a given directory
def get_file_path(folder, file_name):
    file_path = os.path.join(folder, file_name)
    if not os.path.isfile(file_path):
        raise FileNotFoundError(f'File not found: {file_path}')
    return file_path

In [5]:
# Class for the study / dataset information

class Study:
    """
    Class for information on an EEG study

    Attributes
    ----------
    TBD
    """

    # BIDS structure source of subjects data
    __subjects_source_file = 'participants.tsv'

    def __init__(self, dataset_folder_path, dataset_name):
        """
        Initialise Study instance.

        Parameters
        ----------
        dataset_folder : str
        dataset_name : str

        Returns
        -------
        Study : class instance
        """

        # Input validation - Valid Dataset
        datasets_list = os.listdir(dataset_folder_path)
        datasets_list = [d for d in datasets_list if d.startswith('ds') and os.path.isdir(os.path.join(dataset_folder_path, d))]

        if dataset_name not in datasets_list:
            raise ValueError(f"Dataset '{dataset_name}' not found in available datasets: {datasets_list}")
        dataset_path = os.path.join(dataset_folder_path, dataset_name)
        if not os.path.exists(dataset_path):
            raise FileNotFoundError(f"Path does not exist: {dataset_path}")
        subjects_file = os.path.join(dataset_folder_path, dataset_name, self.__subjects_source_file)
        if not os.path.isfile(subjects_file):
            raise ValueError(f'File not found: {subjects_file}')
        
        # Private Attributes
        # TODO: Any private attributes?

        # Public Attributes
        self.dataset_name = dataset_name
        self.dataset_path = dataset_path
        self.subjects_df = self._create_subjects_df(subjects_file)

    def _create_subjects_df(self, subjects_csv):
        # Read the datset csv file to get selected subjects data
        try:
            temp_subjects_df = pd.read_csv(subjects_csv, sep='\t')
        except Exception as e:
            raise IOError(f"Failed to read subjects file '{subjects_csv}': {e}")
        subjects_df = temp_subjects_df[['participant_id', 'AGE', 'GENDER', 'TYPE']].copy()
        subjects_df.columns = ['subject_id', 'age', 'gender', 'pd']

        return subjects_df
    
    # Public functions
    



In [None]:
# Subject Class with all EEG processing etc results

class Subject:
    """
    Class for information on an individual subject within an EEG study

    Attributes
    ----------
    TBD
    """

    def __init__(self, study, subject_id):
        """
        Initialise Subject instance.

        Parameters
        ----------
        study : Study class
        subject_id : str

        Returns
        -------
        Subject : class instance
        """

        if 'verbose' in globals() and verbose:
            mne.set_log_level('DEBUG')
        else:
            mne.set_log_level('WARNING')

        # BIDS File Structure
        #dataset_root = 
        #dataset_name = 
        subject = subject_id
        session = ''
        task = 'Rest'
        datatype='eeg'

        # EEGLab .set file name
        temp_path = os.path.join(study.dataset_path, subject, session, datatype)
        temp_file_name = subject + '_task-' + task + '_' + datatype + '.set'
        eeg_lab_file_path = get_file_path(temp_path, temp_file_name)

        # Get the raw EEG data & Inspect it
        try:
            self.eeg_raw = mne.io.read_raw_eeglab(eeg_lab_file_path, preload=True)
        except Exception as e:
            raise ValueError(f"Failed to load EEG data for subject {subject}: {e}") 

        # TODO: Any raw data to keep in class? eg nchannels, sampling frequency ......      

    
    def inspect_EEG_Raw(self):
        """
        Summarise the EEG information
        """

        print(f"EEG Raw Description: {self.eeg_raw.info['description']} on {self.eeg_raw.info['meas_date']}")
        print(self.eeg_raw)
        print(self.eeg_raw.info)


        


# Pipeline Execute


In [7]:
# Establish data paths & check EEG source datasets available
#

# Set progress messages
global verbose
verbose = True

# Establish Data Folders
eeg_datasets_source_folder = 'Data/EEG_Datasets_Source_exgithub'
eeg_study_features_folder = 'Data/EEG_Datasets_Processed'


eeg_datasets_path = get_folder_path(eeg_datasets_source_folder)
eeg_study_features_path = get_folder_path(eeg_study_features_folder)

# # Get a list of datasets in the EEG datasets source folder
# datasets_list = os.listdir(eeg_datasets_path)
# datasets_list = [d for d in datasets_list if d.startswith('ds') and os.path.isdir(os.path.join(eeg_datasets_path, d))]

print('\n---------------------------------')
print('EEG Pipeline Parameters - Data')
print(f'EEG Source Datasets Folder: {eeg_datasets_path}')
# print('Datasets found:', datasets_list)
print(f'EEG Study Features Folder: {eeg_study_features_path}')

del eeg_datasets_source_folder, eeg_study_features_folder


---------------------------------
EEG Pipeline Parameters - Data
EEG Source Datasets Folder: /Users/stuartgow/GitHub/EEG_ML_Pipeline/Data/EEG_Datasets_Source_exgithub
EEG Study Features Folder: /Users/stuartgow/GitHub/EEG_ML_Pipeline/Data/EEG_Datasets_Processed


In [8]:
# Execute the end to end pipeline for a given EEG source dataset
#

# Get a valid dataset and extract essential data
study_details = Study(eeg_datasets_path, 'ds004584-1.0.0')

print('\n-----------------------------------------------------------------------------------------------')
print(f'EEG Pipeline Processing start for study/dataset: {study_details.dataset_name} with {len(study_details.subjects_df)} subjects')

# Process each subject in the dataset
for idx, subject in study_details.subjects_df.iterrows():
    print('\n-----------------------------------------------------------------------------------------------')
    print(f"Processing subject: {subject['subject_id']}")

    subject_eeg = Subject(study_details, subject['subject_id'])
    subject_eeg.inspect_EEG_Raw()

    break
    








-----------------------------------------------------------------------------------------------
EEG Pipeline Processing start for study/dataset: ds004584-1.0.0 with 149 subjects

-----------------------------------------------------------------------------------------------
Processing subject: sub-001
Reading /Users/stuartgow/GitHub/EEG_ML_Pipeline/Data/EEG_Datasets_Source_exgithub/ds004584-1.0.0/sub-001/eeg/sub-001_task-Rest_eeg.fdt
Reading 0 ... 140829  =      0.000 ...   281.658 secs...
Cropping annotations 1970-01-01 00:00:00+00:00 - 1970-01-01 00:04:41.660000+00:00
  [0] Keeping  (1970-01-01 00:00:00+00:00 - 1970-01-01 00:00:00+00:00 -> 0.0 - 0.0)
Cropping complete (kept 1)
EEG Raw Description: None on None
<RawEEGLAB | sub-001_task-Rest_eeg.fdt, 63 x 140830 (281.7 s), ~67.8 MiB, data loaded>
<Info | 8 non-empty values
 bads: []
 ch_names: Fp1, Fz, F3, F7, FT9, FC5, FC1, C3, T7, TP9, CP5, CP1, P3, P7, ...
 chs: 63 EEG
 custom_ref_applied: False
 dig: 66 items (3 Cardinal, 63 EEG)

  self.eeg_raw = mne.io.read_raw_eeglab(eeg_lab_file_path, preload=True)
