# Organize EEG Session Files
## Purpose
Each expirement currently in the data folder is in 'Raw Data'. Each subject is labeled along with the expirement and session number. 

sXX_exXX_sXX: s(Subject Number)_ex(Experiment Number)_s(Session Number. Just for ex01 and ex02).

Example: s03_ex02_s01: Subject 03 Experiment 02 Session 01

There were a total of 10 Expirements listed as followed:
    The subject was asked to sit down and relax on a comfortable chair. The recording was performed in a single day per subject with the same order of tasks, A non-noisy two minutes recordings were segmented from the raw data. The recordings nvolves the acquisition of the electroencephalographic signal as follows:

    Three minutes of resting-state, eyes open for three sessions.
    Three minutes of resting-state, eyes closed for three sessions.
    Non-Related experiment (Not provided in the dataset).
    Three minutes of resting-state, eyes open for three sessions using noise isolation headset.
    Non-Related experiment (Not provided in the dataset).
    Three minutes of resting-state, eyes open for three sessions using noise isolation headset.
    Three minutes of listing to a song in their native language using in-ear headphones.
    Three minutes of listing to a song in a non-native language using in-ear headphones.
    Three minutes of listing to neutral music using in-ear headphones.
    Three minutes of listing to a song in their native language using bone-conducting headphones.
    Three minutes of listing to a song in a non-native language using bone-conducting headphones.
    Three minutes of listing to neutral music using bone-conducting headphones.

### Import Libraries

In [14]:
import pandas as pd
from pathlib import Path
import re
from collections import defaultdict

In [None]:
# Define the path to the raw data directory
raw = Path("data") / "auditory-evoked-potential-eeg-biometric-dataset-1.0.0" / "Raw_Data"
# Define log statement to check if the directory exists printing only if it does not exist
if not raw.exists():
    print(f"Directory {raw} does not exist.")

In [None]:
# ------------------------------------------------------------
# Define regex patterns to handle both filename formats:
#
# 1) sXX_exXX_sXX.txt  → subject, experiment, session
# 2) sXX_exXX.txt      → subject, experiment (no session)
#
# Some experiments include session numbers (ex01, ex02),
# while others do not — this logic handles both cases.
# ------------------------------------------------------------

pattern_with_session = re.compile(r"s(\d+)_ex(\d+)_s(\d+)")
pattern_no_session = re.compile(r"s(\d+)_ex(\d+)")

# Dictionary to group files by experiment number (ex01, ex02, etc.)
# Each experiment key will store a list of metadata dictionaries
experiments = defaultdict(list)

# ------------------------------------------------------------
# Iterate through all raw EEG files
# ------------------------------------------------------------
for file in raw.glob("*.txt"):

    # Remove file extension for cleaner pattern matching
    filename = file.stem

    # --------------------------------------------------------
    # First attempt: match filenames that include session info
    # --------------------------------------------------------
    match = pattern_with_session.search(filename)

    if match:
        # Extract subject, experiment, and session numbers
        subject, experiment, session = match.groups()

    else:
        # ----------------------------------------------------
        # Second attempt: match filenames without session info
        # ----------------------------------------------------
        match = pattern_no_session.search(filename)

        if match:
            subject, experiment = match.groups()
            session = None  # Explicitly mark missing session
        else:
            # If filename doesn't match any expected format,
            # skip it and log for inspection
            print(f"Skipped unrecognized file format: {file.name}")
            continue

    # --------------------------------------------------------
    # Normalize experiment key (e.g., ex1 → ex01)
    # --------------------------------------------------------
    ex_key = f"ex{int(experiment):02d}"

    # --------------------------------------------------------
    # Store parsed metadata for downstream processing
    # --------------------------------------------------------
    experiments[ex_key].append({
        "subject": int(subject),
        "session": int(session) if session is not None else None,
        "path": file
    })

for ex, files in sorted(experiments.items()):
    print(f"{ex}: {len(files)} files")

Skipped unrecognized file format: RECORDS.txt
ex01: 60 files
ex02: 60 files
ex05: 20 files
ex06: 20 files
ex07: 20 files
ex08: 20 files
ex09: 20 files
ex10: 20 files
