### Matlab Datastructure
```
Subject01_s1.mat
│
│  MATLAB .mat file for ONE subject + ONE session
│  Contains multiple recording runs (blocks) of continuous EEG
│
└── run  (cell array / list, length = 10)
    │
    │  Each cell corresponds to ONE continuous recording run
    │  (e.g., separate blocks to reduce fatigue or reset tasks)
    │
    ├── run{1} (struct)
    │   │
    │   │  One run = one uninterrupted EEG recording
    │   │
    │   ├── eeg : [n_samples x n_channels] numeric matrix
    │   │   │
    │   │   │  The raw EEG signal
    │   │   │  - rows    = time samples (uniformly sampled)
    │   │   │  - columns = EEG channels / electrodes
    │   │   │  Value = voltage at that channel & time point
    │   │   │
    │   │   │  Example:
    │   │   │    eeg(1000, 5) → channel 5 voltage at sample 1000
    │   │   │
    │   └── header : struct
    │       │
    │       │  Metadata describing how to interpret the EEG
    │       │  (who, how, when, and where events occurred)
    │       │
    │       ├── Subject
    │       │   │
    │       │   │  Subject identifier (ID or label)
    │       │
    │       ├── Session
    │       │   │
    │       │   │  Session identifier (e.g., "s1")
    │       │   │  Useful when subjects have multiple sessions
    │       │
    │       ├── SampleRate
    │       │   │
    │       │   │  Sampling frequency in Hz
    │       │   │  Defines time resolution (e.g., 512 samples/sec)
    │       │
    │       ├── Label
    │       │   │
    │       │   │  Channel labels / electrode names
    │       │   │  Length = n_channels
    │       │   │  Column names for eeg matrix
    │       │
    │       └── EVENT : struct
    │           │
    │           │  Event markers embedded in the continuous EEG
    │           │  Used to align cognitive events with the signal
    │           │
    │           ├── POS : [n_events x 1] vector
    │           │   │
    │           │   │  Sample indices (MATLAB 1-based)
    │           │   │  Indicate *when* an event occurred in this run
    │           │
    │           └── TYP : [n_events x 1] vector
    │               │
    │               │  Event type / trigger code
    │               │  Encodes *what kind* of event occurred
    │               │  (e.g., correct response vs error response)
    │
    ├── run{2}
    │   │
    │   │  Same structure as run{1}, but for the next recording block
    │
    └── ...
        │
        │  Additional runs, all with identical internal structure
```

### Output Structure

#### Brain Imaging Data Structure (out_bids directory)  
Contains the canonical, read-only source EEG data stored in BrainVision format, organized according to the BIDS specification. These files are never modified after export and serve as the single source of truth for the dataset.
```
out_bids/
└── sub-01/
    └── ses-01/
        └── eeg/
            ├── sub-01_ses-01_task-errp_run-01_eeg.vhdr
            ├── sub-01_ses-01_task-errp_run-01_eeg.eeg
            ├── sub-01_ses-01_task-errp_run-01_eeg.vmrk
            ├── sub-01_ses-01_task-errp_run-01_events.tsv
            ├── sub-01_ses-01_task-errp_run-01_channels.tsv
            └── sub-01_ses-01_task-errp_run-01_eeg.json  ← never edited
```

#### "Working" FIF files (out_fif directory)
Contains MNE-native working copies of the data for preprocessing, filtering, referencing, and rapid iteration. Multiple FIF files may exist simultaneously, each corresponding to a different preprocessing pipeline or experimental configuration.
```
out_fif/
└── sub-01/
    └── ses-01/
        ├── errp_bp1-20_car_raw.fif
        ├── errp_bp0.1-40_car_raw.fif        ← alternative bandpass
        ├── errp_bp1-20_laplacian_raw.fif    ← alternative referencing
        ├── errp_minimal_raw.fif             ← minimal preprocessing
        └── ...
```

### Pre-processing Notes

Preprocessing that should be done on raw continuous EEG (these operations assume continuity and stationarity):  
	•	Bandpass filtering  
	•	Notch filtering (50/60 Hz)  
	•	Re-referencing (average, mastoids)  
	•	Artifact correction via ICA (if used)  

Why?  
	•	Filtering across epoch boundaries causes edge artifacts  
	•	ICA needs long continuous data  
	•	Re-referencing should be consistent across the recording  

Raw EEG --> Filter / Rereference --> Epoch  

Preprocessing that is usually done on epoched data (these are trial-specific):  
	•	Baseline correction  
	•	Epoch rejection (amplitude thresholds, peak-to-peak)  
	•	Trial-wise normalization  
	•	Feature extraction  

For ML:  
	•	Z-scoring across time  
	•	Channel-wise normalization  
	•	Spatial filters (e.g., xDAWN, CSP) — often applied to epochs  

A very ML-relevant caveat: for classification, some models perform better with minimal preprocessing  
CNNs especially can learn:  
	•	baseline offsets  
	•	frequency filters  
	•	spatial combinations  

So, over-cleaning can remove signal your model could exploit. This is why many modern ErrP pipelines:  
	•	apply only bandpass + notch  
	•	skip aggressive artifact rejection  
	•	let the classifier handle variability  

Recommended Workflow:  
Raw EEG  
  → bandpass + notch  
  → rereference  
  → epoch with varied tmin/tmax  
  → optional baseline (or not)  
  → model-specific normalization  
  → classification