
# EEG Data Analysis Assignment


In this assignment you will analyze the EEG data that you collected yourself. There are different approaches to preprocessing. Here we will model our pipeline after this recent paper (check Methods): https://pmc.ncbi.nlm.nih.gov/articles/PMC10659264/pdf/nihpp-2023.11.07.566051v2.pdf.

## Part 1: Put all data in BIDS format

In [None]:
import os, glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import mne
from mne_bids import BIDSPath, write_raw_bids, read_raw_bids

# Define directories:
project_dir = os.getcwd()
print(project_dir)
raw_dir = os.path.join(project_dir, 'raw')
print(raw_dir)
bids_dir = os.path.join(project_dir, 'bids')

# Define event IDs:
event_id = {
    'A/standard': 1,
    'A/oddball':  2,
    'B/standard': 3,
    'B/oddball':  4,
}

# subject ID
subjects = ["01", '02', '03', '04', '05','06']

C:\Users\Gebruiker\Desktop\Code Data Science\Code-Data-Science\Code-Data-Science\Code-Data-Science


🔥 Please loop over filenames and put all raw data in BIDS format (look up what this means). You can use the functions ```BIDSPath``` and ```write_raw_bids```. We need to take care of a couple of things.

First, after loading the raw EEG data, we need to change channels 'M1', 'M2', 'EXG7' and 'EXG8' to type 'misc', and channels 'UP', 'DOWN', 'LEFT' and 'RIGH' to type 'eog'. We also need to set the right montage ('biosemi64').

Second, observe that ```write_raw_bids``` takes the "events" and "event_id" parameters. During EEG acquisition we always sent the same trigger (2) at every sound presentation. However, we would like to include information about state and stimulus, as per the dictionary called "event_id" (see previous cell). Please make sure to: (i) find all events in the eeg file, (ii) read information about state and stimulus ID in the corresponding "*_events.tsv" file, (iii) update the events according to "event_id", (iv) pass the "events" and "event_id" in ```write_raw_bids```.

Third, look at all the json sidecar files, and check whether all information is correct. 


In [2]:
def rawtobids(raw_dir):
    # Gebruik de bestaande bdf_files en subjects variabelen
    bdf_files = glob.glob(os.path.join(raw_dir, "*.bdf"))
    print("Gevonden bestanden:", bdf_files)

    for i, bdf_file in enumerate(bdf_files):
        # lees BDF file
        raw = mne.io.read_raw_bdf(bdf_file, preload=True)
        
        # bids path creëren
        bids_path = BIDSPath(subject=subjects[i],
                             root=bids_dir,
                             datatype='eeg',
                             extension='.bdf',
                             suffix='eeg',
                             task='experiment')  # Set the task name as required
        # Convert naar BIDS
        write_raw_bids(raw, bids_path, overwrite=True, allow_preload=True, format='EDF')
        print(f"Converted {bdf_file} to BIDS format.")

        misc = ['M1', 'M2', 'EXG7', 'EXG8'] 
        raw.set_channel_types({ch: 'misc' for ch in misc if ch in raw.ch_names}) #changes the channels type to misc
        eog = ['UP', 'DOWN', 'LEFT', 'RIGHT']
        raw.set_channel_types({ch: 'eog' for ch in eog if ch in raw.ch_names}) #changes the channels type to eog

        #sets the montage to biosemi64
        raw.set_montage('biosemi64', on_missing='ignore')
    return raw

rawtobids(raw_dir)

NameError: name 'raw_dir' is not defined

In [None]:
# mernan code moet nog verwerkt worden in voorgaande stukje
# Phase 2 events filteren en opslaan als nieuwe bestanden
# Bestand inlezen
df = pd.read_csv("1_2_2025_08_26_15_11_10_events.tsv", sep="\t", dtype=str)

# Filter: alleen rows waar phase == "2"
filtered = df[df["phase"].str.strip() == "2"]

# Opslaan naar nieuwe bestanden
filtered.to_csv("events_only_phase2.tsv", sep="\t", index=False)
filtered.to_csv("events_only_phase2.csv", index=False)

print(filtered.head())

# Lees het gefilterde TSV-bestand in
events_tsv = pd.read_csv("events_only_phase2.tsv", sep="\t")

# Maak een nieuwe lijst voor de aangepaste event codes zoals in de dictionary
new_event_codes = []

# Loop door de rijen van de DataFrame en wijs de juiste event code toe
for st, fq in zip(events_tsv["state"].astype(float), events_tsv["frequency"].astype(float)):
    if st == 1:  # reeks A
        if fq == 2000:
            new_event_codes.append(event_id["A/standard"])
        else:
            new_event_codes.append(event_id["A/oddball"])
    elif st == -1:  # reeks B
        if fq == 1000:
            new_event_codes.append(event_id["B/standard"])
        else:
            new_event_codes.append(event_id["B/oddball"])
    else:
        new_event_codes.append(0)

👉 **Question:** Why would we bother with storing our data in BIDS format?

**Answer:**


## Part 2: Preprocess example data set

🔥 Please take an example block and load the eeg and events data (from the BIDS directory).

In [6]:
# Your code goes here. Please add comments.
rawtobids(raw_dir)

Gevonden bestanden: []


IndexError: list index out of range

🔥 Please plot the raw data. Show examples of (i) clean data, (ii) blinks (look at eog channels), and (iii) muscle artefacts. Print the shape of the data, and indicate what the dimensions correspond to. Print all channel names. Print all channel types.

In [None]:
# Your code goes here. Please add comments.


👉 **Question:** What kind of information can you extract just from visually inspecting raw EEG traces?

**Answer:**

🔥 Please set the channels 'M1', 'M2', 'EXG7' and 'EXG8' to type 'misc', and channels 'UP', 'DOWN', 'LEFT' and 'RIGH' to type 'eog'. Also set the right montage, and plot this. 

PS: Yes, it is annoying that we did this before. This information did end up correctly in the json sidecar files (check this), but not in the *eeg.bdf files.

In [5]:
# Your code goes here. Please add comments.

🔥 Please re-reference the EEG data to the average of the two mastoids.

In [6]:
# Your code goes here. Please add comments.

👉 **Question:** Why would you want to re-reference the EEG data? What are common approaches? What are advantages and disadvantages of each?

**Answer:**

🔥 Please combine eog channels into bipolar horizontal ("HEOG") and vertical ("VEOG") EOG derivations. Check if the resulting channel type is correct.

In [7]:
# Your code goes here. Please add comments.

👉 **Question:** Why do we take the difference between for example the UP and DOWN electrode?

**Answer:** 

🔥 Please apply a bandpass filter. Look up what a bandpass filter is, and use common cutoffs for EEG.

In [8]:
# Your code goes here. Please add comments.

👉 **Question:** Why should we filter EEG data? What type of temporal filters are typically applied, and what are common cut-offs? 

**Answer:**

🔥 Please downsample the EEG data to 250Hz.

In [9]:
# Your code goes here. Please add comments.

👉 **Question:** What are the advantages of downsampling, and why does 250Hz makes sense?

**Answer:**

🔥 Please perform independent component analysis and plot the components.

In [10]:
# Your code goes here. Please add comments.

👉 **Question:** What does this plot show?  

**Answer:** 


🔥 Please remove components associated with blinks and eye movements. Include diagnostic figures (check with plotting methods exists for ica objects).

In [11]:
# Your code goes here. Please add comments.

👉 **Question:** What do these plot show?  

**Answer:** 

🔥 Please plot the same bit of data, before and after ica-based artefact removal.

In [12]:
# Your code goes here. Please add comments.


👉 **Question:** What kinds of artifacts can ICA help to remove in EEG data? Why do we want to do this? 

**Answer:**

🔥 Please epoch the cleaned data (make sure to use "event_id"), by cutting out segments from 500ms before each sound to 1000ms after. Plot the resulting epochs object.

In [13]:
# Your code goes here. Please add comments.


👉 **Question:** Why do we extract epochs relative to events?  

**Answer:**


## Part 3: Preprocess all the data

🔥 Please put all the code we wrote so far in function called ```preprocess_eeg_data```. The function should take "bids_path" as input and return "epochs", You can then loop across all data and concatenate all the epochs.

In [14]:
# Your code goes here. Please add comments.

## Part 4: Analyze the data

🔥 Please define the following sensor groupings: frontal, central, temporal, parietal and occipital.

In [70]:
# define sensor groupings:
frontal = ['Fp1', 'AF7', 'AF3', 'F1', 'F3', 'F5', 'F7',
           'Fp2', 'AF8', 'AF4', 'F2', 'F4', 'F6', 'F8']
central = ['FC5', 'FC3', 'FC1', 'C1', 'C3', 'C5', 'CP5', 'CP3', 'CP1',
           'FC6', 'FC4', 'FC2', 'C2', 'C4', 'C6', 'CP6', 'CP4', 'CP2']
temporal = ['FT7', 'T7', 'TP7', 'FT8', 'T8', 'TP8',]
parietal = ['P1', 'P3', 'P5', 'P7', 'P9', 
            'P2', 'P4', 'P6', 'P8', 'P10']
occipital = ['PO7', 'PO3', 'O1',
             'PO8', 'PO4', 'O2']