# Group: Hyacinthara - Report

## Introductory Notes

This project including Jupyter notebooks and additional source code was created as submission to the EEG course in winter 2023/24.
This notebook `01_prepare_data.ipynb` serves as guide and interactive installation and setup software.
The pipeline gets run in the subsequent notebook `02_run_pipeline.ipynb`.

To run this notebook, matching software including Python and e. g. Jupyter Lab need to be installed.

For a better overview, in Jupyter Lab, the Table of Contents tab can get opened in the left column.
Since the entries are interactive, it automatically updated and a click on the title initiates a jump to the section.

## System and Dependencies

All operations were tested on computers with 64 Bit multi-core processor and at least 16 GB RAM.
As operating systems, Microsoft Windows 10, Microsoft Windows 11, Manjaro Linux 23.1.3, and Fedora Linux 39 were used.
On all computers, Python 3.10 or 3.11 - e. g. Python 3.11.8 - was installed for running `pip` and the pipeline.

To make sure that all required Python packages are available, the installation gets started in the following cell.

- `!` allows for running the following command on the command line.
- `pip install -r <textfile>` (or depending on the installation `python3 -m pip …`) installs the module dependencies in the versions with which this notebook was created.
- With these module versions combined with Python 3.11.8, the notebook was tested and running. With other versions, errors can occur. One example of a problem caused by non-matching versions is the changed step naming of different `mne-bids-pipeline` versions.

In [1]:
# Install dependencies
!pip install -r requirements.txt
# Try to set UTF8 Encoding (required for MNE-Bids pipeline)
%env PYTHONIOENCODING=utf8

env: PYTHONIOENCODING=utf8



[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


## Download and Pre-process the Dataset

Now, that all required Python packages are installed, we can start to fetch all required data and perform some clean-up operations on the data.
Then, we can use the data with the MNE-BIDS pipeline.

### Preamble
To perform the pre-processing, the custom Python module gets added to the search path.
Then, the needed modules for loading the configuration file and for fetching the dataset get loaded.

To allow for plotting, a plotting library with API similar to the one of Matlab gets loaded.
This then is set to use a QT-based rendering.

In [2]:
# Add the project source code directory to the search path
import sys
import os
sys.path.append(os.path.abspath('./src/'))

# import function to load configuration from file
from mne_bids_pipeline._config_import import _import_config as getConfig
from tools.logtools import *

# tools to get fresh data
import data_handling.data_downloader as dl
import data_handling.data_cleaner as clean

import matplotlib
matplotlib.use('qtagg')

### Load Configuration

First of all, the configuration for the MNE BIDS pipeline gets loaded from the prepared file.
Note, that file existence checks are disabled here.
Otherwise, the import would fail, if the data are not yet available.
Therefore, in the first run of this notebook, loading the configuration would fail.

In the Jupyter notebook for running the pipeline, short checkups get performed.
These show few data written to the configuration, if the file gets properly loaded.

In [3]:
# set the file path of the main configuration file
bids_config_path = "./mne-bids/config/mne-bids-pipeline.py"
# load configured settings from file
bids_cfg = getConfig(
    config_path=bids_config_path,
    check=False
)

### Dataset Download

After loading the custom data handling module, a check for the dataset existence is started.
For ensuring to have an unchanged version of the dataset, the fresh download can get enforced.

If the dataset does not exist in the location specified in the configuration file, a copy gets downloaded and extracted.
Please ensure that you have enough disk space available. About 130 GB are needed for download and extraction. Additional 40 GB should be free for running the pipeline, which can be freed by deleting the file `ds003702.zip` after successful extraction.

For existence checkup and download, there are some configuration options in the following cell.

In [4]:
from data_handling import getDataPathFromBidsRoot

dl.CLEAN_DATA = False # if true, clears the data directory in order to force downloading a fresh copy of the data
dl.DATA_BASE_DIR = getDataPathFromBidsRoot(bids_cfg.bids_root) # get the data folder from the bids pipeline configuration
dl.VALIDATE_DATA = True # if true, checks that the downloaded zip file is the expected file (this may take a while)

In [15]:
if dl.CLEAN_DATA:
    clean.removeDirectory(dl.DATA_BASE_DIR)
dl.fetchData()

[0m[92mData directory is found: [0m [94mB:\Uni\Master\Semester_03\EEG\project\repo\eeg_course_project\data/[0m[0m
[0m[93mChecking file:[0m [94mB:\Uni\Master\Semester_03\EEG\project\repo\eeg_course_project\data/ds003702/sub-01/eeg/sub-01_task-SocialMemoryCuing_channels.tsv[0m[0m
[0m[93mChecking file:[0m [94mB:\Uni\Master\Semester_03\EEG\project\repo\eeg_course_project\data/ds003702/sub-01/eeg/sub-01_task-SocialMemoryCuing_eeg.eeg[0m[0m
[0m[93mChecking file:[0m [94mB:\Uni\Master\Semester_03\EEG\project\repo\eeg_course_project\data/ds003702/sub-01/eeg/sub-01_task-SocialMemoryCuing_eeg.json[0m[0m
[0m[93mChecking file:[0m [94mB:\Uni\Master\Semester_03\EEG\project\repo\eeg_course_project\data/ds003702/sub-01/eeg/sub-01_task-SocialMemoryCuing_eeg.vhdr[0m[0m
[0m[93mChecking file:[0m [94mB:\Uni\Master\Semester_03\EEG\project\repo\eeg_course_project\data/ds003702/sub-01/eeg/sub-01_task-SocialMemoryCuing_eeg.vmrk[0m[0m
[0m[93mChecking file:[0m [94mB:\Uni\Ma

### Pre-processing of Data Files

Once all data is downloaded and unpacked, the format and content of multiple contained files needs to get updated.
This allows for direct use of the updated dataset with tools of MNE BIDS pipeline.

For this data set, this consists mainly of two tasks:

1. Fix file links in `*.vhdr` and `*.vmrk` files. This is needed, because the files got renamed after exporting, but the original authors did not fix the file links
2. Generate a `*_events.tsv` file containing for each subject, which contains onset time, duration, and type for each labelled time frame.

In [16]:
# import tools to patch fresh data
import data_handling.data_patcher as patch
import data_handling.convert_brainvision2bids as convert

# run patches
patch.patchAllFiles(bids_cfg.bids_root)
convert.buildEventTSV(bids_cfg.bids_root)

[0m[0mPatching file names for Subject:[0m [94m1[0m[0m
[0m[0mPatching file names for Subject:[0m [94m2[0m[0m
[0m[0mPatching file names for Subject:[0m [94m3[0m[0m
[0m[0mPatching file names for Subject:[0m [94m4[0m[0m
[0m[0mPatching file names for Subject:[0m [94m5[0m[0m
[0m[0mPatching file names for Subject:[0m [94m6[0m[0m
[0m[0mPatching file names for Subject:[0m [94m7[0m[0m
[0m[0mPatching file names for Subject:[0m [94m9[0m[0m
[0m[0mPatching file names for Subject:[0m [94m10[0m[0m
[0m[0mPatching file names for Subject:[0m [94m11[0m[0m
[0m[0mPatching file names for Subject:[0m [94m12[0m[0m
[0m[0mPatching file names for Subject:[0m [94m13[0m[0m
[0m[0mPatching file names for Subject:[0m [94m14[0m[0m
[0m[0mPatching file names for Subject:[0m [94m15[0m[0m
[0m[0mPatching file names for Subject:[0m [94m16[0m[0m
[0m[0mPatching file names for Subject:[0m [94m17[0m[0m
[0m[0mPatching file names for 

### Validity Check

Now that we got all the data we require, we can import the config again.
This time, it is done with checks for all parameters being valid.

In [17]:
bids_cfg = getConfig(
    config_path=bids_config_path,
)

### Electrode Coordinates
Next, we look at the used electrode coordinates.
The authors have chosen the 1010 system.
Since this is not directly given, we chose to load the coordinates of the 1005 system electrodes insdead of defining a custom 1010 system.

The unused positions get ignored.
The electrodes, for which recorded signals are given, are virtually positioned at the correct positions in the pipeline.

## Run the Pipeline

Once the preparatory steps are done and a configuration is loaded, the pipeline can get run.

In this notebook, the initial setup of the pipeline output gets run.
These steps include loading modules as dependencies, optionally resetting the output data directory, and - if empty - initialising the output directory.

### Load Dependencies

In addition to the modules loaded as dependencies in the next cell, some dependencies were loaded in a previous cell.
This includes `mne-bids-pipeline`, which is in use for loading the configuration from the prepared file.

In [18]:
# allow for calling mne_bids_pipeline within Python
import sys

from mne_bids import BIDSPath
from typing import Optional

### Deletion of Prior Outputs (Optional)

In case errors occur while running the pipeline, we remove the output of the previous pipeline runs.
This gets done by running the following two cells after setting `CLEAR_PIPELINE_OUTPUT` to `True`.

Note: If the value is set to `True`, the computations of the pipeline will need more time than when using some pre-processed output.

In [27]:
CLEAR_PIPELINE_OUTPUT = True # False: Keep previous pipeline output; True: Delete all previous pipeline outputs

In [28]:
if CLEAR_PIPELINE_OUTPUT:
    clean.removeDirectory("{}/derivatives/mne-bids-pipeline".format(bids_cfg.bids_root))


[0m[31mRemoving directory:[0m [94mB:\Uni\Master\Semester_03\EEG\project\repo\eeg_course_project\data\ds003702/derivatives/mne-bids-pipeline[0m[0m


### Pipeline: Initial Pipeline Run

At this step, the preparations are finished.
Therefore, we can start running the pipeline based on the configuration file.

The initialisation should create e. g. needed directories for the subsequent steps.

In [24]:
curr_steps = "init"
!mne_bids_pipeline --config {bids_config_path} --steps {curr_steps}

┌────────┬ Welcome aboard MNE-BIDS-Pipeline! 👋 ───────────────────────────────
¦16:49:21¦ 📝 Using configuration: ./mne-bids/config/mne-bids-pipeline.py
└────────┴ 
┌────────┬ init/_01_init_derivatives_dir ──────────────────────────────────────
¦16:49:21¦ ⏳️ Initializing output directories.
└────────┴ done (1s)
┌────────┬ init/_02_find_empty_room ───────────────────────────────────────────
¦16:49:21¦ ⏩ Skipping, empty-room data only relevant for MEG …
└────────┴ done (1s)
┌────────┬ init/_01_init_derivatives_dir ──────────────────────────────────────
¦16:49:21¦ ✅ Output directories already exist …
└────────┴ done (1s)
┌────────┬ init/_02_find_empty_room ───────────────────────────────────────────
¦16:49:21¦ ⏩ Skipping, empty-room data only relevant for MEG …
└────────┴ done (1s)


In case of Unicode encode errors when attempting to run the pipeline, make sure that the following environment variable is set:

The environment variable should already be set, if the first cell - the cell containing the module installation via `pip` - was run after starting the current Python kernel.

Remember to restart Jupyter after setting the environment variable.

### Pipeline: Pre-processing and Analysis

The following steps get run in the next Jupyter notebook:
```
02_run_pipeline.ipynb
```
