# Initializing a BIDS Study

This notebook operates on the `sourcedata` folder inside the StudyTemplate and attempts to BIDSify the content.

This notebook also assumes that you have run atleast one subject through `explore_source.ipynb` and confirmed which actions you would like to make sure happen to the data before it is put into BIDS or shared.

The first step is to see what is present inside the `sourcedata` folder via the `glob` package:

In [1]:
import glob

glob.glob('../data/eeg-study-template/sourcedata/*')

['../sourcedata/IC_trn_2.bdf',
 '../sourcedata/IC_trn_1.bdf',
 '../sourcedata/IC_trn_3.bdf',
 '../sourcedata/IC_trn_4.bdf',
 '../sourcedata/README.md']

Good, this is the sample data that is expected.

Next, confirm that a single recording can be loaded easily:

In [2]:
import mne
raw = mne.io.read_raw('../data/eeg-study-template/sourcedata/IC_trn_2.bdf')
raw

Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_2.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...


Unnamed: 0,General,General.1
,Filename(s),IC_trn_2.bdf
,MNE object type,RawEDF
,Measurement date,2009-08-31 at 14:39:08 UTC
,Participant,
,Experimenter,Unknown
,Acquisition,Acquisition
,Duration,00:05:03 (HH:MM:SS)
,Sampling frequency,1024.00 Hz
,Time points,310272
,Channels,Channels


Some things that jump out and require intervention:

* There's no montage information
* The reference is still based on the one from the amplifier
* The data contains a lot of high frequency activity that should likely be filtered out

Recall that in `explore_source.ipynb` these issues are fixed and discussed and can be adapated to prep your own data.

For now, we will wrap the actions we used to prep the data into a function that can be ran on any loaded file.  

In [3]:
# Takes in a raw MNE object, and performs data prep/cleaning
# Intended to be used in the body of a loop to standardize prep/cleaning of a large group of files
def prep_data(to_prep):
    to_prep.load_data() # Make sure MNE has the data in memory as it is to be manipulated
    to_prep = to_prep.set_montage('biosemi128') # Set the montage as before
    to_prep = to_prep.set_eeg_reference('average') # Set the reference as before
    to_prep = to_prep.filter(l_freq=1.0, h_freq=30.0)

    return to_prep

The intened use of this function is to apply it to each recording file found in `sourcedata`. To test it on a single file please see the below code snippet:

```python
raw = mne.io.read_raw('../data/eeg-study-template/sourcedata/IC_trn_2.bdf') # Reload data
raw = prep_data(raw) # Call function defined above
raw # Confirm and display changed properies
```

Other things that may be nice to include in the `prep_data` function:
* Manually marking out already known bad channels
* Setting in-task/out-task time periods

The next step is to take the `prep_data` function and turn it into a loop for all subjects inside of the `sourcedata` folder as shown below.

In [4]:
import mne_bids, re, json

task_name = 'fhbc' # Known task name for experiment, change to your own

# The following dictionary maps stim channel markers to strings
# For your own studies, this may not be necessary.
# Check the Epochs section in `explore_source.ipynb` for more information
event_dict = {
    "static/checker/left": 215,
    "press/left": 201,
    "static/checker/right": 216,
    "press/right": 204,
    "static/face/upright": 211,
    "static/face/inverted": 212,
    "static/house/upright": 213,
    "static/house/inverted": 214,
    "boundary": 65790,
}

# Begin BIDS procedure
root_location = '..' # Path to the root, don't change unless very certain
for file in glob.glob('../data/eeg-study-template/sourcedata/*.bdf'):# A little different from before, only grabs BDFs
    raw = mne.io.read_raw(file) # Load the file

    # Take our previously defined function and use to to apply our "prepared" changes
    raw = prep_data(raw)

    # Some intermediate Python; extract the subject id out of the file path/name.
    # For your own studies, it can be a great idea to replace this line with
    # something from ChatGPT.
    subject_id = re.findall(r'\d+', file)[0]

    # The below two functions are part of the mne bids package and have their
    # own documentation that outlines how to interact with them
    bids_path = mne_bids.BIDSPath(subject=subject_id, task=task_name, root=root_location)
    mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)

Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_2.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 310271  =      0.000 ...   302.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 30 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 30.00 Hz
- Upper transition bandwidth: 7.50 Hz (-6 dB cutoff frequency: 33.75 Hz)
- Filter length: 3381 samples (3.302 s)



[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  71 tasks      | elapsed:    1.4s


Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
Removing orphaned offset at the beginning of the file.
399 events found on stim channel Status
Event IDs: [201 204 211 212 213 214 215 216]
Writing '../README.md'...
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-2/eeg/sub-2_space-CapTrak_electrodes.tsv'...
Writing '../sub-2/eeg/sub-2_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: ['press/left', 'press/right', 'static/checker/left', 'static/checker/right', 'static/face/inverted', 'static/face/upright', 'static/house/inverted', 'static/house/upright']
Writing '../sub-2/eeg/sub-2_task-fhbc_events.tsv'...
Writing '../sub-2/eeg/sub-2_task-fhbc_events.json'...
Writing '../dataset_description.json'...
Writing '../sub-2/eeg/sub-2_task-fhbc_eeg.json'...
Writing '../sub-2/eeg/sub-2_task-fhbc_channels.tsv'...
Copying data files to sub-2_task-fhbc_eeg.edf


  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-2/sub-2_scans.tsv'...
Wrote ../sub-2/sub-2_scans.tsv entry with eeg/sub-2_task-fhbc_eeg.edf.
Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_1.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 280575  =      0.000 ...   273.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 30 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 30.00 Hz
- Upper transition bandwidth: 7.50 Hz (-6 dB cutoff frequency: 33.75 Hz)
- Filter length: 3

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  71 tasks      | elapsed:    1.4s


Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
401 events found on stim channel Status
Event IDs: [  201   204   211   212   213   214   215   216 65790]
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-1/eeg/sub-1_space-CapTrak_electrodes.tsv'...
Writing '../sub-1/eeg/sub-1_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: ['boundary', 'press/left', 'press/right', 'static/checker/left', 'static/checker/right', 'static/face/inverted', 'static/face/upright', 'static/house/inverted', 'static/house/upright']
Writing '../sub-1/eeg/sub-1_task-fhbc_events.tsv'...
Writing '../sub-1/eeg/sub-1_task-fhbc_events.json'...
Writing '../dataset_description.json'...
Writing '../sub-1/eeg/sub-1_task-fhbc_eeg.json'...
Writing '../sub-1/eeg/sub-1_task-fhbc_channels.tsv'...
Copying data files to sub-1_task-fhbc_eeg.edf


  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-1/sub-1_scans.tsv'...
Wrote ../sub-1/sub-1_scans.tsv entry with eeg/sub-1_task-fhbc_eeg.edf.
Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_3.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 288767  =      0.000 ...   281.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 30 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 30.00 Hz
- Upper transition bandwidth: 7.50 Hz (-6 dB cutoff frequency: 33.75 Hz)
- Filter length: 3

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  71 tasks      | elapsed:    1.0s


Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
Removing orphaned offset at the beginning of the file.
398 events found on stim channel Status
Event IDs: [201 204 211 212 213 214 215 216]
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-3/eeg/sub-3_space-CapTrak_electrodes.tsv'...
Writing '../sub-3/eeg/sub-3_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: ['press/left', 'press/right', 'static/checker/left', 'static/checker/right', 'static/face/inverted', 'static/face/upright', 'static/house/inverted', 'static/house/upright']
Writing '../sub-3/eeg/sub-3_task-fhbc_events.tsv'...
Writing '../sub-3/eeg/sub-3_task-fhbc_events.json'...
Writing '../dataset_description.json'...
Writing '../sub-3/eeg/sub-3_task-fhbc_eeg.json'...
Writing '../sub-3/eeg/sub-3_task-fhbc_channels.tsv'...
Copying data files to sub-3_task-fhbc_eeg.edf


  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-3/sub-3_scans.tsv'...
Wrote ../sub-3/sub-3_scans.tsv entry with eeg/sub-3_task-fhbc_eeg.edf.
Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_4.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 266239  =      0.000 ...   259.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Filtering raw data in 1 contiguous segment
Setting up band-pass filter from 1 - 30 Hz

FIR filter parameters
---------------------
Designing a one-pass, zero-phase, non-causal bandpass filter:
- Windowed time-domain design (firwin) method
- Hamming window with 0.0194 passband ripple and 53 dB stopband attenuation
- Lower passband edge: 1.00
- Lower transition bandwidth: 1.00 Hz (-6 dB cutoff frequency: 0.50 Hz)
- Upper passband edge: 30.00 Hz
- Upper transition bandwidth: 7.50 Hz (-6 dB cutoff frequency: 33.75 Hz)
- Filter length: 3

[Parallel(n_jobs=1)]: Done  17 tasks      | elapsed:    0.3s
[Parallel(n_jobs=1)]: Done  71 tasks      | elapsed:    1.2s


Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
Removing orphaned offset at the beginning of the file.
384 events found on stim channel Status
Event IDs: [201 204 211 212 213 214 215 216]
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-4/eeg/sub-4_space-CapTrak_electrodes.tsv'...
Writing '../sub-4/eeg/sub-4_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: ['press/left', 'press/right', 'static/checker/left', 'static/checker/right', 'static/face/inverted', 'static/face/upright', 'static/house/inverted', 'static/house/upright']
Writing '../sub-4/eeg/sub-4_task-fhbc_events.tsv'...
Writing '../sub-4/eeg/sub-4_task-fhbc_events.json'...
Writing '../dataset_description.json'...
Writing '../sub-4/eeg/sub-4_task-fhbc_eeg.json'...
Writing '../sub-4/eeg/sub-4_task-fhbc_channels.tsv'...
Copying data files to sub-4_task-fhbc_eeg.edf


  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-4/sub-4_scans.tsv'...
Wrote ../sub-4/sub-4_scans.tsv entry with eeg/sub-4_task-fhbc_eeg.edf.


Assuming that the above cell completed, you have successfully BIDSified data!

At this point, the study passes the latest version of the BIDS validator.

However, a significant part of BIDSification is adding extra metadata to your dataset to make it easier for people to interact with in the future.

You can either edit the `dataset_description.json` file in the project's root folder manually or use the following codeblock.

Note that this isn't an exhaustive list of the fields, just a few for demonstration purposes.

In [5]:
mne_bids.make_dataset_description(
    path=root_location,
    name='StudyTemplate',
    authors=["Tyler K. Collins', 'James A. Desjardins"],
    how_to_acknowledge="This is part of a StudyTemplate taken from https://github.com/Andesha/StudyTemplate/",
    acknowledgements="Tyler K. Collins and James A. Desjardins",
    data_license="CC0",
    references_and_links=[
        "https://github.com/Andesha/StudyTemplate/",
    ],
    overwrite=True,
)
desc_json_path = bids_path.root / "dataset_description.json"
with open(desc_json_path, encoding="utf-8-sig") as fid:
    display(json.loads(fid.read()))

Writing '../dataset_description.json'...


{'Name': 'StudyTemplate',
 'BIDSVersion': '1.7.0',
 'DatasetType': 'raw',
 'License': 'CC0',
 'Authors': ["Tyler K. Collins', 'James A. Desjardins"],
 'Acknowledgements': 'Tyler K. Collins and James A. Desjardins',
 'HowToAcknowledge': 'This is part of a StudyTemplate taken from https://github.com/Andesha/StudyTemplate/',
 'ReferencesAndLinks': ['https://github.com/Andesha/StudyTemplate/']}

In [None]:
from pathlib import Path

Path("../bids_init_completed").touch()

In general, the last steps of BIDSification should be dedicated to going over both modality specific AND agnostic requirements, recommendations, as well as any optional information that would be nice to include.

If you are considering sharing to EEGNET and/or CONP the following things should be included in the data:

CONP (see [DATS.json editor](https://portal.conp.ca/dats-editor))

EEGNet expected fields that are not REQUIRED by BIDS:

* dataset_description.json
    * BIDSVersion
    * HEDVersion
    * Authors
    * License
    * DatasetType

* participants.tsv  (with accompanying participants.json)
    * age and/or dob
    * sex
    * handedness

* scans.tsv
    * acq_time

* eeg.json
    * CapManufacturer
    * CapManufacturersModelName
    * EEGChannelCount
    * EOGChannelCount
    * ECGChannelCount
    * EMGChannelCount
    * EEGGround
    * EEGPlacementScheme
    * HardwareFilters
    * InstitutionName
    * InstitutionAddress
    * MiscChannelCount
    * Manufacturer
    * ManufacturerModelName
    * RecordingType
    * RecordingDuration
    * SoftwareVersions
    * TriggerChannelCount
    * TaskDescription
    * OutputType