# Initializing a BIDS Study

This notebook takes operates on the `sourcedata` folder inside of the StudyTemplate and attempts to BIDSify what is present.

The first step is to see what is present inside of the `sourcedata` folder via the `glob` package:

In [1]:
import glob

glob.glob('../sourcedata/*')

['../sourcedata/IC_trn_2.bdf',
 '../sourcedata/IC_trn_1.bdf',
 '../sourcedata/IC_trn_3.bdf',
 '../sourcedata/IC_trn_4.bdf',
 '../sourcedata/README.md']

Next, let's look at what one recording looks like:

In [2]:
import mne
raw = mne.io.read_raw('../sourcedata/IC_trn_2.bdf')
raw

Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_2.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...


0,1
Measurement date,"August 31, 2009 14:39:08 GMT"
Experimenter,Unknown
Participant,

0,1
Digitized points,Not available
Good channels,"128 EEG, 1 Stimulus"
Bad channels,
EOG channels,Not available
ECG channels,Not available

0,1
Sampling frequency,1024.00 Hz
Highpass,0.00 Hz
Lowpass,268.00 Hz
Filenames,IC_trn_2.bdf
Duration,00:05:03 (HH:MM:SS)


Some things that jump out and require intervention:

* The sampling rate is quite high
* There's no montage information
* The reference is still based on the one from the amplifier

Here's how to fix them:

In [3]:
# raw = raw.resample(128) # This can take a lot of time to run
raw.load_data()
raw = raw.set_montage('biosemi128')
raw = raw.set_eeg_reference('average')
raw

Reading 0 ... 310271  =      0.000 ...   302.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.


0,1
Measurement date,"August 31, 2009 14:39:08 GMT"
Experimenter,Unknown
Participant,

0,1
Digitized points,131 points
Good channels,"128 EEG, 1 Stimulus"
Bad channels,
EOG channels,Not available
ECG channels,Not available

0,1
Sampling frequency,1024.00 Hz
Highpass,0.00 Hz
Lowpass,268.00 Hz
Filenames,IC_trn_2.bdf
Duration,00:05:03 (HH:MM:SS)


It is best to use the above cells to figure out what needs to be done to your data before it is in a good enough state to write to BIDS.

Other things may include:
* Manually marking out already known bad channels
* Merging files together
* Setting in-task/out-task time periods

Once everything has been figured out, you can take that procedure and turn it into a loop for all subjects inside of your `sourcedata` folder as shown below.

In [4]:
del raw # This just clears the previous raw, just to be safe.

In [8]:
import mne_bids, re, json

task_name = 'fhbc' # Standard sort of task naming
root_location = '..' # Remember, this is running from inside of the code folder

event_dict = {
    "static/checker/left": 215,
    "press/left": 201,
    "static/checker/right": 216,
    "press/right": 204,
    "static/face/upright": 211,
    "static/face/inverted": 212,
    "static/house/upright": 213,
    "static/house/inverted": 214,
    "boundary": 65790,
}

# A little different than before, only grabs BDFs
for file in glob.glob('../sourcedata/*.bdf'):
    raw = mne.io.read_raw(file) # Load the file
    subject_id = re.findall(r'\d+', file)[0] # Some intermediate Python; grabs the subject id out of the file path/name

    raw.load_data()
    raw = raw.set_montage('biosemi128')
    raw = raw.set_eeg_reference('average')

    # The below two functions are part of the mne bids package and have their
    # own documentation that outlines how to interact with them
    bids_path = mne_bids.BIDSPath(subject=subject_id, task=task_name, root=root_location)
    mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)

Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_2.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 310271  =      0.000 ...   302.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
Removing orphaned offset at the beginning of the file.
399 events found on stim channel Status
Event IDs: [201 204 211 212 213 214 215 216]
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-2/eeg/sub-2_space-CapTrak_electrodes.tsv'...
Writing '../sub-2/eeg/sub-2_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: [np.str_('press/left'), np.str_('press/right'), np.str_('static/checker/left'), np.str_('static/checker/right'), np.str_('static/face/inverted'), np.str_('st

  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-2/sub-2_scans.tsv'...
Wrote ../sub-2/sub-2_scans.tsv entry with eeg/sub-2_task-fhbc_eeg.edf.
Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_1.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 280575  =      0.000 ...   273.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
401 events found on stim channel Status
Event IDs: [  201   204   211   212   213   214   215   216 65790]
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-1/eeg/sub-1_space-CapTrak_electrodes.tsv'...
Writing '../sub-1/eeg/sub-1_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: [np.str_('boundary'), np.str_('press/left'), np.str_('press/right'), np.str_('sta

  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-1/sub-1_scans.tsv'...
Wrote ../sub-1/sub-1_scans.tsv entry with eeg/sub-1_task-fhbc_eeg.edf.
Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_3.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 288767  =      0.000 ...   281.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
Removing orphaned offset at the beginning of the file.
398 events found on stim channel Status
Event IDs: [201 204 211 212 213 214 215 216]
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-3/eeg/sub-3_space-CapTrak_electrodes.tsv'...
Writing '../sub-3/eeg/sub-3_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: [np.str_('press/left'), np.str_('press/right'), 

  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-3/sub-3_scans.tsv'...
Wrote ../sub-3/sub-3_scans.tsv entry with eeg/sub-3_task-fhbc_eeg.edf.
Extracting EDF parameters from /home/tyler/Documents/eeg-dev/StudyTemplate/sourcedata/IC_trn_4.bdf...
BDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading 0 ... 266239  =      0.000 ...   259.999 secs...
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Trigger channel Status has a non-zero initial value of {initial_value} (consider using initial_event=True to detect this event)
Removing orphaned offset at the beginning of the file.
384 events found on stim channel Status
Event IDs: [201 204 211 212 213 214 215 216]
Writing '../participants.tsv'...
Writing '../participants.json'...
Writing '../sub-4/eeg/sub-4_space-CapTrak_electrodes.tsv'...
Writing '../sub-4/eeg/sub-4_space-CapTrak_coordsystem.json'...
Used Annotations descriptions: [np.str_('press/left'), np.str_('press/right'), 

  mne_bids.write_raw_bids(raw, bids_path, events=mne.find_events(raw), event_id=event_dict, format='EDF', allow_preload=True, overwrite=True)


Writing '../sub-4/sub-4_scans.tsv'...
Wrote ../sub-4/sub-4_scans.tsv entry with eeg/sub-4_task-fhbc_eeg.edf.


A lot of BIDS is about adding some extra metadata to your dataset to make it easier for people to interact with in the future.

You can either edit the `dataset_description.json` file in the project's root manually or use the following function.

Note that this isn't an exhaustive list of the fields, just a few for example purposes.

In [9]:
mne_bids.make_dataset_description(
    path=root_location,
    name='StudyTemplate',
    authors=["Tyler K. Collins', 'James A. Desjardins"],
    how_to_acknowledge="This is part of a StudyTemplate taken from https://github.com/Andesha/StudyTemplate/",
    acknowledgements="Tyler K. Collins and James A. Desjardins",
    data_license="CC0",
    references_and_links=[
        "https://github.com/Andesha/StudyTemplate/",
    ],
    overwrite=True,
)
desc_json_path = bids_path.root / "dataset_description.json"
with open(desc_json_path, encoding="utf-8-sig") as fid:
    display(json.loads(fid.read()))

Writing '../dataset_description.json'...


{'Name': 'StudyTemplate',
 'BIDSVersion': '1.7.0',
 'DatasetType': 'raw',
 'License': 'CC0',
 'Authors': ["Tyler K. Collins', 'James A. Desjardins"],
 'Acknowledgements': 'Tyler K. Collins and James A. Desjardins',
 'HowToAcknowledge': 'This is part of a StudyTemplate taken from https://github.com/Andesha/StudyTemplate/',
 'ReferencesAndLinks': ['https://github.com/Andesha/StudyTemplate/']}