## Welcome to the Brainhack BIDS Demo!

In this demo, we'll work with some example EEG source data. We're going to rename and re-organise the files into the BIDS format, and create some metadata files to describe our data. 

These data were collected to create a machine learning training dataset with the aim of continuously classifying which of two features was currently attended at each moment of each trial. We call the experiment “FeatAttnClass” for short. Below is a description of the task:

*We set out to collect an EEG dataset to use to train various machine learning algorithms to detect the focus of feature-selective attention. Subjects were cued to attend to attend to either black or white moving dots, and respond to brief periods of coherent motion in the cued colour. The display consisted of either both black and white dots, or only the cued colour in randomly interleaved trials. The field of moving dots in the uncued colour never moved coherently, and should thus not have captured attention. The fields of dots flickered at 6 and 7.5 Hz. Colour and frequency were fully counterbalanced. Each trial consisted of a 1 second cue followed by 15 s of the dot motion stimulus.*

The task instructions were as follows: 

*Participants were informed of the purpose of the study, and instructed to press the arrow keys corresponding to the direction of any epoch of coherent motion they saw in the cued colour.*

The data were sampled at 1200 Hz using a g.tec amplifier (model g.USBamp) through the g.tec API running in MATLAB 2017a. Continuous data were recorded from five EEG channels (Iz, Oz, POz, O1, O2) arranged according to the international 10-20 system for electrode placement in a nylon head cap. The ground electrode was placed at Cz, and an ear reference was used. The powerline frequency was 50 Hz, and data were collected with a high pass filter at 1 Hz and a low pass filter at 100 Hz. The data is stored such that the EEG channels are in columns 1-5 in the matrix, and a trigger channel is at position 6. Changes in the amplitude of this trigger channel represent events. 

The data were recorded at the Queensland Brain Institute at The University of Queensland, which is located at: Building 79, The University of Queensland, St Lucia, Australia, 4072. 


For more resources on how to "BIDS-ify" your EEG data, we recommend checking out this paper:

Pernet, C.R., Appelhoff, S., Gorgolewski, K.J. et al. EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Sci Data 6, 103 (2019). https://doi.org/10.1038/s41597-019-0104-8

as well as these resources:
https://github.com/bids-standard/bids-starter-kit


### Step 1: Import libraries and set paths ###

In [None]:
# Import necessary libraries for file manipulation
import h5py
import json
import numpy as np
import os
import pandas as pd
from pathlib import Path
import shutil

In [None]:
# Set paths for source data and BIDS data
ROOTPATH = Path().cwd().parent

In [None]:
# Get file names for relevant EEG and behavioural data 
eegFiles = sorted((ROOTPATH / '01_Sourcedata').glob('**/eeg*'))
behFiles = sorted((ROOTPATH / '01_Sourcedata' ).glob('**/bhv*'))

print('EEG Files found:', eegFiles)
print('Behavioural Files found:', behFiles)

Note that the filenames for all of these files are the same for EEG data (eegData.mat) and behavioural data (bhvData.mat). Without the filepath, there is no way to know what subject, or even what experiment these data are for. This could easily lead to errors where we accidently analyse the same subject's data multiple times, or loose subject data. 

Further, the folder names vary, some contain unique initials, while others don't. Introducing this sort of variability makes it difficult to automate your analysis pipeline/file loading, and means each new person who uses the data needs to write unique code to load the data. 

### Step 2: Iterate through source data and save raw data in a proper folder structure and observing naming conventions ###

To avoid the aformentioned issues, we'll copy our data into a new folder "02_Rawdata" where each subject has their own subfolder ("Sub-01", "Sub-02", etc). 

We will also rename the files to be BIDS compliant, with a name that features the subject ID, task name, and data type.

In [None]:
# Specify the taskname:
taskname = 'FeatAttnDec'

In [None]:
# depending on your operating system, you may need to switch between these splitters
# splitter = '\\' # for Windows
splitter = '/' # for MacOS and Linux

In [None]:
# transfer source behavioural files to raw behavioural files
for fpath in behFiles:
    # get subject ID
    pname = str(fpath.parent).split(splitter )[-1]    
    subID = pname.replace('_', '\t').replace('-', '\t').split()[0][1:].rjust(2, '0')
    
    # Make Directory
    (ROOTPATH / '02_Rawdata' / 'sub-{}'.format(subID) / 'beh').mkdir(exist_ok=True, parents=True)
    
    # Specify new file name
    filename = 'sub-{}_task-'.format(subID) + taskname +  '_beh.mat'
    rawfile = ROOTPATH / '02_Rawdata' / 'sub-{}'.format(subID) / 'beh'/ filename
    
    #Copy file
    shutil.copyfile(fpath, rawfile)
    

In [None]:
# transfer source EEG files to raw EEG files
for fpath in eegFiles:
    # get subject ID
    pname = str(fpath.parent).split(splitter)[-1]
    subID = pname.replace('_', '\t').replace('-', '\t').split()[0][1:].rjust(2, '0')
    
    # Make Directory
    (ROOTPATH / '02_Rawdata' / 'sub-{}'.format(subID) / 'eeg').mkdir(exist_ok=True, parents=True)
    
    # Specify new file name
    filename = 'sub-{}_task-'.format(subID) + taskname +  '_eeg.mat'
    rawfile = ROOTPATH / '02_Rawdata' / 'sub-{}'.format(subID) / 'eeg'/ filename
    
    # Copy file
    shutil.copyfile(fpath, rawfile)

Note that BIDS specifies specific file formats which are acceptable for EEG: "The European Data Format (EDF), which is an ongoing international efort to provide a common data format for electrophysiological recordings that began in 19927, and the BrainVision Core Data Format, developed by Brain Products GmbH"

We won't cover data conversion in this script, but note that you would need to for your own data if it was not in an approved format

### Step 3: Create and store Channel metadata in tsv format ###

Each subject's EEG data needs to be stored with relevant metadata to help future researchers (including yourself!) to know the specifics of how the data were recorded

We'll start with a tsv file, which outlines the parameters our EEG data storage variable. This will have the following collumns:
- "name": The name of the electrode
- "type": What sort of data is this?
- "units": what units are the data in?
- "low cutoff": If the data were filtered, what was the lowpass filter cutoff
- "high cutoff": If the data were filtered, what was the highpass filter cutoff
- "reference": What sort of reference was used?
- "group" Was this electrode from a specific separate group?
- "sampling frequency"
- "description": do you want to give this variable a description?
- "notch": Was there a notch filter, if so, what freq?
- "status": Describe the status of the data, i.e. "good", "bad"
- "Status description": describe what you mean by the status  i.e. very noisy in second half of exp.

In [None]:
# create tsv file

# settings
nchannels = 6
ncolumns = 12

# specify column names and create empty data vessel
columnNames = ["name", "type", "units", "low_cutoff", "high_cutoff", "reference", "group", "sampling_frequency", "description", "notch", "status", "Status_description"]
datempty = np.empty([nchannels,ncolumns]) # N rows x M cols

# loop through subjects to create tsv for each dataset

for fpath in eegFiles:
    # get subject ID
    pname = str(fpath.parent).split(splitter)[-1]
    subID = pname.replace('_', '\t').replace('-', '\t').split()[0][1:].rjust(2, '0')
    
    # create Meta data structure
    MetaData = pd.DataFrame(data = datempty,    columns = columnNames)

    # Specify properties
    MetaData.name = pd.DataFrame( [['Iz'], ['Oz'], ['POz'], ['O1'], ['O2'], ['TRIG']])
    MetaData.type = pd.DataFrame( [['EEG'], ['EEG'], ['EEG'], ['EEG'], ['EEG'], ['TRIG']])
    MetaData.units = 'μV'
    MetaData.low_cutoff = 100
    MetaData.high_cutoff = 1
    MetaData.reference = pd.DataFrame( [['ear'], ['ear'], ['ear'], ['ear'], ['ear'], ['n/a']])
    MetaData.group = 1;
    MetaData.sampling_frequency = 1200
    MetaData.description = 'n/a'
    MetaData.notch = 50
    
    # specify specific channel status's for specific subjects if nescescarry
    if subID == "01":
        MetaData.status = pd.DataFrame( [['GOOD'], ['BAD'], ['GOOD'], ['GOOD'], ['GOOD'], ['GOOD']])
        MetaData.Status_description = pd.DataFrame( [['n/a'], ['Channel exessively noisy through, suspect broken electrode'], ['n/a'], ['n/a'], ['n/a'], ['n/a']])
    elif subID == "02":
        MetaData.status = pd.DataFrame( [['OK'], ['OK'], ['OK'], ['OK'], ['OK'], ['OK']])
        MetaData.Status_description = 'Regular artifact throughout, suspect electrical interference'
    else:
        MetaData.status = pd.DataFrame( [['GOOD'], ['GOOD'], ['GOOD'], ['GOOD'], ['GOOD'], ['GOOD']])
        MetaData.Status_description = pd.DataFrame( [['n/a'], ['n/a'], ['n/a'], ['n/a'], ['n/a'], ['n/a']])
    
    
    
    # Specify file name
    filename = 'sub-{}_task-'.format(subID) + taskname +  '_channels.tsv'
    metadatafile = ROOTPATH / '02_Rawdata' / 'sub-{}'.format(subID) / 'eeg'/ filename
    
    # Write to tsv
    MetaData.to_csv(
        metadatafile, 
        sep = '\t', na_rep = 'n/a', index = False
    )

In [None]:
print(MetaData)

### Step 4: Create and store task metadata in json format ###

In [None]:
# create json file
metadata = dict(
    SubjectArtefactDescription = "Strange Regular artifact - visually looks to be at about 0.5Hz, but resistant to filtering. Suspect this is some sort of electrical artifact in the room - testing was not perfomed in a farraday cage. Should not effect FFT based anayses",
    TaskName = taskname,
    SamplingFrequency =   1200,    
    PowerLineFrequency = 50,
    SoftwareFilters = "n/a",
    DCOffsetCorrection = "n/a",
    EEGReference = "ear",
    EEGGround = "Cz",
    HardwareFilters = dict(
        highpassfilter = dict(CutoffFrequency = 1),
        lowpassfilter =  dict(CutoffFrequency = 100)
    ),
    Manufacturer = "g.tec",
    ManufacturersModelName = "g.USBamp",
    SoftwareVersions = "g.tec API functions running in MATLAB 2017a",
    InstitutionName = "Queensland Brain Institute, The University of Queensland",
    InstitutionAddress = "Building 79, The University of Queensland, St Lucia, Australia, 4072",
    EEGChannelCount = "5",
    TriggerChannelCount = "1",
    RecordingDuration = "2939.0933",
    RecordingType = "continuous",
    TaskDescription = "We set out to collect an EEG dataset to use to train various machine learning algorithms to detect the focus of feature-selective attention. Subjects were cued to attend to attend to either black or white moving dots, and respond to brief periods of coherent motion in the cued colour. The display consisted of either both black and white dots, or only the cued colour in randomly interleaved trials. The field of moving dots in the uncued colour never moved coherently, and should thus not have captured attention. The fields of dots flickered at 6 and 7.5 Hz. Colour and frequency were fully counterbalanced. Each trial consisted of a 1 second cue followed by 15 s of the dot motion stimulus. ",
    Instructions = "Participants were informed of the purpose of the study, and instructed to press the arrow keys corresponding to the direction of any epoch of coherent motion they saw in the cued colour."
)

# save json file
for fpath in eegFiles:
    # get subject ID
    pname = str(fpath.parent).split(splitter)[-1]
    subID = pname.replace('_', '\t').replace('-', '\t').split()[0][1:].rjust(2, '0')
    
    # generate file name
    filename = 'sub-{}_task-'.format(subID) + taskname +  '_eeg.json'
    metadatafile = ROOTPATH / '02_Rawdata' / 'sub-{}'.format(subID) / 'eeg'/ filename
    
    with metadatafile.open('w') as f:
        json.dump(metadata, f, indent=2)


### Step 5: Congratulations! ###

Congrats! You now have the essential skills you'll need to organise your EEG data in the BIDS convention. 
1. You know how to rename and reorganise your files
2. You know how to create a tsv file describing the channel metadata
3. You know how to create a json file specifying the experiment metadata

To completely follow the BIDS format, you still need a participants.tsv file, and a participants.json file specifying your participant metadata, as well as README.txt and CHANGES.txt files. If you have some spare time at the end of the session, you could go ahead and create these below!