# Quick Clean a Study

This notebook iterates over each BIDS subject within a dataset and cleans then saves them as a derivative called `quick_clean`.

This notebook assumes that, in general, you follow and understand the concepts in:

* `explore_source.ipynb`
* `init_bids_study.ipynb`

The following cell simply imports and defines the required variables for creating a derivative.

If you would like to create you own custom derivative, or not override this sample, simply change the `derivatives_path` variable.

In [1]:
import mne, mne_bids, os, time

bids_root = '..'
task_name = 'fhbc' # TODO: change this to read from the dataset description
derivatives_path = '../derivatives/quick_clean'

# This is to ensure that this notebook waits for the previous 
# notebook to complete (NeuroLibre)
while not os.path.exists('../bids_init_completed'):
    print("waiting for bids init to complete")
    time.sleep(5)


Much like in `init_bids_study.ipynb`, it is particularly useful and good practice to define a function that contains the operations that will run on and clean each BIDS subject.

For the purposes of this quick cleaning, the following cleaning operations will be done:

* Reject the channel `C10`, which is bad in all recordings
* Recompute the average reference
* Run a basic ICA (independent component analysis) and reject the first two components in a hard-coded manner, based on known data properties
* Recompute the average reference once again

When working with non-tutorial data, the `quick_clean_subject` function should be rewritten/expanded to perform an investigation of what cleaning actions need to be performed.

In [2]:
# Intended to be used in the body of the iteration of BIDS subject to quickly clean them
def quick_clean_subject(bids_raw):
    bids_raw.load_data()
    bids_raw.info['bads'] = ['C10'] # Manually note that C10 is a "bad" channel
    bids_raw = bids_raw.interpolate_bads() # Purges the bad channels, and performs math to recalculate them from the good ones
    bids_raw = bids_raw.set_eeg_reference('average') # In general, when you change the data, you need to recompute the reference

    # Set up and fit the ICA
    # Originally introduced in `explore_source.ipynb`
    ica = mne.preprocessing.ICA(n_components=20, random_state=97, max_iter=800)
    ica.fit(bids_raw) # Run the algorithm
    ica.exclude = [1, 2] # Details on how we picked these are omitted here
    bids_raw = ica.apply(bids_raw) # Remove components from signal based on component IDs in `ica.exclude`
    bids_raw = bids_raw.set_eeg_reference('average') # In general, when you change the data, you need to recompute the reference

    return bids_raw

To show the power of working from within a BIDS project, the following loop is constructed via `mne_bids` functions to read from the EEGStudyFlow project itself.

Note that changing the structure of the loop is never necessary. Only changing the cleaning methods within the `quick_clean_subject` function.

The following cell demonstrates how to create a naively cleaned derivative.

In [3]:
# Identify all subjects in the dataset via mne_bids
subjects = mne_bids.get_entity_vals(bids_root, entity_key='subject')

# Prep derivative location
os.makedirs(derivatives_path, exist_ok=True)

# Loop through each subject from the mne_bids object
for subject in subjects:
    # Read the raw data for the subject
    bids_path = mne_bids.BIDSPath(root=bids_root, subject=subject, task=task_name, suffix='eeg', extension='.edf')
    raw = mne_bids.read_raw_bids(bids_path=bids_path)
    
    # Call custom cleaning function
    raw = quick_clean_subject(raw)

    print(subject)
    # Prep and save the final cleaned derivative
    derivative_subject_path = f'{derivatives_path}/sub-{subject}/eeg/' 
    os.makedirs(derivative_subject_path, exist_ok=True)
    print("writes dir")
    raw.save(f'{derivative_subject_path}/sub-{subject}_eeg.fif', overwrite=True)

Extracting EDF parameters from /media/tyler/PathstoneProject/EEGStudyFlow/sub-1/eeg/sub-1_task-fhbc_eeg.edf...
EDF file detected
Setting channel info structure...
Creating raw.info structure...
Reading events from ../sub-1/eeg/sub-1_task-fhbc_events.tsv.
Reading channel info from ../sub-1/eeg/sub-1_task-fhbc_channels.tsv.
Reading electrode coords from ../sub-1/eeg/sub-1_space-CapTrak_electrodes.tsv.
Not fully anonymizing info - keeping his_id, sex, and hand info
Reading 0 ... 280575  =      0.000 ...   273.999 secs...
Setting channel interpolation method to {'eeg': 'spline'}.
Interpolating bad channels.
    Automatic origin fit: head of radius 95.0 mm
Computing interpolation matrix from 127 sensor positions
Interpolating 1 sensors
EEG channel type selected for re-referencing
Applying average reference.
Applying a custom ('EEG',) reference.
Fitting ICA to data using 128 channels (please be patient, this may take a while)
Selecting by number: 20 components
Fitting ICA took 19.9s.
Applyin

In [None]:
from pathlib import Path

Path("../quick_clean_completed").touch()

Once completed, you can explore the derivatives folder to view your cleaned data