In [1]:
from IPython.core.display import HTML
import random

def css_styling():
    styles = open("../static/css/custom.css", "r").read()
    return HTML(styles)
css_styling()

### Tutorial Dataset

For this tutorial, we will be using a subset of a pubicly available dataset, ds000030, from [openneuro.org](https://openneuro.org/datasets/ds000030). The dataset is structured according to the Brain Imaging Data Structure (BIDS). BIDS is a simple and intuitive way to organize and describe your neuroimaging and behavioural data. Neuroimaging experiments result in complicated data that can be arranged in several different ways. BIDS tackles this problem by suggesting a new standard (based on consensus from multiple researchers across the world) for the arrangement of neuroimaging datasets.

Using the same structure for all of your studies will allow you to easily reuse all of your scripts between studies. Additionally, sharing code with other researchers will be much easier.

Let's take a look at the `participants.tsv` file to see what the demographics for this dataset look like.

In [None]:
import pandas as pd

In [None]:
participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata

<div class=exercise>
    <b>EXERCISE:</b> Which diagnosis-related groups make up the dataset?
</div>

<div class=solution>
    <b>SOLUTION:</b>
</div>

In [None]:
participant_metadata.diagnosis.unique()

For this tutorial, we're just going to work with participants that are either CONTROL or SCHZ (`diagnosis`) and have both a T1w (`T1w == 1`) and rest (`rest == 1`) scan. Also, you'll notice some of the T1w scans included in this dataset have a ghosting artifact. These should be filtered out as well (`ghost_NoGhost == 'No_ghost'`).

<div class=exercise>
    <b>EXERCISE:</b> Filter <code>participant_metadata</code> so that the dataframe only contains the criteria mentioned above.
</div>

<div class=solution>
    <b>SOLUTION:</b>
</div>

In [None]:
participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) & 
                                            (participant_metadata.T1w == 1) & 
                                            (participant_metadata.rest == 1) & 
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata

To ease the analysis and quicken the amount of time required to download the data, we're just going to use scans from 10 randomly sampled CONTROL and 10 SCHZ participants.

In [None]:
diagnosis_groups = participant_metadata.groupby('diagnosis')
filtered_participant_metadata = diagnosis_groups.apply(lambda x: x.sample(n = 10))
filtered_participant_metadata

In [None]:
participant_list = filtered_participant_metadata.participant_id.tolist()
participant_list

### Downloading Data

We've already randomly sampled 10 CONTROL and 10 SCHZ participants and placed the participant list in the `../download_list` text file. Let's download that data now.

In [None]:
# download T1w scans
!cat ../download_list | \
  xargs -I '{}' aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/anat \
  ../data/ds000030/{}/anat

# download resting state fMRI scans
!cat ../download_list | \
  xargs -I '{}' aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/func \
  ../data/ds000030/{}/func \
  --exclude '*' \
  --include '*task-rest_bold*'

### Querying a BIDS Dataset

There are multiple ways to traverse a BIDS dataset. 

In [None]:
!tree ../data/ds000030

[pybids](https://bids-standard.github.io/pybids/) is a Python API for querying, summarizing and manipulating the BIDS folder structure.

In [None]:
import bids.layout

In [None]:
layout = bids.layout.BIDSLayout('../data/ds000030')

The pybids layout object lets you query your BIDS dataset according to a number of parameters by using a `get_*()` method.  
We can get a list of the subjects we've downloaded from the dataset.

In [None]:
layout.get_subjects()

We can list the modalities in the dataset.

In [None]:
layout.get_modalities()

We can get the fmri tasks.

In [None]:
layout.get_tasks()

Or even all of the data types.

In [None]:
layout.get_types()

We can be more specific. List the data types for the 'func' modality.

In [None]:
layout.get_types(modality='func')

What if we wanted to get all of our fMRI NIfTI files.

In [None]:
layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file')

We can have it all.

Use the `get_metadata()` method to pull metadata from the JSON sidecar.

In [None]:
layout.get_metadata('../data/ds000030/sub-10788/func/sub-10788_task-rest_bold.nii.gz')

Convert `layout` to a data frame.

In [None]:
df = layout.as_data_frame()
df