### Tutorial Dataset

For this tutorial, we will be using a subset of a pubicly available dataset, ds000030, from [openneuro.org](https://openneuro.org/datasets/ds000030). The dataset is structured according to the Brain Imaging Data Structure (BIDS). BIDS is a simple and intuitive way to organize and describe your neuroimaging and behavioural data. Neuroimaging experiments result in complicated data that can be arranged in several different ways. BIDS tackles this problem by suggesting a new standard (based on consensus from multiple researchers across the world) for the arrangement of neuroimaging datasets.

Using the same structure for all of your studies will allow you to easily reuse all of your scripts between studies. Additionally, sharing code with other researchers will be much easier.

Let's take a look at the `participants.tsv` file to see what the demographics for this dataset look like.

In [None]:
import pandas as pd

In [None]:
participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata

<b>EXERCISE:</b> Which diagnosis-related groups make up the dataset?

In [None]:
participant_metadata.diagnosis.unique()

For this tutorial, we're just going to work with participants that are either CONTROL or SCHZ (`diagnosis`) and have both a T1w (`T1w == 1`) and rest (`rest == 1`) scan. Also, you'll notice some of the T1w scans included in this dataset have a ghosting artifact. We'll need to filter these out as well (`ghost_NoGhost == 'No_ghost'`).

<b>EXERCISE:</b> Filter <code>participant_metadata</code> so that only the above conditions are present.

In [None]:
participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) & 
                                            (participant_metadata.T1w == 1) & 
                                            (participant_metadata.rest == 1) & 
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata

To ease the analysis and quicken the amount of time required to download the data, we're just going to use scans from 10 randomly sampled CONTROL and 10 SCHZ participants.

In [None]:
diagnosis_groups = participant_metadata.groupby('diagnosis')
filtered_participant_metadata = diagnosis_groups.apply(lambda x: x.sample(n = 10))
filtered_participant_metadata

In [None]:
participant_list = filtered_participant_metadata.participant_id.tolist()
participant_list

### Downloading Data

We've already randomly sampled 10 CONTROL and 10 SCHZ participants and placed the participant list in the `../download_list` text file. Let's download that data now.

In [2]:
!echo $(pwd)

/home/jerry/projects/workshops/scwg2018_python_neuroimaging/bin


In [12]:
!aws s3 ls --no-sign-request s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/

                           PRE sub-10159/
                           PRE sub-10171/
                           PRE sub-10189/
                           PRE sub-10193/
                           PRE sub-10206/
                           PRE sub-10217/
                           PRE sub-10225/
                           PRE sub-10227/
                           PRE sub-10228/
                           PRE sub-10235/
                           PRE sub-10249/
                           PRE sub-10269/
                           PRE sub-10271/
                           PRE sub-10273/
                           PRE sub-10274/
                           PRE sub-10280/
                           PRE sub-10290/
                           PRE sub-10292/
                           PRE sub-10304/
                           PRE sub-10316/
                           PRE sub-10321/
                           PRE sub-10325/
                           PRE sub-10329/
           

2017-11-18 04:59:43   25873470 sub-10707.html
2017-11-18 04:59:44   21858390 sub-10708.html
2017-11-18 04:59:46   21792550 sub-10719.html
2017-11-18 04:59:47   22127385 sub-10724.html
2017-11-18 04:59:49   23482594 sub-10746.html
2017-11-18 04:59:51   21321039 sub-10762.html
2017-11-18 04:59:52   20522547 sub-10779.html
2017-11-18 04:59:54   21092596 sub-10785.html
2017-11-18 04:59:56   20722256 sub-10788.html
2017-11-18 04:59:57   23838092 sub-10844.html
2017-11-18 04:59:59   21921289 sub-10855.html
2017-11-18 05:00:01   22978322 sub-10871.html
2017-11-18 05:00:03   22821219 sub-10877.html
2017-11-18 05:00:05   21855931 sub-10882.html
2017-11-18 05:00:07   22277511 sub-10891.html
2017-11-18 05:00:08   21128372 sub-10893.html
2017-11-18 05:00:10   20581584 sub-10912.html
2017-11-18 05:00:12   20633148 sub-10934.html
2017-11-18 05:00:14   23469074 sub-10940.html
2017-11-18 05:00:16   10982360 sub-10948.html
2017-11-18 05:00:17   21967029 sub-10949.html
2017-11-18 05

2017-11-18 05:04:38   22115476 sub-70083.html
2017-11-18 05:04:39   22018888 sub-70086.html


In [16]:
# download T1w scans
!cat ../download_list | \
  xargs -I '{}' aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/anat \
  ../data/ds000030/{}/anat
    
!cat ../download_list | \
    xargs -I '{}' aws s3 sync --no-sign-request \
    s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/{} \
    ../data/ds000030/fmriprep/{}

# # download resting state fMRI scans
# !cat ../download_list | \
#   xargs -I '{}' aws s3 sync --no-sign-request \
#   s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/func \
#   ../data/ds000030/{}/func \
#   --exclude '*' \
#   --include '*task-rest_bold*'

download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-10788/func/sub-10788_task-rest_bold_space-T1w_preproc.nii.gz to ../data/ds000030/fmriprep/sub-10788/func/sub-10788_task-rest_bold_space-T1w_preproc.nii.gz [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50052/func/sub-50052_task-pamret_bold_space-T1w_preproc.nii.gz to ../data/ds000030/fmriprep/sub-50052/func/sub-50052_task-pamret_bold_space-T1w_preproc.nii.gz [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50052/func/sub-50052_task-rest_bold_space-T1w_preproc.nii.gz to ../data/ds000030/fmriprep/sub-50052/func/sub-50052_task-rest_bold_space-T1w_preproc.nii.gz [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50052/func/sub-50052_task-pamret_bold_space-MN

download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50075/func/sub-50075_task-scap_bold_space-T1w_preproc.nii.gz to ../data/ds000030/fmriprep/sub-50075/func/sub-50075_task-scap_bold_space-T1w_preproc.nii.gz [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50075/func/sub-50075_task-stopsignal_bold_space-T1w_preproc.nii.gz to ../data/ds000030/fmriprep/sub-50075/func/sub-50075_task-stopsignal_bold_space-T1w_preproc.nii.gz [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50075/func/sub-50075_task-stopsignal_bold_space-MNI152NLin2009cAsym_preproc.nii.gz to ../data/ds000030/fmriprep/sub-50075/func/sub-50075_task-stopsignal_bold_space-MNI152NLin2009cAsym_preproc.nii.gz [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmripre

download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50010/func/sub-50010_task-stopsignal_bold_confounds.tsv to ../data/ds000030/fmriprep/sub-50010/func/sub-50010_task-stopsignal_bold_confounds.tsv [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50010/func/sub-50010_task-scap_bold_space-T1w_preproc.nii.gz to ../data/ds000030/fmriprep/sub-50010/func/sub-50010_task-scap_bold_space-T1w_preproc.nii.gz [Errno 28] No space left on device
download failed: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50010/func/sub-50010_task-stopsignal_bold_space-MNI152NLin2009cAsym_preproc.nii.gz to ../data/ds000030/fmriprep/sub-50010/func/sub-50010_task-stopsignal_bold_space-MNI152NLin2009cAsym_preproc.nii.gz [Errno 28] No space left on device
download: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/derivatives/fmriprep/sub-50010/func/sub-50010_ta

### Querying a BIDS Dataset

There are multiple ways to traverse a BIDS dataset. 

In [None]:
!tree ../data/ds000030

[pybids](https://bids-standard.github.io/pybids/) is a Python API for querying, summarizing and manipulating the BIDS folder structure.

In [None]:
import bids.layout

In [None]:
layout = bids.layout.BIDSLayout('../data/ds000030')

The pybids layout object lets you query your BIDS dataset according to a number of parameters by using a `get_*()` method.  
We can get a list of the subjects we've downloaded from the dataset.

In [None]:
layout.get_subjects()

We can list the modalities in the dataset.

In [None]:
layout.get_modalities()

We can get the fmri tasks.

In [None]:
layout.get_tasks()

Or even all of the data types.

In [None]:
layout.get_types()

We can be more specific. List the data types for the 'func' modality.

In [None]:
layout.get_types(modality='func')

What if we wanted to get all of our fMRI NIfTI files.

In [None]:
layout.get(task='rest', type='bold', extensions='nii.gz', return_type='file')

We can have it all.

Use the `get_metadata()` method to pull metadata from the JSON sidecar.

In [None]:
layout.get_metadata('../data/ds000030/sub-10788/func/sub-10788_task-rest_bold.nii.gz')

Convert `layout` to a data frame.

In [None]:
df = layout.as_data_frame()
df