### Tutorial Dataset

For this tutorial, we will be using a subset of a pubicly available dataset, ds000030, from [openneuro.org](https://openneuro.org/datasets/ds000030). The dataset is structured according to the Brain Imaging Data Structure (BIDS). BIDS is a simple and intuitive way to organize and describe your neuroimaging and behavioural data. Neuroimaging experiments result in complicated data that can be arranged in several different ways. BIDS tackles this problem by suggesting a new standard (based on consensus from multiple researchers across the world) for the arrangement of neuroimaging datasets.

Using the same structure for all of your studies will allow you to easily reuse all of your scripts between studies. Additionally, sharing code with other researchers will be much easier.

Let's take a look at the `participants.tsv` file to see what the demographics for this dataset look like.

In [5]:
import pandas as pd

In [6]:
participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata

Unnamed: 0,participant_id,diagnosis,age,gender,bart,bht,dwi,pamenc,pamret,rest,scap,stopsignal,T1w,taskswitch,ScannerSerialNumber,ghost_NoGhost
0,sub-10159,CONTROL,30,F,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
1,sub-10171,CONTROL,24,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
2,sub-10189,CONTROL,49,M,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
3,sub-10193,CONTROL,40,M,1.0,,1.0,,,,,,1.0,,35343.0,No_ghost
4,sub-10206,CONTROL,21,M,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
5,sub-10217,CONTROL,33,F,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
6,sub-10225,CONTROL,35,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
7,sub-10227,CONTROL,31,F,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
8,sub-10228,CONTROL,40,F,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
9,sub-10235,CONTROL,22,M,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost


**EXERCISE**: Which diagnosis-related groups make up the dataset?

In [7]:
participant_metadata.diagnosis.unique()

array(['CONTROL', 'SCHZ', 'BIPOLAR', 'ADHD'], dtype=object)

For this tutorial, we're just going to work with participants that are either CONTROL or SCHZ (`diagnosis`) and have both a T1w (`T1w == 1`) and rest (`rest == 1`) scan. Also, you'll notice some of the T1w scans included in this dataset have a ghosting artifact. We'll filter these out as well (`ghost_NoGhost == 'No_ghost'`).

In [8]:
participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) & 
                                            (participant_metadata.T1w == 1) & 
                                            (participant_metadata.rest == 1) & 
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata

Unnamed: 0,participant_id,diagnosis,age,gender,bart,bht,dwi,pamenc,pamret,rest,scap,stopsignal,T1w,taskswitch,ScannerSerialNumber,ghost_NoGhost
0,sub-10159,CONTROL,30,F,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
1,sub-10171,CONTROL,24,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
2,sub-10189,CONTROL,49,M,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
4,sub-10206,CONTROL,21,M,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
5,sub-10217,CONTROL,33,F,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
6,sub-10225,CONTROL,35,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
7,sub-10227,CONTROL,31,F,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
8,sub-10228,CONTROL,40,F,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
9,sub-10235,CONTROL,22,M,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
10,sub-10249,CONTROL,28,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost


To ease the analysis and quicken the amount of time required to download the data, we're just going to use scans from 10 randomly sampled CONTROL and 10 SCHZ participants.

In [9]:
diagnosis_groups = participant_metadata.groupby('diagnosis')
filtered_participant_metadata = diagnosis_groups.apply(lambda x: x.sample(n = 10))
filtered_participant_metadata

Unnamed: 0_level_0,Unnamed: 1_level_0,participant_id,diagnosis,age,gender,bart,bht,dwi,pamenc,pamret,rest,scap,stopsignal,T1w,taskswitch,ScannerSerialNumber,ghost_NoGhost
diagnosis,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
CONTROL,110,sub-11068,CONTROL,39,M,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,35426.0,No_ghost
CONTROL,96,sub-10975,CONTROL,32,M,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
CONTROL,74,sub-10724,CONTROL,22,F,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
CONTROL,38,sub-10448,CONTROL,37,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
CONTROL,15,sub-10280,CONTROL,27,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
CONTROL,21,sub-10321,CONTROL,34,M,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,,35343.0,No_ghost
CONTROL,14,sub-10274,CONTROL,43,M,1.0,,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
CONTROL,35,sub-10429,CONTROL,36,F,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
CONTROL,13,sub-10273,CONTROL,30,F,1.0,1.0,1.0,,,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost
CONTROL,89,sub-10940,CONTROL,25,M,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,35343.0,No_ghost


In [10]:
participant_list = filtered_participant_metadata.participant_id.tolist()
participant_list

['sub-11068',
 'sub-10975',
 'sub-10724',
 'sub-10448',
 'sub-10280',
 'sub-10321',
 'sub-10274',
 'sub-10429',
 'sub-10273',
 'sub-10940',
 'sub-50061',
 'sub-50083',
 'sub-50077',
 'sub-50023',
 'sub-50052',
 'sub-50067',
 'sub-50076',
 'sub-50047',
 'sub-50048',
 'sub-50073']

### Downloading Data

We've already randomly sampled 10 CONTROL and 10 SCHZ participants and placed the participant list in the `../download_list` text file. Let's download that data now.

In [11]:
# download T1w scans
!cat ../download_list | \
  xargs -I '{}' aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/anat \
  ../data/ds000030/{}/anat

# download resting state fMRI scans
!cat ../download_list | \
  xargs -I '{}' aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/func \
  ../data/ds000030/{}/func \
  --exclude '*' \
  --include '*task-rest_bold*'

### Querying a BIDS Dataset

There are multiple ways to traverse a BIDS dataset. 

In [12]:
!tree ../data/ds000030

[01;34m../data/ds000030[00m
├── CHANGES
├── dataset_description.json
├── [01;34mderivatives[00m
│   └── [01;34mfmriprep[00m
│       ├── [01;34msub-10171[00m
│       │   ├── [01;34manat[00m
│       │   │   ├── [01;31msub-10171_T1w_brainmask.nii.gz[00m
│       │   │   ├── [01;31msub-10171_T1w_dtissue.nii.gz[00m
│       │   │   ├── sub-10171_T1w_inflated.L.surf.gii
│       │   │   ├── sub-10171_T1w_inflated.R.surf.gii
│       │   │   ├── sub-10171_T1w_midthickness.L.surf.gii
│       │   │   ├── sub-10171_T1w_midthickness.R.surf.gii
│       │   │   ├── sub-10171_T1w_pial.L.surf.gii
│       │   │   ├── sub-10171_T1w_pial.R.surf.gii
│       │   │   ├── [01;31msub-10171_T1w_preproc.nii.gz[00m
│       │   │   ├── sub-10171_T1w_smoothwm.L.surf.gii
│       │   │   ├── sub-10171_T1w_smoothwm.R.surf.gii
│       │   │   ├── [01;31msub-10171_T1w_space-MNI152NLin2009cAsym_brainmask.nii.gz[00m
│       │   │   ├── [01;31msub-10171_T1w_space-MNI152NLin2009cAsym_class-CSF_probtissue.ni

│       │   │   ├── sub-11106_T1w_midthickness.L.surf.gii
│       │   │   ├── sub-11106_T1w_midthickness.R.surf.gii
│       │   │   ├── sub-11106_T1w_pial.L.surf.gii
│       │   │   ├── sub-11106_T1w_pial.R.surf.gii
│       │   │   ├── [01;31msub-11106_T1w_preproc.nii.gz[00m
│       │   │   ├── sub-11106_T1w_smoothwm.L.surf.gii
│       │   │   ├── sub-11106_T1w_smoothwm.R.surf.gii
│       │   │   ├── [01;31msub-11106_T1w_space-MNI152NLin2009cAsym_brainmask.nii.gz[00m
│       │   │   ├── [01;31msub-11106_T1w_space-MNI152NLin2009cAsym_class-CSF_probtissue.nii.gz[00m
│       │   │   ├── [01;31msub-11106_T1w_space-MNI152NLin2009cAsym_class-GM_probtissue.nii.gz[00m
│       │   │   ├── [01;31msub-11106_T1w_space-MNI152NLin2009cAsym_class-WM_probtissue.nii.gz[00m
│       │   │   ├── [01;31msub-11106_T1w_space-MNI152NLin2009cAsym_preproc.nii.gz[00m
│       │   │   └── sub-11106_T1w_space-MNI152NLin2009cAsym_warp.h5
│       │   └── [01;34mfunc[00m
│       │       ├── sub-11106_tas

│       │   ├── [01;34manat[00m
│       │   │   ├── [01;31msub-50052_T1w_brainmask.nii.gz[00m
│       │   │   ├── [01;31msub-50052_T1w_dtissue.nii.gz[00m
│       │   │   ├── sub-50052_T1w_inflated.L.surf.gii
│       │   │   ├── sub-50052_T1w_inflated.R.surf.gii
│       │   │   ├── sub-50052_T1w_midthickness.L.surf.gii
│       │   │   ├── sub-50052_T1w_midthickness.R.surf.gii
│       │   │   ├── sub-50052_T1w_pial.L.surf.gii
│       │   │   ├── sub-50052_T1w_pial.R.surf.gii
│       │   │   ├── [01;31msub-50052_T1w_preproc.nii.gz[00m
│       │   │   ├── sub-50052_T1w_smoothwm.L.surf.gii
│       │   │   ├── sub-50052_T1w_smoothwm.R.surf.gii
│       │   │   ├── [01;31msub-50052_T1w_space-MNI152NLin2009cAsym_brainmask.nii.gz[00m
│       │   │   ├── [01;31msub-50052_T1w_space-MNI152NLin2009cAsym_class-CSF_probtissue.nii.gz[00m
│       │   │   ├── [01;31msub-50052_T1w_space-MNI152NLin2009cAsym_class-GM_probtissue.nii.gz[00m
│       │   │   ├── [01;31msub-50052_T1w_space-MNI152N

│   └── [01;34mfunc[00m
│       ├── sub-11106_task-rest_bold.json
│       └── [01;31msub-11106_task-rest_bold.nii.gz[00m
├── [01;34msub-11108[00m
│   ├── [01;34manat[00m
│   │   ├── sub-11108_T1w.json
│   │   └── [01;31msub-11108_T1w.nii.gz[00m
│   └── [01;34mfunc[00m
│       ├── sub-11108_task-rest_bold.json
│       └── [01;31msub-11108_task-rest_bold.nii.gz[00m
├── [01;34msub-11122[00m
│   ├── [01;34manat[00m
│   │   ├── sub-11122_T1w.json
│   │   └── [01;31msub-11122_T1w.nii.gz[00m
│   └── [01;34mfunc[00m
│       ├── sub-11122_task-rest_bold.json
│       └── [01;31msub-11122_task-rest_bold.nii.gz[00m
├── [01;34msub-11131[00m
│   ├── [01;34manat[00m
│   │   ├── sub-11131_T1w.json
│   │   └── [01;31msub-11131_T1w.nii.gz[00m
│   └── [01;34mfunc[00m
│       ├── sub-11131_task-rest_bold.json
│       └── [01;31msub-11131_task-rest_bold.nii.gz[00m
├── [01;34msub-50010[00m
│   ├── [01;34manat[00m
│   │   ├── sub-50010_T1w.json

[pybids](https://bids-standard.github.io/pybids/) is a Python API for querying the BIDS folder structure for specific files and metadata.

In [13]:
import bids.layout

Failed to import duecredit due to No module named 'duecredit'


In [14]:
layout = bids.layout.BIDSLayout('../data/ds000030')

You can list the subject labels in the dataset.

In [15]:
layout.get_subjects()

['10171',
 '10292',
 '10365',
 '10438',
 '10565',
 '10788',
 '11106',
 '11108',
 '11122',
 '11131',
 '50010',
 '50035',
 '50047',
 '50048',
 '50052',
 '50067',
 '50075',
 '50077',
 '50081',
 '50083']

You can list the modalities in the dataset.

In [16]:
layout.get_modalities()

['anat', 'func']

You can list the data types in the dataset.

In [17]:
layout.get_types()

['bold',
 'brainmask',
 'confounds',
 'description',
 'dtissue',
 'fsaverage5',
 'inflated',
 'midthickness',
 'participants',
 'pial',
 'preproc',
 'probtissue',
 'smoothwm',
 'T1w',
 'warp']

We can be more specific. List the data types for the 'func' modality.

In [18]:
layout.get_types(modality='func')

['bold', 'brainmask', 'confounds', 'fsaverage5', 'preproc']

You can have it all.

Create dataframe of scan parameters.

In [19]:
df = layout.as_data_frame()
df

In [21]:
# Get session variables as a dataframe and merge them back in with the layout
ses_df =  layout.get_collections(level='subject', merge=True, variables=['type','datatype','task']).to_df()
# The query function here limits results to only files related to a resting state task 
df.merge(ses_df,how='left', on=['session','subject','run','datatype','task']).query('task=="rest"').head()

IndexError: list index out of range