### Tutorial Dataset

For this tutorial, we will be using a subset of a pubicly available dataset, ds000030, from [openneuro.org](https://openneuro.org/datasets/ds000030). The dataset is structured according to the Brain Imaging Data Structure (BIDS). BIDS is a simple and intuitive way to organize and describe your neuroimaging and behavioural data. Neuroimaging experiments result in complicated data that can be arranged in many different ways. BIDS tackles this problem by suggesting a new standard (based on consensus from multiple researchers across the world) for the arrangement of neuroimaging datasets.

Using the same structure for all of your studies will allow you to easily reuse all of your scripts between studies. But additionally, it also has the advantage of sharing code with and using scripts from other researchers that will be much easier.

Let's take a look at the `participants.tsv` file to see what the demographics for this dataset look like.

In [None]:
import pandas as pd

In [None]:
participant_metadata = pd.read_csv('../data/ds000030/participants.tsv', sep='\t')
participant_metadata

Which diagnosis-related groups make up the dataset?

In [None]:
participant_metadata.diagnosis.unique()

For this tutorial, we're just going to work with participants that are either CONTROL or SCHZ (`diagnosis`) and have both a T1w (`T1w == 1`) and rest (`rest == 1`) scan. Also, you'll notice some of the T1w scans included in this dataset have a ghosting artifact. We'll filter these out as well (`ghost_NoGhost == 'No_ghost'`).

In [None]:
participant_metadata = participant_metadata[(participant_metadata.diagnosis.isin(['CONTROL', 'SCHZ'])) & 
                                            (participant_metadata.T1w == 1) & 
                                            (participant_metadata.rest == 1) & 
                                            (participant_metadata.ghost_NoGhost == 'No_ghost')]
participant_metadata

In [None]:
participant_list = participant_metadata.participant_id.tolist()

add pandas exercises!

### Downloading Data

We've randomly sampled 10 CONTROL and 10 SCHZ participants in the `../download_list` text file. Let's download that data now.

In [None]:
# download T1w scans
!cat ../download_list | \
  xargs -I '{}' aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/anat \
  ../data/ds000030/{}/anat

# download resting state fMRI scans
!cat ../download_list | \
  xargs -I '{}' aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/{}/func \
  ../data/ds000030/{}/func \
  --exclude '*' \
  --include '*task-rest_bold*'

### Querying a BIDS Dataset

There are multiple ways to traverse a BIDS dataset. 

In [None]:
!tree ../data/ds000030

`pybids` is a Python API of querying the BIDS folder structure for specific files and metadata.

In [None]:
import bids.layout  # pip install pybids

In [None]:
layout = bids.layout.BIDSLayout('../data/ds000030')

List the subject labels in the dataset.

In [None]:
layout.get_subjects()

List the modalities in the dataset.

In [None]:
layout.get_modalities()

List the data types in the dataset.

In [None]:
layout.get_types()

We can be more specific. List the data types for the 'func' modality.

In [None]:
layout.get_types(modality='func')

Create dataframe of scan parameters.

In [None]:
df = layout.as_data_frame()

In [None]:
df

In [None]:
# Get session variables as a dataframe and merge them back in with the layout
ses_df =  layout.get_collections(level='subject', merge=True, variables=['type','datatype','task']).to_df()
# The query function here limits results to only files related to a resting state task 
df.merge(ses_df,how='left', on=['session','subject','run','datatype','task']).query('task=="rest"').head()

In [None]:
layout.get_collections(level='dataset').to_df()

In [None]:
?layout.get_collections