# Introduction to `pybids`

[`pybids`](https://github.com/INCF/pybids) is a tool to query, summarize and manipulate data using the BIDS standard. 
In this tutorial we will use a `pybids` test dataset to illustrate some of the functionality of `pybids.grabbids`

In [1]:
import bids.layout
import bids.tests
import os

Failed to import duecredit due to No module named 'duecredit'
  return f(*args, **kwds)


## `BIDSLayout`

One of the most fundamental tools offered by pybids is `BIDSLayout`. `BIDSLayout` is a lightweight class to represent a BIDS project file tree.

In [2]:
# Initialise a BIDSLayout of an example dataset
layout = bids.layout.BIDSLayout(os.path.join(bids.tests.get_test_data_path(), '7t_trt'))
layout

BIDS Layout: .../pybids/bids/tests/data/7t_trt | Subjects: 10 | Sessions: 20 | Runs: 20

### Using `get()`
a `BIDSLayout` object can be queried with the class method [`get()`](https://incf.github.io/pybids/generated/bids.grabbids.BIDSLayout.html#bids.grabbids.BIDSLayout.get). The `BIDSLayout` object contains `File` objects. We can see the whole list of these by calling `get()` with no arguments

In [3]:
# The file objects returned are tuples of key-value pairs
layout.get()[10]

File(filename='/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-01/ses-1/fmap/sub-01_ses-1_run-2_phasediff.json', subject='01', session='1', run='2', type='phasediff', modality='fmap')

`get()` returns (by default) a key-value pair tuple view of the matched file objects. We can match on these key value pairs.

In [4]:
# We query for any files with the type 'T1w' 
layout.get(type='T1w', subject='01')

[File(filename='/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz', subject='01', session='1', type='T1w', modality='anat')]

[`get()`](https://incf.github.io/pybids/generated/bids.grabbids.BIDSLayout.html#bids.grabbids.BIDSLayout.get) has been returning a "tuple" representation of the `File` objects, but we can specify the return type using optional argument `return_type`

In [5]:
# Ask get() to return the matching file objects
layout.get(type='T1w', return_type='obj')
# You can convert an obj to a tuple by using the .as_named_tuple() method

[<grabbit.core.File at 0x7fd8ddc81a58>,
 <grabbit.core.File at 0x7fd8ddc92748>,
 <grabbit.core.File at 0x7fd8ddc232e8>,
 <grabbit.core.File at 0x7fd8ddc29f98>,
 <grabbit.core.File at 0x7fd8ddc38b00>,
 <grabbit.core.File at 0x7fd8ddc4b668>,
 <grabbit.core.File at 0x7fd8ddc5a2e8>,
 <grabbit.core.File at 0x7fd8ddbe1d68>,
 <grabbit.core.File at 0x7fd8ddbf2b38>,
 <grabbit.core.File at 0x7fd8ddc04780>]

In [6]:
# Ask get() to return the filenames of the matching files
layout.get(type='T1w', return_type='file')

['/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-02/ses-1/anat/sub-02_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-03/ses-1/anat/sub-03_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-04/ses-1/anat/sub-04_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-05/ses-1/anat/sub-05_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-06/ses-1/anat/sub-06_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-07/ses-1/anat/sub-07_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-08/ses-1/anat/sub-08_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-09/ses-1/anat/sub-09_ses-1_T1w.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-10/ses-1/anat/sub-10_ses-1_T1w.nii.gz']

We can also ask get to return other key/value data from the `File` objects using the argument `target`

In [7]:
# Ask get() to return the ids of subjects that have T1w files
layout.get(type='T1w', return_type='id', target='subject')

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']

In [8]:
# See all modality labels in this dataset
layout.get(return_type='id', target='modality')

['anat', 'fmap', 'func']

And if our `target` is a key that corresponds to a particular directory in the BIDS spec (e.g subject or session) we can ask get to return the `target` subdirectory for each matching file.

In [9]:
# We can feed get more complicated queries
layout.get(type=['T1w', 'T1map'], subject='01')

[File(filename='/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-01/ses-1/anat/sub-01_ses-1_T1map.nii.gz', subject='01', session='1', type='T1map', modality='anat'),
 File(filename='/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz', subject='01', session='1', type='T1w', modality='anat')]

In [10]:
# See all the type values for json files in this dataset
layout.get(return_type='id', target='type', extensions='.json')

['bold', 'description', 'phasediff', 'physio']

### What can we do with `File` objects?

In [11]:
# We can convert File objects into a named tuple 
f = layout.get(return_type='obj', subject='01', session='1', run='1', type='phasediff', modality='fmap')[1]
f.as_named_tuple()

File(filename='/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-01/ses-1/fmap/sub-01_ses-1_run-1_phasediff.json', subject='01', session='1', run='1', type='phasediff', modality='fmap')

In [12]:
# get_metadata reads the associated json file
layout.get_metadata(f.path)

{'EchoTime2': 0.00702,
 'EchoTime1': 0.006,
 'IntendedFor': 'ses-1/func/sub-01_ses-1_task-rest_acq-fullbrain_run-1_bold.nii.gz'}

In [13]:
# Get file object as a dictionary
dict(f.as_named_tuple()._asdict())

{'filename': '/home/ltirrell/projects/pybids/bids/tests/data/7t_trt/sub-01/ses-1/fmap/sub-01_ses-1_run-1_phasediff.json',
 'subject': '01',
 'session': '1',
 'run': '1',
 'type': 'phasediff',
 'modality': 'fmap'}

In [14]:
# We can return file key value pairs as dictionaries
f.entities

{'subject': '01',
 'session': '1',
 'run': '1',
 'type': 'phasediff',
 'modality': 'fmap'}

In [15]:
# Layout can parse filenames to create this same dictionary of entities
layout.parse_file_entities(f.path)

{'subject': '01',
 'session': '1',
 'run': '1',
 'type': 'phasediff',
 'modality': 'fmap'}

In [16]:
# We can even make up a filename and ask layout to parse it 
layout.parse_file_entities('/home/isla/anaconda3/lib/python3.6/site-packages/bids/tests/data/7t_trt/sub-02/ses-1/sub-02_ses-1_T2w.nii.gz')

{'subject': '02', 'session': '1', 'type': 'T2w'}

## Build New Paths

You may want to create valid BIDS filenames for files that are new or hypothetical that would sit within your BIDS project. 

In [17]:
# You need to define a pattern for filenames. This is a string object with 
# replaceable keys in curly brackets
pattern = "sub-{subject}[/ses-{session}]/{modality}/sub-{subject}[_ses-{session}][_acq-{acquisition}]_{type}.nii.gz"
# And you need to make a dictionary of entities. These are the key-value pairs that will 
# define how to replace the {key}s in the pattern
entities = {'subject': '01', 'session': '1', 'modality': 'anat', 'type': 'T1w' }

In [18]:
# You can pass patterns directly to build_path
layout.build_path(entities, path_patterns=[pattern])

'sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz'

In [19]:
# Or you can define the default patterns for layout to use
layout.path_patterns = [pattern]
layout.build_path(entities)

'sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz'

### Loading subdomains

You can declare your `derivatives` subfolder when you initialise your `BIDSLayout`. This will endow it with the extra structure specified in the [derivatives config file](https://github.com/INCF/pybids/blob/master/bids/grabbids/config/derivatives.json).

In [20]:
# Define paths to root and derivatives folders
root = os.path.join(bids.tests.get_test_data_path(), 'synthetic')
deriv = os.path.join(root, 'derivatives')
# import root as a 'bids' domain, and derivatives as a 'bids' and 'derivatives' domain
layout2 = bids.layout.BIDSLayout([(root, 'bids'), (deriv, ['bids', 'derivatives'])])



In [21]:
# Get all files in derivatives
layout2.get(domains='derivatives', return_type='file')

['/home/ltirrell/projects/pybids/bids/tests/data/synthetic/derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_bold_space-MNI152NLin2009cAsym_brainmask.nii',
 '/home/ltirrell/projects/pybids/bids/tests/data/synthetic/derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_bold_space-MNI152NLin2009cAsym_brainmask.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/synthetic/derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii',
 '/home/ltirrell/projects/pybids/bids/tests/data/synthetic/derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz',
 '/home/ltirrell/projects/pybids/bids/tests/data/synthetic/derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_bold_space-T1w_brainmask.nii',
 '/home/ltirrell/projects/pybids/bids/tests/data/synthetic/derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_

### `Dataframe` option
the `BIDSLayout` class has built in support for pandas `DataFrames`

In [22]:
# Convert the layout to a pandas dataframe
df = layout.as_data_frame()
df.head()

Unnamed: 0,path,acq,acquisition,bval,fmap,modality,run,scans,session,subject,task,type
0,/home/ltirrell/projects/pybids/bids/tests/data...,prefrontal,prefrontal,,,,,,,,rest,physio
1,/home/ltirrell/projects/pybids/bids/tests/data...,,,,,,,,,,,description
2,/home/ltirrell/projects/pybids/bids/tests/data...,,,,,,,,,10.0,,sessions
3,/home/ltirrell/projects/pybids/bids/tests/data...,,,,magnitude2,fmap,1.0,,2.0,10.0,,magnitude2
4,/home/ltirrell/projects/pybids/bids/tests/data...,,,,phasediff,fmap,2.0,,2.0,10.0,,phasediff


## Retrieving BIDS variables 
BIDS variables are stored in .tsv files at the run, session, subject, or dataset level. You can retrieve these variables with `layout.get_collections()`. The resulting objects can be converted to dataframes and merged with the layout to associate the variables with corresponding scans.

In [23]:
# Get subject variables as a dataframe and merge them back in with the layout
subj_df = layout.get_collections(level='subject', merge=True, variables=['thirst','vigilance','words']).to_df()
# The query function here limits results to only files that have a modality defined
df.merge(subj_df, how='left', on=['session','subject']).query('modality.notnull()', engine='python').head()

Unnamed: 0,path,acq,acquisition,bval,fmap,modality,run,scans,session,subject,task,type,thirst,vigilance,words
3,/home/ltirrell/projects/pybids/bids/tests/data...,,,,magnitude2,fmap,1,,2,10,,magnitude2,4.0,2.0,20.0
4,/home/ltirrell/projects/pybids/bids/tests/data...,,,,phasediff,fmap,2,,2,10,,phasediff,4.0,2.0,20.0
5,/home/ltirrell/projects/pybids/bids/tests/data...,,,,magnitude1,fmap,2,,2,10,,magnitude1,4.0,2.0,20.0
6,/home/ltirrell/projects/pybids/bids/tests/data...,,,,,fmap,1,,2,10,,phasediff,4.0,2.0,20.0
7,/home/ltirrell/projects/pybids/bids/tests/data...,,,,magnitude2,fmap,2,,2,10,,magnitude2,4.0,2.0,20.0


In [24]:
# Get session variables as a dataframe and merge them back in with the layout
ses_df =  layout.get_collections(level='session', merge=True, variables=['type','modality','task','future','past']).to_df()
# The query function here limits results to only files related to a resting state task 
df.merge(ses_df,how='left', on=['session','subject','run', 'modality','task']).query('task=="rest"').head()

Unnamed: 0,path,acq_x,acquisition,bval,fmap,modality,run,scans,session,subject,task,type_x,type_y,acq_y,future,past
0,/home/ltirrell/projects/pybids/bids/tests/data...,prefrontal,prefrontal,,,,,,,,rest,physio,,,,
11,/home/ltirrell/projects/pybids/bids/tests/data...,fullbrain,fullbrain,,,func,2.0,,2.0,10.0,rest,bold,bold,fullbrain,70.0,80.0
12,/home/ltirrell/projects/pybids/bids/tests/data...,fullbrain,fullbrain,,,func,2.0,,2.0,10.0,rest,physio,bold,fullbrain,70.0,80.0
13,/home/ltirrell/projects/pybids/bids/tests/data...,fullbrain,fullbrain,,,func,1.0,,2.0,10.0,rest,bold,bold,fullbrain,65.0,75.0
14,/home/ltirrell/projects/pybids/bids/tests/data...,fullbrain,fullbrain,,,func,1.0,,2.0,10.0,rest,physio,bold,fullbrain,65.0,75.0


## BIDSValidator

`pybids` includes a BIDS validator. This can tell you if a filepath is a valid BIDS filepath as well as answering questions about what kind of data it should represent

In [25]:
# Note that when using the bids validator, the filepath MUST be relative to the top level bids directory
validator = bids.layout.BIDSValidator()
validator.is_bids('/sub-02/ses-01/anat/sub-02_ses-01_T2w.nii.gz')

True

In [26]:
# Can decide if a filepath represents an anat file
validator.is_anat('/sub-02/ses-01/anat/sub-02_ses-01_T2w.nii.gz')

True

In [27]:
# Can decide if a filepath represents behavioural data
validator.is_behavioral('/sub-02/ses-01/anat/sub-02_ses-01_T2w.nii.gz')

False

In [28]:
# Can decide if a filepath represents an anat file
validator.is_anat('/sub-02/ses-01/anat/sub-02_ses-01_T2w.nii.gz')

True

And so on. See the [docs](https://incf.github.io/pybids/generated/bids.grabbids.BIDSValidator.html#bids-grabbids-bidsvalidator) for the full list of `BIDSValidator` options.