# Introduction to `pybids`

[`pybids`](https://github.com/bids-standard/pybids) is a tool to query, summarize and manipulate data using the BIDS standard. 
In this tutorial we will use a `pybids` test dataset to illustrate some of the functionality of `pybids.layout`

In [1]:
from bids import BIDSLayout, BIDSValidator
from bids.tests import get_test_data_path
import os

## `BIDSLayout`

One of the most fundamental tools offered by pybids is `BIDSLayout`. `BIDSLayout` is a lightweight class to represent a BIDS project file tree.

In [2]:
# Initialise a BIDSLayout of an example dataset
data_path = os.path.join(get_test_data_path(), '7t_trt')
layout = BIDSLayout(data_path)
layout

BIDS Layout: .../pybids/bids/tests/data/7t_trt | Subjects: 10 | Sessions: 20 | Runs: 20

### Querying and working with `BIDSFile` objects
a `BIDSLayout` object can be queried with the class method [`get()`](https://bids-standard.github.io/pybids/generated/bids.grabbids.BIDSLayout.html#bids.grabbids.BIDSLayout.get). The `BIDSLayout` object contains `BIDSFile` objects. We can see the whole list of these by calling `get()` with no arguments:

In [3]:
# Print a summary of the last (out of >300) BIDSFile in the list
layout.get()[-1]

<BIDSFile filename='task-rest_acq-prefrontal_physio.json'>

A `BIDSFile` has various attributes we might be interested in:
* `.path`: The full path of the associated file
* `.filename`: The associated file's filename (without directory)
* `.dirname`: The directory containing the file
* `.image`: The file contents as a nibabel image, if the file is an image
* `.metadata`: A dictionary of all metadata found in associated JSON files
* `.entities`: A dictionary of BIDS entities (or keywords) extracted from the filename

For example, here's the `dict` of entities for the 10th file in our list:

In [4]:
f = layout.get()[-1]
f.entities

{u'acquisition': 'prefrontal', u'suffix': 'physio', u'task': 'rest'}

And here's the metadata:

In [5]:
f.metadata

{u'Columns': [u'cardiac', u'respiratory', u'trigger', u'oxygen saturation'],
 u'SamplingFrequency': 100,
 u'StartTime': 0}

The entity and metadata dictionaries aren't just there for our casual perusal once we've already retrieved a `BIDSFile`; we can directly filter files from the `BIDSLayout` by requesting only files that match specific values. Some examples:

In [6]:
# We query for any files with the suffix 'T1w', only for subject '01'
layout.get(suffix='T1w', subject='01')

[<BIDSFile filename='sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz'>]

In [7]:
# Retrieve all files where SamplingFrequency (a metadata key) = 100
# and acquisition = prefrontal, for the first two subjects
layout.get(subject=['01', '02'], SamplingFrequency=100, acquisition="prefrontal")

[<BIDSFile filename='sub-01/ses-1/func/sub-01_ses-1_task-rest_acq-prefrontal_physio.tsv.gz'>,
 <BIDSFile filename='sub-01/ses-2/func/sub-01_ses-2_task-rest_acq-prefrontal_physio.tsv.gz'>,
 <BIDSFile filename='sub-02/ses-1/func/sub-02_ses-1_task-rest_acq-prefrontal_physio.tsv.gz'>,
 <BIDSFile filename='sub-02/ses-2/func/sub-02_ses-2_task-rest_acq-prefrontal_physio.tsv.gz'>]

By default, [`get()`](https://bids-standard.github.io/pybids/generated/bids.grabbids.BIDSLayout.html#bids.grabbids.BIDSLayout.get) returns a `BIDSFile` object, but we can also specify alternative return types using the `return_type` argument. Here, we return only the full filenames as strings:

In [8]:
# Ask get() to return the filenames of the matching files
# We will also specify to return relative not absolute paths, which is
# the default for the return type 'file'
layout.get(suffix='T1w', return_type='file', absolute_paths=False)

['sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz',
 'sub-02/ses-1/anat/sub-02_ses-1_T1w.nii.gz',
 'sub-03/ses-1/anat/sub-03_ses-1_T1w.nii.gz',
 'sub-04/ses-1/anat/sub-04_ses-1_T1w.nii.gz',
 'sub-05/ses-1/anat/sub-05_ses-1_T1w.nii.gz',
 'sub-06/ses-1/anat/sub-06_ses-1_T1w.nii.gz',
 'sub-07/ses-1/anat/sub-07_ses-1_T1w.nii.gz',
 'sub-08/ses-1/anat/sub-08_ses-1_T1w.nii.gz',
 'sub-09/ses-1/anat/sub-09_ses-1_T1w.nii.gz',
 'sub-10/ses-1/anat/sub-10_ses-1_T1w.nii.gz']

We can also ask `get()` to return unique values (or ids) of particular entities. For example, say we want to know which subjects have at least one `T1w` file. We can request that information by setting `return_type='id'` and `target='subject'`:

In [9]:
# Ask get() to return the ids of subjects that have T1w files
layout.get(return_type='id', target='subject')

['01', '02', '03', '04', '05', '06', '07', '08', '09', '10']

If our `target` is a BIDS entity that corresponds to a particular directory in the BIDS spec (e.g., `subject` or `session`) we can also use `return_type='dir'` to get all matching subdirectories:

In [10]:
layout.get(return_type='dir', target='subject', absolute_paths=False)

['sub-01',
 'sub-02',
 'sub-03',
 'sub-04',
 'sub-05',
 'sub-06',
 'sub-07',
 'sub-08',
 'sub-09',
 'sub-10']

## Other utilities

Say you have a filename, and you want to manually extract BIDS entities from it. The `parse_file_entities` method provides the facility:

In [11]:
path = "/a/fake/path/to/a/BIDS/file/sub-01_run-1_T2w.nii.gz"
layout.parse_file_entities(path)

{u'run': 1, u'subject': '01', u'suffix': 'T2w'}

You may want to create valid BIDS filenames for files that are new or hypothetical that would sit within your BIDS project. This is useful when you know what entity values you need to write out to, but don't want to deal with looking up the precise BIDS file-naming syntax. In the example below, imagine we've created a new file containing stimulus presentation information, and we want to save it to a `.tsv.gz` file, per the BIDS naming conventions. All we need to do is define a dictionary with the name components, and `build_path` takes care of the rest (including injecting sub-directories!):

In [12]:
entities = {
    'subject': '01',
    'run': 2,
    'task': 'nback',
    'suffix': 'bold'
}

layout.build_path(entities)

u'sub-01/func/sub-01_task-nback_run-2_bold.nii.gz'

You can also use `build_path` in more sophisticated ways—for example, by defining your own set of matching templates that cover cases not supported by BIDS out of the box. For example, suppose you want to create a template for naming a new z-stat file. You could do something like:

In [13]:
# Define the pattern to build out of the components passed in the dictionary
pattern = "sub-{subject}[_ses-{session}]_task-{task}[_acq-{acquisition}][_rec-{reconstruction}][_run-{run}][_echo-{echo}]_{suffix<z>}.nii.gz",

entities = {
    'subject': '01',
    'run': 2,
    'task': 'n-back',
    'suffix': 'z'
}

# Notice we pass the new pattern as the second argument
layout.build_path(entities, pattern)


'sub-01_task-n-back_run-2_z.nii.gz'

### Loading derivatives

By default, `BIDSLayout` objects are initialized without scanning contained `derivatives/` directories. But you can easily ensure that all derivatives files are loaded and endowed with the extra structure specified in the [derivatives config file](https://github.com/bids-standard/pybids/blob/master/bids/layout/config/derivatives.json):

In [14]:
# Define paths to root and derivatives folders
root = os.path.join(get_test_data_path(), 'synthetic')
layout2 = BIDSLayout(root, derivatives=True)
layout2

BIDS Layout: ...bids/bids/tests/data/synthetic | Subjects: 5 | Sessions: 10 | Runs: 10

The `scope` argument to `get()` specifies which part of the project to look in. By default, valid values are `'bids'` (for the "raw" BIDS project that excludes derivatives) and `'derivatives'` (for all BIDS-derivatives files). The following call returns the filenames of all derivatives files.

In [15]:
# Get all files in derivatives
layout2.get(scope='derivatives', return_type='file', absolute_paths=False)

['derivatives/fmriprep/dataset_description.json',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_desc-confounds_regressors.tsv.gz',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-brain_mask.nii',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-brain_mask.nii.gz',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.nii',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_space-T1w_desc-brain_mask.nii',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_space-T1w_desc-brain_mask.nii.gz',
 'derivatives/fmriprep/sub-01/ses-01/func/sub-01_ses-01_task-nback_run-01_space-T1w_desc-preproc_bold.nii',
 'derivatives/fmripr

### `Dataframe` option
the `BIDSLayout` class has built in support for pandas `DataFrames`:

In [16]:
# Convert the layout to a pandas dataframe
# Accepts all the options/query of .get() call
df = layout.to_df(absolute_paths=False)
df.head(9)

Unnamed: 0,path,acquisition,datatype,fmap,run,scans,session,subject,suffix,task
0,dataset_description.json,,,,,,,,description,
1,participants.tsv,,,,,,,,participants,
2,README,,,,,,,,,
3,sub-01/ses-1/anat/sub-01_ses-1_T1map.nii.gz,,anat,,,,1.0,1.0,T1map,
4,sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz,,anat,,,,1.0,1.0,T1w,
5,sub-01/ses-1/fmap/sub-01_ses-1_run-1_magnitude...,,fmap,magnitude1,1.0,,1.0,1.0,magnitude1,
6,sub-01/ses-1/fmap/sub-01_ses-1_run-1_magnitude...,,fmap,magnitude2,1.0,,1.0,1.0,magnitude2,
7,sub-01/ses-1/fmap/sub-01_ses-1_run-1_phasediff...,,fmap,,1.0,,1.0,1.0,phasediff,
8,sub-01/ses-1/fmap/sub-01_ses-1_run-1_phasediff...,,fmap,phasediff,1.0,,1.0,1.0,phasediff,


In [17]:
layout.to_df(absolute_paths=False, subject='01').head()

Unnamed: 0,path,acquisition,datatype,fmap,run,scans,session,subject,suffix,task
0,sub-01/ses-1/anat/sub-01_ses-1_T1map.nii.gz,,anat,,,,1,1,T1map,
1,sub-01/ses-1/anat/sub-01_ses-1_T1w.nii.gz,,anat,,,,1,1,T1w,
2,sub-01/ses-1/fmap/sub-01_ses-1_run-1_magnitude...,,fmap,magnitude1,1.0,,1,1,magnitude1,
3,sub-01/ses-1/fmap/sub-01_ses-1_run-1_magnitude...,,fmap,magnitude2,1.0,,1,1,magnitude2,
4,sub-01/ses-1/fmap/sub-01_ses-1_run-1_phasediff...,,fmap,,1.0,,1,1,phasediff,


## Retrieving BIDS variables 
BIDS variables are stored in .tsv files at the run, session, subject, or dataset level. You can retrieve these variables with `layout.get_collections()`. The resulting objects can be converted to dataframes and merged with the layout to associate the variables with corresponding scans.

In the following example, we request all subject-level variable data available anywhere in the BIDS project, and merge the results into a single `DataFrame` (by default, we'll get back a single `BIDSVariableCollection` object for each subject). 

In [18]:
# Get subject variables as a dataframe and merge them back in with the layout
subj_df = layout.get_collections(level='subject', merge=True).to_df()
subj_df.head()

Unnamed: 0,session,suffix,subject,CCPT_FN_count,CCPT_FP_count,CCPT_avg_FN_RT,CCPT_avg_FP_RT,CCPT_avg_succ_RT,CCPT_succ_count,caffeine_daily,...,relative_water_intake,specific_vague,subject_id,surroundings,systolic_blood_pressure_left,systolic_blood_pressure_right,thirst,vigilance,vigilance_nyc-q,words
0,1,sessions,1,0.0,1.0,,507.0,500.770833,96.0,0.5,...,7.0,95.0,1.0,0.0,108.0,109.0,9.0,9.0,100.0,100.0
1,1,sessions,2,0.0,5.0,,297.6,351.729167,96.0,0.0,...,1.0,100.0,2.0,70.0,99.0,100.0,2.0,7.0,100.0,100.0
2,1,sessions,3,0.0,1.0,,441.0,426.71875,96.0,1.0,...,3.0,100.0,3.0,10.0,122.0,128.0,3.0,8.0,100.0,0.0
3,1,sessions,4,0.0,1.0,,443.0,417.90625,96.0,0.1,...,5.0,80.0,4.0,0.0,130.0,110.0,6.0,5.0,100.0,85.0
4,1,sessions,5,0.0,2.0,,355.5,372.114583,96.0,0.0,...,5.0,75.0,5.0,80.0,105.0,117.0,7.0,7.0,60.0,30.0


## BIDSValidator

`pybids` includes a BIDS validator. This can tell you if a filepath is a valid BIDS filepath as well as answering questions about what kind of data it should represent

In [19]:
# Note that when using the bids validator, the filepath MUST be relative to the top level bids directory
validator = BIDSValidator()
validator.is_bids('/sub-02/ses-01/anat/sub-02_ses-01_T2w.nii.gz')

True

In [20]:
# Can decide if a filepath represents a file part of the specification
validator.is_file('/sub-02/ses-01/anat/sub-02_ses-01_T2w.json')

True

In [21]:
# Can check if file a dataset top
validator.is_top_level('/dataset_description.json')

True

In [22]:
# or subject (or session) level
validator.is_subject_level('/dataset_description.json')

False

In [23]:
validator.is_session_level('/sub-02/ses-01/sub-02_ses-01_scans.json')

True

In [24]:
# Can decide if a filepath represents phenotypic data
validator.is_phenotypic('/sub-02/ses-01/anat/sub-02_ses-01_T2w.nii.gz')

False

And so on. See the [docs](https://bids-standard.github.io/pybids/generated/bids.grabbids.BIDSValidator.html#bids-grabbids-bidsvalidator) for the full list of `BIDSValidator` options.