# Introduction to BIDS and `pybids`

Brain Imaging Data Structure (**BIDS**) is a a standard for organizing and describing MRI datasets standardised https://bids.neuroimaging.io/
Gorgolewski, K., Auer, T., Calhoun, V. et al. *The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.* Sci Data 3, 160044 (2016). https://doi.org/10.1038/sdata.2016.44

`pybids` is a tool to query, summarize and manipulate data using the BIDS standard. 

**In this tutorial I will illustrate some of the functionality of pybids.layout**

The material is adapted from https://github.com/bids-standard/pybids/tree/master/examples

## Example dataset

When `pybids` is installed, some example datasets are added to the library. They are useful to test the structure of the dataset, but they do not contain the actual image files. 
I will use 'syntetic' dataset for some examples. 

In [1]:
from bids.layout import BIDSLayout
from bids.tests import get_test_data_path
import os

ds_path = os.path.join(get_test_data_path(), 'synthetic')

In [3]:
import seedir as sd
sd.seedir(ds_path, style='emoji')

📁 synthetic/
├─📄 dataset_description.json
├─📁 derivatives/
│ └─📁 fmriprep/
│   ├─📄 dataset_description.json
│   ├─📁 sub-01/
│   │ ├─📁 ses-01/
│   │ │ └─📁 func/
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_desc-confounds_regressors.tsv
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-brain_mask.nii
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-brain_mask.nii.gz
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.nii
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-T1w_desc-brain_mask.nii
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-T1w_desc-brain_mask.nii.gz
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-T1w_desc-preproc_bold.nii
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-01_space-T1w_desc-preproc_bold.nii.gz
│   │ │   ├─📄 sub-01_ses-01_task-nback_run-02_desc-confounds_regressors.

## The `BIDSLayout`

A `BIDSLayout` is a Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. The `BIDSLayout` initializer has a large number of arguments you can use to control the way files are indexed and accessed. But usually you'd initialise a `BIDSLayout` with just its root path location.

In [2]:
# Initialize the layout
layout = BIDSLayout(ds_path)

# Print some basic information about the layout
layout

BIDS Layout: ...ages\bids\tests\data\synthetic | Subjects: 5 | Sessions: 10 | Runs: 10

### Querying the BIDSLayout

The main method for querying `BIDSLayout` is `.get()`.

If we call `.get()` with no additional arguments, we get back a list of all the BIDS files in our dataset.

Get all subject IDs

In [4]:
all_files = layout.get()
print("There are {} files in the layout.".format(len(all_files)))
print("\nThe first 5 files are:")
all_files[:5]

There are 153 files in the layout.

The first 5 files are:


[<BIDSJSONFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\dataset_description.json'>,
 <BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\participants.tsv'>,
 <BIDSImageFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-01\anat\sub-01_ses-01_T1w.nii'>,
 <BIDSImageFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-01\anat\sub-01_ses-01_T1w.nii.gz'>,
 <BIDSImageFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-01\func\sub-01_ses-01_task-nback_run-01_bold.nii'>]

The returned object is a **Python list**. Each element in the list is a `BIDSFile` object. 

We can also get just filenames.

In [6]:
layout.get(return_type='filename')[:5]

['D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\dataset_description.json',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\participants.tsv',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-01\\anat\\sub-01_ses-01_T1w.nii',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-01\\anat\\sub-01_ses-01_T1w.nii.gz',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-01\\func\\sub-01_ses-01_task-nback_run-01_bold.nii']

We can also get such information as
* all `subject` IDs
* all `task` names
* dataset `description`
* the BOLD repetition time tr

In [34]:
layout.get_subjects()

['03', '04', '02', '05', '01']

In [35]:
layout.get_tasks()

['nback', 'rest']

In [29]:
layout.get_dataset_description()

{'Name': 'Synthetic dataset for inclusion in BIDS-examples',
 'BIDSVersion': '1.0.2',
 'License': 'PD',
 'Authors': ['Markiewicz, C. J.']}

In [38]:
layout.get_tr()

2.5

### Filtering files by entities

We can pass any BIDS-defined entities (keywords) to `.get()` method. For example, here's how we would retrieve all BOLD runs with `.nii.gz` extensions for subject `01`.

In [14]:
# Retrieve filenames of all BOLD runs for subject 01
layout.get(subject='01', extension='nii.gz', suffix='bold', return_type='filename')

['D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-01\\func\\sub-01_ses-01_task-nback_run-01_bold.nii.gz',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-01\\func\\sub-01_ses-01_task-nback_run-02_bold.nii.gz',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-01\\func\\sub-01_ses-01_task-rest_bold.nii.gz',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-02\\func\\sub-01_ses-02_task-nback_run-01_bold.nii.gz',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-02\\func\\sub-01_ses-02_task-nback_run-02_bold.nii.gz',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01\\ses-02\\func\\sub-01_ses-02_task-rest_bold.nii.gz']

All of the entities are found in the names of BIDS files. For example *'sub-01_ses-01_task-nback_run-01_bold.nii'* is **subject** **session** **task** **run** **suffix**.

You can get the list of all availabe entities by `layout.get_entities()`.

Here are a few of the most common entities:

* `suffix`: The part of a BIDS filename just before the extension (e.g., 'bold', 'events', 'T1w', etc.).
* `subject`: The subject label
* `session`: The session label
* `run`: The run index
* `task`: The task name

In [12]:
# a list of all awailable entities
layout.get_entities()

{'subject': <bids.layout.models.Entity at 0x20dff673c70>,
 'session': <bids.layout.models.Entity at 0x20dff673f70>,
 'task': <bids.layout.models.Entity at 0x20dff673520>,
 'acquisition': <bids.layout.models.Entity at 0x20dff9999a0>,
 'ceagent': <bids.layout.models.Entity at 0x20dff9996d0>,
 'reconstruction': <bids.layout.models.Entity at 0x20dff999850>,
 'direction': <bids.layout.models.Entity at 0x20dff9997f0>,
 'run': <bids.layout.models.Entity at 0x20dff999d00>,
 'proc': <bids.layout.models.Entity at 0x20dff999be0>,
 'modality': <bids.layout.models.Entity at 0x20dff9995e0>,
 'echo': <bids.layout.models.Entity at 0x20dff999e80>,
 'flip': <bids.layout.models.Entity at 0x20dff999c70>,
 'inv': <bids.layout.models.Entity at 0x20dff999070>,
 'mt': <bids.layout.models.Entity at 0x20dff999640>,
 'part': <bids.layout.models.Entity at 0x20dff999a00>,
 'recording': <bids.layout.models.Entity at 0x20dff9994f0>,
 'space': <bids.layout.models.Entity at 0x20dff77b790>,
 'suffix': <bids.layout.mode

### Filtering by metadata

Sometimes we want to search for files based not just on their names, but also based on metadata defined in JSON files. We can pass any key that occurs in any JSON file in our project as an argument to `.get()`. We can combine these with any number of core BIDS entities (like `subject`, `run`, etc.).

For example, say we want to retrieve all files where (a) the value of `SamplingFrequency` (a metadata key) is `10`, and (b) the subject is `01` or `02`.

Indeed, we can pass a list for `subject` rather than just a string. This principle applies to all filters: you can always pass in a list instead of a single value, and this will be interpreted as a logical disjunction (i.e., a file must match any one of the provided values).

In [18]:
# Retrieve all files where SamplingFrequency (a metadata key) = 100
# and acquisition = prefrontal, for the first two subjects
layout.get(subject=['01', '02'], SamplingFrequency=10.0)

[<BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-01\func\sub-01_ses-01_task-nback_run-01_physio.tsv.gz'>,
 <BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-01\func\sub-01_ses-01_task-nback_run-02_physio.tsv.gz'>,
 <BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-01\func\sub-01_ses-01_task-rest_physio.tsv.gz'>,
 <BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-02\func\sub-01_ses-02_task-nback_run-01_physio.tsv.gz'>,
 <BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-02\func\sub-01_ses-02_task-nback_run-02_physio.tsv.gz'>,
 <BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-01\ses-02\func\sub-01_ses-02_task-rest_physio.tsv.gz'>,
 <BIDSDataFile filename='D:\miniconda3\lib\site-packages\bids\tests\data\synthetic\sub-02\ses-01\func\sub-02_s

### Other `return_type` values¶

We can also ask `get()` to return unique values (or ids) of particular entities. For example, say we want to know which subjects have at least one `T1w` file. We can request that information by setting `return_type='id'`. When using this option, we also need to specify a target entity (or metadata keyword) called `target`. This combination tells the `BIDSLayout` to return the unique values for the specified `target` entity. For example, in the next example, we ask for all of the unique subject IDs that have at least one file with a `T1w` suffix. 

In [19]:
# Ask get() to return the ids of subjects that have T1w files
layout.get(return_type='id', target='subject', suffix='T1w')

['03', '04', '02', '05', '01']

If our `target` is a BIDS entity that corresponds to a particular directory in the BIDS spec (e.g., `subject` or `session`) we can also use `return_type='dir'` to get all matching subdirectories. 

In [20]:
layout.get(return_type='dir', target='subject')

['D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-01',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-02',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-03',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-04',
 'D:\\miniconda3\\lib\\site-packages\\bids\\tests\\data\\synthetic\\sub-05']

In [None]:
from bids.reports import BIDSReport
#from bids.tests import get_test_data_path
#layout = BIDSLayout(os.path.join(get_test_data_path(), 'synthetic'))

report = BIDSReport(layout)

counter = report.generate()

In [None]:
main_report = counter.most_common()[0][0]
print(main_report)

In [None]:
# Convert the layout to a pandas dataframe
df = layout.to_df()
df.head(10)

In [None]:
layout.to_df(metadata=True).head(10)