# BIDS: Brain Imaging Dataset Specification and PyBIDS

This notebook is a revision of the excellent [Dartbrains Introductory Notebook](https://dartbrains.org/content/Introduction_to_Neuroimaging_Data.html).

**BIDS**: There has been a growing interest in sharing datasets across labs and even on public repositories such as [openneuro](https://openneuro.org/). To succeed we need standards for naming and organizing data. Historically, each lab has used their own idiosyncratic conventions, which can make it difficult for outsiders to analyze. In the past few years, there have been heroic efforts by the neuroimaging community to create a standardized file organization and naming practices. This specification is called **BIDS** for [Brain Imaging Dataset Specification](http://bids.neuroimaging.io/).

**PyBIDS**: Because BIDS is a consistent format, it is possible to use a Python package to make it easy to query a dataset. [PyBIDS](https://github.com/bids-standard/pybids) is a set of tools for doing just that, working with Brain Imaging Data Structure (BIDS) datasets. This notebook allows you to explore its capabilities.  

**TIP**: Some of these cells are slow to run! To determine whether a cell has been run, look at the square brackets on the left:
- `[ ]` indicates the cell has not been run
- `[*]` indicates the cell is running.  Be patient!
- A number in the brackets, e.g., `[35]`, indicates that the cell has been run, and you can move on to the next one. 

# Get the Data

Download the dataset from [OSF](https://osf.io/5q3m8).

In [None]:
import os
import wget
import zipfile

site_url = 'https://osf.io/6gm3j/download'

# Download the data with wget and unzip it to create the data directory. The BIDS dataset is in data/inputs.
if (not os.path.isdir('data')):
    wget.download(site_url)
    with zipfile.ZipFile('Jupyter_pybids_data.zip','r') as zip_ref: 
        zip_ref.extractall(path=None)

In [None]:
# Remove the zip file to save space
if os.path.exists('Jupyter_pybids_data.zip'):
   os.remove('Jupyter_pybids_data.zip')

# PyBIDS: BIDSLayout

[PyBIDS](https://github.com/bids-standard/pybids) is a package to help query and navigate a BIDS neuroimaging dataset. At the core of PyBIDS is the `BIDSLayout` object. A `BIDSLayout` is a lightweight Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. While the BIDSLayout initializer has a large number of arguments you can use to control the way files are indexed and accessed, you will most commonly initialize a BIDSLayout by passing in the BIDS dataset root location as a single argument. This creates an sqlite database containing information about the BIDS dataset. BIDS apps (e.g. BIDS Validator, fmriprep etc.) all creating the BIDSLayout database and use it to identify subjects, files, sessions and relevant metadata.

In our case, except for the T1w anatomical image and a standard space 4D functional image in derivatives for sub-219, the image files are empty.  These empty files will work with BIDSlayout...and keep the dataset small. 

See [Querying BIDS datasets](https://bids-standard.github.io/pybids/layout/index.html) for more examples.

In [None]:
from bids import BIDSLayout, BIDSValidator

data_dir = 'data/inputs'
layout = BIDSLayout(data_dir, derivatives=False)

Initializing a BIDSLayout finds and indexes all files and metadata found under the specified root folder. This can take a few seconds (or, for very large datasets, a minute or two). Once initialization is complete, you can start querying the BIDSLayout in various ways. 

- The main query method is `.get()`. If you call `.get()` with no additional arguments, you get information for **all** the files in the BIDS dataset. 
- The information about each file is contained in an object of type `BIDSFile`. 
- The several classes of BIDSFile objects each represent a type of file recognized by PyBIDS: BIDSFile, BIDSJSONFile, BIDSImageFile, and BIDSDataFile.

## BIDSLayout: .get

When you call `.get()` on a BIDSLayout, the default returned values are BIDSFile objects. A BIDSFile is a container for information about individual files in a BIDS dataset. As we saw above, there are several sub-classes of BIDSFiles; each representing one of the kinds of files recognized by BIDS.

Each sub-class has attributes and methods appropriate for the corresponding file type.
Below are *some* of the **attributes** and **methods** available. Note that some of the methods are only available for certain sub-classes of BIDSFile; e.g., you can't call `get_image()` on a BIDSJSONFile because it doesn't correspond to an image file:

- .path: The attribute contains the full path of the associated file
- .filename: The attribute contains the associated file's filename (without directory)
- .dirname: The attribute contains the directory containing the file
<p>
- .get_entities(): This method returns information about entities associated with this BIDSFile (optionally including metadata)
- .get_image(): This method returns the file contents as a NiBabel image (only works for image files)
- .get_df(): This method returns file contents as a pandas DataFrame (only works for TSV files)
- .get_metadata(): This method returns a dictionary of all JSON metadata associated with an image (only works on image files)
- .get_associations(): This method returns a list of all files associated with a specified file (usually the JSON file associated with an image file and vice-versa)
- .get_subjects(): This method returns a list of the subject ID numbers

In [None]:
layout.get()

### Try This!

That's a lot of files!  To clear the giant list that was just produced, right-click the output cell (the one that was created by running the code) and choose **Clear Outputs**

### Get Just Filenames

To see a list of filenames, instead of the clunky BIDSFile objects, just add a `return_type=file` argument, like this:

In [None]:
layout.get(return_type='file')

### Get just Subject IDs

As you saw above, using just the generic `.get()` call gives us information about **all** of the files. We will usually want to query the BIDSLayout to extract more specific information. For example, to return a list of the subject ids, we can say:

In [None]:
layout.get(target='subject', return_type='id')

### Try This!
In addition to `subject`, other working targets include `run`, `session`, and `task`.  Add cells and try them out!

### Concise Get Methods 

These `get` operations are so common that there are more concise calls for each: e.g., `get_subjects` returns the same Python list of the subject IDs as the call above.  

In [None]:
layout.get_subjects()

### Try This!
Analogous to `get_subjects`, there are concise get commands for other properties. Create additional cells and try to retrieve data for some of the most common: `run`, `session`, and `task`. 

### Get BIDSLayout Entities
Many of the BIDSLayout methods mention `entities`.  The code below gets the entities `layout.get_entities()`, retains just the `keys` (i.e., the entity names), and sorts the output alphabetically.

### Get a Subset of Files

Provide arguments to the general `get` method to select a subset of our files. For example, select only the BOLD-related file objects for subject number 188:

In [None]:
layout.get(suffix='bold', subject='188')

#### Try This!
Try these alternative layout.get queries in new code cells

```Python
layout.get(suffix='bold')
layout.get(suffix='bold', return_type='file')
layout.get(subject='188', extension='.json')
layout.get(ManufacturersModelName='Skyra')
layout.get(Modality='MR')
layout.get(EchoTime='0.025')
layout.get(FlipAngle='90')
```

### Get Specific Information for a Subset of the Files

In addition to selecting only certain files, we can also extract specific information from those file objects. 
For example, return only the filenames (not the BIDSLayout objects) for the BOLD NIfTI images for subject number 188:

In [None]:
layout.get(suffix='bold', subject='188', return_type='file', extension='nii.gz')

### Get a Subset by Task
Query all files associated with a task (like rest).   

In [None]:
layout.get(task='rest', suffix='bold')

#### Try This!
Add a cell to get information about the `nad1` task instead of `rest`.

### Get Individual File Objects

Use `layout.get()` to retrieve an individual file so you can drill down to learn more about it.
The first file in the BIDSLayout list is indexed with [0].  
(Change the index to see a different file).

In [None]:
f = layout.get()[0]
f

In [None]:
# In this dataset: 
# index [0] retrieves the JSON file for the task rest
f_rest_json = layout.get(task='rest')[0]

# index [1] retrieves the NIfTI file for the task rest
f_rest_nifti = layout.get(task='rest')[1]

# This gets just the phasediff JSON file for sub-219
f_phasediff_json = layout.get(subject='219', suffix='phasediff')[0]

# This gets just the phasediff NIfTI file for sub-219
f_phasediff_nifti = layout.get(subject='219', suffix='phasediff')[1]

### Get Information for One File
Get the entities associated with one image file

In [None]:
# Get entities just for the phasediff image file
f_phasediff_nifti.get_entities()

In [None]:
# Get_associations identifies related files, like the paired NIfTI and JSON files
f_phasediff_nifti.get_associations()

#### Try This!
Add cells and do the following:
- Get entities for a different file.
- Get associations for a different file.

### Make Information Retrieval more Efficient

#### Store Data Subsets of Interest

In [None]:
# Several files are associated with the `rest` task.  Put that list into a variable
rest_task = layout.get(task='rest')
rest_task

#### Loop over Data Subsets

In [None]:
# Now loop over the items in the list and get associations 

for fyl in rest_task:
    print(f"Associations for {fyl.filename} in {os.path.basename(fyl.dirname)}:")
    assoc = fyl.get_associations()
    for ass in assoc:
        print(f"\t{ass.filename}")
    print()

In [None]:
# Now loop over the items in the list and get entities 

for fyl in rest_task:
    print(f"Entities for {fyl.filename} in {os.path.basename(fyl.dirname)}:")
    ent = fyl.get_entities()
    for e in ent:
        print(f"\t{e}")
    print()

### Get Information about the IntendedFor Field

In [None]:
# Get the entire BIDSLayout and put it in a variable
fyls_all=layout.get()
# Just to check, see how many files this are
num_fyls=len(fyls_all)
num_fyls

In [None]:
for fyl in fyls_all:  
    intend = fyl.get_metadata().get('IntendedFor')
    if (intend is not None):
        print(f"IntendedFor in {fyl.filename}:")
        print(f"{intend}\n")

## Display BIDSLayout in a Dataframe  

Display a handsome summary of all the files in your BIDSLayout using the `to_df()` method.

In [None]:
layout.to_df()

### Try This!
Add a cell and filter the table displayed by to_df:    
- You can limit this by supplying arguments like subject, suffix, extension etc. as we did before. 
- For example, `layout.to_df(suffix='T1w')`

## BIDS Reports

The `bids.reports.BIDSReport` class requires a `bids.BIDSLayout` object as an argument. Pybids reports are then generated with the generate method.
The generate method allows for keyword restrictions, just like bids.BIDSLayoutâ€™s get method. For example, to generate a report only for nback task data in session 01:
`counter = report.generate(session='01', task='nback')`.

The report knows which images are empty, so for this dataset, we can only run the report on the sub-219 T1w image which isn't empty!

See [PyBIDS Reports](https://bids-standard.github.io/pybids/reports/index.html) to learn more.

In [None]:
from os.path import join
from bids import BIDSLayout
from bids.reports import BIDSReport

data_dir = 'data/inputs'
layout = BIDSLayout(data_dir, derivatives=False)

report = BIDSReport(layout)

counter = report.generate(subject = '219', session = 'itbs', suffix = 'T1w')
main_report = counter.most_common()[0][0]
print(main_report)

In [None]:
# This is a report for a synthetic test dataset. 
# In a dataset with real images, it is not be necessary to restrict report.generate

from bids.tests import get_test_data_path
layout2 = BIDSLayout(join(get_test_data_path(), 'synthetic'))
report = BIDSReport(layout2)
counter = report.generate()
main_report = counter.most_common()[0][0]
print(main_report)

## PyBIDS BIDSLayout Summary

If your data conforms to the BIDS data structure, you can use BIDSLayout to retrieve information about the contents of the dataset.
But, of course, you'd like to DO something with the data, not just look at its structure.  That's where our next tool NiBabel is useful, and can be explored in an additional notebook.