# BIDS: Brain Imaging Dataset Specification

- This notebook is a revision of the excellent [Dartbrains Introductory Notebook](https://dartbrains.org/content/Introduction_to_Neuroimaging_Data.html).
- Recently, there has been growing interest to share datasets across labs and even on public repositories such as [openneuro](https://openneuro.org/). In order to make this a successful enterprise, it is necessary to have some standards in how the data are named and organized. Historically, each lab has used their own idiosyncratic conventions, which can make it difficult for outsiders to analyze. In the past few years, there have been heroic efforts by the neuroimaging community to create a standardized file organization and naming practices. This specification is called **BIDS** for [Brain Imaging Dataset Specification](http://bids.neuroimaging.io/).
- Because BIDS is a consistent format, it is possible to have a python package to make it easy to query a dataset. We recommend using [pybids](https://github.com/bids-standard/pybids).


# Get the Data and Unzip it
Download the data from osf with wget, unzip it and clean up the zip file.

In [None]:
import wget

site_url = 'https://osf.io/5q3m8/download'
wget.download(site_url)

In [None]:
import zipfile

with zipfile.ZipFile("Jupyter_neuro_data.zip","r") as zip_ref: 
    zip_ref.extractall(path=None)

In [None]:
import os

if os.path.exists("Jupyter_neuro_data.zip"):
   os.remove("Jupyter_neuro_data.zip")


## The `BIDSLayout`
[Pybids](https://github.com/bids-standard/pybids) is a package to help query and navigate a neuroimaging dataset that is in the BIDs format. At the core of pybids is the `BIDSLayout` object. A `BIDSLayout` is a lightweight Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. While the BIDSLayout initializer has a large number of arguments you can use to control the way files are indexed and accessed, you will most commonly initialize a BIDSLayout by passing in the BIDS dataset root location as a single argument.

If we set `derivatives=True`the layout will also index and validate the derivatives subfolder, which might contain preprocessed data, analyses, or other user generated files (In our case, except for sub-219, the datasets are just stubs.  These stubs will work with BIDSlayout even though the images are empty). 

See [Querying BIDS datasets](https://bids-standard.github.io/pybids/layout/index.html) for more examples.

In [None]:
from bids import BIDSLayout, BIDSValidator
import os

data_dir = 'data/inputs'
# layout = BIDSLayout(data_dir, derivatives=True)
layout = BIDSLayout(data_dir, derivatives=False)

When we initialize a BIDSLayout, all of the files and metadata found under the specified root folder are indexed. This can take a few seconds (or, for very large datasets, a minute or two). Once initialization is complete, we can start querying the BIDSLayout in various ways. 

The main query method is `.get()`. If we call .`get()` with no additional arguments, we get back a list of all the BIDS files in our dataset.

In [None]:
layout.get()

As you can see, just a generic `.get()` call gives us *all* of the files. We will definitely want to be a bit more specific. We can specify the type of data we would like to query. For example, suppose we want to return the first 10 subject ids.

### Clear Cell Outputs

That's a lot of files!  To clear the giant list that was just produced, right-click the output cell and choose **Clear Outputs**

Return just 10 results instead

In [None]:
pwd

In [None]:
layout.get()[:10]

Return just the subject numbers.  This can be accomplished in a couple of ways.

In [None]:
layout.get(target='subject', return_type='id')[:10]

In [None]:
layout.get_subjects()

Or perhaps, we would like to get the file names for the raw bold functional nifti images for the first 10 subjects. We can filter files in the `raw` or `derivatives`, using `scope` keyword.`scope='raw'`, to only query raw bold nifti files.

In [None]:
layout.get(target='subject', scope='raw', suffix='bold', return_type='file')[:10]

When you call .get() on a BIDSLayout, the default returned values are objects of class BIDSFile. A BIDSFile is a lightweight container for individual files in a BIDS dataset. 

Here are some of the attributes and methods available to us in a BIDSFile (note that some of these are only available for certain subclasses of BIDSFile; e.g., you can't call get_image() on a BIDSFile that doesn't correspond to an image file!):

- .path: The full path of the associated file
- .filename: The associated file's filename (without directory)
- .dirname: The directory containing the file
- .get_entities(): Returns information about entities associated with this BIDSFile (optionally including metadata)
- .get_image(): Returns the file contents as a nibabel image (only works for image files)
- .get_df(): Get file contents as a pandas DataFrame (only works for TSV files)
- .get_metadata(): Returns a dictionary of all metadata found in associated JSON files
- .get_associations(): Returns a list of all files associated with this one in some way
- .get_subjects(): Return a list of the subject ID numbers

Explore the first file in the list in a little more detail. It is indexed with [0].  

Change the index to see a different file.

In [None]:
f = layout.get()[0]
f

If we wanted to get the path of the file, we can use `.path`.

In [None]:
f.path

Suppose we were interested in getting a list of tasks included in the dataset.

In [None]:
layout.get_task()

We can query all of the files associated with this task.

In [None]:
layout.get(task='rest', suffix='bold', scope='raw')[:10]

Notice that there are nifti and json files. We can get the filename for the first particant's functional run

In [None]:
f = layout.get(task='rest')[0].filename
f

If you want a summary of all the files in your BIDSLayout, but don't want to have to iterate BIDSFile objects and extract their entities, you can get a nice bird's-eye view of your dataset using the `to_df()` method.

In [None]:
layout.to_df()

## Loading Data with Nibabel
Neuroimaging data is often stored in the format of nifti files `.nii` which can also be compressed using gzip `.nii.gz`.  These files store both 3D and 4D data and also contain structured metadata in the image **header**.

There is an very nice tool to access nifti data stored on your file system in python called [nibabel](http://nipy.org/nibabel/).  If you don't already have nibabel installed on your computer it is easy via `pip`. First, tell the jupyter cell that you would like to access the unix system outside of the notebook and then install nibabel using pip `!pip install nibabel`. You only need to run this once (unless you would like to update the version).

nibabel objects can be initialized by simply pointing to a nifti file even if it is compressed through gzip.  First, we will import the nibabel module as `nib` (short and sweet so that we don't have to type so much when using the tool).  I'm also including a path to where the data file is located so that I don't have to constantly type this.  It is easy to change this on your own computer.

We will be loading an anatomical image from subject 219.

Use pybids to grab subject's T1 image.

In [None]:
import nibabel as nib

T1w_data = nib.load(layout.get(subject='219', scope='raw', session='itbs', suffix='T1w', return_type='file', extension='nii.gz')[0])

If we want to get more help on how to work with the nibabel data object we can either consult the [documentation](https://nipy.org/nibabel/tutorials.html#tutorials) or add a `?`.

In [None]:
T1w_data?

The imaging data is stored in either a 3D or 4D numpy array. Just like numpy, it is easy to get the dimensions of the data using `shape`. 

In [None]:
T1w_data.shape

Looks like there are 3 dimensions (x,y,z) that is the number of voxels in each dimension. If we know the voxel size, we could convert this into millimeters.

We can also directly access the data and plot a single slice using standard matplotlib functions.

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

plt.imshow(T1w_data.get_fdata()[:,:,100])

Try slicing different dimensions (x,y,z) yourself to get a feel for how the data is represented in this anatomical image.

We can also access data from the image header. Let's assign the header of an image to a variable and print it to view it's contents.

In [None]:
header = T1w_data.header
print(header)      

Some of the important information in the header is information about the orientation of the image in space. This can be represented as the affine matrix, which can be used to transform images between different spaces.

In [None]:
T1w_data.affine

We will dive deeper into affine transformations in the preprocessing tutorial.

## Plotting Data with Nilearn
There are many useful tools from the [nilearn](https://nilearn.github.io/index.html) library to help manipulate and visualize neuroimaging data. See their [documentation](https://nilearn.github.io/plotting/index.html#different-plotting-functions) for an example.

In this section, we will explore a few of their different plotting functions, which can work directly with nibabel instances.

Some of these functions are SLOW.  If the cell is marked like `[*]` then wait till it finishes running and gets assigned a number.

In [None]:
%matplotlib inline

from nilearn.plotting import view_img, plot_glass_brain, plot_anat, plot_epi

In [None]:
plot_anat(T1w_data)

Nilearn plotting functions are very flexible and allow us to easily customize our plots

In [None]:
plot_anat(T1w_data, draw_cross=False, display_mode='z')

## Get More Information about plot_anat
Get more information about how to use the function with `?` and try to add different commands to change the plot.

nilearn also has a neat interactive viewer called `view_img` for examining images directly in the notebook. 

In [None]:
plot_anat?

In [None]:
view_img(T1w_data)

The `view_img` function is particularly useful for overlaying statistical maps over an anatomical image so that we can interactively examine where the results are located.

As an example, let's load a mask of the amygdala and try to find where it is located. We will download it from [Neurovault](https://neurovault.org/images/18632/) using a function from `nltools`.

In [None]:
from nltools.data import Brain_Data
amygdala_mask = Brain_Data('https://neurovault.org/media/images/1290/FSL_BAmyg_thr0.nii.gz').to_nifti()

view_img(amygdala_mask, T1w_data)

We can also plot a glass brain which allows us to see through the brain from different slice orientations. In this example, we will plot the binary amygdala mask.

In [None]:
plot_glass_brain(amygdala_mask)

## Manipulating Data with Nltools
Ok, we've now learned how to use nibabel to load imaging data and nilearn to plot it.

Next we are going to learn how to use the `nltools` package that tries to make loading, plotting, and manipulating data easier. It uses many functions from nibabel, nilearn, and other python libraries. The bulk of the nltools toolbox is built around the `Brain_Data()` class. The concept behind the class is to have a similar feel to a pandas dataframe, which means that it should feel intuitive to manipulate the data.

The `Brain_Data()` class has several attributes that may be helpful to know about. First, it stores imaging data in `.data` as a vectorized features by observations matrix. Each image is an observation and each voxel is a feature. Space is flattened using `nifti_masker` from nilearn. This object is also stored as an attribute in `.nifti_masker` to allow transformations from 2D to 3D/4D matrices. In addition, a brain_mask is stored in `.mask`. Finally, there are attributes to store either class labels for prediction/classification analyses in `.Y` and design matrices in `.X`. These are both expected to be pandas `DataFrames`.

We will give a quick overview of basic Brain_Data operations, but we encourage you to see our [documentation](https://nltools.org/) for more details.

### Brain_Data basics
To get a feel for `Brain_Data`, let's load an example anatomical overlay image that comes packaged with the toolbox.

In [None]:
from nltools.data import Brain_Data
from nltools.utils import get_anatomical

anat = Brain_Data(get_anatomical())
anat

To view the attributes of `Brain_Data` use the `vars()` function.

In [None]:
print(vars(anat))

`Brain_Data` has many methods to help manipulate, plot, and analyze imaging data. We can use the `dir()` function to get a quick list of all of the available methods that can be used on this class.

To learn more about how to use these tools either use the `?` function, or look up the function in the [api documentation](https://nltools.org/api.html).


In [None]:
print(dir(anat))

Ok, now let's load a single subject's functional data from the run 1 resting state dataset. We will load one that has already been preprocessed with fmriprep and is stored in the derivatives folder.

Loading data can be **slow** especially if the data need to be resampled to the template, which is set at $2mm^3$ by default. However, once it's loaded into the workspace it should be relatively fast to work with it.


In [None]:
sub = 'sub-219'
ses = 'ses-itbs'

fmr_data = Brain_Data(os.path.join(data_dir, 'derivatives', 'fmriprep', sub, ses, 'func', f'{sub}_{ses}_task-rest_run-1_space-MNI152NLin6Asym_desc-smoothAROMAnonaggr_bold.nii.gz'))

Here are a few quick basic data operations.

Find number of images in Brain_Data() instance

In [None]:
print(len(fmr_data))

Find the dimensions of the data (images x voxels)

In [None]:
print(fmr_data.shape())

We can use any type of indexing to slice the data such as integers, lists of integers, slices, or boolean vectors.

In [None]:
import numpy as np

print(fmr_data[5].shape())

print(fmr_data[[1,6,2]].shape())

print(fmr_data[0:10].shape())

index = np.zeros(len(fmr_data), dtype=bool)
index[[1,5,9, 16, 20, 22]] = True

print(fmr_data[index].shape())

### Simple Arithmetic Operations

Calculate the mean for every voxel over images

In [None]:
print(fmr_data.mean())

Calculate the standard deviation for every voxel over images

In [None]:
fmr_data.std()

Methods can be chained.  Here we get the shape of the mean.

In [None]:
print(fmr_data.mean().shape())

Brain_Data instances can be added and subtracted

In [None]:
new = fmr_data[1]+fmr_data[2]

Brain_Data instances can be manipulated with basic arithmetic operations.

Here we add 10 to every voxel and scale by 2

In [None]:
fmr_data2 = (fmr_data + 10) * 2

Brain_Data instances can be copied

In [None]:
new = fmr_data.copy()

Brain_Data instances can be easily converted to nibabel instances, which store the data in a 3D/4D matrix.  This is useful for interfacing with other python toolboxes such as [nilearn](http://nilearn.github.io)


In [None]:
fmr_data.to_nifti()

Brain_Data instances can be concatenated using the append method

In [None]:
new = new.append(fmr_data[4])

Lists of `Brain_Data` instances can also be concatenated by recasting as a `Brain_Data` object.

In [None]:
print(type([x for x in fmr_data[:4]]))

type(Brain_Data([x for x in fmr_data[:4]]))

Any Brain_Data object can be written out to a nifti file.

In [None]:
fmr_data.write('data/outputs/Tmp_Data.nii.gz')

Images within a Brain_Data() instance are iterable.  Here we use a list comprehension to calculate the overall mean across all voxels within an image.

In [None]:
[x.mean() for x in fmr_data]

Though, we could also do this with the `mean` method by setting `axis=1`.

In [None]:
fmr_data.mean(axis=1)

Let's plot the mean to see how the global signal changes over time.

In [None]:
plt.plot(fmr_data.mean(axis=1))

Notice the slow linear drift over time, where the global signal intensity gradually decreases. We will learn how to remove this with a high pass filter in future tutorials.

### Plotting
There are multiple ways to plot your data.

For a very quick plot, you can return a montage of axial slices with the `.plot()` method. As an example, we will plot the mean of each voxel over time.

In [None]:
f = fmr_data.mean().plot()

Brain_Data() instances can be converted to a nibabel instance and plotted using any nilearn plot method such as glass brain.


In [None]:
plot_glass_brain(fmr_data.mean().to_nifti())

Ok, that's the basics. `Brain_Data` can do much more!
Check out some of our [tutorials](https://nltools.org/auto_examples/index.html) for more detailed examples.
