# Prepare & preprocess your neuroimaging data

## Organization of neuroimaging data: the Brain Imaging Data Structure (BIDS)

### Introduction

Before any processing of your neuroimaging data, several steps may be needed. According to your type of neuroimaging data (DICOM or NIfTI), neuroimaging software tools generally require data in NIfTI format.

Regarding the organization of the clinical and imaging data, the **Brain Imaging Data Structure (BIDS)** [(Gorgolewski et al., 2016)](doi.org/10.1038/sdata.2016.44) is the standard adopted for the organisation of the datasets. The BIDS standard is based on a file hierarchy rather than on a database management system, thus facilitating its deployment.

Thanks to its clear and simple way to describe neuroimaging and behavioral data, the BIDS standard has been easily adopted by the neuroimaging community. Organizing a dataset following the BIDS hierarchy simplifies the execution of neuroimaging software tools. 

Here is a general overview of the BIDS structure. If you need more details, please check the [documentation](https://bids-specification.readthedocs.io/en/latest/) on the [website](http://bids.neuroimaging.io/).

```Text
BIDS_Dataset/
├── participants.tsv
├── sub-CLNC01/
│   │   ├── ses-M00/
│   │   │   └── anat/
│   │   │       └── sub-CLNC01_ses-M00_T1w.nii.gz
│   │   └── sub-CLNC01_sessions.tsv
├── sub-CLNC02/
│   │   ├── ses-M00/
│   │   │   └── anat/
│   │   │       └── sub-CLNC02_ses-M00_T1w.nii.gz
│   │   └── sub-CLNC02_sessions.tsv
└──  ...
```



### Convert OASIS dataset into BIDS

The OASIS dataset contains imaging data in DICOM format but does not provide a BIDS version of the data. To solve this issue, [Clinica provides a converter](http://www.clinica.run/doc/Converters/OASIS2BIDS/) to automatically convert DICOM files into NifTI following the BIDS standard.

A command line instruction is enough to get the data in BIDS format:

```bash
clinica convert oasis-to-bids <dataset_directory> <clinical_data_directory> <bids_directory>
```

where:

  - `dataset_directory` is the path to the original OASIS images' directory;
  - `clinical_data_directory` is the path to the directory where the xls file with the clinical data is located;
  - `bids_directory` is the path to the output directory, where the BIDS-converted version of OASIS will be stored.


In [None]:
# Download the example dataset of 4 images
!curl -k https://aramislab.paris.inria.fr/files/data/databases/tuto/OasisDatabase.tar.gz -o OasisDatabase.tar.gz
!tar xf OasisDatabase.tar.gz

In [None]:
# Convert the example dataset to BIDS
!clinica convert oasis-to-bids OasisDatabase/Dicoms OasisDatabase/ClinicalData OasisBids_example

## Preprocess raw images with `t1-linear` pipeline

Although CNNs have the potential to extract low-to-high level features from raw images, a proper image preprocessing procedure is a fundamental step to ensure a good classification performance (in particular for Alzheimer's Disease classification where datasets are relatively small).

In the context of Alzheimer's Disease classification, image preprocessing procedures included:

- **Bias field correction:** MR images can be corrupted by a low frequency and smooth signal caused by magnetic field inhomogeneities. This bias field induces variations in the intensity of the same tissue in different locations of the image, which deteriorates the performance of image analysis algorithms such as registration.
- **Image registration:** Medical image registration consists of spatially aligning two or more images, either globally (rigid and affine registration) or locally (non-rigid registration), so that voxels in corresponding positions contain comparable information.

Finally, a **Cropping** of the registered images is performed in order to remove the background and to reduce the computing power required when training deep learning models. The final image size is 169×208×179 with 1 mm3 isotropic voxels.

For this tutorial, we propose a "Minimal preprocessing" with the [`t1-linear` pipeline](http://www.clinica.run/doc/Pipelines/T1_Linear/) using [ANTs](http://stnava.github.io/ANTs/) software package [(Avants et al., 2014)](https://doi.org/10.3389/fninf.2014.00044) where:

- **Bias field correction** using the N4ITK method [(Tustison et al., 2010)](https://doi.org/10.1109/TMI.2010.2046908).

- **Image registration** was an affine registrationto the MNI152NLin2009cSym template (Fonov et al., [2011](https://doi.org/10.1016/j.neuroimage.2010.07.033), [2009](https://doi.org/10.1016/S1053-8119(09)70884-5) ) in MNI space with the SyN algorithm [(Avants et al., 2008)](https://doi.org/10.1016/j.media.2007.06.004).

- **Cropping** resulted in final images of size 169×208×179 with 1 mm3 isotropic voxels.


These steps can be run with this simple command line:
```Text
clinica run t1-linear <bids_directory> <caps_directory>
```
where:

- `bids_directory` is the input folder containing the dataset in a [BIDS](../../BIDS) hierarchy.
- `caps_directory` is the output folder containing the results in a [CAPS](../../CAPS/Introduction) hierarchy.


In [None]:
!clinica run t1-linear ./OasisBids_example ./OasisCaps_example

Once the pipeline has been run, the necessary outputs for the next steps are saved using a specific suffix: 
    `_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_T1w.nii.gz`. 

For example, processed images from our dataset are:

In [None]:
from nilearn import plotting

suffix = '_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_T1w.nii.gz'

sub1 = 'OasisCaps_example/subjects/sub-OASIS10094/ses-M00/t1_linear/sub-OASIS10094_ses-M00' + suffix 
sub2 = 'OasisCaps_example/subjects/sub-OASIS10304/ses-M00/t1_linear/sub-OASIS10304_ses-M00' + suffix
sub3 = 'OasisCaps_example/subjects/sub-OASIS10284/ses-M00/t1_linear/sub-OASIS10284_ses-M00' + suffix
sub4 = 'OasisCaps_example/subjects/sub-OASIS10326/ses-M00/t1_linear/sub-OASIS10326_ses-M00' + suffix

plotting.plot_anat(sub1, title="sub-OASIS10094")
plotting.plot_anat(sub2, title="sub-OASIS10304")
plotting.plot_anat(sub3, title="sub-OASIS10284")
plotting.plot_anat(sub4, title="sub-OASIS10326")

## Quality check of your preprocessed data

TODO

## Prepare input data for deep learning with PyTorch

Once the dataset had been preprocessed, we need to obtain files suited for the training phase.
This task can be performed using the [Clinica `deeplearning-prepare-data` pipeline](http://www.clinica.run/doc/Pipelines/DeepLearning_PrepareData/)

This pipeline prepares images generated by Clinica to be used with the PyTorch deep learning library [(Paszke et al., 2019)](https://papers.nips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library). Three types of tensors are proposed: 3D images, 3D patches or 2D slices.

This pipeline selects the preprocessed images, extract the "tensors", and write them as output files for the entire images, for each slice or for each patch.

You simply need to type the following command line:

```bash
clinica run deeplearning-prepare-data <caps_directory> <tensor_format>
```
where:

- `caps_directory` is the folder containing the results of the [`t1-linear` pipeline](#Preprocess-raw-images-with-t1-linear-pipeline) and the output of the present command, both in a CAPS hierarchy.
- `tensor_format` is the format of the extracted tensors. You can choose between `image` to convert to PyTorch tensor the whole 3D image, `patch` to extract 3D patches and `slice` to extract 2D slices from the image.

Output files are stored into a new folder (inside the CAPS) and follows a struture like this:

```
deeplearning_prepare_data
├── image_based
│   └── t1_linear
│       └── sub-<participant_label>_ses-<session_label>_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_T1w.pt
├── patch_basedd
│   └── t1_linear
│       ├── sub-<participant_label>_ses-<session_label>_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_patchsize-50_stride-50_patch-0_T1w.pt
│       ├── sub-<participant_label>_ses-<session_label>_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_patchsize-50_stride-50_patch-1_T1w.pt
│       ├── ...
│       └── sub-<participant_label>_ses-<session_label>_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_patchsize-50_stride-50_patch-N_T1w.pt
└── slice_based
    └── t1_linear
        ├── sub-<participant_label>_ses-<session_label>_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_axis-axi_channel-rgb_slice-0_T1w.pt
        ├── sub-<participant_label>_ses-<session_label>_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_axis-axi_channel-rgb_slice-1_T1w.pt
        ├── ...
        ├── sub-<participant_label>_ses-<session_label>_T1w_space-MNI152NLin2009cSym_desc-Crop_res-1x1x1_axis-axi_channel-rgb_slice-N_T1w.pt
```

In a short, there is a folder for each feature (image, slice or patch) and inside the numerated tensor files with the corresponding feature. 

<div class="alert alert-info">

**Note:** You can extract only the tensors for the full images (`clinica run deeplearning-prepare-data <caps_directory> image` ) and continue working with one single file per subject/session. 
    
The package `clinicadl` is able to extract patches or slices _on-the-fly_ (from one single file) when running train or inference tasks. The downside of this approach is that, according to the size of your dataset, you have to guarantee enough memory ressources in your GPU card to host the full images/tensors for all your data. 

If the memory size of the GPU card you use is not enough, we suggest you to extract the patches and/or the slices using the proper `tensor_format` option of the command described above.
</div>
    
To perform the feature extraction in our dataset, run the following cell: 

In [None]:
!clinica run deeplearning-prepare-data ./OasisCaps_example image

At the end of this command, a new directory named `deeplearning_prepare_data` is created inside each subject/session of the CAPS structure. We can easily verify:

In [None]:
!ls ./OasisCaps_example/subjects/sub-OASIS10094/ses-M00