# fMRIPrep pre-processing and post-processing - connectivity matrices extraction using a parcellation

This notebook is intented to give a brief overview of a pipeline to:

1) Download data from Beluga, an HPC  
2) Verify the data with bids-validator using a Docker container  
3) Run the pre-processing with fMRIPrep using a Docker container  
4) Create and run a post-processing pipeline in Nilearn to obtain connectivity matrices

## Background

The data used in this tutorial is anonymized data from the Prevent-AD Cohort. Data from the Prevent-AD is open and available at this [address](https://portal.conp.ca/dataset?id=projects/preventad-open). Unfortunately, the images from the cohort on the CONP are only available in .mnc format. While .mnc images are tranformable to .nii format, they are not transformable in BIDS at this time.

As such, the tutorial used closed data from the Prevent-AD cohort which was readily available in BIDS format. As the data is not open, the participant ID was anonymized and the data is not available to be reproduced.

However, the tools used and the tutorial should be applicable to any bids-validated datasets, which is why we run the bids-validator first. 

## 1) Downloading data from Beluga

In this step, we simply download the data from Beluga. The exact code is not shared as it would leak the actual participant ID. However, I share below the command that would be used to download the data. 

```scp -r user@beluga.computecanada.ca:/path/to/directory/to/copy /directory/local/computer/to/copy```

In this ```scp``` command, the ```-r``` argument serves to copy whole directories to a local computer. It is important to put ```:``` after your user name for the HPC. The path after the ```:``` refers to the path on the supercomputer that needs to be copied to the local computer. After this path, a white space and a path refering to the folder on the local computer where the folder from the HPC should be copied.

Should you want to copy a folder from your computer to the HPC (e.g. after the pre-processing), you simply need to invert the order of the paths.

```scp -r /directory/local/computer/to/copy/to/HPC user@beluga.computecanada.ca:/path/to/remote/directory/```

## 2) Verify BIDS validity with bids-validator

Once the data is downloaded, and in our case, anonymized, we are ready to validate our bids! You can find everything there is to know about BIDS [here](https://bids-specification.readthedocs.io/en/stable/). Note that the actual bids-specification version might differ slightly from the version available within the Docker container.

While this might seem redundant when running the bids-validator on our local computer (since fMRIPrep also runs the bids-validator before launching the pre-processing), it becomes particularly useful when we need to launch jobs say on remote HPCs. We might not receive a notice right away that our pre-processing failed because our data is not in bids format for example. Further it is a good practice on how to use containers. You can note however that there is a bunch of different ways to download and use bids-validator as described [here](https://pypi.org/project/bids-validator/), including a [web browser](http://bids-standard.github.io/bids-validator/) where you can upload the dataset and verify it there.

For the Docker container, you first need to install [Docker](https://hub.docker.com/editions/community/docker-ce-desktop-mac/). The hyperlink shows instructions for Mac, but Docker is available for Windows and Linux. Once you have gone through the instructions, your Docker version should be ready to go! 

To use the bids-validator, we first need to ```pull``` the Docker image of the bids-validator. In short, by pulling, we are "installing" the software on our computer, without actually installing software on the hardware. This might not make a ton of sense, but for now, you can think of it as accessing another computer, that is not yours, which only contains the softwares necessary to run bids-validator.

The first time you run the command below, Docker will ```pull``` the Docker image for you and run the analyses right away. Note that I split the code below in 4 lines for readability using back slashes, but you can run this command in a single line too.

In [2]:
#docker run -ti --rm \
#-v /path/to/data:/data:ro \
#bids/validator \
#/data

Let's unpack the command. 

The first line: Calls Docker and tells it to run in interactive mode (```ti```) (i.e. once we run Docker, we will be "warped" inside the container where there will be an output displayed on the terminal as the software runs. We also use the ```--rm``` command to "clean" our environment before the container is called. This insures that no variables from our Unix/Mac environment "leak" in the container. It is basically just how you would clean a wet table before putting a cardboard on it so that the water wouldn't leak in the box.

The second line: This is called a "mount" and is called in Docker using the ```-v``` argument. Is it telling Docker where on our computer it can fetch the data we want the software inside the container to analyze. When using the ```-v``` argument, we need to tell it: 1) Where to find the data on our computer, 2) How to call this path in the Container and 3) Whether or not Docker has permissions to modify the files in this folder (in this case, ```ro``` stands for read-only. To summarize, this mount tells Docker that the data is on our computer at a certain path (i.e. ```path/to/data```). Then, it tells Docker that inside the container, we should refer to it as ```/data```. Finally, we tell Docker that it can't modify this data: it is read only.

The third line: This is straightforward-this is simply the program Docker needs to call from within the container.

The fourth line: This is an argument given to the program 'bids-validator'. In this case, the program looks for a BIDS dataset within the path you gave it.

Once run, the command will open in your terminal in 'interactive' mode. You will see 

In [3]:
#docker run -ti --rm \
#-v /Users/stong3/Desktop/test_fmriprep_PAD/sourcedata:/data:ro \
#bids/validator \
#/data


Running this command gives us an output that looks a little bit like this:

IMAGE COMMAND LINE

In red, we have errors: These are things that will be problematic when trying to run the bids apps (in our case, fMRIPrep). Full disclosure, in Prevent-AD, it seems that the field 'TaskName' is missing from our .json files. As such, bids-validator will throw an error. However, fMRIPrep still ran in our case.

In blue: we have references that the bids-validator recommend to check to get more information on the error. Note that these links do not always work, as they are auto-generated, so a Google search is much better.

In greenish/yellow: we have warning. These warnings are 'recommendations' that the bids specification asks for. However, they are not essential for the code to run properly. 

Once we verified our BIDS dataset and corrected the errors, we can re-run it again to insure that it is completly bids-compliant. Then, we are ready for our pre-processing!

## 3) Run fMRIPrep pre-processing

As we now know how to use Docker containers, this next part should not be too difficult. To run fMRIPrep, a single command line is necessary. However, I would recommend to edit this command in a text editor first so that you can make sure that all the arguments necessary are there. The goal here is not to describe what fMRIPrep **does** in terms of pre-processing. The [fMRIPrep documentation](https://fmriprep.readthedocs.io/en/stable/), though quite long, is quite thorough in its documentation. The goal is to describe the correct command(s) to go through the pre-processing. 

fMRIPrep gives a few options to download and use the software, but their recommended method is to use a Docker container and use their Python script wrapper to simplify the command complexity (i.e. you do not need to call Docker as the script will do it for you). However, running the Docker container directly will give you a lot more option to fine-tune your analyses. Here is an example of a command from the [fMRIPrep documentation](https://fmriprep.readthedocs.io/en/stable/docker.html):

In [4]:
#docker run -ti --rm \
#-v path/to/data:/data:ro \
#-v path/to/output:/out \
#poldracklab/fmriprep:<latest-version> \
#/data /out/out \
#participant

Let's unpack the command.

The first line: Simply calls Docker in interactive mode and cleans the environment, as we have seen with the bids-validator.

The second line: A mount telling fMRIPrep where to get the data on our computer and telling it that it can't modify these files in their original folders (with the ```:ro```) option. We will call it '/data' in the container.

The third line: A mount telling fMRIPrep where to send back the pre-processed data on our computer. We will call it '/out/' in the container.

The fourth line: Calls fMRIPrep. The next lines will be arguments that we give directly to fMRIPrep to specify what and how we want to process our data.

The fifth line: We tell fMRIPrep 2 things. 1) The data to analyze is in the '/data' folder, which we defined with a mount before and the output where the data is to be store is the '/out' folder which we also defined in a mount. The second 'out' is basically to create a separate folder for fMRIPrep in the output folder.

The sixth line: We tell fMRIPrep which participants we want to pre-process.

This is the basic arguments that fMRIPrep needs to pre-process the data. The full list of arguments that can be used can be found [here](https://fmriprep.readthedocs.io/en/stable/usage.html).

In the case of the pre-processing I did for the current tutorial, I ran the following command:

In [None]:
#docker run -it --rm \
#-v /Users/stong3/Desktop/test_fmriprep_PAD/sourcedata:/data:ro \
#-v /Users/stong3/Desktop/test_fmriprep_PAD/derivative:/out \
#-v /Users/stong3/Desktop/test_fmriprep_PAD/fs_license/license.txt:/opt/freesurfer/license.txt \
#-v /Users/stong3/Desktop/test_fmriprep_PAD/work_dir:/work \
#poldracklab/fmriprep:latest \
#/data /out/fmriprep \
#participant \
#--participant-label sub-00001 \
#-w /work \
#--low-mem \
#--output-spaces T1w \
#--write-graph \
#--fs-license-file /opt/freesurfer/license.txt

So what changed?

The first, second and third lines: This is unchanged from the basic structure, i.e., we run Docker, and give an input and output mount. 

The fourth line: This line tells Docker where to find a Freesurfer license. As part of the fMRIPrep processing, FreeSurfer is run on all anatomical images available to render surfaces and in part for registration of the anatomical template to the functional template. To do this, fMRIPrep needs to access a license that authorizes users to use FreeSurfer. This is free and can be done [here](https://surfer.nmr.mgh.harvard.edu/registration.html). 

The fifth line: This mount is to make it a bit easier on your computer to process the data. It creates a work directory where the intermediate output of the pre-processing are stored during processing so that fMRIPrep doesn't store all of it the computer's memory. We will define the work directory a little bit later.

The sixth and seventh lines: These are the basic fMRIPrep arguments where we start the software, where we tell it where to find the data, where to output the pre-processed files.

The eight and nine lines: This is where we define the participants we want to process. We first tell it the ```participant``` argument, followed by the ```--participant-label```. We then feed it the labels of the participants we want to process. In our case, we simply want the subject 00001. 

The tenth line: This defines the work directory where to put the intermediate files. 

The eleventh line: This tells fMRIPrep that our computer does not have a lot of RAM, and to go a bit easier on it.

The twelveth line: We