# BigBrain data processing with CBRAIN and DataLad

This tutorial introduces several infrastructure tools used in HIBALL:
* <a href="#part1">Part I</a>: DataLad and Boutiques, to access and process BigBrain data through uniform command-line interfaces
* <a href="#part2">Part II</a>: CBRAIN, to process BigBrain data on HPC clusters through a web portal

<div id="part1"/>


## Part I: accessing and reusing BigBrain data with DataLad and Boutiques


This part of the tutorial will walk you through the following steps:
1. <a href="#finding">Finding BigBrain datasets in DataLad</a>
2. <a href="#installing">Installing and downloading BigBrain datasets</a>
3. <a href="#processing">Processing BigBrain datasets with Boutiques</a>
4. <a href="#adding">Uploading derived data to DataLad</a>

The main tools and platforms involved in this tutorial are [DataLad](https://www.datalad.org), [Boutiques](http://boutiques.github.io), and the [Canadian Open Neuroscience Platform](http://portal.conp.ca). Please refer to the documentation of these tools for additional information.

This tutorial notebook is available on Google Colab and can be done entirely online, without the need for any local software installation. Familiarity with Linux command lines is recommended but not required.

Alternately, if you are familiar with the [Docker](http://docker.io) system and want to run the tutorial on your own computer, you can run all the commands of this tutorial in Docker image `glatard/hws`.

### Software installation

The following script installs the required software in the Google Colab environment:

In [None]:
!git clone https://github.com/glatard/hws.git && (cd hws && ./install.sh)

<div id="finding"/>


### Finding BigBrain datasets in DataLad

One of our goals in [HIBALL](https://bigbrainproject.org/hiball.html) is to distribute BigBrain datasets through the uniform interface provided by DataLad. In this part of the tutorial, we will demonstrate how BigBrain data can be downloaded and manipulated using DataLad. A complete introduction to DataLad, including detailed tutorials, is available in the [DataLad handbook](http://handbook.datalad.org/en/latest/index.html).

BigBrain DataLad datasets are available through the web portal of the Canadian Open Neuroscience Platform, available at http://portal.conp.ca. They can be found by entering "BigBrain" in the search field:

![screenshot](figures/search_data.png)

https://portal.conp.ca/search?search=bigbrain&sortKey=conpStatus&sortComparitor=asc&page=1&max_per_page=10&cursor=0&limit=10

<div id="installing"/>

### Installing and downloading BigBrain datasets

Once a dataset is identified, instructions on how to download it using DataLad are available in the corresponding dataset page in the CONP portal:

![screenshot](figures/download_instructions.png)

The next steps will go through these instructions.

#### Dataset installation

First, the CONP dataset should be installed to your local machine using `datalad install`:

In [None]:
!datalad install https://github.com/CONP-PCNO/conp-dataset.git
%cd conp-dataset

The CONP DataLad dataset contains many datasets, located under `projects`:

In [None]:
!ls projects

The specific BigBrain dataset of interest in this tutorial can be installed as follows:

In [None]:
!datalad install projects/BigBrain

Importantly, this step does not download the data. Instead, it installs a set of links that could be downloaded at a later stage. Feel free to install any other dataset you might be interested in, this won't involve long transfer times!

This dataset contains the 40$\mu$m BigBrain blocks, in the Nifti and MINC formats:

In [None]:
!ls projects/BigBrain/3D_Blocks/40um/*

It also contains reconstructed 3D volumes at various resolutions and in various spaces:

In [None]:
!ls projects/BigBrain/3D_Volumes/*/*

#### Data download

The actual data can be downloaded on demand, using `datalad get`:

In [None]:
!datalad get projects/BigBrain/3D_Volumes/MNI-ICBM152_Space/nii/full8_400um_2009b_sym.nii.gz

The data is now available:

In [None]:
%matplotlib inline
import nibabel as nib
import nilearn.plotting as nilp
im1 = nib.load('projects/BigBrain/3D_Volumes/MNI-ICBM152_Space/nii/full8_400um_2009b_sym.nii.gz')
nilp.view_img(im1.slicer[100:200,100:200,100:200], bg_img=None, cmap='gray', resampling_interpolation='nearest')

<div id="processing"/>


### Processing BigBrain datasets with Boutiques

<a href="https://docs.google.com/presentation/d/1w9SC6IMxhTneR1Mac3-RoF8_ps84XDMIexiwpU_0ERo" target="_blank">
    <img src="./figures/Boutiques.svg/"></a>

#### How does Boutiques facilitate the processing of BigBrain ? 

Boutiques wraps around command-line tools to facilitate their porting to different environments.
For instance, locally you might be processing BigBrain using an FSL [Docker](https://www.docker.com/)
container. However, for security reasons, HPC environments use [Singularity](https://singularity.lbl.gov/)
for containerization. While their command-lines are similar, you'd would still have to alter your script to
enable use of both.

Boutiques abstracts the interfacing with container technologies entirely. All you need is Boutiques installed
(via `pip install boutiques`), a Boutiques tool, and an invocation file and you're ready to go!

#### What makes a tool a Boutiques tool?

A Boutiques tool is any command line tool that is described within a descriptor file following the Boutiques json schema.

The content of a Boutiques descriptor can be seen below:

In [None]:
!bosh pprint zenodo.4472771

#### Steps to process data with Boutiques

1. `bosh search` your desired tool
2. use `bosh example` as a guide for creating a valid invocation for the tool
3. Launch the tool with the command `bosh exec launch`

#### BoSh search

To facilitate search of the available tools published to Zenodo, a search functionality is built into the the **Bo**utiques **Sh**ell (bosh) command line interface.

Let's take a look at what are the top 10 most pulled descriptors:

In [None]:
!bosh search

If we have a tool in mind, we can specify the name of the tool within our query to return descriptors with a matching name

In [None]:
!bosh search fsl

The `--exact` flag can be used to return descriptor names with the exact spelling

In [None]:
!bosh search fsl --exact

#### BoSh example

In order to be able to run a Boutiques tool, an invocation JSON file is required.
Invocation files consist of the command parameters/inputs that need to be provided to run the tool.

The `bosh example` command provides an example combination tool parameters in the expected JSON format

Let's get an example invocation of fslstats (zenodo.4472771)

In [None]:
!bosh example zenodo.4472771

To get additional optional parameter, the `--complete` flag can be applied

In [None]:
!bosh example zenodo.4472771 --complete

Today we'll use fslstats to calculate the histogram of one of the bigbrain blocks. We will do that in a new DataLad dataset so that we can publish it easily later:

In [None]:
%cd /content
!datalad create histogram
%cd histogram

We're just missing the invocation file, so let's create it

In [None]:
!echo '{"input_file": "../conp-dataset/projects/BigBrain/3D_Volumes/MNI-ICBM152_Space/nii/full8_400um_2009b_sym.nii.gz", "h": 10}' > invocation.json

#### Launching tools with `bosh exec launch`

Once we know which Boutiques tool we'd like to use and have created a valid invocation, we are ready to launch our tool.
This can be achieved using the `bosh exec launch` command.

In [None]:
!bosh exec launch zenodo.4472771 invocation.json

The execution produced one file, a text file containing the histogram:

In [None]:
!mv ../conp-dataset/projects/BigBrain/3D_Volumes/MNI-ICBM152_Space/nii/full8_400um_2009b_sym.txt .
!head full8_400um_2009b_sym.txt

Let's finally plot the histogram:

In [None]:
from matplotlib import pyplot as plt
import numpy as np
hist_data = np.genfromtxt('full8_400um_2009b_sym.txt')
plt.bar(np.arange(len(hist_data[:-1])), hist_data[:-1])
plt.show()

#### Other useful features

- Boutiques provides a Python API to enable integration of Boutiques tools directly within Python code
- Integrated into [CBRAIN](http://cbrain.ca/) to enable execution of tools
- Interfaces with existing neuroimaging pipeline engines such as [Pydra](https://github.com/nipype/pydra) and [TIGR_PURR](https://github.com/TIGRLab/TIGR_PURR)

<div id="adding"/>

### Adding derived data to DataLad

We will publish our newly-created histogram text file as a new DataLad dataset on the [Open Science Framework](https://osf.io/), a research data archive.

To publish our histogram text file as a new DataLad dataset, we first have to save it:

In [None]:
!datalad save

The save command adds a new commit to the dataset containing the newly created text file. We will then declare an OSF "sibling" to our dataset, where we will later push our data. Although our dataset will be publicly accessible on OSF, this step requires an authentication token linked with the OSF account in which the dataset will be deposited (ask the to your instructor) :

In [None]:
!OSF_TOKEN=<ask_your_instructor> datalad create-sibling-osf --title 'BigBrain histogram' \
  --mode exportonly \
  -s osf-export \
  --description "This carefully acquired data will bring science forward" \
  --public

Check the URL that was created for your dataset. Note that OSF datasets can also be configured to be private. 

We can finally push our dataset to OSF:

In [None]:
!OSF_TOKEN=<ask_your_instructor> git-annex export HEAD --to osf-export-storage

<div id="part2"/>

# Part II: CBRAIN, a web portal to process BigBrain datasets on HPC clusters

CBRAIN is a web portal where you will be able to access the BigBrain dataset and to launch tools on it. No more command-line required! In this tutorial we will launch `fslstats` on the BigBrain 40$\mu$m blocks.

<a href="https://docs.google.com/presentation/d/1YXFlPkxiGzUyHpbSzpS8X5RviVaosW0vDI0aZ3UlhIE/edit?usp=sharing"><img src="figures/cbrain.png"/>
    </a>

Link to the tutorial slides: [https://docs.google.com/presentation/d/1YXFlPkxiGzUyHpbSzpS8X5RviVaosW0vDI0aZ3UlhIE/edit?usp=sharing](https://docs.google.com/presentation/d/1YXFlPkxiGzUyHpbSzpS8X5RviVaosW0vDI0aZ3UlhIE/edit?usp=sharing)

CBRAIN portal: [https://portal.cbrain.mcgill.ca/session/new)](https://portal.cbrain.mcgill.ca/session/new))

## Accessing CBRAIN files with SFTP

We will use SFTP from CBRAIN servers to download result files.

Detailed documentation to interact with CBRAIN using sftp can be found [here](https://portal.cbrain.mcgill.ca/doc/manual/uploading.html).

Let's first create a directory to retrieve our files:

In [None]:
%cd /content
%mkdir cbrain_outputs
%cd cbrain_outputs

The following command initiates an interactive sftp session with CBRAIN. Once the connection is established, You will have to type:
- Your CBRAIN password, to authenticate
- `get block*`, to download BigBrain blocks
- `exit`, to end the session


In [None]:
!ptyrun sftp -o StrictHostKeyChecking=no -P 7500 YOUR_LOGIN@ace-cbrain-1.cbrain.mcgill.ca

The files are now available on the local computer:

In [None]:
%ls

# Conclusion

We hope you enjoyed the session! Feel free to reach out to us for any additional information:
    
[Natacha Beck](mailto:natacha.beck@mcgill.ca), [Tristan Glatard](mailto:tristan.glatard@concordia.ca), [Valérie Hayot-Sasson](mailto:valeriehayot@gmail.com)
