# TVB PIPELINE: constructing **Brain Network Models** from empirical MRI data
 
## The pipeline will generate structural connectomes, region-average fMRI time series and functional connectomes for simulation within The Virtual Brain

### This pipeline is based on the Apps
* thevirtualbrain/tvb-pipeline-sc:1.0 (dwMRI preprocessing, tractography)
* thevirtualbrain/tvb-pipeline-fmriprep:1.0 (fMRI preprocessing)
* thevirtualbrain/tvb_converter (structural connectome, region-wise fMRI, functional connectivity, TVB input data set)

### In this tutorial you will learn how to ...
* ...upload a BIDS data set to EBrains Collab 2.0
* ...import it into a Jupyter notebook
* ...upload it to a supercomputer via PyUnicore
* ...create and execute batch job scripts for the supercomputer that execute the pipeline
* ...download the results to the notebook
* ...from there to the Storage and finally to your computer.

### Authors / Feedback 
michael.schirner@charite.de  
petra.ritter@charite.de  

### Acknowledgments
Thank you to Paul Triebkorn for co-developing the original version of this script on Collab 1

## 1. Create EBRAINS Collab and upload BIDS data set

1. Input data **must** be in BIDS format to run the following operations. Check out the BIDS format [here](https://bids.neuroimaging.io/) for more info. Hint: there are programs that transform data from other formats into BIDS.
2. Navigate to the [Collabs](https://wiki.ebrains.eu/bin/view/Collabs/) page
3. Click on the "Create a Collab" button and fill out the form.
4. Navigate to the Drive in the leftmost menu. You might need to wait for a few seconds and refresh the page before it is visible.
5. Create one folder for your notebooks and one for your data. In the following we will assume that you called the folders "notebooks" and "data".
6. Download this Ipython notebook to your local file system and upload it into your newly created folder "notebooks" in your Collab.
7. Upload your BIDS data set as a zip file into "data".
8. Make sure that the two folders were successfully created and that the pipeline notebook and your input data were successfully uploaded into these folders.

In this example we uploaded the file
```
dataset.zip
```

into the folder
```
Collabs / TVB PIPELINE / data

```

Depending on whether you used a private or public repository the file will end up in either of the following folders in the filesystem of the EBRAINS Jupyter Hub at https://lab.ebrains.eu/
 
public_drive = 'drive/Shared with all'   

private_drive = 'drive/Shared with groups'



## 2. Access Collab drive from IPython notebook

To read a file from a  Collab's drive, build the path like in the example below. Note that we specify two paths:
* one to the BIDS MRI input data set and
* one to the FreeSurfer license file, which we need to run fmriprep. If you do not have already a license file, you can obtain it following the instructions here: https://surfer.nmr.mgh.harvard.edu/fswiki/License

To upload the data into your drive go to the main Collab site and click on "Drive" at the top of the left sidebar menu. Note that you must be logged in and you must be the creator/owner of the Collab to be able to do that. Once you are in your Collab's drive, use the buttons to create the folders 'data' and 'FreeSurfer_license' (or name them however you want, just make sure the names are correct when we build the path below) and upload the zipped BIDS data set into the former and the license.txt from FreeSurfer into the latter.

In [1]:
# get path of home folder
import os
home = os.getenv('HOME')

# paths for public and private drives
public_drive = 'drive/Shared with all'
private_drive = 'drive/Shared with groups'

which_drive = private_drive

# Collab name
collab = 'TVB PIPELINE'

# data folder name
path = 'data'

# filename
dataset = 'dataset.zip'

# full path
full_path = os.path.join(home, which_drive, collab, path, dataset)
print(full_path)

# Freesurfer license file folder
fs_path = 'FreeSurfer_license'
full_fs_path = os.path.join(home, which_drive, collab, fs_path, 'license.txt')
print(full_fs_path)

/opt/app-root/src/drive/Shared with groups/TVB PIPELINE/data/dataset.zip
/opt/app-root/src/drive/Shared with groups/TVB PIPELINE/FreeSurfer_license/license.txt


Let's check whether the file is really there

In [2]:
print(os.path.exists(full_path))
print(os.path.exists(full_fs_path))

True
True


If the above command returns false something went wrong. Note: you can use Bash commands like `ls` and `cd` to navigate in the file system and look for the file.

Now it's time to upload our brain model to the supercomputer. Therefore, we create a PyUnicore client.

## 3. Upload BIDS data set to supercomputer
First, we update PyUnicore, if necessary. Then, we import it. Finally, we connect with Piz Daint. To see which other supercomputers are there, and to learn their ID run the commented 
```
r.site_urls
```

To select a different supercomputer replace the supercomputer identifier string in
```
site_client = r.site('DAINT-CSCS')
```
with your preferred supercomputer.

In [3]:
# use the pyunicore library
!pip install pyunicore --upgrade
import pyunicore.client as unicore_client

tr = unicore_client.Transport(clb_oauth.get_token())
r = unicore_client.Registry(tr, unicore_client._HBP_REGISTRY_URL)
site_client = r.site('DAINT-CSCS')

Requirement already up-to-date: pyunicore in /opt/app-root/lib/python3.6/site-packages (0.5.9)
You should consider upgrading via the '/opt/app-root/bin/python3 -m pip install --upgrade pip' command.[0m


Next, we start an "empty" interactive job to get a workspace on Piz Daint

In [4]:
job_description = {}
job = site_client.new_job(job_description)
storage = job.working_dir
#storage.properties

First, let's check the contents of the folder

In [5]:
storage.listdir()

{}

Good, it's empty. If it weren't empty we can remove files or folders with `storage.rm(filename)` or `storage.rmdir(foldername)`. Run `help(storage)` for more information.

Before uploading the BIDS ZIP file, let's look in which folder we landed on the supercomputer.

In [6]:
working_dir = (storage.properties['mountPoint']).encode('ascii')
working_dir = working_dir.decode('utf-8') # we get a "byte"-type but need a string type
working_dir

'/scratch/snx3000/unicore/FILESPACE/8728b479-3392-4a2f-a984-e64ba09ca627/'

Good. We will need the path to the working directory later.

Now, let's copy the ZIP file to the supercomputer and check whether it arrived.

In [7]:
storage.upload(input_name = full_path, destination = dataset)
storage.listdir()

{'dataset.zip': PathFile: dataset.zip}

`dataset.zip` is there -- the upload was successful.
Now we need to extract the ZIP file. We will use the program `unzip` for this task.

Now we use this client to execute the unzip command and take a quick look into the folder whether `unzip` started working.

In [8]:
exec_result = site_client.execute("unzip " + working_dir + dataset)

The execute command launches unzip and forwards the result into a new folder. Let's find out the path of the folder...

In [9]:
base_folder = exec_result.working_dir.properties['mountPoint'].encode('ascii')
wd_handle = exec_result.working_dir
base_folder = base_folder.decode('utf-8')
base_folder

'/scratch/snx3000/unicore/FILESPACE/2dc283ff-6984-44f3-95b8-2ab7bd7bb44f/'

...and its contents.

In [10]:
exec_result.working_dir.listdir()

{'UNICORE_SCRIPT_EXIT_CODE': PathFile: UNICORE_SCRIPT_EXIT_CODE,
 'UNICORE_SCRIPT_PID': PathFile: UNICORE_SCRIPT_PID,
 'stderr': PathFile: stderr,
 'dataset/': PathDir: dataset/,
 'stdout': PathFile: stdout,
 '__MACOSX/': PathDir: __MACOSX/}

`Unzip` worked, the folder `dataset` has been created (along with some UNICORE related output files and everything written to stdout/stderr during the command).  

## 4. Define environment variables

We have a working directory on the supercomputer and our data is there. Let's define some environment variables with important folder paths that we can use to call the BIDS Apps.

In [11]:
tvb_output = "tvb_converter_workdir"# output folder name

input_dir = base_folder + "dataset" # the name of the folder extracted from the ZIP file
output_dir = base_folder + tvb_output # full path to output folder
mrtrix_output = output_dir + "/mrtrix_output"

fmriprep_output = output_dir + "/fmriprep_output"
fmriprep_workdir = fmriprep_output + "/tmp"
tvb_output = output_dir + "/TVB_output"
tvb_workdir = tvb_output + "/tmp"

participant_label = "CON03" # participant_label of BIDS dataset
parcellation = "desikan" # parcellation atlas for SC and FC -- check out MRtrix3_connectome github page for available options
n_cpus = "36" # how many CPUs does your HPC node have? We'll set the number of parallel threads accordingly

task_name="ArchiSocial" # name of task fmri, as specified in the dataset

## 5. Run BIDS App `mrtrix3_connectome` to generate a structural connectome

Here we create a SLURM batch job script for launching the `mtrix3_connectome` BIDS App. Instead of directly starting a job on the batch system, we use PyUnicore to submit a job on the login node, which in turn submits a job for the batch system. This gives us a greater flexibility to configure our job, we don't have to learn so much PyUnicore (although it's great!) and are failsafe if PyUnicore misses bindings for certain job managers. Note that before we run the container, we make sure that the image is up to date, or, if non-existent, gets pulled for the first time.

Below is a brief outline of the `sarus run` command. For a great in-depth tutorial check out the help pages of the Swiss CSCS supercomputing site: https://user.cscs.ch/tools/containers/

sarus run <container_name>
is the standard way of running a container. Here, we additionally use the mount command to directly mount the input and the output folders into the container's filesystem's top-level directories /input and /output.

For in-depth instructions for the BIDS App `mrtrix3_connectome` please refer to its help page at Docker hub:  
https://hub.docker.com/r/bids/mrtrix3_connectome/


For an in-depth discussion of Sarus usage, check out this documentation:  
https://user.cscs.ch/tools/containers/sarus/

# Please note: There seems to be a strange bug in Sarus: sometimes the pull command   

# `srun -C mc sarus pull thevirtualbrain/tvb-pipeline-sc:1.0`   

# works and sometimes not. It's often easier to just log into your supercomputing account via SSH and run the pull command in the shell. Once the image with the correct tag was pulled it doesn't need to be re-pulled. Sometimes the pull works if the command was issued multiple times.

In [12]:
# ADJUSTABLE PARAMETERS
################################################
wall_time = "23:59:00" # ADJUST wall time of job
job_script = "job_script"
################################################

job_script_path = os.path.join(home, which_drive, collab, path, job_script)
print(job_script_path)

# create job_script with bash commands
# this script will get forwarded to the supercomputer and there run the pipeline
with open(job_script_path, "w") as f:
    f.write("#!/bin/bash -l\n") 
    f.write("#SBATCH --time=" + wall_time + "\n")
    f.write("#SBATCH --output=slurm-" + job_script + ".out\n")
    f.write("#SBATCH --nodes=1\n")
    f.write("#SBATCH --ntasks-per-core=1\n")    
    f.write("#SBATCH --ntasks-per-node=1\n")
    f.write("#SBATCH --cpus-per-task=" + n_cpus + "\n")
    f.write("#SBATCH --partition=normal\n")
    f.write("#SBATCH --constraint=mc\n")
    f.write("#SBATCH --hint=nomultithread\n") # disable hyperthreading such that all cores become available for multithreading
    f.write("export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n")
    f.write("module load /apps/daint/UES/easybuild/modulefiles/daint-mc\n")
    f.write("module load /apps/daint/system/modulefiles/sarus/1.1.0\n\n")
    f.write("mkdir -p " + output_dir + " " + mrtrix_output + " " + fmriprep_output + " " + fmriprep_workdir + " " + tvb_output + " " + tvb_workdir + "\n\n")    
    f.write("srun sarus pull thevirtualbrain/tvb-pipeline-sc:1.0\n\n")
    
    f.write("srun sarus run " + 
            "--mount=type=bind,source=$HOME,target=$HOME " + 
            "--mount=type=bind,source=" + input_dir + ",target=/BIDS_dataset " +
            "--mount=type=bind,source=" + mrtrix_output + ",target=/mrtrix3_out " +
            "--entrypoint " + # necessary for sarus, to overwrite default entrypoint
            "thevirtualbrain/tvb-pipeline-sc:1.0 python -c " + 
                "\"from mrtrix3 import app; app.cleanup=False; import sys; sys.argv=" + 
                    "'/mrtrix3_connectome.py /BIDS_dataset /mrtrix3_out participant1 " + 
                    "--participant_label " + participant_label + " -skip "
                    " --output_verbosity 2 --template_reg ants --n_cpus $SLURM_CPUS_PER_TASK --debug'" 
                    ".split(); execfile('/mrtrix3_connectome.py')\"")
    
    
#        f.write("srun sarus run " + 
#            "--mount=type=bind,source=$HOME,target=$HOME " + 
#            "--mount=type=bind,source=" + input_dir + ",target=/BIDS_dataset " +
#            "--mount=type=bind,source=" + mrtrix_output + ",target=/mrtrix3_out " +
#            "--entrypoint " + # necessary for sarus, to overwrite default entrypoint
#            "thevirtualbrain/tvb-pipeline-sc:1.0 python -c " + 
#                "\"from mrtrix3 import app; app.cleanup=False; import sys; sys.argv=" + 
#                    "'/mrtrix3_connectome.py /BIDS_dataset /mrtrix3_out participant1 " + 
#                    "--participant_label " + participant_label + " --parcellation " + parcellation +
#                    " --output_verbosity 2 --template_reg ants --n_cpus $SLURM_CPUS_PER_TASK --debug'" 
#                    ".split(); execfile('/mrtrix3_connectome.py')\"")

/opt/app-root/src/drive/Shared with groups/TVB PIPELINE/data/job_script


Check the job script

In [13]:
with open(job_script_path, 'r') as fin:
    print(fin.read())

#!/bin/bash -l
#SBATCH --time=23:59:00
#SBATCH --output=slurm-job_script.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=36
#SBATCH --partition=normal
#SBATCH --constraint=mc
#SBATCH --hint=nomultithread
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module load /apps/daint/UES/easybuild/modulefiles/daint-mc
module load /apps/daint/system/modulefiles/sarus/1.1.0

mkdir -p /scratch/snx3000/unicore/FILESPACE/2dc283ff-6984-44f3-95b8-2ab7bd7bb44f/tvb_converter_workdir /scratch/snx3000/unicore/FILESPACE/2dc283ff-6984-44f3-95b8-2ab7bd7bb44f/tvb_converter_workdir/mrtrix_output /scratch/snx3000/unicore/FILESPACE/2dc283ff-6984-44f3-95b8-2ab7bd7bb44f/tvb_converter_workdir/fmriprep_output /scratch/snx3000/unicore/FILESPACE/2dc283ff-6984-44f3-95b8-2ab7bd7bb44f/tvb_converter_workdir/fmriprep_output/tmp /scratch/snx3000/unicore/FILESPACE/2dc283ff-6984-44f3-95b8-2ab7bd7bb44f/tvb_converter_workdir/TVB_output /scratch/snx3000/unicore/FILESPACE/2dc283ff

Looking good!

## 6. Upload SLURM script to supercomputer

Here we use the same upload function as used previously for the brain model data followed by a quick check whether the file arrived.

In [14]:
wd_handle.upload(input_name=job_script_path, destination = job_script)
wd_handle.listdir()

{'UNICORE_SCRIPT_EXIT_CODE': PathFile: UNICORE_SCRIPT_EXIT_CODE,
 'UNICORE_SCRIPT_PID': PathFile: UNICORE_SCRIPT_PID,
 'stderr': PathFile: stderr,
 'job_script': PathFile: job_script,
 'dataset/': PathDir: dataset/,
 'stdout': PathFile: stdout,
 '__MACOSX/': PathDir: __MACOSX/}

If a file with the name stored in the variable `job_script` exists, the upload worked.

## 7. Launching the simulation on the supercomputer

Now we will run the first out of the three BIDS Apps: mrtrix3_connectome.  
We do this by executing the the SLURM command `sbatch <job_script>`, which will evaluate our batch file and generate a job out of it that is added to the queue.
After we executed the job, we extract the working directory of this job (which will again be a new directory).

In [15]:
mrtrix_job = site_client.execute("sbatch " + base_folder + job_script)
wd_mrtrix_res = mrtrix_job.working_dir.properties['mountPoint'].encode('ascii')
wd_mrtrix_res = wd_mrtrix_res.decode('utf-8')
wd_mrtrix_res

'/scratch/snx3000/unicore/FILESPACE/0088f7c3-4077-47ca-92af-ee80eaeca747/'

Note that we have two important folder handles and associated folder paths now. The following table gives an overview over handles, associated paths, and their contents.


Handle | Path variable | Description
:---: | :---: | :---:
`wd_handle` | `base_folder` | BIDS input, pipeline output
`mrtrix_job` | `wd_mrtrix_res` | SLURM job meta output


You can use the handles and path variables to access files and folders contained in the respective folders, which may be helpful, e.g. for debugging or to download other results.

## 8. Checking the structural connectome

The MRtrix pipeline takes some time, depending on parameters like: number of generated tracks, parcellation, resolution of imaging data, number of parallel threads, etc.

With our test dataset and configuration it took around 5 hours.  

Let's check whether it finished successfully by inspeting the last lines of the SLURM meta output, which gives us information related to the job execution.

Note that after the job was submitted to the queue it usually takes a while until the job starts. Until then you will receive a "404 Client Error: Not found for URL" error message.

In [17]:
slurm_output = mrtrix_job.working_dir.stat("slurm-" + job_script + ".out").raw().readlines()
slurm_output[-40:]

[b'> extracting     : "/scratch/snx3000/bp000225/.sarus/cache/sha256:1898ce6247527866b535f3e91cd5e964dcf012e4dfb9569a9688a52bf543b7f1.tar"\n',
 b'> extracting     : "/scratch/snx3000/bp000225/.sarus/cache/sha256:e88b3011503fa6b7d195696c2428a2456bb03c4afb560a986e7d05b2e7c820ab.tar"\n',
 b'> extracting     : "/scratch/snx3000/bp000225/.sarus/cache/sha256:96c7fcaee4deef3f52e874f4b434488d7c0553d49a807ea5822bb3f84b7e613c.tar"\n',
 b'> extracting     : "/scratch/snx3000/bp000225/.sarus/cache/sha256:60644d31c59d5a22624716dbe68ccf89dc38b52312a153d67ff07fa0988b6159.tar"\n',
 b'> extracting     : "/scratch/snx3000/bp000225/.sarus/cache/sha256:9d3ce4e5876875d880dcb19295e404876a05723fbf39882c8676399a6176e922.tar"\n',
 b'> extracting     : "/scratch/snx3000/bp000225/.sarus/cache/sha256:231cad235466b6d86dbe6946b10afcbca7504e2d93e1fda2002fc478916f7e16.tar"\n',
 b'> extracting     : "/scratch/snx3000/bp000225/.sarus/cache/sha256:b2a3cf8961c46db36e486825f2ed2a15a685d18c2ef44b21ed3126669019e45f.tar"\n',

If the job finished successfully, the last few lines of the output should contain a batch job summary result, looking like this:

'mrtrix3_connectome.py: Contents of temporary directory kept; location: /mrtrix3_out/mrtrix3_connectome.py-tmp-2BNQRE/\n',  
 ' \n',  
 'Batch Job Summary Report for Job "job_script" (20156225) on daint\n',  
 '-----------------------------------------------------------------------------------------------------\n',  
 '             Submit            Eligible               Start                 End    Elapsed  Timelimit \n',  
 '------------------- ------------------- ------------------- ------------------- ---------- ---------- \n',  
 '2020-02-11T09:13:50 2020-02-11T09:13:50 2020-02-11T09:13:50 2020-02-11T13:45:58   04:32:08   23:59:00 \n',  
 '-----------------------------------------------------------------------------------------------------\n',  
 'Username    Account     Partition   NNodes   Energy\n',  
 '----------  ----------  ----------  ------  --------------\n',  
 'bp000225    ich012      normal           1        0 joules\n',  
 ' \n',  
 'This job did not utilize any GPUs\n',  
 ' \n',  
 '----------------------------------------------------------\n',  
 'Scratch File System        Files       Quota\n',  
 '--------------------  ----------  ----------\n',  
 '/scratch/snx3000            1207     1000000\n',  
 ' \n']  
 
 



After we made sure that the job finished we now fetch a handle to the result folder where the final connectome matrix is stored. The file path format, relative to the base directory of wd_handle of the job working directory, is

tvb_converter_workdir/mrtrix_output/sub-<participant_label>/connectome

In [20]:
connectome_folder = "tvb_converter_workdir/mrtrix_output/sub-" + participant_label + "/connectome"
SC_folder = wd_handle.stat(connectome_folder)

## 9. Re-use FreeSurfer's `recon-all` output

If you used any of the parcellations {"desikan", "destrieux", "hcpmmp1"} you can re-use recon-all output during the `fmriprep` step. Otherwise, `fmriprep` will run `recon-all` again, which increases computation time.  

Using the specified demo filenames, FreeSurfer output is stored in the folder 

```
<base_folder>/tvb_converter_workdir/mrtrix_output/mrtrix3_connectome.py-tmp-<ID string>/freesurfer
```

Note how the MRtrix App generates a folder 

```
mrtrix3_connectome.py-tmp-<ID string>
```

This folder is created at runtime by the image, so we don't know its name until it's there. So, let's first find out the folder name.

In [21]:
mrtrix_output_content = wd_handle.contents(path="/tvb_converter_workdir/mrtrix_output") # get contents of folder
mrtrix_output_dir = [i for i in mrtrix_output_content['content'] if "mrtrix3_connectome.py-tmp-" in i][0].encode('ascii') # get mrtrix3_connectome.py-tmp-<ID string> folder name
mrtrix_output_dir = mrtrix_output_dir.decode('utf-8')
mrtrix_output_dir

'/tvb_converter_workdir/mrtrix_output/mrtrix3_connectome.py-tmp-ASX3DR/'

Now we use the MRtrix results folder name to build the FreeSurfer folder path.

In [22]:
freesurfer_path = mrtrix_output_dir + "freesurfer"
freesurfer_path

'/tvb_converter_workdir/mrtrix_output/mrtrix3_connectome.py-tmp-ASX3DR/freesurfer'

Good. Now we build the destination path where we copy the freesurfer folder. We copy it to the `fmriprep_output` folder, in a subfolder called  

`sub-<participant_label>`.  


Due to some technical restrictions of PyUnicore (or my inability to grasp a better solution) we first copy the FreeSurfer folder (in the mrtrix3 App the FreeSurfer subject name is just "freesurfer") and then rename it to have the same name as our subject.

First, create the target folder and then copy the `freesurfer` folder into the `fmriprep` output folder...

In [23]:
freesurfer_target = "/tvb_converter_workdir/fmriprep_output/freesurfer"
wd_handle.mkdir(freesurfer_target)
wd_handle.copy(source=freesurfer_path, target=freesurfer_target)

<Response [204]>

In [24]:
wd_handle.copy(source=freesurfer_path, target=freesurfer_target)

<Response [204]>

Second, rename the freesurfer folder from subject name `freesurfer` to our current subject name.

In [25]:
tmp_source = freesurfer_target + "/freesurfer"
freesurfer_target = freesurfer_target + "/sub-" + participant_label
wd_handle.rename(source=tmp_source, target=freesurfer_target)

<Response [204]>

In [26]:
wd_handle.path_urls

{'action:rename': 'https://brissago.cscs.ch:8080/DAINT-CSCS/rest/core/storages/d385c828-d1fd-4d5f-80e0-4ac310e03802-uspace/actions/rename',
 'self': 'https://brissago.cscs.ch:8080/DAINT-CSCS/rest/core/storages/d385c828-d1fd-4d5f-80e0-4ac310e03802-uspace',
 'files': 'https://brissago.cscs.ch:8080/DAINT-CSCS/rest/core/storages/d385c828-d1fd-4d5f-80e0-4ac310e03802-uspace/files',
 'action:copy': 'https://brissago.cscs.ch:8080/DAINT-CSCS/rest/core/storages/d385c828-d1fd-4d5f-80e0-4ac310e03802-uspace/actions/copy'}

`<Response [200]>` of the above operations seems to indicate success! But, to be sure, let's check the folder content.

In [27]:
checkdir = wd_handle.contents(path=freesurfer_target)
checkdir['content']

{'/tvb_converter_workdir/fmriprep_output/freesurfer/sub-01/tmp/': {'owner': 'bp000225',
  'size': 4096,
  'permissions': 'rwx------',
  'lastAccessed': '2020-05-28T17:00:04+0200',
  'isDirectory': True,
  'group': 'bp0'},
 '/tvb_converter_workdir/fmriprep_output/freesurfer/sub-01/trash/': {'owner': 'bp000225',
  'size': 4096,
  'permissions': 'rwx------',
  'lastAccessed': '2020-05-28T17:00:08+0200',
  'isDirectory': True,
  'group': 'bp0'},
 '/tvb_converter_workdir/fmriprep_output/freesurfer/sub-01/scripts/': {'owner': 'bp000225',
  'size': 4096,
  'permissions': 'rwx------',
  'lastAccessed': '2020-05-28T16:59:58+0200',
  'isDirectory': True,
  'group': 'bp0'},
 '/tvb_converter_workdir/fmriprep_output/freesurfer/sub-01/stats/': {'owner': 'bp000225',
  'size': 4096,
  'permissions': 'rwx------',
  'lastAccessed': '2020-05-28T17:00:08+0200',
  'isDirectory': True,
  'group': 'bp0'},
 '/tvb_converter_workdir/fmriprep_output/freesurfer/sub-01/label/': {'owner': 'bp000225',
  'size': 4096

This looks like the typical `recon-all` output folder schema. Nice!  

## 10. Upload FreeSurfer license file

What we need to do now, in order to finally preprocess the fMRI data, is to upload a FreeSurfer license file. Many will already obtained a FreeSurfer license while downloading/installing FreeSurfer. It is usually located in the FreeSurfer main folder and called  For more information on how to obtain a FreeSurfer license file, see:  

https://surfer.nmr.mgh.harvard.edu/fswiki/License

Note: the mrtrix3_connectome container already contains a FreeSurfer license, but it is a person-specific license from the container developer, so to be compliant with Usage Terms, please generate and upload your own license file as outlines in the next step.

So you generated your license file. Let's upload it. To upload, we perform the same operations as we did to upload the MRI data.

In [28]:
# upload into base_folder
wd_handle.upload(input_name = full_fs_path, destination = 'license.txt')
wd_handle.listdir()
# create file path
fs_license = base_folder + "license.txt"

## 11. Run BIDS App `fmriprep` for fMRI preprocessing

Ok, everything is in place, let's create a batch file for `fmriprep`.

In [29]:
# ADJUSTABLE PARAMETERS
################################################
job_script_fmriprep = "job_script_fmriprep" # name of the job script file
wall_time = "23:59:00"
aroma_melodic_dimensionality = "-120"
################################################

with open(job_script_fmriprep, "w") as f:
    f.write("#!/bin/bash -l\n")  
    f.write("#SBATCH --time=" + wall_time + "\n")
    f.write("#SBATCH --output=slurm-" + job_script_fmriprep + ".out\n")
    f.write("#SBATCH --nodes=1\n")
    f.write("#SBATCH --ntasks-per-core=1\n")    
    f.write("#SBATCH --ntasks-per-node=1\n")
    f.write("#SBATCH --cpus-per-task=" + n_cpus + "\n")
    f.write("#SBATCH --partition=normal\n")
    f.write("#SBATCH --constraint=mc\n")
    f.write("#SBATCH --hint=nomultithread\n") # disable hyperthreading such that all cores become available for multithreading
    f.write("export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n")
    f.write("module load /apps/daint/UES/easybuild/modulefiles/daint-mc\n")
    f.write("module load /apps/daint/system/modulefiles/sarus/1.1.0\n\n")
    f.write("srun sarus pull thevirtualbrain/tvb-pipeline-fmriprep\n")
    f.write("srun sarus run " + 
            "--mount=type=bind,source=$HOME,destination=$HOME " +
            "--mount=type=bind,source=" + input_dir + ",destination=/dataset " +
            "--mount=type=bind,source=" + fmriprep_output + ",destination=/fmriprep_out/ " +
            "--mount=type=bind,source=" + fmriprep_workdir + ",destination=/fmriprep_workdir/ " +
            "--mount=type=bind,source=" + fs_license + ",destination=/license.txt " +
            "thevirtualbrain/tvb-pipeline-fmriprep " +
            "/dataset /fmriprep_out/ participant " +
            "--use-aroma --bold2t1w-dof 6 --nthreads $SLURM_CPUS_PER_TASK " +
            "--omp-nthreads $SLURM_CPUS_PER_TASK " +
            "--output-spaces T1w MNI152NLin6Asym:res-2 fsaverage5 " +
            "--participant_label " + participant_label + " " +
            "--fs-license-file /license.txt " +
            "--aroma-melodic-dimensionality " + aroma_melodic_dimensionality + " " +
            "-w /fmriprep_workdir\n")

Check whether it looks good

In [30]:
!cat job_script_fmriprep

#!/bin/bash -l
#SBATCH --time=23:59:00
#SBATCH --output=slurm-job_script_fmriprep.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=36
#SBATCH --partition=normal
#SBATCH --constraint=mc
#SBATCH --hint=nomultithread
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module load /apps/daint/UES/easybuild/modulefiles/daint-mc
module load /apps/daint/system/modulefiles/sarus/1.1.0

srun sarus pull poldracklab/fmriprep
srun sarus run --mount=type=bind,source=$HOME,destination=$HOME --mount=type=bind,source=/scratch/snx3000/unicore/FILESPACE/d385c828-d1fd-4d5f-80e0-4ac310e03802/dataset,destination=/dataset --mount=type=bind,source=/scratch/snx3000/unicore/FILESPACE/d385c828-d1fd-4d5f-80e0-4ac310e03802/tvb_converter_workdir/fmriprep_output,destination=/fmriprep_out/ --mount=type=bind,source=/scratch/snx3000/unicore/FILESPACE/d385c828-d1fd-4d5f-80e0-4ac310e03802/tvb_converter_workdir/fmriprep_output/tmp,destination=/fmriprep_workdir/ --mount=type=bi

Upload it

In [31]:
wd_handle.upload(input_name=job_script_fmriprep, destination = job_script_fmriprep)
wd_handle.listdir()

{'UNICORE_SCRIPT_EXIT_CODE': PathFile: UNICORE_SCRIPT_EXIT_CODE,
 'UNICORE_SCRIPT_PID': PathFile: UNICORE_SCRIPT_PID,
 'stderr': PathFile: stderr,
 'license.txt': PathFile: license.txt,
 'job_script': PathFile: job_script,
 'dataset/': PathDir: dataset/,
 'job_script_fmriprep': PathFile: job_script_fmriprep,
 'stdout': PathFile: stdout,
 '__MACOSX/': PathDir: __MACOSX/,
 'tvb_converter_workdir/': PathDir: tvb_converter_workdir/}

The file is there, upload worked. Time to submit the batch job to the queue.

In [32]:
fmriprep_job = site_client.execute("sbatch " + base_folder + job_script_fmriprep)
wd_fmriprep = fmriprep_job.working_dir.properties['mountPoint'].encode('ascii')
wd_fmriprep

b'/scratch/snx3000/unicore/FILESPACE/cb8f84ef-b2c7-4a01-a2ad-8735a713ec99/'

Let me just update the table to give us a better overview over handles/folders.

Handle | Path variable | Description
:---: | :---: | :---:
`wd_handle` | `base_folder` | BIDS input, pipeline output
`mrtrix_job` | `wd_mrtrix_res` | SLURM job meta output MRtrix3
`fmriprep_job` | `wd_fmriprep` | SLURM job meta output fmriprep


Now let's check whether fmriprep is (still) running or finished.

# Please note: There seems to be a strange bug in Sarus: sometimes the command   

# `srun sarus pull thevirtualbrain/tvb-pipeline-fmriprep`   

# works and sometimes not. It's often easier to just log into your supercomputing account via SSH and run the pull command in the shell. Once the container with the "latest" tag was pulled it doesn't need to be re-pulled.

In [35]:
slurm_output = fmriprep_job.working_dir.stat("slurm-" + job_script_fmriprep + ".out").raw().readlines()
slurm_output[-40:]

[b'# image            : index.docker.io/poldracklab/fmriprep:latest\n',
 b'# cache directory  : "/scratch/snx3000/bp000225/.sarus/cache"\n',
 b'# temp directory   : "/tmp"\n',
 b'# images directory : "/scratch/snx3000/bp000225/.sarus/images"\n',
 b'> save image layers ...\n',
 b'> found in cache : sha256:b859591002b6d9f24b9df7427b56290084866a327b33c92d5a43aa13a6327b49\n',
 b'> found in cache : sha256:8de46f1b04d0fae16915dcc8896d1af6354601a8602be7435f6c5a3446b6830c\n',
 b'> found in cache : sha256:b21d5cc69c8fb02df67725712134ad6d7123d9cea585dc524c5ba03734435000\n',
 b'> found in cache : sha256:a0f74950c55bb6b072add5997c5b4693182d82bcbd58f000cb1bf107419ce0e4\n',
 b'> found in cache : sha256:a82cc7ba20ad040d1ae6b0e1e53bee7fc72d53707b992fb6f8f1e5a7aac35a2a\n',
 b'> found in cache : sha256:42da0942cc0e3de8b86ba649f506f3a0c87533451756c51d7b8bf181ba94a2eb\n',
 b'> found in cache : sha256:7abb3d638ec4853b85fef7192689023a3d394b78198bd8e7894e3857106687a7\n',
 b'> found in cache : sha256:19197c55

You know the drill: unless there is the job summary at the end, the job hasn't finished.

## 12. Select `recon-all` output

Either of the two Apps, `mrtrix3_connectome` and `fmriprep`, may have run `recon-all` (depending on which parcellation you have chosen). Here, we set the appropriate folder. In this example we used the parcellation `desikan` and `mrtrix3_connectome` generated `recon_all` output which we re-used later in `fmriprep`. So let's configure accordingly. Below are the two alternatives in different cells, choose only one.

In [52]:
# Option 1: recon_all run by fmriprep
recon_all_dir = base_folder + freesurfer_target + "/freesurfer/"
recon_all_subject_name = "sub-" + participant_label

In [53]:
# Option 2: recon_all run by MRtrix3
recon_all_dir = base_folder + freesurfer_path[:-10] # cut-off the last "freesurfer"
recon_all_subject_name = "freesurfer"

In [54]:
recon_all_dir

'/scratch/snx3000/unicore/FILESPACE/3b57e398-e641-4080-8766-0f94fa3fe913//tvb_converter_workdir/mrtrix_output/mrtrix3_connectome.py-tmp-R3O573/'

## 13. Run BIDS App `tvb_converter` to generate TVB input

What remains to be done is to create a batch file for the tvb_converter BIDS App, upload it and execute it.

In [55]:
# ADJUSTABLE PARAMETERS
################################################
wall_time = "01:00:00"
cpu_per_task = "36"
job_script_tvb_conv = "job_script_tvb_conv" # name of the job script file
################################################

# FIXED PARAMETERS
################################################
weights_path = "/mrtrix3_out/sub-" + participant_label + "/connectome/sub-" + participant_label + "_parc-" + parcellation + "_level-participant_connectome.csv"
tracts_path = "/mrtrix3_out/sub-" + participant_label + "/connectome/sub-" + participant_label + "_parc-"+ parcellation+ "_meanlength.csv"
################################################


with open(job_script_tvb_conv, "w") as f:
    f.write("#!/bin/bash -l\n")  
    f.write("#SBATCH --time=" + wall_time + "\n")
    f.write("#SBATCH --output=slurm-" + job_script_tvb_conv + ".out\n")
    f.write("#SBATCH --nodes=1\n")
    f.write("#SBATCH --ntasks-per-core=1\n")    
    f.write("#SBATCH --ntasks-per-node=1\n")
    f.write("#SBATCH --cpus-per-task=" + cpu_per_task + "\n")
    f.write("#SBATCH --partition=normal\n")
    f.write("#SBATCH --constraint=mc\n")
    f.write("#SBATCH --hint=nomultithread\n") # disable hyperthreading such that all cores become available for multithreading
    f.write("export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n")
    f.write("module load /apps/daint/UES/easybuild/modulefiles/daint-mc\n")
    f.write("module load /apps/daint/system/modulefiles/sarus/1.0.1\n")
    f.write("srun sarus pull thevirtualbrain/tvb-pipeline-converter\n")
    f.write("srun sarus run " +
            "--mount=type=bind,source=`dirname " + fs_license + "`,destination=/freesurfer_license_dir/ " + 
            "--mount=type=bind,source=" + input_dir + ",destination=/input_dir " +
            "--mount=type=bind,source=" + output_dir + ",destination=/output_dir " +
            "--mount=type=bind,source=" + mrtrix_output + ",destination=/mrtrix3_out " +
            "--mount=type=bind,source=" + fmriprep_output + ",destination=/fmriprep_out " +
            "--mount=type=bind,source=" + fmriprep_workdir + ",destination=/fmriprep_workdir " +
            "--mount=type=bind,source=" + tvb_output + ",destination=/tvb_out " +
            "--mount=type=bind,source=" + tvb_workdir + ",destination=/tvb_workdir " +
            "--mount=type=bind,source=" + recon_all_dir + ",destination=/recon_all_dir " +
            "--entrypoint thevirtualbrain/tvb-pipeline-converter " +
            "/bin/bash -c \"cp /freesurfer_license_dir/license.txt /opt/freesurfer/ && " +
                            "mkdir -p /recon_all_dir/" + recon_all_subject_name + "/bem && " +
                            "cd /recon_all_dir/" + recon_all_subject_name + "/bem && " +
                            "/tvb_converter_pipeline.sh /input_dir /output_dir /mrtrix3_out /fmriprep_out " +
                            "/fmriprep_workdir /tvb_out /tvb_workdir /recon_all_dir " + recon_all_subject_name + " " +
                            participant_label + " " + task_name + " " + parcellation + " " + weights_path + 
                            " " + tracts_path + " " + cpu_per_task + "\"\n")

Check it

In [56]:
!cat job_script_tvb_conv

#!/bin/bash -l
#SBATCH --time=01:00:00
#SBATCH --output=slurm-job_script_tvb_conv.out
#SBATCH --nodes=1
#SBATCH --ntasks-per-core=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=36
#SBATCH --partition=normal
#SBATCH --constraint=mc
#SBATCH --hint=nomultithread
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module load /apps/daint/UES/easybuild/modulefiles/daint-mc
module load /apps/daint/system/modulefiles/sarus/1.0.1
srun sarus pull thevirtualbrain/tvb_converter
srun sarus run --mount=type=bind,source=`dirname /scratch/snx3000/unicore/FILESPACE/3b57e398-e641-4080-8766-0f94fa3fe913/license.txt`,destination=/freesurfer_license_dir/ --mount=type=bind,source=/scratch/snx3000/unicore/FILESPACE/3b57e398-e641-4080-8766-0f94fa3fe913/BIDS_test,destination=/input_dir --mount=type=bind,source=/scratch/snx3000/unicore/FILESPACE/3b57e398-e641-4080-8766-0f94fa3fe913/tvb_converter_workdir,destination=/output_dir --mount=type=bind,source=/scratch/snx3000/unicore/FILESPACE/3b57e398-e641-4080-8766-0

Upload it

In [57]:
wd_handle.upload(input_name=job_script_tvb_conv, destination = job_script_tvb_conv)
wd_handle.listdir()

{u'BIDS_test/': PathDir: BIDS_test/,
 u'UNICORE_SCRIPT_EXIT_CODE': PathFile: UNICORE_SCRIPT_EXIT_CODE,
 u'UNICORE_SCRIPT_PID': PathFile: UNICORE_SCRIPT_PID,
 u'__MACOSX/': PathDir: __MACOSX/,
 u'job_script': PathFile: job_script,
 u'job_script_fmriprep': PathFile: job_script_fmriprep,
 u'job_script_tvb_conv': PathFile: job_script_tvb_conv,
 u'license.txt': PathFile: license.txt,
 u'stderr': PathFile: stderr,
 u'stdout': PathFile: stdout,
 u'tvb_converter_workdir/': PathDir: tvb_converter_workdir/}

The file is there, upload worked. Time to submit the batch job to the queue.

In [58]:
tvb_conv_job = site_client.execute("sbatch " + base_folder + job_script_tvb_conv)
wd_tvb_conv = tvb_conv_job.working_dir.properties['mountPoint'].encode('ascii')
wd_tvb_conv

'/scratch/snx3000/unicore/FILESPACE/251c77e3-07b3-4924-a1a7-69d32007583c/'

Let's update the table one last time

Handle | Path variable | Description
:---: | :---: | :---:
`wd_handle` | `base_folder` | BIDS input, pipeline output
`mrtrix_job` | `wd_mrtrix_res` | SLURM job meta output MRtrix3
`fmriprep_job` | `wd_fmriprep` | SLURM job meta output fmriprep
`tvb_conv_job` | `wd_tvb_conv` | SLURM job meta output tvb_converter


Let's check whether tvb_converter is (still) running or finished.

In [60]:
slurm_output = tvb_conv_job.working_dir.stat("slurm-" + job_script_tvb_conv + ".out").raw().readlines()
slurm_output[-40:]

['> completed      : sha256:0acb0ab97510317eceec3f64cd6d60a01502e15a9867c739700ae504d3c4fa6a\n',
 '> completed      : sha256:d2083d800e5813079efc401a41b5ef17ba3e188759f1dde597c705a3806d97a5\n',
 '> completed      : sha256:35c102085707f703de2d9eaad8752d6fe1b8f02b5d2149f1d8357c9cc7fb7d0a\n',
 '> completed      : sha256:cf2c808e01bd05576dae1e1b5d81e0c55909082419365c5829b1cc00b927e217\n',
 '> completed      : sha256:180be0bebdb64000b9a30b3fbf313fbfbac92dc751a6e5bff68f7fff41d39c31\n',
 '> completed      : sha256:24c54dd105b4378e24b228793f3bf8ac2cb5f354e41391f128317c1917191935\n',
 '> completed      : sha256:f72d004efc5ffbd672187f45c9e06c60935942fd45e4b09fba4b3bf3ef3f4c93\n',
 '> completed      : sha256:f6d009f607640ca352523ff93323f3133877dd4652bd84b53df506828e19c881\n',
 '> completed      : sha256:952472aec7fb3f407a404c870120d52f721e9fa15fe4a8db8ded620e4b1a6a2d\n',
 '> failed         : sha256:66665379e90bb2ea40c6632650c30f5977510b67de4c8b74aafebda433e8f194\n',
 '[1454066.7962521] [nid01143-

## 14. Download TVB ready data

 On the supercomputer, results are stored in the folder
```
<tvb_output>
```

If everything went smoothly, there will be nine files in this folder:  

```
sub-<participant_label>_task-rest_parc-desikan_ROI_timeseries.txt
sub-<participant_label>_EEGProjection.mat
sub-<participant_label>_region_mapping.txt
sub-<participant_label>_inner_skull_surface.zip
sub-<participant_label>_Cortex.zip
sub-<participant_label>_outer_skull_surface.zip
sub-<participant_label>_outer_skin_surface.zip
sub-<participant_label>_EEG_Locations.txt
sub-<participant_label>_Connectome.zip
```

We will go on to zip the folder and download the results to the Docker image filesystem in which the Jupyter notebook client is running. 



In [42]:
site_client.execute("zip -r " + tvb_output+".zip " + tvb_output)
output_hdl = wd_handle.stat("/tvb_converter_workdir/TVB_output.zip")
output_hdl.download("TVB_output.zip")

All that is left to do is to copy the files from the image's virtual filesystem into Collab Storage where you can download the files by clicking on them and then hitting the "Download" button.

In [44]:
collab_storage.upload_file("TVB_output.zip", COLLAB_PATH+"/TVB_output.zip")

'8ff55081-8e05-4c54-bc70-9ef1154efdd0'

Additionally the created derivatives for TVB are stored in BIDS format inside the dataset provided at the beginning.
We will now zip and download this one too. 

# 15. Download pipeline output in BIDS format

The pipeline results are also stored according to the BIDS specifications inside the /derivatives/TVB directory of the given BIDS folder. 
The data are stored following the specifications of the Common derivatives and Computational models BIDS extension proposal. <br><br>

Common derivatives <br>
https://docs.google.com/document/d/1Wwc4A6Mow4ZPPszDIWfCUCRNstn7d_zzaWPcfcHmgI4/edit <br>
<br>
Computational models <br>
https://docs.google.com/document/d/1oaBWmkrUqH28oQb1PTO-rG_kuwNX9KqAoE9i5iDh1xw/edit#heading=h.mqkmyp254xh6 <br>
<br>

What remains is to zip, download the BIDS folder from the HPC and to add a additional dataset_description.json to the /derivatives directory to comply with BIDS


In [51]:
# zip the BIDS folder on the HPC
site_client.execute("cd " + base_folder + " && zip -r " + dataset + " " 
                    + dataset.strip(".zip"))

# download the BIDS folder
output_hdl = wd_handle.stat("/"+dataset)
output_hdl.download(dataset)

# unzip 
import zipfile
with zipfile.ZipFile(dataset, 'r') as zip_ref:
    zip_ref.extractall("./")

In [54]:
# generate BIDS-compliant metadata file dataset_description.json

import json
from collections import OrderedDict


# Arguments
input_dataset_description_json = dataset.strip(".zip")+'/dataset_description.json'
output_dataset_description_json = dataset.strip(".zip")+'/derivatives/TVB/dataset_description.json'


# 1 read the dataset_description.json from the raw data set 
with open(input_dataset_description_json, "r") as input_json:
    data = json.load(input_json)


# 2 prepend "TVB pipeline derivative: " to the value of the key "Name" 
# to indicate that this is not the original raw data set, but
# a derivative generated by the TVB pipeline.
data['Name'] = "TVB pipeline derivative: " + data['Name'] 


## Uncomment below to generate the data dict from scratch
#data= OrderedDict()
#data['Name'] = ''
#data['BIDSVersion'] = '1.0.2'
#data['License'] = ''
#data['Authors'] = ['','','']
#data['HowToAcknowledge'] = ''
#data['Funding'] = ['','','']
#data['ReferencesAndLinks'] = ['','','']
#data['DatasetDOI'] = ''


# 3 Add fields for BIDS derivatives
data['PipelineDescription'] = {
    	"Name": "TVB", # REQUIRED: this field must be a substring of the folder name of the pipeline output
        "Version": "1.0", 
    	"CodeURL": "https://github.com/BrainModes",
    	"DockerHubContainerTag": "thevirtualbrain/tvb_pipeline",
    	"SingularityContainerURL": "",
    	"SingularityContainerVersion": "" 
    }

data['SourceDatasets'] = [
    	{
    		"URL": "",
    		"DOI": data['DatasetDOI'],
    		"Version": ""
    	}
#    	,{
#    		"URL": "",
#    		"DOI": "",
#    		"Version": ""
#    	}    
    ]


# 4 Save JSON
with open(output_dataset_description_json, 'w') as ff:
    json.dump(data, ff,sort_keys=False, indent=4)

In [56]:
ls -l dataset/

total 20
-rw-r--r-- 1 jovyan users  104 Jan 27 09:59 dataset_description.json
drwxr-xr-x 4 jovyan users 4096 Jan 27 09:59 [0m[01;34mderivatives[0m/
-rw-r--r-- 1 jovyan users   51 Jan 27 09:59 participants.tsv
drwxr-xr-x 6 jovyan users 4096 Jan 27 09:59 [01;34msub-QL20120814[0m/
-rw-r--r-- 1 jovyan users   57 Jan 27 09:59 task-rest_bold.json


In [57]:
import shutil
import os
# add README, CHANGES and participants.tsv to the derivatives folder
if os.path.exists(dataset.strip(".zip")+"/README"):
    shutil.copyfile(dataset.strip(".zip")+"/README", dataset.strip(".zip")+"/derivatives/TVB/README")
if os.path.exists(dataset.strip(".zip")+"/CHANGES"):
    shutil.copyfile(dataset.strip(".zip")+"/CHANGES", dataset.strip(".zip")+"/derivatives/TVB/CHANGES")
if os.path.exists(dataset.strip(".zip")+"/participants.tsv"):
    shutil.copyfile(dataset.strip(".zip")+"/participants.tsv", dataset.strip(".zip")+"/derivatives/TVB/participants.tsv")

In [58]:
# zip the BIDS folder
shutil.make_archive(dataset.strip(".zip"), 'zip', dataset.strip(".zip"))

'/home/jovyan/BIDS_test.zip'

In [59]:
# and download to the collab storage
collab_storage.upload_file(dataset, COLLAB_PATH+"/BIDS_TVB_derivatives.zip")

'521f92bb-1c6c-4dd9-907d-8a225ca914ae'

# Integration with HBP Knowledgegraph

To integrate the produced pipeline output into KnowledgeGraph, the user is required to provide values for MINDS metadata keys. The provided metadata is stored in the file minds_metadata.json in order to be read out by the curation team. <br>
<br>
Please read through the following lines of code and enter applicable information.

In [60]:
# generate MINDS-compliant metadata file minds_metadata.json

# Arguments
output_MINDS_JSON = 'minds_metadata.json'

# Pipeline users are requested to fill out the following fields
# in order to integrate data with KnowledgeGraph
# Enter the information between the ""
Dataset__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
Dataset__has_contributor = "" # enter corresponding Person block _ID 
Dataset_created_as = "" # choose: HBP-SGA1, HBP-SGA2, HBP-SGA3, external 
Dataset__has_custodian = "" # enter corresponding Person block _ID 
Dataset_description = "" # describe the content of this metadata block 
Dataset_DOI = "" # if this dataset already has a DOI, please enter it here (otherwise a DOI will be assigned to this dataset when it is released 
Dataset_embargo_status = "" # choose: embargoed, not-embargoed 
Dataset_intended_release_date = "" # if you chose "embargoed" please state here the intended release date (format: yyyy-mm-dd) 
Dataset_license = "" # enter the data license, choose (e.g.) from Creative Comments licence list 
Dataset__has_main_contact = "" # enter corresponding Person block _ID 
Dataset__has_main_file_bundle = "" # enter the corresponding FileBundle block _ID 
Dataset_title = "" # enter a meaningful title for this metadata block 
EthicsApproval__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
EthicsApproval_authority = "" # enter the ethics authority name of this ethics approval 
EthicsApproval_country = "" # enter the country of this ethics approval was issued 
EthicsApproval_ID = "" # enter the received identifier of this ethics approval 
EthicsApproval_SP12_approval = "" # choose: yes, no 
File__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
File_description = "" # describe the content of this metadata block 
File_format = "" # enter format of file of this File block 
File_title = "" # enter a meaningful title for this metadata block 
File_URL = "" # enter URL to file of this File block 
FileBundle__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
FileBundle_description = "" # describe the content of this metadata block 
FileBundle_format = "" # enter format of file occurring in this FileBundle block 
FileBundle_tag = "" # enter tag for grouping reason 
FileBundle_title = "" # enter a meaningful title for this metadata block 
FileBundle_URL = "" # enter URL to the file-folder or URLs to corresponding files belonging to this FileBundle block (multiple entries as multiple rows) 
FundingInformation__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
FundingInformation_grant_ID = "" # enter identification number of the grant of this FundingInformation block  
FundingInformation_name = "" # enter name of funding institution of this FundingInformation block  
MethodParadigm__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
MethodParadigm_abbreviation = "" # enter abbreviated name of instance described in this metadata block 
MethodParadigm_description = "" # describe the content of this metadata block 
MethodParadigm_experimental_type = "" # choose: in vivo, ex vivo, in utero, in vitro, in situ, in silico 
MethodParadigm_full_name = "" # enter full name of instance described in this metadata block 
MethodParadigm_type = "" # enter high level type to which the method/paradigm of this Method/Paradigm block belongs to (e.g., imaging, behavioral assay) 
ModelInstance__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
ModelInstance__is_modelling_sub_cellular_target = "" # enter corresponding StudyTarget block _ID 
ModelInstance_abstraction_level = "" # choose: protein structure, systems biology, spiking neurons, rate neurons, population modelling, cognitive modelling or enter other abstraction level 
ModelInstance_alias = "" # enter alternative name for this metadata block  
ModelInstance__is_modelling_brain_structure = "" # enter corresponding StudyTarget block _ID 
ModelInstance__has_contributor = "" # enter corresponding Person block _ID 
ModelInstance__has_custodian = "" # enter corresponding Person block _ID 
ModelInstance_description = "" # describe content of this metadata block 
ModelInstance__has_main_contact = "" # enter corresponding Person block _ID 
ModelInstance_model_format = "" # choose: NeuroML, PyNN, SONATA, NEURON-Python, NEURON-Hoc, NEST-SLI, NEST-PYTHON, Java, C++, C, Brian, NineML, MATLAB, NetPyNE 
ModelInstance_model_format_version_compatibility = "" # enter which model format version is compatible for the model of this ModelInstant block 
ModelInstance_model_scope = "" # choose: subcellular model (spine, ion channel, signalling, or molecular), single cell model, network model (microcircuit, brain region, or whole brain) or enter other model scope 
ModelInstance_species = "" # enter binomial name of species used for this ModelInstant (e.g., Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Macaca fascicularis) 
ModelInstance_title = "" # enter a meaningful title for this metadata block 
ModelInstance_version = "" # enter version number for model described in this ModelInstance 
Person__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
Person_email = "" # if no ORCID, please enter email of this Person 
Person_first_name = "" # if no ORCID, please enter first name of this Person 
Person_last_name = "" # if no ORCID, please enter last name of this Person 
Person_ORCID = "" # if available, enter ORCID of this Person 
PLAComponent__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
PLAComponent_associated_task_ID = "" # enter identifier of associated HBP task for corresponding PLA component 
PLAComponent_ID = "" # enter identifier of corresponding PLA component 
PLAComponent__has_owner = "" # enter corresponding Person block _ID 
PLAComponent_phase = "" # enter HBP project phase of corresponding PLA component (e.g., HBP-SGA1) 
Project__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
Project__has_coordinator = "" # enter corresponding Person block_ID 
Project_description = "" # describe the content of this metadata block 
Project_title = "" # enter a meaningful title for this metadata block 
PublicationResource__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
PublicationResource_ID = "" # enter identifier of corresponding publication/resource 
PublicationResource_ID_type = "" # choose: DOI, ISSN, ISBN, URL, or other identifier type 
StudyTarget__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
StudyTarget_abbreviation = "" # enter abbreviated name of instance described in this metadata block 
StudyTarget_full_name = "" # enter full name of instance described in this metadata block 
StudyTarget_source_of_name = "" # if full name is chosen from a terminology or ontology, enter name of corresponding terminology or ontology list 
StudyTarget_type = "" # choose: disease, disease model, tissue type, brain structure, cell type, subcellular structure, macromolecular complex, biological process 
Subject__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
Subject_age = "" # enter age of subject described in this Subject block 
Subject_age_category = "" # choose: embryo, neonate, juvenile, young adult, adult, aged adult 
Subject_age_range_max = "" # within in the time frame of the experiment, enter maximum age of subject described in this Subject block 
Subject_age_range_min = "" # within in the time frame of the experiment, enter minimum age of subject described in this Subject block 
Subject_alias = "" # enter alternative name for this metadata block  
Subject_disabilitydisease = "" # enter disability, disease or disease model the subject of this Subject block has  
Subject_genotype = "" # enter genotype of the subject described in this Subject block 
Subject_handedness = "" # choose: left, right, ambidextrous 
Subject_sex = "" # choose: female, male, hermaphrodite 
Subject_species = "" # enter binomial species name of this Subject (e.g., Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Macaca fascicularis) 
Subject_strain = "" # enter strain of the subject described in this Subject block 
SubjectGroup__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
SubjectGroup_age_category = "" # choose: embryo, neonate, juvenile, young adult, adult, aged adult 
SubjectGroup_age_range_max = "" # within in the time frame of the experiment, enter maximum age of subject described in this Subject block 
SubjectGroup_age_range_min = "" # within in the time frame of the experiment, enter minimum age of subject described in this Subject block 
SubjectGroup_alias = "" # enter alternative name for this metadata block  
SubjectGroup_description = "" # describe content of this metadata block 
SubjectGroup_disabilitydisease = "" # enter disability, disease or disease model the subjects connected to this SubjectGroup block have  
SubjectGroup_genotype = "" # enter genotype of subjects connected to this SubjectGroup block 
SubjectGroup_handedness = "" # choose: left, right, ambidextrous 
SubjectGroup_number_of_subjects = "" # enter number of subjects connected to this SubjectGroup block
SubjectGroup_sex = "" # choose: female, male, hermaphrodite 
SubjectGroup_species = "" # enter binomial name of species occurring in this SubjectGroup (e.g., Homo sapiens, Mus musculus, Rattus norvegicus, Macaca mulatta, Macaca fascicularis) 
SubjectGroup_strain = "" # enter strain of subjects connected to this SubjectGroup block 
TissueSample__ID = "" # enter your unique identifier for this metadata block, will later be replaced by a system (Knowledge Graph) wide unique identifier 
TissueSample_alias = "" # enter alternative name for this metadata block  
TissueSample_hemisphere = "" # choose: left, right 
TissueSample_pathology = "" # enter pathology state of tissue described in this TissueSample block 
TissueSample_type = "" # choose: whole brain, hemisphere, slice, brain part, cortical column, or enter other tissue sample type 

##################################################
# Create MINDS JSON object
import json
from collections import OrderedDict

data = OrderedDict()

data['Dataset'] = [
    	{
    		"_ID": Dataset__ID,
    		"_has_contributor": Dataset__has_contributor,
    		"created as": Dataset_created_as,
            "_has_custodian": Dataset__has_custodian,
    		"description": Dataset_description,
    		"DOI": Dataset_DOI,
            "embargo status": Dataset_embargo_status,
    		"intended release date": Dataset_intended_release_date,
    		"license": Dataset_license,
            "_has_main_contact": Dataset__has_main_contact,
    		"_has_main_file_bundle": Dataset__has_main_file_bundle,
    		"title": Dataset_title
    	}  
    ]

data['EthicsApproval'] = [
    	{
    		"_ID": EthicsApproval__ID,
    		"authority": EthicsApproval_authority,
    		"country": EthicsApproval_country,
            "ID": EthicsApproval_ID,
    		"SP12 approval": EthicsApproval_SP12_approval
    	}  
    ]

data['File'] = [
    	{
    		"_ID": File__ID,
    		"description": File_description,
    		"format": File_format,
            "title": File_title,
    		"URL": File_URL
    	}  
    ]

data['FileBundle'] = [
    	{
    		"_ID": FileBundle__ID,
    		"description": FileBundle_description,
    		"format": FileBundle_format,
            "tag": FileBundle_tag,
    		"title": FileBundle_title,
    		"URL": FileBundle_URL
    	}  
    ]

data['FundingInformation'] = [
    	{
    		"_ID": FundingInformation__ID,
    		"grant ID": FundingInformation_grant_ID,
    		"name": FundingInformation_name
    	}  
    ]

data['Method/Paradigm'] = [
    	{
    		"_ID": MethodParadigm__ID,
    		"abbreviation": MethodParadigm_abbreviation,
    		"description": MethodParadigm_description,
            "experimental type": MethodParadigm_experimental_type,
    		"full name": MethodParadigm_full_name,
            "type": MethodParadigm_type
    	}  
    ]

data['ModelInstance'] = [
    	{
    		"_ID": ModelInstance__ID,
    		"_is_modelling_sub_cellular_target": ModelInstance__is_modelling_sub_cellular_target,
    		"abstraction level": ModelInstance_abstraction_level,
    		"alias": ModelInstance_alias,
            "_is_modelling_brain_structure": ModelInstance__is_modelling_brain_structure,
    		"_has_contributor": ModelInstance__has_contributor,
            "_has_custodian": ModelInstance__has_custodian,
            "description": ModelInstance_description,
    		"_has_main_contact": ModelInstance__has_main_contact,
    		"model format": ModelInstance_model_format,
    		"model format version compatibility": ModelInstance_model_format_version_compatibility,
            "model scope": ModelInstance_model_scope,
    		"species":ModelInstance_species,
    		"title": ModelInstance_title,
            "version": ModelInstance_version
    	}  
    ]


data['Person'] = [
    	{
    		"_ID": Person__ID,
    		"email": Person_email,
    		"first name": Person_first_name,
            "last name": Person_last_name,
    		"ORCID": Person_ORCID
    	}  
    ]

data['PLAComponent'] = [
    	{
    		"_ID": PLAComponent__ID,
    		"associated task (ID)": PLAComponent_associated_task_ID,
    		"ID": PLAComponent_ID,
            "_has_owner": PLAComponent__has_owner,
    		"phase": PLAComponent_phase
    	}  
    ]


data['Project'] = [
    	{
    		"_ID": Project__ID,
    		"_has_coordinator": Project__has_coordinator,
    		"description": Project_description,
            "title": Project_title
    	}  
    ]

data['Publication/Resource'] = [
    	{
    		"_ID": PublicationResource__ID,
    		"ID": PublicationResource_ID,
    		"ID type": PublicationResource_ID_type
    	}  
    ]


data['StudyTarget'] = [
    	{
    		"_ID": StudyTarget__ID,
    		"abbreviation": StudyTarget_abbreviation,
    		"full name": StudyTarget_full_name,
            "source of name": StudyTarget_source_of_name,
    		"type": StudyTarget_type
    	}  
    ]



data['Subject'] = [
    	{
    		"_ID": Subject__ID,
    		"age": Subject_age,
    		"age category": Subject_age_category,
    		"age range (max)": Subject_age_range_max,
            "age range (min)": Subject_age_range_min,
    		"alias": Subject_alias,
            "disability/disease": Subject_disabilitydisease,
            "genotype": Subject_genotype,
    		"handedness": Subject_handedness,
    		"sex": Subject_sex,
    		"species": Subject_species,
            "strain": Subject_strain
    	}  
    ]

data['SubjectGroup'] = [
    	{
    		"_ID": SubjectGroup__ID,
    		"age category": SubjectGroup_age_category,
    		"age range (max)": SubjectGroup_age_range_max,
            "age range (min)": SubjectGroup_age_range_min,
    		"alias": SubjectGroup_alias,
    		"description": SubjectGroup_description,
            "disability/disease": SubjectGroup_disabilitydisease,
            "genotype": SubjectGroup_genotype,
    		"handedness": SubjectGroup_handedness,
    		"number of subjects": SubjectGroup_handedness,
    		"sex": SubjectGroup_sex,
    		"species": SubjectGroup_species,
            "strain": SubjectGroup_strain
    	}  
    ]

data['TissueSample'] = [
    	{
    		"_ID": TissueSample__ID,
    		"alias": TissueSample_alias,
    		"hemisphere": TissueSample_hemisphere,
            "pathology": TissueSample_pathology,
            "type": TissueSample_type
    	}  
    ]


# Save JSON
with open(output_MINDS_JSON, 'w') as ff:
    json.dump(data, ff,sort_keys=False, indent=4)

In [61]:
# load the minds_metadata.json to HBP Collab storage
collab_storage.upload_file(output_MINDS_JSON, COLLAB_PATH+"/"+output_MINDS_JSON)

'70ce6cd9-4b4c-4f85-9006-8e07a9b4cfef'