In [1]:
import getpass
import pprint
import os

import pandas
import qmenta.client
from tqdm import tqdm

# 1. Sign-up and login at QMENTA

Go to https://platform.qmenta.com/#/register, create an account to use the QMENTA platform **using the promotional code specified below**, and then log in with your recently created account to acces the platform.

#### PROMOTIONAL CODE: **VISUM2018**

![QMENTA Platform Registration form](assets/qmenta_platform_registration.png)

Once you log in for the first time you should be able to see the Public Project **CNIC-QMENTA 1000 Brains Challenge**

![Platform dashboard](assets/platform_dashboard.png)

_NOTE_: _Check this [Getting started](https://support.qmenta.com/hc/en-us/sections/115000599931-Sign-up-and-login) article in case you need help with registration and login_

# 2. Set up the QMENTA Client

Fill in the following cells with your username and password

In [2]:
username = 'user'

In [3]:
password = getpass.getpass('Password:')

Instantiate a QMENTA Client account, and get a Project instance to interact with the Hackaton project in the platform 

In [4]:
acc = qmenta.client.Account(username=username, password=password)

In [5]:
project = acc.get_project('QMENTA 1000 Brains Challenge')

# 3. List all the subjects in the project and their metadata

Get all the subjects and sort them by patient_secret_name

In [6]:
subjects_metadata = project.get_subjects_metadata()  # This returns a list of dictionaries
subjects_metadata = sorted(subjects_metadata, key=lambda x: x['patient_secret_name'])

In [7]:
len(subjects_metadata)

951

Let's have a look at the data structure that represents each subject

In [8]:
subjects_metadata[0]

{u'_id': 144299.0,
 u'age_at_scan': None,
 u'container_id': 194963,
 u'data_location': u'eu',
 u'date_at_scan': {u'$date': 1480032000000},
 u'md_age': 38.0,
 u'md_set': u'train',
 u'owner': u'Albert Puente Encinas',
 u'patient_secret_name': u'28634',
 u'qa_comments': u'',
 u'qa_status': u'',
 u'ssid': u'1',
 u'tags': [],
 u'user_id': u'apuente'}

The relevant fields are:
- **patient_secret_name**: unique identifier for this subject in this project. We also refer to this field as **SubjectID** 
- **md_set**: identifies if the subject belongs to the 'train' or the 'test' set
- **md_age**: the age of the subject

Let's separate the subjects between train and set

In [9]:
train_subjects = [x for x in subjects_metadata if x['md_set'] == 'train']
len(train_subjects)

856

In [10]:
test_subjects = [x for x in subjects_metadata if x['md_set'] == 'test']
len(test_subjects)

95

We will only use the subjects in 'train' to build a CSV which maps the SubjectID with the Age.

In [11]:
train_csv_info_dict = {x['patient_secret_name']: {'Age': x['md_age']} for x in train_subjects}

In [12]:
train_csv_dataframe = pandas.DataFrame.from_dict(train_csv_info_dict, orient='index')

In [13]:
train_csv_dataframe

Unnamed: 0,Age
28634,38.0
28698,47.0
28703,64.0
28708,58.0
28713,12.0
28717,62.0
28723,25.0
28733,28.0
28736,21.0
28740,14.0


Let's create a local folder to store all the data from the Hackaton 

In [14]:
hackaton_dir = os.path.expanduser('~/qmenta_1000_brains_challenge')
print(hackaton_dir)

/home/santi/qmenta_1000_brains_challenge


In [15]:
if not os.path.isdir(hackaton_dir):
    os.makedirs(hackaton_dir)

Now we will store the pandas DataFrame as a CSV in this folder, so that we can use it later to train a Machine Learning model.

In [16]:
train_csv_dataframe.to_csv(os.path.join(hackaton_dir, 'train.csv'))

# 4. Fetch the analysis results for all subjects

We will again list all the completed analysis in the platform and sort them by patient secret name

In [17]:
analysis = project.list_analysis()
analysis = [x for x in analysis if x['state'] == 'completed']
analysis = sorted(analysis, key=lambda x: x['patient_secret_name'])

In [18]:
len(analysis)

951

Let's see the data structure that represents each analysis

In [19]:
analysis[0]

{u'_id': 83827,
 u'config': {u'time_project_end': {u'$date': 1528115957123},
  u'time_project_start': {u'$date': 1528103238822}},
 u'description': u'',
 u'in_container_id': 194963,
 u'name': u'ANTs Morphology 2.1 (v.4.6)',
 u'out_container_id': 206911,
 u'owner': u'Albert Puente Encinas',
 u'patient_secret_name': u'28634',
 u'progress': [-1, u'Completed', {u'$date': 1528115956987}],
 u'projectset_id': 1658,
 u'qa_comments': u'',
 u'qa_data': {},
 u'qa_status': u'',
 u'script_name': u'qmenta_ants_morphology_2',
 u'settings': {u'acpc_alignment': u'1',
  u'age_months': 456.0,
  u'alignment_tool': u'ants',
  u'atlas_template': u'DKT40',
  u'do_thickness': u'1',
  u'input': {u'container_id': 194963,
   u'date': {u'$date': 1480032000000},
   u'filters': {u'c_T1': {u'files': [{u'_id': 2542875.0,
       u'modality': u'T1',
       u'name': u'anat.nii.gz',
       u'tags': []}],
     u'has_to_choose': 0,
     u'passed': True,
     u'range': [1, 1]}},
   u'in_out': u'in',
   u'passed': True,
   u'

As mentioned before, the analysis run for this database quantifies volumetry and morphometry of the brain given a structural MR image, concretely a T1-weighted MR image.
The specific tool that we used is build upon [ANTs](The specific tool that we used is build upon [ANTs]). You can learn about the specifics of this tool in this support article [here](https://support.qmenta.com/hc/en-us/articles/115000760611-ANTs-Morphology-2-1-0-).

The important field in this case is the **out_container_id**, because it identifies the data container that stores the result files. We will need to download one or more of these result files to use them as the predictor variables for our ML model.

Let's see the typical set of files produced by an ANTs analysis

In [20]:
out_container_id_example = analysis[0]['out_container_id']
results_files_example = project.list_container_files_metadata(out_container_id_example)
pprint.pprint(results_files_example)

[{u'metadata': {u'format': u'nifti', u'info': {}, u'modality': u'T1'},
  u'name': u'T1_acpc.nii.gz',
  u'size': 27157393,
  u'tags': [u'head']},
 {u'metadata': {u'format': None, u'info': {}},
  u'name': u'T1strip_bin.nii.gz',
  u'size': 104754,
  u'tags': [u'mask']},
 {u'metadata': {u'format': u'nifti', u'info': {}, u'modality': u'T1'},
  u'name': u'T1_original.nii.gz',
  u'size': 6371740,
  u'tags': []},
 {u'metadata': {u'format': u'nifti', u'info': {}, u'modality': u'T1'},
  u'name': u'T1strip.nii.gz',
  u'size': 4905960,
  u'tags': [u'strip']},
 {u'metadata': {u'format': None, u'info': {}},
  u'name': u'report.pdf',
  u'size': 481209,
  u'tags': []},
 {u'metadata': {u'format': None, u'info': {}},
  u'name': u'tissueSegmentation.csv',
  u'size': 647,
  u'tags': []},
 {u'metadata': {u'format': None, u'info': {}},
  u'name': u'volumetric.csv',
  u'size': 9254,
  u'tags': []},
 {u'metadata': {u'format': None, u'info': {}},
  u'name': u'labeled.nii.gz',
  u'size': 274288,
  u'tags': [u'l

As you can see ANTs generates a lot of intermediate results, however we are only interested in few things:
- **T1_strip.nii.gz** (modality=T1, tags=strip): skull-stripped brain, in case you want to train a Machine or Deep Learning algorithm on the raw structural data.
- **thickness.nii.gz** (tags=thickness): thickness map of the brain, in which each voxel belonging to the cortex has a value that indicates the thickness in mm.
- **tissueSegmentation.nii.gz** (tags=tissue_segmentation): segmentation map of the brain into its main tissues, namely Gray Matter, White Matter, CerebroSpinal Fluid (CSF), Deep Brain, Brain-Stem and Cerebellum.
- **labels.nii.gz** (tags=labels): Brain parcellation of the cortex and other important structures.
- **volumetric.csv**: CSV with volume, average thickness and standard deviation of thickness information for each tissue and region in the brain.

In our example we will only use the information provided by **volumetric.csv**, however you are free to use any of the result files available in there, and even the original T1 image.

Let's download the volumetric.csv for our example analysis and inspect it using pandas

In [21]:
project.download_file(container_id=out_container_id_example, file_name='volumetric.csv', local_filename='/tmp/volumetric.csv', overwrite=True)

True

In [22]:
volumetric_csv_example = pandas.read_csv('/tmp/volumetric.csv')

In [23]:
volumetric_csv_example

Unnamed: 0,x,y,z,t,value,mass,volume,count,label,group,thick_avg,thick_std
0,90.477601,115.384626,84.866236,0,1,250883,2.508830e+05,250883,CSF,CSF,0.00000,0.000000
1,89.407179,106.321591,83.597271,0,2,503624,5.036240e+05,503624,Gray matter,Gray matter,0.00000,0.000000
2,89.278643,108.537062,87.133985,0,3,427407,4.274070e+05,427407,White matter,White matter,0.00000,0.000000
3,89.861248,117.922166,70.414811,0,4,39430,3.943000e+04,39430,Deep brain,Deep brain,0.00000,0.000000
4,89.153674,98.397373,37.404017,0,5,19717,1.971700e+04,19717,Brain-Stem,Brain-Stem,0.00000,0.000000
5,89.615404,71.588065,36.966495,0,6,146096,1.460960e+05,146096,Cerebellum,Cerebellum,0.00000,0.000000
6,0.000000,0.000000,0.000000,0,-1,0,1.387157e+06,0,ICV,ICV,0.00000,0.000000
7,0.000000,0.000000,0.000000,0,-1,0,8.191387e+01,0,BPF,BPF,0.00000,0.000000
8,99.643965,107.150580,75.860853,0,10,6123,6.123000e+03,6123,thalamusproper_L,Subcortical-Left,0.00000,0.000000
9,103.467513,134.084018,79.544810,0,11,2678,2.678000e+03,2678,caudate_L,Subcortical-Left,0.00000,0.000000


Now we can download all the **volumetric.csv** for all subjects in order to use this information to train an ML algorithm to predict the age.

First we will create a folder to store all the volumetric data in our computer from the train and test sets

In [24]:
train_volumetric_data_dir = os.path.join(hackaton_dir, 'train')
print(train_volumetric_data_dir)

/home/santi/qmenta_1000_brains_challenge/train


In [25]:
if not os.path.isdir(train_volumetric_data_dir):
    os.makedirs(train_volumetric_data_dir)

In [26]:
test_volumetric_data_dir = os.path.join(hackaton_dir, 'test')
print(test_volumetric_data_dir)

/home/santi/qmenta_1000_brains_challenge/test


In [27]:
if not os.path.isdir(test_volumetric_data_dir):
    os.makedirs(test_volumetric_data_dir)

We distinguish between analysis from the train and test set

In [28]:
train_subjects_set = set([x['patient_secret_name'] for x in train_subjects])
test_subjects_set = set([x['patient_secret_name'] for x in test_subjects])

In [29]:
train_analysis = [x for x in analysis if x['patient_secret_name'] in train_subjects_set]
test_analysis = [x for x in analysis if x['patient_secret_name'] in test_subjects_set]

For each analysis we download the volumetric data

In [30]:
for analysis_instance in tqdm(train_analysis):
    analysis_container_id = analysis_instance['out_container_id']
    patient_name = analysis_instance['patient_secret_name']
    volumetric_filepath = os.path.join(train_volumetric_data_dir, '{}_volumetric.csv'.format(patient_name))
    project.download_file(container_id=analysis_container_id, file_name='volumetric.csv', local_filename=volumetric_filepath, overwrite=True)

100%|██████████| 856/856 [06:25<00:00,  2.22it/s]


In [31]:
for analysis_instance in tqdm(test_analysis):
    analysis_container_id = analysis_instance['out_container_id']
    patient_name = analysis_instance['patient_secret_name']
    volumetric_filepath = os.path.join(test_volumetric_data_dir, '{}_volumetric.csv'.format(patient_name))
    project.download_file(container_id=analysis_container_id, file_name='volumetric.csv', local_filename=volumetric_filepath, overwrite=True)

100%|██████████| 95/95 [00:42<00:00,  2.23it/s]
