<a href="https://colab.research.google.com/github/Warvito/Normative-modelling-using-deep-autoencoders/blob/master/notebooks/predict_deviation_bootstrap.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deviation scores using all trained models

Here in this notebook, we implemented a easy way to you try our normative models based on autoencoders trained on the UK Biobank data.

Let's start!

---
## Enabling the GPU

First, you'll need to enable [GPUs](https://cloud.google.com/gpu) for the notebook:

- Navigate to Edit→Notebook Settings
- select GPU from the Hardware Accelerator drop-down

## Load trained models
Next, we will load into the colab enviroment the normative models trained in this study. Here, we trained 1,000 different models through the resampling method called bootstrap method. 

All the saved files are available at https://www.dropbox.com/s/7zatvu6f1vwtfgp/supervised_aae.zip?dl=0 .

 This file contains the saved files created in the bootstrap_train_aae_supervised.py script. The files are organized in subdirectories where each of the is a bootstrap iteration. In each iteration, we stored the data scaler, the age and gender encoders, and the encoder and decoder of the normative model.

In [1]:
!wget -O models.zip --no-check-certificate https://www.dropbox.com/s/7zatvu6f1vwtfgp/supervised_aae.zip?dl=0

--2019-11-18 22:48:25--  https://www.dropbox.com/s/7zatvu6f1vwtfgp/supervised_aae.zip?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.1, 2620:100:6016:1::a27d:101
Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/7zatvu6f1vwtfgp/supervised_aae.zip [following]
--2019-11-18 22:48:25--  https://www.dropbox.com/s/raw/7zatvu6f1vwtfgp/supervised_aae.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uce683e33875ecf79f81c38d36aa.dl.dropboxusercontent.com/cd/0/inline/AsrJ4Lu4Om4YJEOittw6DvECLsmbeiK9agGPB1-sJ7x4s32KRV0fbDVxeLjbMeWwOLFFQWdOvXz1H1DBoM7OUmLouxcrltdr6YqCvVvFR4pNYlzYrizczdNryl0Y19wx9B0/file# [following]
--2019-11-18 22:48:25--  https://uce683e33875ecf79f81c38d36aa.dl.dropboxusercontent.com/cd/0/inline/AsrJ4Lu4Om4YJEOittw6DvECLsmbeiK9agGPB1-sJ7x4s32KRV0fbDVxeLjbMeWwOLFFQWdOvXz1H1DBoM7OUmLouxc

## Unzip models


In [4]:
!unzip models.zip

Archive:  models.zip
   creating: supervised_aae/
   creating: supervised_aae/000/
  inflating: supervised_aae/000/encoder.h5  
  inflating: supervised_aae/000/decoder.h5  
  inflating: supervised_aae/000/discriminator.h5  
  inflating: supervised_aae/000/scaler.joblib  
  inflating: supervised_aae/000/age_encoder.joblib  
  inflating: supervised_aae/000/gender_encoder.joblib  
   creating: supervised_aae/001/
  inflating: supervised_aae/001/encoder.h5  
  inflating: supervised_aae/001/decoder.h5  
  inflating: supervised_aae/001/discriminator.h5  
  inflating: supervised_aae/001/scaler.joblib  
  inflating: supervised_aae/001/age_encoder.joblib  
  inflating: supervised_aae/001/gender_encoder.joblib  
   creating: supervised_aae/002/
  inflating: supervised_aae/002/encoder.h5  
  inflating: supervised_aae/002/decoder.h5  
  inflating: supervised_aae/002/discriminator.h5  
  inflating: supervised_aae/002/scaler.joblib  
  inflating: supervised_aae/002/age_encoder.joblib  
  inflating: 

As showed below, in the goocle colab enviroment, there is an arrow mark which looks like “>” at the left-hand side of the cells.

**FIGURE**

When you click that you will find a tab with three options, just have to select “Files” to explore the loaded models. 

## Importing libraries


In [0]:
%tensorflow_version 2.x
from pathlib import Path

import joblib
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from google.colab import files


In [3]:
tf.__version__

'2.0.0'

## Download freesurferData.csv and participants.tsv templates
In order to make predictions of your data, it is necessary to make it in the format to correctly read by this script. To facilitate this process, we supply the template files to to be filled with your data.

As show below, these template files contains the necessary columns names to run the script.

In [0]:
pd.read_csv('freesurferData.csv')

In [0]:
pd.read_csv('participants.tsv')

The next cells will start the download of the templates.

In [0]:
files.download('freesurferData.csv')

In [0]:
files.download('participants.tsv')

After filled the templates, upload the files to the Google colab enviroment.

Note: Your data will only be loaded in this runtime of the Google colab. This code is being execute at the Google Cloud Platform by default, and you are not making your code available for our team. If you are concern about uploading your data to the Google Cloud Platform, please, consider execute this notebook in a local runtime in your computer (https://research.google.com/colaboratory/local-runtimes.html).

First, start uploading the freesurferData.csv.

In [0]:
uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))

Saving freesurferData.csv to freesurferData.csv
User uploaded file "freesurferData.csv" with length 646877 bytes


Then, upload the participant.tsv file.

In [0]:
uploaded = files.upload()

for fn2 in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn2, length=len(uploaded[fn2])))

In [0]:
freesurfer_data_df = pd.read_csv(fn)
participants_df = pd.read_csv(fn2)

## Making predictions
After loaded the data, we predict the deviations compared to our trained normative models.

We begin by setting the random seeds and then calculating the relative brain region volmes (original volume divided by the total intracranial volume).

In [0]:
# Set random seed
random_seed = 42
tf.random.set_seed(random_seed)
np.random.seed(random_seed)

In [0]:
# Get the relative brain region volumes
x_dataset = freesurfer_data_df[dataset_df.columns[2:]].values

tiv = freesurfer_data_df['EstimatedTotalIntraCranialVol'].values
tiv = tiv[:, np.newaxis]

x_dataset = (np.true_divide(x_dataset, tiv)).astype('float32')

Finally, we iterate over all models performing the calculation of the deviations.

In [0]:
model_dir = Path('supervised_aae')
N_BOOTSTRAP = 1000

# Create dataframe to store outputs
reconstruction_error_df = pd.DataFrame(columns=['Participant_ID', 'Reconstruction error'])
reconstruction_error_df['Participant_ID'] = clinical_df['Participant_ID']

# ----------------------------------------------------------------------------
for i_bootstrap in range(N_BOOTSTRAP):
    bootstrap_model_dir = model_dir / '{:03d}'.format(i_bootstrap)

    # ----------------------------------------------------------------------------
    encoder = keras.models.load_model(bootstrap_model_dir / 'encoder.h5')
    decoder = keras.models.load_model(bootstrap_model_dir / 'decoder.h5')

    scaler = joblib.load(bootstrap_model_dir / 'scaler.joblib')

    enc_age = joblib.load(bootstrap_model_dir / 'age_encoder.joblib')
    enc_gender = joblib.load(bootstrap_model_dir / 'gender_encoder.joblib')

    # ----------------------------------------------------------------------------
    x_normalized = scaler.transform(x_dataset)

    # ----------------------------------------------------------------------------
    age = clinical_df['Age'].values[:, np.newaxis].astype('float32')
    one_hot_age = enc_age.transform(age)

    gender = clinical_df['Gender'].values[:, np.newaxis].astype('float32')
    one_hot_gender = enc_gender.transform(gender)

    y_data = np.concatenate((one_hot_age, one_hot_gender), axis=1).astype('float32')

    # ----------------------------------------------------------------------------
    encoded = encoder(x_normalized, training=False)
    reconstruction = decoder(tf.concat([encoded, y_data], axis=1), training=False)

    # ----------------------------------------------------------------------------
    reconstruction_error = np.mean((x_normalized - reconstruction) ** 2, axis=1)

    reconstruction_error_df[('Reconstruction error {:03d}'.format(i_bootstrap))] = reconstruction_error

reconstruction_error_df.to_csv('reconstruction_error.csv', index=False)

## Download predicitons
Finally, you can download the result in the "Files" tab or executing the cell below.