# Assignment

In this assignment we will build an unsupervised CNN autoencoder network. Subsequently, the pretrained encoding (contracting) backbone of the autoencoder will be used to create a model for survival prediction in brain tumor patients.

This assignment is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found at: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

### Submission

Once complete, the following items must be submitted:

* final `*.ipynb` notebook
* final trained `*.hdf5` model file
* final compiled `*.csv` file with performance statistics

# Google Colab

The following lines of code will configure your Google Colab environment for this assignment.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [None]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

Collecting jarvis-md
[?25l  Downloading https://files.pythonhosted.org/packages/74/8c/c0e9a5cc4840e50d0743824996f84a95922c4e21a71a991572323328df9e/jarvis_md-0.0.1a14-py3-none-any.whl (81kB)
[K     |████                            | 10kB 14.2MB/s eta 0:00:01[K     |████████                        | 20kB 21.4MB/s eta 0:00:01[K     |████████████                    | 30kB 26.7MB/s eta 0:00:01[K     |████████████████▏               | 40kB 28.7MB/s eta 0:00:01[K     |████████████████████▏           | 51kB 31.2MB/s eta 0:00:01[K     |████████████████████████▏       | 61kB 26.6MB/s eta 0:00:01[K     |████████████████████████████▏   | 71kB 24.0MB/s eta 0:00:01[K     |████████████████████████████████| 81kB 9.0MB/s 
Collecting pyyaml>=5.2
[?25l  Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)
[K     |████████████████████████████████| 645kB 42.8MB/s 
Insta

### Imports

Use the following lines to import any additional needed libraries:

In [2]:
import numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers
from jarvis.train import datasets

from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.activity.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fexperimentsandconfigs%20https%3a%2f%2fwww.googleapis.com%2fauth%2fphotos.native&response_type=code

Enter your authorization code:
4/1AY0e-g57tbnbUI6GnrFA6smuYyTb7oPP6FGyQzanHtTswezP9xoVQiXgZ6o
Mounted at /content/drive


# Data

The data used in this assignment will consist of brain tumor MRI exams derived from the MICCAI Brain Tumor Segmentation Challenge (BRaTS). More information about he BRaTS Challenge can be found here: http://braintumorsegmentation.org/. Each single 2D slice will consist of one of four different sequences (T2, FLAIR, T1 pre-contrast and T1 post-contrast).

The following lines of code will:

1. Download the dataset (if not already present) 
2. Prepare the necessary Python generators to iterate through dataset
3. Prepare the corresponding Tensorflow Input(...) objects for model definition

In [3]:
# --- Download dataset
datasets.download(name='mr/brats-2020-096')

# --- Prepare generators and model inputs
gen_train, gen_valid, client = datasets.prepare(name='mr/brats-2020-096', keyword='096*glb-org')
inputs = client.get_inputs(Input)



# Autoencoder

In this assignment we will train a convolutional autoencoder. Compared to a standard contract-encoding U-Net architecture for semantic segmentation, two important distinctions should be emphasize:

* no "skip" connections between the contractind and expanding layers
* use of a regression loss function (e.g., MAE or MSE) for optimization

### Define model layers

*Hint*: Recall that both a shared autoencoder and isolated encoder are needed to ensure that the contracting layers may be reused in a future model.

In [None]:
### --- Define kwargs dictionary
kwargs = {
    'kernel_size': (3, 3, 3),
    'padding': 'same'}

# --- Define lambda functions
conv = lambda x, filters, strides : layers.Conv3D(filters=filters, strides=strides, **kwargs)(x)
norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.ReLU()(x)
tran = lambda x, filters, strides : layers.Conv3DTranspose(filters=filters, strides=strides, **kwargs)(x)

# --- Define stride-1, stride-2 blocks
conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=2)))
tran2 = lambda filters, x : relu(norm(tran(x, filters, strides=2)))
# --- Define contracting layers
l1 = conv1(8, inputs['dat'])
l2 = conv1(16, conv2(16, l1))
l3 = conv1(32, conv2(32, l2))
l4 = conv1(48, conv2(48, l3))
l5 = conv1(64, conv2(64, l4))

# --- Define expanding layers
l6  = tran2(48, l5)
l7  = tran2(32, conv1(48, l6))
l8  = tran2(16, conv1(32, l7))
l9  = tran2(8,  conv1(16, l8))
l10 = conv1(8,  l9)

### --- Create autoencoder
# --- Create autoencoder
ae_outputs = {'recon': layers.Conv3D(filters=4, name='recon', **kwargs)(l10)}
ae = Model(inputs=inputs, outputs=ae_outputs)
# --- Create encoder
# --- Create encoder
encoder = Model(inputs=inputs, outputs=l5)

### Generator

*Hint*: Recall that the default training generators which yield labels corresponding to survival scores need to be modified to instead yield the original input data. 

In [5]:
def ae_generator(G):
    
  for xs, ys in G:

          ys = {'recon': xs['dat']}

          yield xs, ys

### Compile the model

In [6]:
# --- Compile model
ae.compile(optimizer=optimizers.Adam(learning_rate=1e-3),
    loss={'recon': losses.MeanSquaredError()},
    experimental_run_tf_function=False)

### Train the model

In [7]:
client.load_data_in_memory()



In [None]:
# --- Train model
ae.fit(
    x=ae_generator(gen_train), 
    steps_per_epoch=350, 
    epochs=8,
    use_multiprocessing=True)

# Survival Model

In the second part of this assignment, you will create a dedicated survival prediction model using the pretrained encoder layers.

In [None]:
# --- Define model
encoder.trainable = False
latent = encoder(inputs)

# --- Finalize model
h0 = layers.Flatten()(latent)
h1 = layers.Dense(32, activation='relu')(h0)

logits = {}
logits['survival'] = layers.Dense(1, activation='sigmoid', name='survival')(h1)

# --- Create encoder
model = Model(inputs=inputs, outputs=logits)

In [None]:
model.summary()

### Compile the model

In [None]:
# --- Compile model
# --- Compile model
model.compile(
    optimizer=optimizers.Adam(learning_rate=2e-4),
    loss={'survival': losses.MeanSquaredError()},
    experimental_run_tf_function=False)

### Train the model

In [12]:
# --- Train model
model.fit(
    x=gen_train, 
    steps_per_epoch=275, 
    epochs=8,
    validation_data=gen_valid,
    validation_steps=275,
    validation_freq=4,
    use_multiprocessing=True)

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<tensorflow.python.keras.callbacks.History at 0x7f920f20c0d0>

# Evaluation

Based on the tutorial discussion, use the following cells to calculate model performance. The following metrics should be calculated:

* absolute error (mean, median, 25th percentile, 75th percentile)

### Performance

The following minimum performance metrics must be met for full credit:

* median absolute error of < 0.09

In [None]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)

preds = []
trues = []
mae = []

for x, y in test_valid:
    
    # --- Predict
    logits = model.predict(x['dat'])

    if type(logits) is dict:
        logits = logits['survival']

    # --- Aggregate
    preds.append(logits.ravel())
    trues.append(y['survival'].ravel())
    mae.append(np.abs(preds[-1] - trues[-1]))

preds = np.array(preds).ravel()
trues = np.array(trues).ravel()
mae = np.array(mae).ravel()

### Results

When ready, create a `*.csv` file with your compiled **validation** cohort absolute prediction error statistics. There is no need to submit training performance accuracy.

In [14]:
# --- Define columns
df = pd.DataFrame(index=np.arange(mae.size))
df['MAE'] = mae
df.to_csv('./results.csv')
model.save('./survival.hdf5')
drive.mount('/content/drive')
# --- Print accuracy
print(df['MAE'].mean())

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
0.05868181213736534
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
0.05868181213736534


# Submission

Use the following line to save your model for submission:

### Canvas

Once you have completed this assignment, download the necessary files from Google Colab and your Google Drive. You will then need to submit the following items:

* final (completed) notebook: `[UCInetID]_assignment.ipynb`
* final (results) spreadsheet: `[UCInetID]_results.csv`
* final (trained) model: `[UCInetID]_model.hdf5`

**Important**: please submit all your files prefixed with your UCInetID as listed above. Your UCInetID is the part of your UCI email address that comes before `@uci.edu`. For example, Peter Anteater has an email address of panteater@uci.edu, so his notebooke file would be submitted under the name `panteater_notebook.ipynb`, his spreadshhet would be submitted under the name `panteater_results.csv` and and his model file would be submitted under the name `panteater_model.hdf5`.