# Google Colab

The following lines of code will configure your Google Colab environment for this assignment.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Mount Google Drive

The Google Colab environment is transient and will reset after any prolonged break in activity. To retain important and/or large files between sessions, use the following lines of code to mount your personal Google drive to this Colab instance:

In [5]:
try:
    # --- Mount gdrive to /content/drive/My Drive/
    from google.colab import drive
    drive.mount('/content/drive')
    
except: pass

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Throughout this assignment we will use the following global `MOUNT_ROOT` variable to reference a location to store long-term data. If you are using a local Jupyter server and/or wish to store your data elsewhere, please update this variable now.

In [0]:
# --- Set data directory
MOUNT_ROOT = '/content/drive/My Drive'

### Select Tensorflow library version

This assignment will use the (new) Tensorflow 2.0 library. Use the following line of code to select this updated version:

In [0]:
# --- Select Tensorflow 2.0 (only in Google Colab)
% tensorflow_version 2.x

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [8]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md



### Imports

Use the following lines to import any additional needed libraries:

In [0]:
import os, numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers, callbacks
from jarvis.train import datasets

# Data

As in the tutorial, data for this assignment will consist of prostate MRI exams. Each image will consist of one of four different sequences (T2, low b-value DWI, high b-value DWI, ADC). In this initial exercise, the goal is to simply develop an algorith that is capable of differentiating image type so that downstream models for cancer prediction can be used properly. The following lines of code will:

1. Download the dataset (if not already present) 
2. Prepare the necessary Python generators to iterate through dataset
3. Prepare the corresponding Tensorflow Input(...) objects for model definition

In [0]:
# --- Download dataset
datasets.download(name='mr/prostatex')

# --- Prepare generators and model inputs
configs = {'batch': {'size': 12}}
gen_train, gen_valid, client = datasets.prepare(name='mr/prostatex', configs=configs)
inputs = client.get_inputs(Input)

# Training

In this assignment we will train a basic convolutional neural network to predict the correct imaging series protocol on prostate MRI. At minumum you must include one of the following modern CNN architecture motifs techniques covered in the tutorial:

* residual function with bottleneck operation
* Inception module

You are also **encouraged** to try different permuations and customizations to achieve optimal validation accuracy.

### Define the model

In [0]:
proj = lambda filters, x, strides : layers.Conv2D(
    filters=filters, 
    strides=strides, 
    kernel_size=(1, 1),
    padding='same')(x)

In [0]:
def residual(a, b):

    #Method to implement residual connection between two arbitrary tensors (a + b)

    if a[0].shape == b[0].shape:
      return a+b
    else:
      return a+proj(a.shape[3],b,(b.shape[1]/a.shape[1],b.shape[1]/a.shape[1]))


In [0]:


preconv = lambda x, filters, strides, kernel_size,  : layers.Conv2D(filters = filters, strides = strides, kernel_size = kernel_size, padding = 'same')(x)

norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.LeakyReLU()(x)
pool = lambda x : layers.MaxPool2D(pool_size = (3,3), strides = 2, padding = 'same')(x)

conv = lambda filters, x, strides: relu(norm(preconv(x,filters, strides, kernel_size = (3,3))))
maxpool = lambda x : relu(norm(pool(x)))


In [39]:
# --- Define model

l1 = conv(256, inputs['dat'], 1)
l2 = conv(192, l1, 1)
l3 = residual(conv(168,l2, 1),l1)
l4 = maxpool(l3)
l5 = conv(140, l4, 1)
l6 = maxpool(l5)
l7 = residual(conv(108,l6,1),l3)
l7_d = layers.Dropout(0.2)(l7)
l8 = maxpool(l7_d)
l9 = conv(72,l8,1)
l9_d = layers.Dropout(0.2)(l9)
l10 = maxpool(l9_d)
l11 = residual(conv(48,l10,2),l7)
l11_d = layers.Dropout(0.2)(l11)
l12 = maxpool(l11_d)


f0 = layers.Flatten()(l12)


logits = {}
logits['class'] = layers.Dense(4, name = 'class')(f0)

# --- Create model

model = Model(inputs=inputs, outputs=logits)
model.summary()

Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
dat (InputLayer)                [(None, 256, 256, 1) 0                                            
__________________________________________________________________________________________________
conv2d_48 (Conv2D)              (None, 256, 256, 256 2560        dat[0][0]                        
__________________________________________________________________________________________________
batch_normalization_54 (BatchNo (None, 256, 256, 256 1024        conv2d_48[0][0]                  
__________________________________________________________________________________________________
leaky_re_lu_54 (LeakyReLU)      (None, 256, 256, 256 0           batch_normalization_54[0][0]     
____________________________________________________________________________________________

### Compile the model

In [0]:
# --- Compile model
model.compile(
    optimizer=optimizers.Adam(learning_rate = 1e-5),
    loss={'class': losses.SparseCategoricalCrossentropy(from_logits=True)}, 
    metrics={'class': 'sparse_categorical_accuracy'},)

### Train the model

In [42]:
callback = callbacks.EarlyStopping(monitor='val_sparse_categorical_accuracy', restore_best_weights=True)
model.fit(
    x=gen_train, 
    steps_per_epoch=50, 
    epochs=50,
    validation_data=gen_valid,
    callbacks = callback,
    validation_steps=50,
    validation_freq=4)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50


<tensorflow.python.keras.callbacks.History at 0x7ff3e1998390>

# Evaluation

Based on the tutorial discussion, use the following cells to check your algorithm performance. Consider loading a saved model and running prediction using `model.predict(...)` on the data aggregated via a test generator.

**Important**: In this assignment, you must obtain >90% performance accuracy to recieve full credit. Accuracy is determined on a patient by patient (volume by volume) basis, so please *aggregate* results per volume while calculating your performance accuracy here. One common approach is to take the mean prediction across the volume for final prediction; however many altneratives exist. If you determine a better method to calculate accuracy, feel free to implement here.

In [0]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)

**Note**: this cell is used only to check for model performance prior to submission. It will not be graded. Once submitted, your model will be benchmarked against the (same) validation cohort to determine final algorithm performance and grade. If your evaluation code above is correct the algorithm accuracy should match and you can confident that you will recieve full credit for the assignment. Once you are satisfied with your model, proceed to submission of your assignment below.

In [51]:
trues = []
preds = []

for x, y in test_valid:
    
    # --- Predict
    logits = model.predict(x['dat'][0])

    if type(logits) is dict:
        logits = logits['class']

    # --- Argmax
    pred = np.argmax(logits, axis=1)

    trues.append(y['class'][0, 0])
    preds.append(int(np.round(pred.mean())))

trues = np.array(trues)
preds = np.array(preds)



In [52]:
#--- Create DataFrame
df = pd.DataFrame(index=np.arange(preds.size))

# --- Define columns
df['true'] = trues
df['pred'] = preds
df['corr'] = df['true'] == df['pred']

# --- Print accuracy
print(df['corr'].mean())

0.9903846153846154


In [0]:
#np.mean(trues == preds)

### Results

When ready, create a `*.csv` file with your compiled **validation** cohort statistics. There is no need to submit training performance accuracy. As in the tutorial, ensure that there are at least three columns in the `*.csv` file:

* true (ground-truth)
* pred (prediction)
* corr (correction prediction, True or False)

In [0]:
# --- Create *.csv
                              
# --- Serialize *.csv
fname = '{}/models/series_id/results.csv'.format(MOUNT_ROOT)
os.makedirs(os.path.dirname(fname), exist_ok=True)
df.to_csv('assignment_4')