# Assignment

In this assignment we will train a convolutional neural network (CNN) to detect brain tumors from MR imaging data. For each 2D slice of data, four different input channels (four different MR imaging sequences) will be used for a global classification task of detection tumor presence.

This assignment is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

### Submission

Once complete, the following items must be submitted:

* final `*.ipynb` notebook
* final trained `*.hdf5` model file
* final compiled `*.csv` file with performance statistics

# Google Colab

The following lines of code will configure your Google Colab environment for this assignment.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [1]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md



### Imports

Use the following lines to import any additional needed libraries:

In [2]:
import os, numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers
from jarvis.train import datasets

# Data

The data used in this assignemnt will consist of brain tumor MRI exams derived from the MICCAI Brain Tumor Segmentation Challenge (BRaTS). More information about he BRaTS Challenge can be found here: http://braintumorsegmentation.org/. Each single 2D slice will consist of one of four different sequences (T2, FLAIR, T1 pre-contrast and T1 post-contrast). In this assignment, we will use this dataset to derive a model for slice-by-slice tumor detection.  The following lines of code will:

1. Download the dataset (if not already present) 
2. Prepare the necessary Python generators to iterate through dataset
3. Prepare the corresponding Tensorflow Input(...) objects for model definition

In [3]:
# --- Download dataset
datasets.download(name='mr/brats-2020-mip')

# --- Prepare generators and model inputs
gen_train, gen_valid, client = datasets.prepare(name='mr/brats-2020-mip', keyword='mip*glb')
inputs = client.get_inputs(Input)

# Training

In this assignment we will train a basic convolutional neural network to predict the correct imaging series protocol on prostate MRI. At minumum you must include one of the following modern CNN architecture motifs techniques covered in the tutorial:

* residual function with bottleneck operation
* Inception module
* squeeze-and-excitation module

You are also **encouraged** to try different permuations and customizations to achieve optimal validation accuracy.

### Define the model

In [4]:
# --- Define model
# --- Define kwargs dictionary
kwargs = {
    'kernel_size': (1, 3, 3),
    'padding': 'same'}

# --- Define lambda functions
conv = lambda x, filters, strides : layers.Conv3D(filters=filters, strides=strides, **kwargs)(x)
norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.ReLU()(x)

# --- Define stride-1, stride-2 blocks
conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=(1, 2, 2))))
# --- Define projection

proj = lambda filters, x : layers.Conv3D(
    filters=filters, 
    strides=1, 
    kernel_size=(1, 1, 1),
    padding='same')(x)

# --- Define bottleneck conv-conv block
l1 = conv2(32, inputs['dat'])
p1 = layers.GlobalAveragePooling3D()(l1)

# --- Excitation (reduce channels to 1 / R) ==> in this example set R = 4 arbitrarily
ch = int(p1.shape[-1] / 4)
f1 = layers.Dense(ch, activation='relu')(p1)

# --- Scale (expand channels to original size)
scale = layers.Dense(l1.shape[-1], activation='sigmoid')(f1)
scale = layers.Reshape((1, 1, 1, l1.shape[-1]))(scale)    

# --- Modify l1
l1 = l1 * scale
l2 = proj(48, conv2(8, proj(8, l1)))
# --- Define third block with residual connection
# l3 = conv1(32, l2) + proj(32, l1)

# --- Create model
# --- Extract shape and reshape
n0, n1, c = l2.shape[-3:]
f0 = layers.Reshape([-1, 1, 1, n0 * n1 * c])(l2)

logits = {}
logits['tumor'] = layers.Conv3D(filters=2, kernel_size=1, name='tumor')(f0)

model = Model(inputs=inputs, outputs=logits)

### Compile the model

In [5]:
# --- Compile model

model.compile(
    optimizer=optimizers.Adam(learning_rate=2e-4), 
    loss={'tumor': losses.SparseCategoricalCrossentropy(from_logits=True)}, 
    metrics={'tumor': 'sparse_categorical_accuracy'})


# New Section

### In-memory data

To speed up training, consider loading all your model data into RAM memory:

In [6]:
# --- Load data into memory for faster training
client.load_data_in_memory()



### Train the model

In [7]:
model.fit(
    x=gen_train, 
    steps_per_epoch=175, 
    epochs=10,
    validation_data=gen_valid,
    validation_steps=125,
    validation_freq=3)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fef64d0c510>

# Evaluation

Based on the tutorial discussion, use the following cells to check your algorithm performance. Consider loading a saved model and running prediction using `model.predict(...)` on the data aggregated via a test generator.

**Important**: In this assignment, you must obtain >75% performance accuracy to recieve full credit. Accuracy is determined on slice-by-slice performance accuracy as demonstrated in the tutorial; please refer to tutorial materials if you have questions.

In [8]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)

trues = []
preds = []

for x, y in test_valid:
    
    # --- Predict
    logits = model.predict(x['dat'])

    if type(logits) is dict:
        logits = logits['tumor']

    # --- Argmax
    pred = np.squeeze(np.argmax(logits, axis=-1))

    trues.append(y['tumor'].ravel())
    preds.append(pred.ravel())

trues = np.concatenate(trues)
preds = np.concatenate(preds)



In [9]:
# --- Create DataFrame
df = pd.DataFrame(index=np.arange(preds.size))

# --- Define columns
df['true'] = trues
df['pred'] = preds
df['corr'] = df['true'] == df['pred']

# --- Print accuracy
print(df['corr'].mean())

0.8410226270937409


**Note**: this cell is used only to check for model performance prior to submission. It will not be graded. Once submitted, your model will be benchmarked against the (same) validation cohort to determine final algorithm performance and grade. If your evaluation code above is correct the algorithm accuracy should match and you can be confident that you will recieve full credit for the assignment. Once you are satisfied with your model, proceed to submission of your assignment below.

### Results

When ready, create a `*.csv` file with your compiled **validation** cohort statistics. There is no need to submit training performance accuracy. As in the tutorial, ensure that there are at least three columns in the `*.csv` file:

* true (ground-truth)
* pred (prediction)
* corr (correction prediction, True or False)

In [11]:
os.makedirs(os.path.dirname('./'), exist_ok=True)
df.to_csv('./Fnaghman_cnn.csv')

# Submission

Use the following line to save your model for submission (in Google Colab this should save your model file into your personal Google Drive):

In [12]:
# --- Serialize a model
fname = './Fnaghman_model.hdf5'
model.save(fname)

### Canvas

Once you have completed this assignment, download the necessary files from Google Colab and your Google Drive. You will then need to submit the following items:

* final (completed) notebook: `[UCInetID]_assignment.ipynb`
* final (results) spreadsheet: `[UCInetID]_results.csv`
* final (trained) model: `[UCInetID]_model.hdf5`

**Important**: please submit all your files prefixed with your UCInetID as listed above. Your UCInetID is the part of your UCI email address that comes before `@uci.edu`. For example, Peter Anteater has an email address of panteater@uci.edu, so his notebooke file would be submitted under the name `panteater_notebook.ipynb`, his spreadshhet would be submitted under the name `panteater_results.csv` and and his model file would be submitted under the name `panteater_model.hdf5`.

In [13]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive
