# Assignment

In this assignment we will create a model for segmentation of enhancing tumor from brain MR images using custom loss function modifications to account for class imbalance.

This assignment is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

### Submission

Once complete, the following items must be submitted:

* final `*.ipynb` notebook
* final trained `*.hdf5` model file
* final compiled `*.csv` file with performance statistics

# Google Colab

The following lines of code will configure your Google Colab environment for this assignment.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Select Tensorflow library version

This tutorial will use the Tensorflow 2.1 library. Use the following line of code to select and download this specific version:

In [1]:
# --- Download Tensorflow 2.x (only in Google Colab)
% pip install tensorflow-gpu==2.1

Collecting tensorflow-gpu==2.1
[?25l  Downloading https://files.pythonhosted.org/packages/0c/e8/56ecca076108302a0bc34d509dc891086455ac31895843403ef0a71d0497/tensorflow_gpu-2.1.0-cp37-cp37m-manylinux2010_x86_64.whl (421.8MB)
[K     |████████████████████████████████| 421.8MB 33kB/s 
Collecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Collecting tensorboard<2.2.0,>=2.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/d9/41/bbf49b61370e4f4d245d4c6051dfb6db80cec672605c91b1652ac8cc3d38/tensorboard-2.1.1-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.9MB 37.6MB/s 
Collecting tensorflow-estimator<2.2.0,>=2.1.0rc0
[?25l  Downloading https://files.pythonhosted.org/packages/18/90/b77c328a1304437ab1310b463e533fa7689f4bfc41549593056d812fab8e/tensorflow_estimator-2.1.0-py2.py3-none-any.whl (448kB)
[K     |████████████████████████████████| 450kB 47.4MB/

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [2]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

Collecting jarvis-md
[?25l  Downloading https://files.pythonhosted.org/packages/74/8c/c0e9a5cc4840e50d0743824996f84a95922c4e21a71a991572323328df9e/jarvis_md-0.0.1a14-py3-none-any.whl (81kB)
[K     |████████████████████████████████| 81kB 38kB/s 
Collecting pyyaml>=5.2
[?25l  Downloading https://files.pythonhosted.org/packages/7a/a5/393c087efdc78091afa2af9f1378762f9821c9c1d7a22c5753fb5ac5f97a/PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636kB)
[K     |████████████████████████████████| 645kB 11.4MB/s 
Installing collected packages: pyyaml, jarvis-md
  Found existing installation: PyYAML 3.13
    Uninstalling PyYAML-3.13:
      Successfully uninstalled PyYAML-3.13
Successfully installed jarvis-md-0.0.1a14 pyyaml-5.4.1


### Imports

Use the following lines to import any additional needed libraries:

In [3]:
import numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers, metrics
from jarvis.train import datasets, custom
from jarvis.train.client import Client
from jarvis.utils.general import overload, tools as jtools
from jarvis.utils.display import imshow

# Data

The data used in this tutorial will consist of brain tumor MRI exams derived from the MICCAI Brain Tumor Segmentation Challenge (BRaTS). More information about he BRaTS Challenge can be found here: http://braintumorsegmentation.org/. Each single 2D slice will consist of one of four different sequences (T2, FLAIR, T1 pre-contrast and T1 post-contrast). In this exercise, we will use this dataset to derive a model for slice-by-slice tumor segmentation. The custom `datasets.download(...)` method can be used to download a local copy of the dataset. By default the dataset will be archived at `/data/raw/mr_brats_2020`; as needed an alternate location may be specified using `datasets.download(name=..., path=...)`. 

In [4]:
# --- Download dataset
datasets.download(name='mr/brats-2020-mip')



{'code': '/data/raw/mr_brats_2020', 'data': '/data/raw/mr_brats_2020'}

# Training

In order to create a high sensitivity classifier for pnuemonia, the following stratgies should be implemented in this assigment:

* stratified sampling, and
* pixel-level class weights, or
* pixel-level masked loss

### Stratified Sampling

Use the following code block to define a custom configuration dictionary to increase the sampling distribution of enhancing tumor (`lbl-mip-03`):

In [5]:
# --- Configs dict to implement stratified sampling
configs = {
    'batch': {'size': 8},
    'sampling': {
        'lbl-mip-00': 0.3,
        'lbl-mip-01': 0.1,
        'lbl-mip-02': 0.1,
        'lbl-mip-03': 0.5}}

# --- Prepare generators
gen_train, gen_valid, client = datasets.prepare(name='mr/brats-2020-mip', keyword='mip*vox', configs=configs)

### Create custom generators

*Hint*: Ensure that a combination of class weights and/or masked loss is used.

In [6]:
def CustomGenerator(G):
    
    for xs, ys in G:
        
        # --- Define msk
        xs['msk'] = np.zeros(ys['tumor'].shape, dtype='float32')
        xs['msk'][ys['tumor'] == 3] = 0.5
        
        # --- Binarize ys
        ys['tumor'] = ys['tumor'] == 3
        ys['tumor'] = ys['tumor'].astype('uint8')
        
        yield xs, ys

### Create inputs

*Hint*: Ensure that both the standard `dat` input as well as the addition `msk` input is accounted for in the `inputs` dictionary.

In [7]:
inputs = client.get_inputs(Input)
inputs['msk'] = Input(shape=(None, 240, 240, 1), dtype='float32', name='msk')

### Define the model

In [8]:
# --- Define kwargs dictionary
kwargs = {
    'kernel_size': (1, 3, 3),
    'padding': 'same'}

# --- Define lambda functions
conv = lambda x, filters, strides : layers.Conv3D(filters=filters, strides=strides, **kwargs)(x)
norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.ReLU()(x)
tran = lambda x, filters, strides : layers.Conv3DTranspose(filters=filters, strides=strides, **kwargs)(x)

concat = lambda a, b : layers.Concatenate()([a, b])

# --- Define stride-1, stride-2 blocks
conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=(1, 2, 2))))
tran2 = lambda filters, x : relu(norm(tran(x, filters, strides=(1, 2, 2))))

In [9]:
# --- Define contracting layers
l1 = conv1(8, inputs['dat'])
l2 = conv1(16, conv2(16, l1))
l3 = conv1(32, conv2(32, l2))
l4 = conv1(48, conv2(48, l3))
l5 = conv1(64, conv2(64, l4))

# --- Define expanding layers
l6  = tran2(48, l5)
l7  = tran2(32, conv1(48, concat(l4, l6)))
l8  = tran2(16, conv1(32, concat(l3, l7)))
l9  = tran2(8,  conv1(16, concat(l2, l8)))
l10 = conv1(8,  l9)

# --- Create logits
logits = {}
logits['tumor'] = layers.Conv3D(filters=2, name='tumor', **kwargs)(l10)

# --- Create model
model = Model(inputs=inputs, outputs=logits)

### Compile the model

*Hint*: Ensure that custom loss functions are used as described in the tutorial to properly adjust the loss function for weights and masks. In addition it may be useful to track metrics such as Dice score and sensitivity to gauge real time performance.

In [10]:
def sce(weights, scale=1.0):

    loss = losses.SparseCategoricalCrossentropy(from_logits=True)

    def sce(y_true, y_pred):

        return loss(y_true=y_true, y_pred=y_pred, sample_weight=weights) * scale

    return sce

In [11]:
loss = {'tumor': custom.sce(inputs['msk'])}


In [12]:
# --- Create metrics
metrics = custom.dsc(weights=inputs['msk'])
metrics += [custom.softmax_ce_sens(weights=inputs['msk'])]

metrics = {'tumor': metrics}


In [13]:
# --- Compile the model
model.compile(
    optimizer=optimizers.Adam(learning_rate=2e-4),
    loss=loss,
    metrics=metrics,
    experimental_run_tf_function=False)

In [14]:
client.load_data_in_memory()



In [16]:
def dice(y_true, y_pred, c=1, epsilon=1):
    """
    Method to calculate the Dice score coefficient for given class
    
    :params
    
      (np.ndarray) y_true : ground-truth label
      (np.ndarray) y_pred : predicted logits scores
      (int)             c : class to calculate DSC on
    
    """
    assert y_true.ndim == y_pred.ndim

    A = np.count_nonzero(y_true & y_pred) * 2
    B = np.count_nonzero(y_true) + np.count_nonzero(y_pred) + epsilon
    
    return A / B 

In [17]:
def calculate_sens(pred, true, epsilon=1):
    """
    Method to calculate sensitivity from pred and true masks
    
    """
    truePositve=(pred==1) & (true==1)
    groundTruth= (true==1)

    return (truePositve.sum()+epsilon)/(groundTruth.sum()+epsilon)


In [18]:
from tensorflow.keras import callbacks  
tensorboard_callback = callbacks.TensorBoard('./logs')

### Train the model

Use the following cell block to train your model.

In [19]:
gen_train_custom=CustomGenerator(gen_train)
gen_valid_custom=CustomGenerator(gen_valid)

model.fit(
    x=gen_train_custom, 
    steps_per_epoch=50, 
    epochs=20,
    validation_data=gen_valid_custom,
    validation_steps=50,
    validation_freq=4,
    use_multiprocessing=True,
    callbacks=[tensorboard_callback])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 1/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 1/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 1/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 1/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20


<tensorflow.python.keras.callbacks.History at 0x7fe1815aef90>

# Evaluation

Based on the tutorial discussion, use the following cells to calculate model performance. The following metrics should be calculated:

* pixel-wise sensitivity (mean, median, 25th percentile, 75th percentile)
* Dice score coefficient (mean, median, 25th percentile, 75th percentile)

### Performance

The following minimum performance metrics must be met for full credit:

* median pixel-wise sensitivity: >0.65
* median Dice score coefficient: >0.65

In [23]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)
custom_train= CustomGenerator(test_train)
test_valid=CustomGenerator(test_valid)

dsc=[]
sens=[]
for x, y in test_valid:
    
    # --- Create prediction
    logits = model.predict(x)

    pred = np.argmax(logits[0], axis=-1)
 
    # --- Clean up pred using mask
    pred[x['msk'][0, ..., 0] == 0] = 0

    
    # --- Calculate Dice
    dsc.append(dice(y['tumor'][0,...,0], pred, c=1))


    
    # --- Calculate sens
    sens.append(calculate_sens(pred=pred, true=y['tumor'][0,...,0]))

dsc=np.array(dsc)
sens=np.array(sens)



### Results

When ready, create a `*.csv` file with your compiled **validation** cohort sensitivity and Dice score statistics. There is no need to submit training performance accuracy.

In [37]:
df = pd.DataFrame(index=np.arange(dsc.size))
df['dice'] = dsc
df['sens'] = sens

stats=pd.DataFrame(columns=['Median', 'Mean','25th percentile', '75th Percentile'])
stats.loc['Dice']=df['dice'].median(), df['dice'].mean(), df['dice'].quantile(q=0.25), df['dice'].quantile(q=0.75)
stats.loc['Sensitivity']=df['sens'].median(), df['sens'].mean(), df['sens'].quantile(0.25), df['sens'].quantile(0.75)
stats


Unnamed: 0,Median,Mean,25th percentile,75th Percentile
Dice,0.999879,0.918608,0.99973,0.999946
Sensitivity,1.0,0.999872,1.0,1.0


In [43]:
print(df['dice'].describe())
print(df['sens'].describe())

df.to_csv('./results.csv')
stats.to_csv('./stats.csv')

count    74.000000
mean      0.918608
std       0.274731
min       0.000000
25%       0.999730
50%       0.999879
75%       0.999946
max       0.999979
Name: dice, dtype: float64
count    74.000000
mean      0.999872
std       0.000980
min       0.991576
25%       1.000000
50%       1.000000
75%       1.000000
max       1.000000
Name: sens, dtype: float64


# Submission

Use the following line to save your model for submission (in Google Colab this should save your model file into your personal Google Drive):

In [46]:
# --- Serialize a model
model.save('./class_imbalance.hdf5')

In [47]:
from google.colab import drive
drive.mount('/content/drive')

KeyboardInterrupt: ignored

### Canvas

Once you have completed this assignment, download the necessary files from Google Colab and your Google Drive. You will then need to submit the following items:

* final (completed) notebook: `[UCInetID]_assignment.ipynb`
* final (results) spreadsheet: `[UCInetID]_results.csv`
* final (trained) model: `[UCInetID]_model.hdf5`

**Important**: please submit all your files prefixed with your UCInetID as listed above. Your UCInetID is the part of your UCI email address that comes before `@uci.edu`. For example, Peter Anteater has an email address of panteater@uci.edu, so his notebooke file would be submitted under the name `panteater_notebook.ipynb`, his spreadshhet would be submitted under the name `panteater_results.csv` and and his model file would be submitted under the name `panteater_model.hdf5`.