# Assingment 1 - MRI Contrast Classifier
### Course: Convolutional Neural Networks with Applications in Medical Image Analysis


Welcome to the first course assignments! We have collected a dataset based on the popular BraTS challenge (http://braintumorsegmentation.org/), containing MRI slices of the brain, of different contrasts (sometimes referred to as modalities): T1-weighted (T1w), T1-weighted with contrast agent (T1w-CE), T2-weighted (T2w), and FLAIR, also a manually segmented binary map of a tumor, if visible on the slice. 

The assignments will build on each other, and all three of them will use the same dataset and the same data generator so take your time to familiarize yourself with these.

In the first assignments you are tasked with training a convolutional neural network to classify the acquired MR data into their contrasts (T1w, T1w-CE, T2w, FLAIR).

The code below is a working, but poor implementation of classifying between T1w and T2w contrasts. Your exercise is to expand and improve the code so the final model handles all four contrasts, and achieves an accuracy of $95\%$. 

The most important aspect of the assignment is that all your choices in the final code are explained and supported in writing. Show your though process, even if you have managed to improve the accuracy by trial and error. Make sure that in the report you include:
- How you reached the required performances
- Plot the confusion matrix of the validation data, using the final model.
- Describe the thought process behind building your model and choosing the model hyper-parameters.
- Describe what you think are the biggest issues with the current setup, and how to solve them.

Upload the updated notebook to Canvas before February $16^{th}$, 15:00.

Good luck and have fun!

## Environment setup
conda create --name 3ra023vt23 python=3.8.12
- You now have an environment named “3ra023vt23”. Activate by:
$ conda activate 3ra023vt23
% Blank row

- Install CUDA and cuDNN:
conda install cudatoolkit=10.1.243 cudnn=7.6.5
- Install Tensorflow **with GPU** support:
conda install tensorflow-gpu=2.2.0
- Or, install Tensorflow **without GPU** support:
conda install tensorflow=2.2.0
- Install the other packages we need:
conda install jupyter=1.0.0
conda install matplotlib=3.5.0
conda install scikit-learn=1.0.2
conda install scikit-image=0.18.3

In [4]:
import os
import numpy as np
np.random.seed(2023)  # Set seed for reproducibility

In [1]:
import tensorflow as tf
# tf.random.set_seed(2023)

2023-02-11 09:50:05.092357: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
import tensorflow.keras as keras

import matplotlib.pyplot as plt

gpus = tf.config.experimental.list_physical_devices('GPU')
if len(gpus) > 0:
    tf.config.experimental.set_memory_growth(gpus[0], True)
    print(f"GPU(s) available (using '{gpus[0].name}'). Training will be lightning fast!")
else:
    print("No GPU(s) available. Training will be suuuuper slow!")

# NOTE: These are the packages you will need for the assignment.
# NOTE: You are encouraged to use the course virtual environment, which already has GPU support.

No GPU(s) available. Training will be suuuuper slow!


##### The cell below will define the data generator for the data you will be using. You should not change anything in the below code!

In [5]:
class DataGenerator(keras.utils.Sequence):
    def __init__(self,
                 data_path,
                 arrays,
                 batch_size=32,
                 ):

        self.data_path = data_path
        self.arrays = arrays
        self.batch_size = batch_size

        if data_path is None:
            raise ValueError('The data path is not defined.')

        if not os.path.isdir(data_path):
            raise ValueError('The data path is incorrectly defined.')

        self.file_idx = 0
        self.file_list = [self.data_path + '/' + s for s in
                          os.listdir(self.data_path)]
        
        self.on_epoch_end()
        with np.load(self.file_list[0]) as npzfile:
            self.in_dims = []
            self.n_channels = 1
            for i in range(len(self.arrays)):
                im = npzfile[self.arrays[i]]
                self.in_dims.append((self.batch_size,
                                    *np.shape(im),
                                    self.n_channels))

    def __len__(self):
        """Get the number of batches per epoch."""
        return int(np.floor((len(self.file_list)) / self.batch_size))

    def __getitem__(self, index):
        """Generate one batch of data."""
        # Generate indexes of the batch
        indexes = self.indexes[index * self.batch_size:(index + 1) *
                               self.batch_size]

        # Find list of IDs
        list_IDs_temp = [self.file_list[k] for k in indexes]

        # Generate data
        a = self.__data_generation(list_IDs_temp)
        return a

    def on_epoch_end(self):
        """Update indexes after each epoch."""
        self.indexes = np.arange(len(self.file_list))
        np.random.shuffle(self.indexes)
    
    #@threadsafe_generator
    def __data_generation(self, temp_list):
        """Generate data containing batch_size samples."""
        # X : (n_samples, *dim, n_channels)
        # Initialization
        arrays = []

        for i in range(len(self.arrays)):
            arrays.append(np.empty(self.in_dims[i]).astype(np.single))

        for i, ID in enumerate(temp_list):
            with np.load(ID) as npzfile:
                for idx in range(len(self.arrays)):
                    x = npzfile[self.arrays[idx]] \
                        .astype(np.single)
                    x = np.expand_dims(x, axis=2)
                    arrays[idx][i, ] = x

        return arrays

# NOTE: Don't change the data generator!

In [6]:
gen_dir = "/import/software/3ra023/vt23/brats/data/"  # Change if you have copied the data locally on your machine
array_labels = ['t1', 't2']  # Available arrays are: 't1', 't1ce', 't2', 'flair', 'mask'.
batch_size = 4

gen_train = DataGenerator(data_path=gen_dir + 'training',
                          arrays=array_labels,
                          batch_size=batch_size)

gen_val = DataGenerator(data_path=gen_dir + 'validating',
                        arrays=array_labels,
                        batch_size=batch_size)

gen_test = DataGenerator(data_path=gen_dir + 'testing',
                         arrays=array_labels,
                         batch_size=batch_size)

# NOTE: What arrays are you using? Their order will be the same as their unpacking order during training!
# NOTE: What batch size are you using? Should you use more? Or less?
# NOTE: Are you using the correct generators for the correct task? Training for training and validating for validating?

ValueError: The data path is incorrectly defined.

### Let's plot some example images from the dataset:

In [None]:
imgs = gen_train[0]
for inp in range(np.shape(imgs)[0]):
    plt.figure(figsize=(12,5))
    for i in range(4):
        plt.subplot(1, 4, i + 1)
        plt.imshow(imgs[inp][i, :, :, 0], cmap='gray')
        plt.title('Image size: ' + str(np.shape(imgs[inp][i, :, :, 0])))
        plt.tight_layout()
    plt.suptitle('Array: ' + gen_train.arrays[inp])
    plt.show()

### The dataset preprocessing so far has been to help you, you should not change anything above. However, from now on, take nothing for granted.

A quick summery of the data sizes:

In [None]:
# A quick summary of the data:
print(f"Number of training images: {str(len(gen_train.file_list))}")
print(f"Training batch size      : {str(gen_train.in_dims)}")

In [None]:
from tensorflow.keras import backend as K
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense, Conv2D, Conv2DTranspose
from tensorflow.keras.layers import Flatten, Input
from tensorflow.keras.layers import MaxPooling2D, AveragePooling2D, UpSampling2D
from tensorflow.keras.layers import Activation, Concatenate
from tensorflow.keras.layers import Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam, RMSprop, Nadam

# NOTE: Take inspiration from the imported layers and components, however you are not required to use all of them.

In [3]:
# Define global variables:
H, W, IN_CHANS = gen_train.in_dims[0][1:]

In [4]:
class ConvBlock:
    def __init__(self,
                out_channels:int,
                stride:tuple=(1,1),
                activation:str=None,
                padding:str="same",
                kernel:str="he_normal",
                inp=):
        inp = Input(shape=(H,W,IN_CHANS), name='input_1')
        conv  = Conv2D(out_channels)


In [None]:
# NOTE: This is a very basic network, that you will need to improve.

def build_model(height, width, channels):
    inp = Input(shape=(height, width, channels), name='input_1')
    
    conv1 = Conv2D(2, 3, activation=None, padding='same', kernel_initializer='he_normal')(inp)
    pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
    
    flat = Flatten()(pool1)
    output_1 = Dense(2, activation='softmax')(flat)

    return Model(inputs=[inp], outputs=[output_1])

# NOTE: A better designed network will improve performance. Look at the imported layers in the cell above for inspiration.

In [None]:
height, width, channels = gen_train.in_dims[0][1:]
model = build_model(height=height, width=width, channels=channels)
model.summary()

# NOTE: Are the input sizes correct?
# NOTE: Are the output sizes correct?
# NOTE: Try to imagine the model layer-by-layer and think it through. Is it doing something reasonable?
# NOTE: Are the model parameters split "evenly" between the layers? Or is there one huge layer?
# NOTE: Will the model fit into memory? Is the model too small? Is the model too large?

In [None]:
custom_lr = 0.00001
custom_optimizer = RMSprop(lr=custom_lr)
custom_loss = "mse"
custom_metric = "accuracy"

model.compile(loss=custom_loss,
              optimizer=custom_optimizer,
              metrics=[custom_metric])

# NOTE: Are you satisfied with the loss function?
# NOTE: Are you satisfied with the metric?
# NOTE: Are you satisfied with the optimizer? Look at the cell where the optimizers are imported for inspiration.
# NOTE: Are you satisfied with the optimizer's parameters?

In [None]:
n_epochs = 50
n_classes = 2
t1_label = tf.one_hot(np.repeat(0, batch_size), n_classes)
t2_label = tf.one_hot(np.repeat(1, batch_size), n_classes)

for epoch in range(n_epochs):
    training_loss = []
    validating_loss = []
    
    for idx, (t1, t2) in enumerate(gen_train):
        images = np.concatenate((t1, t2), axis=0)
        labels = np.concatenate((t1_label, t2_label), axis=0)
        h = model.train_on_batch(images, labels)
        training_loss.append(h)
    
    for idx, (t1, t2) in enumerate(gen_val):
        images = np.concatenate((t1, t2), axis=0)
        labels = np.concatenate((t1_label, t2_label), axis=0)
        validating_loss.append(model.test_on_batch(images, labels)[-1])
        pred = model.predict_on_batch(images)
    
    print(f"Epoch: {epoch + 1:2d}, training loss: {np.mean(training_loss):.3f}, validation loss: {np.mean(validating_loss):.3f}")