# Detecting brain tumors on MRI images using VGG19 model and tuning its hyperparameters to improve performance

In [1]:
import os
import matplotlib.pyplot as plt

import tensorflow as tf
import tensorflow.keras.layers as tfl

from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow.keras.applications.vgg19 import VGG19
from tensorflow.keras.layers import Flatten, Dense, Input, Dropout, Rescaling
from tensorflow.keras.models import Model

In [2]:
notebook_path = os.path.dirname(os.path.abspath('brain_MRI_classification.ipynb'))
datasets_combined = os.path.join(notebook_path, 'brainMRI_data')

train_directory = os.path.join(datasets_combined, 'Training')
test_directory = os.path.join(datasets_combined, 'Testing')

In [3]:
BATCH_SIZE = 64
IMG_SIZE = (224, 224)

train_dataset = image_dataset_from_directory(train_directory,
                                             batch_size = BATCH_SIZE,
                                             image_size = IMG_SIZE,
                                             shuffle = True,
                                             validation_split = 0.2,
                                             subset = 'training',
                                             seed = 42,
                                             label_mode='categorical')

validation_dataset = image_dataset_from_directory(train_directory,
                                                  batch_size = BATCH_SIZE,
                                                  image_size = IMG_SIZE,
                                                  shuffle = True,
                                                  validation_split = 0.2,
                                                  subset = 'validation',
                                                  seed = 42,
                                                  label_mode='categorical')

Found 2870 files belonging to 4 classes.
Using 2296 files for training.
Found 2870 files belonging to 4 classes.
Using 574 files for validation.


In [4]:
test_dataset = image_dataset_from_directory(test_directory,
                                            shuffle = False,
                                            image_size = IMG_SIZE,
                                            label_mode='categorical')

Found 394 files belonging to 4 classes.


In [5]:
AUTOTUNE = tf.data.AUTOTUNE

train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
validation_dataset = validation_dataset.prefetch(buffer_size=AUTOTUNE)
test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

In [6]:
# (224, 224) -> (224, 224, 3) for the 3 color channels
IMG_SHAPE = IMG_SIZE + (3,)

# load network without the top classification layers
base_model = VGG19(include_top = False,
                   weights = 'imagenet',
                   input_shape=IMG_SHAPE)

In [7]:
data_augmentation = tf.keras.Sequential([
  tf.keras.layers.RandomFlip('horizontal'),
  tf.keras.layers.RandomRotation(0.1),
])

In [8]:
# only augment training set
train_augmented = train_dataset.map(lambda x, y: (data_augmentation(x, training = False), y))

In [9]:
preprocess_input = tf.keras.applications.vgg19.preprocess_input

In [10]:
base_model.trainable = False

inputs = tf.keras.Input(shape=IMG_SHAPE)
x = preprocess_input(inputs)
x = base_model(x, training=False)

# Flatten the output layer to 1D
x = Flatten()(x)

# Add a fully connected layer with 4096 hidden units, ReLU activation
x = Dense(4096, activation = 'relu')(x)

# Add a dropout layer with 0.2 (20%) rate
x = Dropout(0.5)(x) 

# Add another FC layer, 4096 units, ReLU activation
x = Dense(4096, activation = 'relu')(x)

# Add another dropout layer with 0.2 (20%) rate
x = Dropout(0.5)(x) 

# Add a final FC layer for classification with 4 units using softmax activation function
outputs = Dense(4, activation = 'softmax')(x)

In [11]:
# Configure and compile the model
model = Model(inputs, outputs)
model.compile(loss='categorical_crossentropy',
              optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.0001),
              metrics=['accuracy'])

In [12]:
results = model.fit(train_augmented, epochs = 40, validation_data = validation_dataset, verbose = 1)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


In [13]:
loss, accuracy = model.evaluate(test_dataset)
print('Test accuracy :', accuracy)

Test accuracy : 0.7360405921936035


<pre>
While training the model, the accuracy came out to 99.13% for the training set and 92.33% for the validation set, after epoch 40/40. However, when evaluating the mdoel against the testing set, accuracy came to about 73.6%. 

We took different approaches to tune the model in order to improve its performance. It is important to note that the process involved using the strategy of "orthogonalization" in the context of tuning hyperparameters, though not all the models and results are shown. We optimized the model in the following ways (not necessarily in the order given):

We preprocessed the data the same way the pretrained model processed the ImageNet data to get the learned weights. Based on tensorflow documentation, the images are converted from RGB to BGR and each color channel is zero-centered based on the ImageNet data. This normalizes the data, which helps the learning algorithm converge faster. We lowered the learning rate from 0.001 (default) to 0.0001 while using the Adam optimization algorithm, which allowed us to take advantage of the techniques behind momentum and RMSprop. We lowered this parameter when the training accuracy was relatively low in attempts to increase it. Additionally, the training time was increased from 20 epochs to 40 epochs (ultimate choice was influenced by research papers), which was also done in attempt to increase the learning accuracy for the training set. With a combination of these, the training accuracy increased significantly based on previous drafts of the model, but some methods compromised the validation accuracy slightly. 

Additionally, we performed data augmentation as a method of regularization in order to avoid overfitting the data and decrease variance. Implementing the Dropout layers with a 50% rate was also a method of decreasing variance. We lowered the learning rate from 0.001 (default) to 0.0001 while using the Adam optimization algorithm, which allowed us to take advantage of the techniques behind momentum and RMSprop. We increased the batch size from 32 to 64, which helped the model converge more quickly, but may have compromised the long term learning accuracy, potentionally contibuting to the ongoing fluctuations in accuracy/losses. However it's important to note that a combination of some of the methods mentioned above decreased the extent of this fluctuation/oscillation in the latter epochs.

There are many approaches we can take to further improve the model's performance. We can see how the model performs using batch size of 32 with the rest of the parameters fixed, since this change was made early on. This would increase the training time, however, and given how long it currently takes to run, that's something I would consider after trying out other approaches. The relatively high accuracy for the training and validation set compared to that of the testing set indicates that the model may have overfit the data and is unable to generalize when given new data. Thus, a promising approach would be to further consider implementing other methods of regularization.

We will attempt to train a Transformer network (Swin) to compare its performance on the testing set with what we see here.

<pre>