# Brain Tumor Classifier Using Convolutional Neural Network (CNN)

## Business Understanding
According to the [National Brain Tumor Society](https://braintumor.org/brain-tumors/about-brain-tumors/brain-tumor-facts/), approximately 700,000 individuals in the United States are living with a primary brain tumor. In 2023, over 94,000 people are expected to be diagnosed with a brain tumor, and more than 18,000 will succumb to the disease. Even "benign" tumors can significantly impact a patient's quality of life, while malignant tumors like gliomas can often be fatal.

Accurate tumor classification is crucial for effective treatment. However, the [National Cancer Institute](https://www.cancer.gov/rare-brain-spine-tumor/blog/2020/brain-tumors-diagnosed-treated) reports that as of 2020, 5-10% of brain tumor diagnoses are incorrect. This may be due to the vast number of brain tumor types, as [classified by the World Health Organization](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8328013/). A well-trained deep learning model on MRI images from various brain tumors could be a valuable tool for clinicians.

Current deep learning models typically focus on three tumor types (pituitary, meningioma, and glioma) due to limited MRI image availability. Some models also include a "healthy" or "no tumor" category. While there is some publicly available data for other tumor types, the sample sizes are often too small for effective deep learning. This project aims to enhance existing models by adding an "other tumor" class, encompassing images from patients with rarer tumor types.

An accurate tumor classification model trained on a broader dataset could be highly beneficial to stakeholders such as American College of Radiology-accredited facilities, which encounter patients with a wide variety of tumor types.

In [21]:
# Imports

# For Data Processing And Evaluation
import numpy as np
import pandas as pd
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, ConfusionMatrixDisplay

# For ML Models
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, LearningRateScheduler
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import Accuracy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing.image import load_img, ImageDataGenerator
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from keras import layers
from keras.models import Model

# For Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Miscellaneous
import os
import shutil
import random

# Suppress future, deprecation, and SettingWithCopy warnings
import warnings
warnings.filterwarnings("ignore", category= FutureWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)
pd.options.mode.chained_assignment = None

# make all columns in a df viewable and wider
pd.options.display.max_columns = None
pd.options.display.width = None
pd.set_option('max_colwidth', 400)

## Data Understanding and Preparation
I used two datasets:
- 7,023 total T1C-enhanced MRI images of pituitary, meningioma, and glioma, as well as images of brains with no tumors.  These images are from a [dataset available on Kaggle from Masoud Nickparvar.](https://www.kaggle.com/datasets/masoudnickparvar/brain-tumor-mri-dataset)
- A subset of 1,177 "other" tumor MRI images from a [dataset available on Kaggle from Fernando Feltrin.](https://www.kaggle.com/datasets/fernando2rad/brain-tumor-mri-images-44c) In constructing this "other" class.

The use of this second dataset is what allowed me to construct the "other" category, which differentiates this model from many other proposed approaches.

In [22]:
# specify local directories for training and testing data
train_dir = 'data/Training/'
test_dir = 'data/Testing/'

# Create a new directory named "other" in the train directory
new_train_dir = os.path.join(train_dir, 'other')
if not os.path.exists(new_train_dir):
    os.makedirs(new_train_dir)

# Create a new directory named "other" in the test directory
new_test_dir = os.path.join(test_dir, 'other')
if not os.path.exists(new_test_dir):
    os.makedirs(new_test_dir)

In [23]:
# specify folders with images to move from second dataset
supplement_labels = ['Astrocitoma T1C+', 'Carcinoma T1C+', 'Ependimoma T1C+', 'Ganglioglioma T1C+',
                'Germinoma T1C+', 'Granuloma T1C+', 'Meduloblastoma T1C+', 'Neurocitoma T1C+',
                'Oligodendroglioma T1C+', 'Papiloma T1C+', 'Schwannoma T1C+', 'Tuberculoma T1C+']

# specify path with images from second dataset
supplement_path = 'data/supplement/'

# move 80% of images to the new "other" directory in train_dir and 20% to the new "other" directory in test_dir
for label in supplement_labels:
    folder_path = os.path.join(supplement_path, label)
    files = os.listdir(folder_path)
    random.shuffle(files)
    num_files = len(files)
    num_train_files = int(num_files * 0.8)
    train_files = files[:num_train_files]
    test_files = files[num_train_files:]
    files_moved = 0
    
    for filename in train_files:
        if filename.endswith((".jpg", ".jpeg")):
            src_path = os.path.join(folder_path, filename)
            if os.path.isfile(src_path):
                dst_path = os.path.join(new_train_dir, filename)
                shutil.move(src_path, dst_path)

    for filename in test_files:
        if filename.endswith((".jpg", ".jpeg")):
            src_path = os.path.join(folder_path, filename)
            if os.path.isfile(src_path):
                dst_path = os.path.join(new_test_dir, filename)
                shutil.move(src_path, dst_path)

Now that the images are all in the right place, we can create our training and test sets. We'll make corresponding lists of filepaths and labels, and store the information in a DataFrame to use in our ImageDataGenerators later.

In [24]:
# list of labels corresponding to folders
labels = ['pituitary', 'notumor', 'meningioma', 'glioma', 'other']

# initialize empty lists
X_train = []
y_train = []

# add filepaths and labels to train lists
for label in labels:
    for image in os.listdir(train_dir+label):
        X_train.append(train_dir+label+'/'+image)
        y_train.append(label)

In [25]:
# initialize empty lists
X_test = []
y_test = []

# add filepaths and labels to test lists
for label in labels:
    for image in os.listdir(test_dir+label):
        X_test.append(test_dir+label+'/'+image)
        y_test.append(label)

In [26]:
# shuffle lists
X_train, y_train = shuffle(X_train, y_train)

In [None]:
# create dataframe for later use
tumor_train_df = pd.concat([pd.Series(X_train, name = 'paths'), 
                            pd.Series(y_train, name = 'label')], 
                            axis = 1)
tumor_train_df

In [28]:
# shuffle lists
X_test, y_test = shuffle(X_test, y_test)

In [None]:
# create dataframe for later use
tumor_test_df = pd.concat([pd.Series(X_test, name = 'paths'), 
                            pd.Series(y_test, name = 'label')], 
                            axis = 1)
tumor_test_df

Now that we have training and test DataFrames, let's take a look at some of the characteristics of our data.

### Exploratory Data Analysis

Let's look at the distribution of our target variable - tumor types.

In [None]:
tumor_train_df['label'].value_counts(normalize = True)

In [None]:
# instantiate figure
fig, ax = plt.subplots()

# plot histogram of tumor types
tumor_train_df['label'].value_counts().plot(kind = 'bar')

#set title and axis labels
ax.set_title('Distribution of tumor type in training data')
ax.set_ylabel('Count')
ax.set_xlabel('Tumor Type');

Our training classes are slightly imbalanced, but not terribly so.

In [None]:
tumor_test_df['label'].value_counts(normalize = True)

In [None]:
# instantiate figure
fig, ax = plt.subplots()

# plot tumor types
tumor_test_df['label'].value_counts().plot(kind = 'bar')

# set title and axis labels
ax.set_title('Distribution of tumor type in test data')
ax.set_ylabel('Count')
ax.set_xlabel('Tumor Type');

The test data is the same - imbalanced, but not alarmingly so. Based on this distribution of classes, we would expect a naive model that simply predicts the majority class every time to achieve an accuracy of about 26%.

Let's take a look at some of these MRI images.

In [None]:
# with gratitude to MD Mushfirat Mohaimin for this code
# https://www.kaggle.com/code/mushfirat/brain-tumor-classification-accuracy-96

IMAGE_SIZE = (224, 224)

def open_images(paths):
    '''
    Given a list of paths to images, this function returns the images as arrays.
    '''
    images = []
    for path in paths:
        image = load_img(path, target_size=IMAGE_SIZE)
        image = np.array(image)
        images.append(image)
    return np.array(images)

# open 16 images
images = open_images(X_train[50:67])
labels = y_train[50:67]
fig = plt.figure(figsize=(12, 16))
for x in range(1, 17):
    fig.add_subplot(4, 4, x)
    plt.axis('off')
    plt.title(labels[x-1])
    plt.imshow(images[x-1])
plt.rcParams.update({'font.size': 12})
plt.show()

They appears to be taken from different angles, but are all grayscale. The images with tumors of any kind appear distinct from images with no tumors, which makes sense. Pituitary tumors seem localized in one spot (the pituitary gland, at the base of the brain), whereas gliomas, meningiomas, and other tumors are distributed elsewhere in the brain. Let's construct a simple baseline model as a starting point.

## Baseline Model

For this model, we will use a single dense layer with 32 nodes before our output layer. We will not perform any image augmentation for now - we just want to get the simplest possible baseline sense of the model's accuracy.

In [None]:
# rescale images
train_datagen_baseline = ImageDataGenerator(rescale=1./255)
test_datagen_baseline = ImageDataGenerator(rescale=1./255)
batch_size = 32

# generate image data from training df
train_generator_baseline = train_datagen_baseline.flow_from_dataframe(
        tumor_train_df,
        x_col = 'paths',
        y_col = 'label',
        target_size=IMAGE_SIZE,
        batch_size=batch_size,
        class_mode='categorical',
        color_mode='grayscale',
        seed = 1990,
        shuffle = False)

# generate image data from test df
test_generator_baseline = test_datagen_baseline.flow_from_dataframe(
        tumor_test_df,
        x_col = 'paths',
        y_col = 'label',
        target_size=IMAGE_SIZE,
        batch_size=batch_size,
        class_mode='categorical',
        color_mode='grayscale',
        seed = 1990,
        shuffle = False)

In [36]:
# specify hyperparamters for model
cce = keras.losses.CategoricalCrossentropy()
opt = keras.optimizers.Adam()
es = EarlyStopping(monitor='loss', patience=5)

# define learning rate scheduler
def scheduler(epoch, lr):
    lr = .001
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1)
    
sched = LearningRateScheduler(scheduler)

In [None]:
# instantiate model and add dense layer and output layer
model = Sequential()
model.add(layers.Flatten())
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(5, activation='softmax'))

# compile model
model.compile(optimizer=opt, loss=cce,  metrics=['accuracy'])

In [None]:
base_model = model.fit(train_generator_baseline,
                       verbose = 1,
                       validation_data = test_generator_baseline, 
                       epochs=50,
                       callbacks = [es])

In [None]:
model.summary()

In [None]:
# summarize history for accuracy
plt.plot(base_model.history['accuracy'])
plt.plot(base_model.history['val_accuracy'])
plt.title('Model Accuracy - Baseline')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(base_model.history['loss'])
plt.plot(base_model.history['val_loss'])
plt.title('Model Loss - Baseline')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

The model peaked at around 66% accuracy on test data, which is better than a naive model but not nearly good enough to be deployed in clinical practice. It is also badly overfit to the training data. We can likely improve on this accuracy by augmenting our training data, and adding layers to the model.

## Data Augmentation and Second Model

For this second model, we'll augment our training images by flipping them horizontally and vertically; adjusting the brightness of images to make them darker or brighter; and rotating them slightly to account for potential rotational differences in test images. We won't perform any augmentation on the test images.

In [None]:
# instantiate generators with augmentation for training data
train_datagen_aug = ImageDataGenerator(rescale=1./255,
                                       horizontal_flip=True,
                                       vertical_flip=True,
                                       brightness_range=[0.75, 1.25],
                                       rotation_range = 15)
test_datagen_aug = ImageDataGenerator(rescale=1./255)

# generate image data
train_generator_aug = train_datagen_aug.flow_from_dataframe(
        tumor_train_df,
        x_col = 'paths',
        y_col = 'label',
        target_size=IMAGE_SIZE,
        batch_size=32,
        class_mode='categorical',
        color_mode='grayscale',
        seed = 1990,
        shuffle = False)

# generate image data
test_generator_aug = test_datagen_aug.flow_from_dataframe(
        tumor_test_df,
        x_col = 'paths',
        y_col = 'label',
        target_size=IMAGE_SIZE,
        batch_size=32,
        class_mode='categorical',
        color_mode='grayscale',
        seed = 1990,
        shuffle = False)

For this model, we'll add in an additional dense layer, but more importantly we will add in two convolutional and pooling layers. Convolutional layers aid in feature extraction (such as edge detection) by using a defined filter to recognize patterns in the image. Max pooling layers are used to downsample the input image by taking only the maximum value from the defined window, which helps to consolidate the features learned by the convolutional layers. 

In [25]:
model_conv = Sequential()

# define 3x3 filter window sizes and create 64 filters
model_conv.add(layers.Conv2D(64, (3, 3), activation='relu',
                        input_shape=(224, 224, 1)))
# max pool in 2x2 window
model_conv.add(layers.MaxPooling2D((2, 2)))

# define 3x3 filter window sizes and create 128 filters
model_conv.add(layers.Conv2D(128, (3, 3), activation='relu'))
# max pool in 2x2 window
model_conv.add(layers.MaxPooling2D((2, 2)))

model_conv.add(layers.Flatten())

# add dense layers and output layer
model_conv.add(layers.Dense(128, activation='relu'))
model_conv.add(layers.Dense(64, activation='relu'))
model_conv.add(layers.Dense(5, activation='softmax'))

# compile and fit model
model_conv.compile(optimizer=opt, loss=cce,  metrics=['accuracy'])

In [None]:
model_conv.summary()

In [None]:
conv_model = model_conv.fit(train_generator_aug,
                       verbose = 1,
                       validation_data = test_generator_aug, 
                       epochs=50,
                       callbacks = [es])

In [None]:
# summarize history for accuracy
plt.plot(conv_model.history['accuracy'])
plt.plot(conv_model.history['val_accuracy'])
plt.title('Model Accuracy - Simple CNN')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(conv_model.history['loss'])
plt.plot(conv_model.history['val_loss'])
plt.title('Model Loss - Simple CNN')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

As we can see, the accuracy rose for this model from the baseline considerably - from 66% to nearly 91%. We also reduced the overfitting observed in the baseline model, though not completely. Our next model will focus on trying to eliminate overfitting altogether.

## Model 3: With Regularization

We'll add some regularization to this model in the form of batch normalization, which makes the values in each layer more stable; dropout layers, which randomly set some of the outputs from the previous layer to zero during each epoch to encourage the network to learn more about more robust features and less about features of specific inputs; and kernel regularization applied to the convolutional layers, which functions similarly to the regularization penalties applied to other models. We'll also use a learning rate scheduler going forward, which automatically decays the value of the learning rate starting at a specified epoch to ensure changes made to the model weights later in the training process are smaller and thus less likely to result in overfitting.

In [29]:
# specify level of regularization
reg = l2(1e-2)

# instantiate model
model_conv_drop = Sequential()

# first conv layer
model_conv_drop.add(layers.Conv2D(64, (3, 3), activation='relu',
                        input_shape=(224, 224, 1), kernel_regularizer = reg))
model_conv_drop.add(BatchNormalization())
model_conv_drop.add(layers.MaxPooling2D((2, 2)))

# second conv layer
model_conv_drop.add(layers.Conv2D(128, (3, 3), activation='relu', kernel_regularizer = reg))
model_conv_drop.add(layers.MaxPooling2D((2, 2)))

model_conv_drop.add(layers.Flatten())

# dense layers and output layer
model_conv_drop.add(layers.Dense(128, activation='relu'))
model_conv_drop.add(Dropout(0.3))
model_conv_drop.add(layers.Dense(64, activation='relu'))
model_conv_drop.add(layers.Dense(32, activation='relu'))
model_conv_drop.add(layers.Dense(5, activation='softmax'))

# compile model
model_conv_drop.compile(optimizer=opt, loss=cce,  metrics=['accuracy'])

In [None]:
model_conv_drop.summary()

In [None]:
conv_drop_model = model_conv_drop.fit(train_generator_aug,
                       verbose = 1,
                       validation_data = test_generator_aug, 
                       epochs=50,
                       callbacks = [es, sched])

In [None]:
# summarize history for accuracy
plt.plot(conv_drop_model.history['accuracy'])
plt.plot(conv_drop_model.history['val_accuracy'])
plt.title('Model Accuracy - CNN W/ Normalization')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(conv_drop_model.history['loss'])
plt.plot(conv_drop_model.history['val_loss'])
plt.title('Model Loss - CNN W/ Normalization')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

The performance seems to have remained about the same from that of the prior model. Let's try a deeper model.

## Transfer Learning - VGG

Transfer learning is the use of a model developed for one task in a separate task. In this case, I will use the VGG-16 model, which is a 16-layer CNN originally developed for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) computer vision competition. There are two approaches to transfer learning - the model can be "frozen" with its existing weights and used purely for feature extraction before the dense and output layers are added on, or parts can be "unfrozen" and fine-tuned on new data. I will start with the first approach.

In [None]:
# instantiate generators with augmentation and preprocessing for training data
train_datagen_vgg = ImageDataGenerator(rescale=1./255,
                                       preprocessing_function=keras.applications.vgg16.preprocess_input,
                                       horizontal_flip=True,
                                       vertical_flip=True,
                                       brightness_range=[0.75, 1.25],
                                       rotation_range = 15)
test_datagen_vgg = ImageDataGenerator(rescale=1./255, 
                                      preprocessing_function=keras.applications.vgg16.preprocess_input)

# generate image data for training set
train_generator_vgg = train_datagen_vgg.flow_from_dataframe(
        tumor_train_df,
        x_col = 'paths',
        y_col = 'label',
        target_size=IMAGE_SIZE,
        batch_size=32,
        class_mode='categorical',
        seed = 1990,
        shuffle = False)

# generate image data for test set
test_generator_vgg = test_datagen_vgg.flow_from_dataframe(
        tumor_test_df,
        x_col = 'paths',
        y_col = 'label',
        target_size=IMAGE_SIZE,
        batch_size=32,
        class_mode='categorical',
        seed = 1990,
        shuffle = False)

## Model 4: Frozen VGG

In [34]:
# instantiate vgg model
vgg = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
output = vgg.layers[-1].output
output = layers.Flatten()(output)
vgg_model = Model(vgg.input, output)

# freeze layers
vgg_model.trainable = False
for layer in vgg_model.layers:
    layer.trainable = False
    
input_shape = vgg_model.output_shape[1]

In [None]:
# check model summary
vgg_model.summary()

As we can see, the model has over 14 million parameters - but none of them are trainable! Let's try fitting it to our data.

In [None]:
# instantiate model
vggmodel = Sequential()

# add vgg layers
vggmodel.add(vgg_model)
vggmodel.add(layers.Flatten())

# add dense layers and output layer
vggmodel.add(Dense(512, activation='relu', input_dim=input_shape))
vggmodel.add(Dropout(0.3))
vggmodel.add(Dense(512, activation='relu'))
vggmodel.add(Dropout(0.3))
vggmodel.add(Dense(5, activation='softmax'))

# compile and fit
vggmodel.compile(optimizer=opt, loss=cce,  metrics=['accuracy'])
vgg_model_feats = vggmodel.fit(train_generator_vgg,
                       verbose = 1,
                       validation_data = test_generator_vgg, 
                       epochs=50,
                       callbacks = [es, sched])

In [None]:
# summarize history for accuracy
plt.plot(vgg_model_feats.history['accuracy'])
plt.plot(vgg_model_feats.history['val_accuracy'])
plt.title('Model Accuracy - VGG Frozen')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(vgg_model_feats.history['loss'])
plt.plot(vgg_model_feats.history['val_loss'])
plt.title('Model Loss - VGG Frozen')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

This model performed the best of any so far, peaking at over 93% accuracy. Let's try fine-tuning it.

### Model 5: VGG With Fine-Tuning

In [38]:
# instantiate vgg model
vgg = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
output = vgg.layers[-1].output
output = layers.Flatten()(output)
vgg_model_finetune = Model(vgg.input, output)
input_shape = vgg_model_finetune.output_shape[1]

# set last convolutional layer to trainable
vgg_model_finetune.trainable = True
set_trainable = False
for layer in vgg_model_finetune.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

In [None]:
# examine model summary
vgg_model_finetune.summary()

Now we can see that about half of the parameters are trainable. Let's try fitting the model to our data.

In [None]:
# instantiate model
vggmodel_finetune = Sequential()

# add VGG layers
vggmodel_finetune.add(vgg_model_finetune)

vggmodel_finetune.add(layers.Flatten())

# add dense layers and output layer
vggmodel_finetune.add(Dense(512, activation='relu', input_dim=input_shape))
vggmodel_finetune.add(Dropout(0.3))
vggmodel_finetune.add(Dense(512, activation='relu'))
vggmodel_finetune.add(Dropout(0.3))
vggmodel_finetune.add(Dense(5, activation='softmax'))

# compile and fit model
vggmodel_finetune.compile(optimizer=opt, loss=cce,  metrics=['accuracy'])
finetune_vgg = vggmodel_finetune.fit(train_generator_vgg,
                       verbose = 1,
                       validation_data = test_generator_vgg, 
                       epochs=50,
                       callbacks = [es, sched])

In [None]:
# summarize history for accuracy
plt.plot(finetune_vgg.history['accuracy'])
plt.plot(finetune_vgg.history['val_accuracy'])
plt.title('Model Accuracy - VGG Fine-Tuned')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(finetune_vgg.history['loss'])
plt.plot(finetune_vgg.history['val_loss'])
plt.title('Model Loss - VGG Fine-Tuned')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Again, this model performed similarly to the simple CNN. Let's take a closer look at the performance of the simple CNN, regularized CNN, and both VGG-based models.

## Model Comparison and Selection

To examine the results of each model, we'll need to generate predictions for our test data and compare them to the true classes.

In [None]:
# generate predictions for three top models
y_pred_conv = model_conv.predict(test_generator_aug, batch_size = batch_size)
y_pred_conv_drop = model_conv_drop.predict(test_generator_aug, batch_size = batch_size)
y_pred_vgg = vggmodel.predict(test_generator_vgg)
y_pred_vgg_finetune = vggmodel_finetune.predict(test_generator_vgg)

In [43]:
# specify true and predicted classes for each model
y_pred_classes_conv = np.argmax(y_pred_conv, axis=1)
y_true_classes_conv = test_generator_aug.classes
y_pred_classes_conv_drop = np.argmax(y_pred_conv_drop, axis=1)
y_true_classes_conv_drop = test_generator_aug.classes
y_pred_classes_vgg = np.argmax(y_pred_vgg, axis=1)
y_true_classes_vgg = test_generator_vgg.classes
y_pred_classes_vgg_finetune = np.argmax(y_pred_vgg_finetune, axis=1)
y_true_classes_vgg_finetune = test_generator_vgg.classes

In [None]:
# confirm order of classes for each generator
print(test_generator_aug.class_indices.keys())
print(test_generator_vgg.class_indices.keys())

In [None]:
# instantiate figure
fig, axes = plt.subplots(figsize = (16, 16), ncols = 2, nrows = 2)
report_labels = ['glioma', 'meningioma', 'notumor', 'other', 'pituitary']

# plot confusion matrices for each model
ConfusionMatrixDisplay.from_predictions(y_true_classes_conv, y_pred_classes_conv, 
                                        display_labels = report_labels,
                                        xticks_rotation = 90,
                                        ax = axes[0, 0])
ConfusionMatrixDisplay.from_predictions(y_true_classes_conv_drop, y_pred_classes_conv_drop, 
                                        display_labels = report_labels,
                                        xticks_rotation = 90,
                                        ax = axes[0, 1])
ConfusionMatrixDisplay.from_predictions(y_true_classes_vgg, y_pred_classes_vgg, 
                                        display_labels = report_labels,
                                        xticks_rotation = 90,
                                        ax = axes[1, 0])
ConfusionMatrixDisplay.from_predictions(y_true_classes_vgg_finetune, y_pred_classes_vgg_finetune, 
                                        display_labels = report_labels,
                                        xticks_rotation = 90,
                                        ax = axes[1, 1])

axes[0,0].set_title('Simple CNN')
axes[0,1].set_title('CNN With Normalization')
axes[1,0].set_title('VGG-16 - Feature Extraction')
axes[1,1].set_title('VGG-16 - Fine-Tuned')

plt.tight_layout()
plt.show();

In [None]:
print('Simple CNN')
print(classification_report(test_generator_aug.classes, y_pred_classes_conv, target_names = report_labels))
print('--------------------------------------------------------')
print('CNN With Regularization')
print(classification_report(test_generator_aug.classes, y_pred_classes_conv_drop, target_names = report_labels))
print('--------------------------------------------------------')
print('VGG-16 - Feature Extraction')
print(classification_report(test_generator_vgg.classes, y_pred_classes_vgg, target_names = report_labels))
print('--------------------------------------------------------')
print('Fine-Tuned VGG-16')
print(classification_report(test_generator_vgg.classes, y_pred_classes_vgg_finetune, target_names = report_labels))

### Model Selection
Overall, each of the four models above performed relatively well in terms of both accuracy and overall recall. The VGG-16 model using all of the pre-trained weights had the highest accuracy, but we can dig deeper into where each model went wrong and select our final model.

The worst possible outcome for our model would be to predict that a person does not have a tumor when they actually do. We want our model to correctly the presence of a case as often as possible so the patient can follow up, receive a final diagnosis from their clinician, and start treatment appropriately. 

The VGG-16 model using pre-trained weights has the highest recall for the no tumor class at 99%, which means it correctly classified 99% of all the images with no tumor present in the dataset. Further, it has the highest precision for that class at 97%, meaning 97% of all of its predicted positives were true positives (and thus the fewest of its predicted "no tumor" images were actually images that showed a tumor). **Subsequently, we can select the VGG-16 feature extraction model as our final model because it performs better than the rest in terms of overall accuracy, and makes mistakes more rarely for the "no tumor" class than any other model.**

### Examining Incorrect Predictions
Now that we've selected a model, let's look more deeply at some of the incorrect predictions generated by that model to better understand what those images look like.

In [None]:
# create new df with test data and corresponding predictions
tumor_test_df_preds = pd.concat([tumor_test_df, 
                            pd.Series(y_pred_classes_vgg, name = 'pred_label')], 
                            axis = 1)
tumor_test_df_preds

In [None]:
# relabel predictions
pred_dict = {0: 'glioma', 1: 'meningioma', 2: 'notumor', 3: 'other', 4: 'pituitary'}
tumor_test_df_preds['pred_label'].replace(pred_dict, inplace = True)
tumor_test_df_preds

In [None]:
# create df with subset of only incorrect predictions
wrong_preds = tumor_test_df_preds.loc[tumor_test_df_preds['label'] != 
                                      tumor_test_df_preds['pred_label']]
wrong_preds.reset_index(drop = True, inplace = True)
wrong_preds

In [None]:
# open 16 images
images = open_images(wrong_preds['paths'][0:17])
labels = wrong_preds['label'][0:17]
pred_labels = wrong_preds['pred_label'][0:17]
fig = plt.figure(figsize=(12, 16))
for x in range(1, 17):
    fig.add_subplot(4, 4, x)
    plt.axis('off')
    plt.title(f'True: {labels[x]}\n Predicted: {pred_labels[x]}')
    plt.imshow(images[x])
plt.rcParams.update({'font.size': 12})
plt.show()

In [None]:
# save model for use in deployment
vggmodel.save('/Users/eli/Desktop/brain_tumor_CNN_classifier/final_model')