# Business Understanding

## Overview
Brain tumors present a critical challenge in the medical field due to their complexity and the vital importance of early and accurate diagnosis. Early detection of brain tumors is crucial for improving patient outcomes and survival rates. As studies have shown, "Catching tumors early often allows for more treatment options. Some early tumors may have signs and symptoms that can be noticed, but this is not always the case" (Cancer.org, n.d.). "Enhancing the accuracy and efficiency of early-stage detection can significantly impact treatment planning and prognostic evaluation" (Cancer.org, n.d.). The real-world problem this project aims to solve is the need for a reliable and efficient method to accurately classify different types of brain tumors from MRI scans. Given the subtle differences between various tumor types and normal brain tissue, manual classification by radiologists can be time-consuming and prone to error, especially in high-pressure environments. This project seeks to address these challenges by developing a tool that assists in the accurate and rapid classification of brain tumors, ultimately supporting better clinical decision-making and improving patient care.

## Stakeholders
The primary stakeholders for this project include:

**Medical Professionals:** Radiologists, neurologists, and oncologists who rely on MRI images for diagnosing and treating brain conditions. This model can serve as a decision-support tool, helping them quickly identify and classify tumors, which can be especially useful in busy clinical settings or in areas with a shortage of specialists.

**Healthcare Providers:** Hospitals and clinics can benefit from integrating this model into their diagnostic workflows, potentially reducing the time required for diagnosis and enabling more efficient resource allocation.

**Researchers:** Medical researchers and data scientists working in the field of medical imaging and diagnostics can use the findings from this project to further refine and develop new models for tumor classification, potentially contributing to advances in AI-assisted diagnostics.

**Patients:** Indirectly, patients stand to benefit from more accurate and timely diagnoses, which can lead to better treatment outcomes and reduced anxiety during the diagnostic process.

The project’s outcomes could be used to improve diagnostic accuracy and speed, reduce the cognitive load on medical professionals, and ultimately lead to better health outcomes for patients. By providing a robust model for classifying brain tumors, this project could play a critical role in the early detection and treatment of brain cancer, which is crucial for improving survival rates and quality of life.

# Data Understanding

## Data Source
The dataset used in this project was obtained from Kaggle, specifically from the "Brain Tumor Classification" dataset. This dataset contains brain MRI images, categorized into four distinct classes: glioma tumors, meningioma tumors, pituitary tumors, and normal brain scans. The data was downloaded directly from the Kaggle platform and was provided in a well-organized structure, which facilitated the initial exploration and preprocessing stages.

## Data Suitability
The dataset is suitable for the objectives of this project, particularly for binary classification tasks. It provides a diverse set of images representing different types of brain tumors and normal cases, making it appropriate for training and evaluating convolutional neural networks (CNNs). However, the dataset was found to be inadequate for multiclass classification due to its limited size. The relatively small number of images in each class, particularly in the normal category, impacted the model's ability to accurately differentiate between the four classes in a multiclass setting. This limitation necessitated the use of techniques to address class imbalance and suggests that larger, more diverse datasets may be required for more effective multiclassification.

## Dataset Description
The dataset consists of MRI images divided into four categories:

**Glioma Tumor:** 901 images

**Meningioma Tumor:** 913 images

**Pituitary Tumor:** 844 images

**Normal:** 438 images

Each image is labeled according to its category, and the images are of size 256x256 pixels. The dataset is organized into separate directories for each class, which were further split into training, validation, and test sets for this project.

## Descriptive Statistics
Descriptive statistics of the dataset reveal the distribution of images across the four classes. The tumor classes (glioma, meningioma, and pituitary tumors) have a relatively balanced number of images, while the normal class has fewer images, creating an imbalance. The dataset's image resolution is consistent at 256x256 pixels, providing uniformity in input size for the CNN models. The distribution across training, validation, and test sets was maintained proportionally during the data split.

## Feature Justification
The primary feature used in this project is the image data itself, specifically the pixel values of the MRI scans. The images serve as the input for the CNN models, which are designed to automatically learn relevant features during training. No additional manual feature extraction was performed, as CNNs are highly effective at identifying and learning important patterns directly from image data. The goal was to enable the models to learn the distinguishing features of different tumor types and normal brain scans.

## Justification of Choice of Metrics 
In the context of the real-world problem of brain tumor classification, the selection of appropriate evaluation metrics is crucial for ensuring that the model meets the demands of medical applications. 

The primary metric of importance for this project is **recall**, particularly for the tumor classes. Recall is vital because it measures the model's ability to correctly identify all relevant instances—in this case, all actual tumor cases. In a medical context, a high recall is critical because failing to identify a tumor (a false negative) could have severe consequences, such as delayed treatment, progression of the disease, or even reduced survival rates. Therefore, ensuring that the model has a high recall helps to minimize the risk of missing a tumor diagnosis, which is a priority in clinical settings.

**Precision** is the second most important metric. While recall focuses on minimizing false negatives, precision focuses on minimizing false positives, which is also important in a medical context. High precision ensures that when the model predicts a tumor, it is likely correct, thereby reducing the likelihood of unnecessary stress and potential harm to patients who might otherwise undergo further invasive testing or treatment based on incorrect predictions. Although precision is secondary to recall, it still plays a critical role in ensuring that the model's predictions are trustworthy and actionable.

**Accuracy,** while a common metric, is less critical in this context due to the imbalanced nature of the dataset. Accuracy alone can be misleading, as it does not differentiate between the correct classification of tumor and nontumor cases. For instance, in a dataset where tumors are more prevalent, a model could achieve high accuracy simply by predicting the majority class (tumors) more often, without necessarily being effective at identifying the less common class (nontumors). Therefore, precision and recall are preferred metrics as they provide a more nuanced understanding of the model's performance, particularly in terms of its ability to handle the real-world consequences of errors in medical diagnosis.

## Limitations of the Data
The dataset has several limitations that impact the overall modeling process:

**Class Imbalance:** The dataset is imbalanced, with significantly fewer normal images compared to tumor images. This imbalance poses challenges in training the model to accurately classify nontumor cases.

**Limited Data Size:** While the dataset contains a reasonable number of images, the size may still be insufficient for achieving high accuracy in a complex task like brain tumor classification, particularly for multiclass classification.

**Lack of Metadata:** The dataset does not include additional metadata (e.g., patient demographics, tumor size, or location), which could provide valuable context for the classification task.

These limitations were considered during the modeling process, and steps were taken to mitigate their impact, such as using data augmentation and class weight adjustments to address class imbalance. However, the limitations suggest that further data collection or augmentation, as well as advanced modeling techniques, may be necessary for substantial improvements in classification performance.

# Data Preparation (Binary Classification)

In the initial steps of this project, the raw image data, originally categorized into four distinct classes (glioma_tumor, meningioma_tumor, pituitary_tumor, and normal), was reorganized to facilitate binary classification. The goal of this reorganization was to simplify the classification task by focusing on distinguishing between tumor and nontumor images.

To achieve this, the images from the three tumor-related classes (glioma_tumor, meningioma_tumor, and pituitary_tumor) were moved into a new directory named tumor. Simultaneously, the images from the normal class were relocated to a directory labeled nontumor. Once all the images were sorted into their respective new directories, the original four directories were deleted to streamline the dataset.

After reorganizing the data, the next step involved splitting the images into training, validation, and test sets using an 80-10-10 ratio. This split was performed separately for both tumor and nontumor images to ensure that the class distribution remained proportional across the datasets. As a result, three main directories (train, val, and test) were created, each containing two subdirectories (tumor and nontumor), housing the images based on their designated split.

Subsequently, data generators were set up using ImageDataGenerator from Keras. For the training data, data augmentation techniques such as rotation, width and height shifts, shear, zoom, and horizontal flips were applied to enhance the model's ability to generalize across different variations of the data. For the validation and test sets, only normalization was applied by rescaling the pixel values to a range of [0, 1].

This preprocessing step was essential to prepare the dataset for binary classification, where the model's primary task was to differentiate between images showing tumors and those depicting normal brain scans.

In [15]:
# Importing necessary libraries
import tensorflow as tf
from tensorflow.keras import models, layers, regularizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_predict
import numpy as np
from scikeras.wrappers import KerasClassifier
import os
import shutil
from sklearn.model_selection import train_test_split

In [44]:
# Organizing the data into tumor/nontumor and splitting it into train, val, and test sets

import os
import shutil
from sklearn.model_selection import train_test_split

os.makedirs('tumor', exist_ok=True)
os.makedirs('nontumor', exist_ok=True)

for tumor_dir in ['glioma_tumor', 'meningioma_tumor', 'pituitary_tumor']:
    for img in os.listdir(tumor_dir):
        shutil.move(os.path.join(tumor_dir, img), 'tumor')

for img in os.listdir('normal'):
    shutil.move(os.path.join('normal', img), 'nontumor')

shutil.rmtree('glioma_tumor')
shutil.rmtree('meningioma_tumor')
shutil.rmtree('pituitary_tumor')
shutil.rmtree('normal')

def split_data(source_dir, dest_dir, split_ratios=(0.8, 0.1, 0.1)):
    files = os.listdir(source_dir)
    train_files, temp_files = train_test_split(files, train_size=split_ratios[0], random_state=42)
    val_files, test_files = train_test_split(temp_files, train_size=split_ratios[1]/(split_ratios[1] + split_ratios[2]), random_state=42)
    
    for file in train_files:
        shutil.copy(os.path.join(source_dir, file), os.path.join(dest_dir, 'train', os.path.basename(source_dir), file))
    for file in val_files:
        shutil.copy(os.path.join(source_dir, file), os.path.join(dest_dir, 'val', os.path.basename(source_dir), file))
    for file in test_files:
        shutil.copy(os.path.join(source_dir, file), os.path.join(dest_dir, 'test', os.path.basename(source_dir), file))

for split in ['train', 'val', 'test']:
    os.makedirs(os.path.join(split, 'tumor'), exist_ok=True)
    os.makedirs(os.path.join(split, 'nontumor'), exist_ok=True)

split_data('tumor', '.')
split_data('nontumor', '.')

shutil.rmtree('tumor')
shutil.rmtree('nontumor')


In [45]:
# Counting the number of files in each directory and printing the number of images per category in each dataset
import os

def count_files(directory):
    return sum([len(files) for r, d, files in os.walk(directory)])

for split in ['train', 'val', 'test']:
    tumor_count = count_files(os.path.join(split, 'tumor'))
    nontumor_count = count_files(os.path.join(split, 'nontumor'))
    print(f"{split.capitalize()} set - Tumor images: {tumor_count}, Nontumor images: {nontumor_count}")


Train set - Tumor images: 2126, Nontumor images: 350
Val set - Tumor images: 266, Nontumor images: 44
Test set - Tumor images: 266, Nontumor images: 44


In [19]:
# Preprocessing the data using ImageDataGenerator with augmentation for training and normalization for validation/test

from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

val_test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    'train',
    target_size=(256, 256),
    batch_size=32,
    class_mode='binary'
)

val_generator = val_test_datagen.flow_from_directory(
    'val',
    target_size=(256, 256),
    batch_size=32,
    class_mode='binary'
)

test_generator = val_test_datagen.flow_from_directory(
    'test',
    target_size=(256, 256),
    batch_size=32,
    class_mode='binary'
)


Found 2476 images belonging to 2 classes.
Found 310 images belonging to 2 classes.
Found 310 images belonging to 2 classes.


# Modeling (Binary Classification)

The modeling process for binary classification involved a series of iterative improvements to identify the model that best balanced precision, recall, and overall accuracy. The journey began with a baseline CNN model, which served as the foundation. This baseline model performed well, achieving a validation accuracy of 94.52% with a precision of 0.8614 and recall of 0.8647. While these results were promising, there was still room for improvement, particularly in further enhancing recall, a key metric for this task.

The next step was to develop a CNN model with added layers and dropout to increase complexity and reduce overfitting. This model improved the validation accuracy to 96.13%, but the recall slightly decreased to 0.8383, indicating a potential trade-off between accuracy and the model’s ability to correctly identify tumor cases. To address this, a more complex CNN model was designed, incorporating additional convolutional layers and batch normalization to stabilize learning. This model struck a better balance, achieving a validation accuracy of 93.87%, with precision at 0.8587 and recall at 0.8684. The improvement in recall, alongside balanced precision and accuracy, indicated that this model was better at identifying tumor cases while maintaining overall performance.

Finally, an attempt was made to improve the baseline CNN by adjusting class weights to address the imbalance in the dataset. While this model did show strong results, it did not surpass the performance of the more complex CNN in terms of recall. As a result, the more complex CNN model was chosen as the final model due to its superior recall and balanced performance across all key metrics. This model’s ability to effectively identify the majority of tumor cases, while also maintaining strong precision and accuracy, made it the optimal choice for the task at hand.

## Baseline CNN Model

The Baseline Binary CNN model demonstrated solid performance with a validation loss of 0.1075 and a validation accuracy of 94.52%. The model’s precision was 0.8614, and its recall was 0.8647, indicating that it was effective at both predicting and correctly identifying tumor cases. The confusion matrix shows that the model correctly identified 230 out of 266 tumor cases, while it incorrectly identified 37 nontumor cases as tumors and missed 36 tumor cases.

While the model performed well overall, its primary challenge lies in distinguishing nontumor cases from tumors, as reflected by the low true negative count of 7. This suggests that while the model is reliable in detecting tumors, there is room for improvement in reducing false positives, which could be achieved by further refining the model’s ability to differentiate between nontumor and tumor images.

In [20]:
# Creating, compiling, training, evaluating, and saving a CNN model for binary classification

from tensorflow.keras import models, layers, optimizers
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.models import save_model
from sklearn.metrics import confusion_matrix, precision_score, recall_score

cnn_model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

cnn_model.compile(
    optimizer=optimizers.Adam(learning_rate=0.0001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = cnn_model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping]
)

val_loss, val_accuracy = cnn_model.evaluate(val_generator)
print(f"Validation Loss: {val_loss}")
print(f"Validation Accuracy: {val_accuracy}")

save_model(cnn_model, 'binary_cnn_model.h5')

val_generator.reset()
predictions = cnn_model.predict(val_generator)
predicted_classes = np.where(predictions > 0.5, 1, 0)

true_classes = val_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
precision = precision_score(true_classes, predicted_classes)
recall = recall_score(true_classes, predicted_classes)

print("Confusion Matrix:")
print(conf_matrix)
print(f"Precision: {precision}")
print(f"Recall: {recall}")


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Validation Loss: 0.10753267258405685
Validation Accuracy: 0.9451612830162048
Confusion Matrix:
[[  7  37]
 [ 36 230]]
Precision: 0.8614232209737828
Recall: 0.8646616541353384


## CNN with Added Layers and Dropout

The Binary CNN with added layers and dropout exhibited a validation loss of 0.0812 and an improved validation accuracy of 96.13%, reflecting a better fit on the validation data compared to the baseline model. The model's precision was 0.8447, and its recall was 0.8383, indicating that while it remained effective at predicting tumor cases, there was a slight drop in both precision and recall compared to the baseline model. The confusion matrix shows that the model correctly identified 223 out of 266 tumor cases, but it also produced 43 false negatives and 41 false positives.

This model, despite its higher accuracy, had more difficulty in distinguishing between tumor and nontumor cases, as evidenced by the higher number of false negatives and false positives. The addition of layers and dropout helped prevent overfitting, but it also led to a decrease in the model's ability to accurately identify tumor cases, resulting in lower recall and precision. The increased complexity may have made the model less certain in its classifications, suggesting that further tuning of the architecture or additional regularization might be needed to strike a better balance between complexity and accuracy.

In [21]:
# Creating, compiling, training, evaluating, and saving an enhanced CNN model for binary classification

cnn_model = models.Sequential([
    layers.Conv2D(64, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

cnn_model.compile(
    optimizer=optimizers.Adam(learning_rate=0.0001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = cnn_model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping]
)

val_loss, val_accuracy = cnn_model.evaluate(val_generator)
print(f"Validation Loss: {val_loss}")
print(f"Validation Accuracy: {val_accuracy}")

save_model(cnn_model, 'enhanced_cnn_model.h5')

val_generator.reset()
predictions = cnn_model.predict(val_generator)
predicted_classes = np.where(predictions > 0.5, 1, 0)

true_classes = val_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
precision = precision_score(true_classes, predicted_classes)
recall = recall_score(true_classes, predicted_classes)

print("Confusion Matrix:")
print(conf_matrix)
print(f"Precision: {precision}")
print(f"Recall: {recall}")


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Validation Loss: 0.08121751993894577
Validation Accuracy: 0.9612902998924255
Confusion Matrix:
[[  3  41]
 [ 43 223]]
Precision: 0.8446969696969697
Recall: 0.8383458646616542


## CNN with More Added Layers and Adjusted Learning Rate

The Binary CNN with more added layers and an adjusted learning rate demonstrated a validation loss of 0.1410 and a validation accuracy of 93.87%. While the accuracy was slightly lower than that of the previous model, this model achieved the highest recall so far at 0.8684 and a comparably high precision of 0.8587. The confusion matrix indicates that the model correctly identified 231 out of 266 tumor cases, with 35 false negatives and 38 false positives.

This model showed an improved balance between precision and recall, particularly in its ability to correctly identify tumor cases, which was the main objective. The higher recall signifies that this model was more effective at minimizing false negatives, making it particularly valuable given that missing a tumor could have serious consequences. However, the model still struggled with distinguishing nontumor cases, as seen in the 38 false positives. In summary, the adjustments in layers and learning rate helped the model achieve a well-rounded performance although it still struggles with identification of nontumor cases.

In [22]:
# Creating, compiling, training, evaluating, and saving a more complex CNN model with adjusted learning rate

cnn_model = models.Sequential([
    layers.Conv2D(64, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(512, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(1, activation='sigmoid')
])

cnn_model.compile(
    optimizer=optimizers.Adam(learning_rate=0.00005),  
    loss='binary_crossentropy',
    metrics=['accuracy']
)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = cnn_model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping]
)

val_loss, val_accuracy = cnn_model.evaluate(val_generator)
print(f"Validation Loss: {val_loss}")
print(f"Validation Accuracy: {val_accuracy}")

save_model(cnn_model, 'complex_cnn_model.h5')

val_generator.reset()
predictions = cnn_model.predict(val_generator)
predicted_classes = np.where(predictions > 0.5, 1, 0)

true_classes = val_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
precision = precision_score(true_classes, predicted_classes)
recall = recall_score(true_classes, predicted_classes)

print("Confusion Matrix:")
print(conf_matrix)
print(f"Precision: {precision}")
print(f"Recall: {recall}")


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Validation Loss: 0.14099209010601044
Validation Accuracy: 0.9387096762657166
Confusion Matrix:
[[  6  38]
 [ 35 231]]
Precision: 0.8587360594795539
Recall: 0.868421052631579


## Baseline CNN with Class Weight Adjustment

The Binary CNN with class weight adjustment achieved a validation loss of 0.1149 and a validation accuracy of 94.84%. This model was designed to address class imbalance by assigning more weight to the minority class (nontumor) during training. The model’s precision was 0.8605, and its recall was 0.8346. The confusion matrix shows that the model correctly identified 222 out of 266 tumor cases, with 44 false negatives and 36 false positives.

While the class weight adjustment helped maintain a high precision of 0.8605 for tumor cases, the recall slightly decreased to 0.8346 compared to other models. The model also showed a slight improvement in identifying nontumor cases, correctly classifying 8 out of 44 nontumor instances, though the performance in this area remained limited. The classification report further highlights the challenge, with a lower macro average F1-score of 0.51, reflecting the difficulty in balancing performance across both classes.

Overall, the class weight adjustment was effective in maintaining precision but did not significantly enhance the model's ability to distinguish between tumor and nontumor cases. The relatively higher false negative rate suggests that while the model is generally reliable, it may still miss a notable number of tumor cases, indicating room for further improvement in handling class imbalance.

In [47]:
# Creating, compiling, training, and evaluating a CNN model with class weight adjustment 

class_weights = {0: 2126/350, 1: 1.0}  

cnn_model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

cnn_model.compile(
    optimizer=optimizers.Adam(learning_rate=0.0001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = cnn_model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping],
    class_weight=class_weights
)

save_model(cnn_model, 'binary_cnn_with_class_weights.h5')

val_loss, val_accuracy = cnn_model.evaluate(val_generator)
print(f"Validation Loss: {val_loss}")
print(f"Validation Accuracy: {val_accuracy}")

val_generator.reset()
predictions = cnn_model.predict(val_generator)
predicted_classes = np.where(predictions > 0.5, 1, 0)

true_classes = val_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
precision = precision_score(true_classes, predicted_classes)
recall = recall_score(true_classes, predicted_classes)
class_report = classification_report(true_classes, predicted_classes)

print("Confusion Matrix:")
print(conf_matrix)
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print("Classification Report:")
print(class_report)




Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Validation Loss: 0.11494480073451996
Validation Accuracy: 0.948387086391449
Confusion Matrix:
[[  8  36]
 [ 44 222]]
Precision: 0.8604651162790697
Recall: 0.8345864661654135
Classification Report:
              precision    recall  f1-score   support

           0       0.15      0.18      0.17        44
           1       0.86      0.83      0.85       266

    accuracy                           0.74       310
   macro avg       0.51      0.51      0.51       310
weighted avg       0.76      0.74      0.75       310



# Binary Classification Model Evaluation

The final evaluation of the binary classification model on the test data confirms the effectiveness of the chosen complex CNN model. The model achieved a test loss of 0.1526 and a test accuracy of 92.26%, indicating strong overall performance. However, the accuracy alone does not fully capture the model's performance, particularly in the context of an imbalanced dataset where correctly identifying tumor cases (class 1) is of primary importance.

The confusion matrix shows that the model correctly identified 231 out of 266 tumor cases, yielding a recall of 0.8684. This high recall is crucial, as it means the model is highly effective at detecting the majority of tumor cases, minimizing the risk of false negatives, which is critical in medical applications. It is for this reason that recall was chosen as the primary metric for model selection. The precision for tumor cases was also 0.8684, indicating that when the model predicts a tumor, it is correct 86.84% of the time.

However, the model continues to struggle with nontumor cases (class 0), correctly identifying only 9 out of 44 instances, resulting in a precision and recall of 0.20 for this class. This indicates that while the model is strong in identifying tumors, it has difficulty distinguishing nontumor images from tumors.

Overall, the model's performance, with a weighted average F1-score of 0.77, reflects its balanced capability in identifying tumors. Despite its challenges with nontumor cases, the model's high recall and precision for tumor cases affirm its suitability for accurately detecting tumors. Further improvements could focus on enhancing the model’s ability to differentiate between tumor and nontumor images, potentially by collecting more diverse nontumor data or applying advanced techniques to address the imbalance.

The implications of this final model evaluation are significant for solving the real-world problem of brain tumor detection. The high recall rate demonstrates that the model is highly effective in identifying the majority of tumor cases, which is critical for ensuring that patients receive timely and appropriate treatment which will result in better treatment outcomes. This is particularly important in medical settings, where the consequences of missing a tumor diagnosis can be severe. The model's ability to accurately detect tumors supports its potential use as a decision-support tool for radiologists and other medical professionals, helping to reduce the cognitive load and time required for manual classification of MRI scans. However, the challenges with correctly identifying nontumor cases could lead to unnecessary and costly treatments for healthy patients/ This suggests that the model should be used as an adjunct to, rather than a replacement for, human expertise, with further improvements necessary to enhance its reliability in distinguishing between different types of brain scans. This balance between automated detection and expert oversight could lead to better outcomes for patients by combining the strengths of both AI and human judgment.

In [50]:
# Evaluating the final model on the test data

model = load_model('complex_cnn_model.h5')

test_loss, test_accuracy = model.evaluate(test_generator)
print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy}")

test_generator.reset()
predictions = model.predict(test_generator)
predicted_classes = np.where(predictions > 0.5, 1, 0)

true_classes = test_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
precision = precision_score(true_classes, predicted_classes)
recall = recall_score(true_classes, predicted_classes)
class_report = classification_report(true_classes, predicted_classes)

print("Confusion Matrix:")
print(conf_matrix)
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print("Classification Report:")
print(class_report)


Test Loss: 0.152609184384346
Test Accuracy: 0.9225806593894958
Confusion Matrix:
[[  9  35]
 [ 35 231]]
Precision: 0.868421052631579
Recall: 0.868421052631579
Classification Report:
              precision    recall  f1-score   support

           0       0.20      0.20      0.20        44
           1       0.87      0.87      0.87       266

    accuracy                           0.77       310
   macro avg       0.54      0.54      0.54       310
weighted avg       0.77      0.77      0.77       310



# Data Preparation (Multiple Classification)

In preparation for the multiple classification task, the image data originally split into binary classes (tumor vs. nontumor) was reorganized into four distinct categories: glioma_tumor, meningioma_tumor, pituitary_tumor, and normal. These categories were determined based on the first letter of the image filenames, with each image being moved into the corresponding directory.

Once the data was organized into these four categories, a proportionate train-validation-test split was performed, maintaining an 80-10-10 ratio for each class. This ensured that each dataset had a balanced representation of the classes, which is crucial for training a robust classification model. After the split, ImageDataGenerators were configured for the training, validation, and test datasets. The training generator applied data augmentation techniques like rotation, shifts, and flips to enhance the model’s generalization capability, while the validation and test generators applied only rescaling.

This preprocessing step set the stage for the development of multiple classification models aimed at distinguishing between the four different categories of brain images, laying the groundwork for training, validating, and testing these models.

In [40]:
# Reorganizing images into new directories for multiple classification


os.makedirs('glioma_tumor', exist_ok=True)
os.makedirs('meningioma_tumor', exist_ok=True)
os.makedirs('pituitary_tumor', exist_ok=True)
os.makedirs('normal', exist_ok=True)

for dataset in ['train', 'val', 'test']:
    for category in ['tumor', 'nontumor']:
        category_path = os.path.join(dataset, category)
        if os.path.exists(category_path):
            for img in os.listdir(category_path):
                if img.startswith('G'):
                    shutil.move(os.path.join(category_path, img), 'glioma_tumor')
                elif img.startswith('M'):
                    shutil.move(os.path.join(category_path, img), 'meningioma_tumor')
                elif img.startswith('P'):
                    shutil.move(os.path.join(category_path, img), 'pituitary_tumor')
                elif img.startswith('N'):
                    shutil.move(os.path.join(category_path, img), 'normal')

shutil.rmtree('train')
shutil.rmtree('val')
shutil.rmtree('test')

for dir_name in ['glioma_tumor', 'meningioma_tumor', 'pituitary_tumor', 'normal']:
    count = len(os.listdir(dir_name))
    print(f"{dir_name} has {count} images")


glioma_tumor has 901 images
meningioma_tumor has 913 images
pituitary_tumor has 844 images
normal has 438 images


In [41]:
# Performing a train, val, test split 

def split_data(class_name, split_ratios=(0.8, 0.1, 0.1)):
    files = os.listdir(class_name)
    train_files, temp_files = train_test_split(files, train_size=split_ratios[0], random_state=42)
    val_files, test_files = train_test_split(temp_files, train_size=split_ratios[1]/(split_ratios[1] + split_ratios[2]), random_state=42)

    os.makedirs(f'train/{class_name}', exist_ok=True)
    os.makedirs(f'val/{class_name}', exist_ok=True)
    os.makedirs(f'test/{class_name}', exist_ok=True)

    for file in train_files:
        shutil.move(os.path.join(class_name, file), os.path.join(f'train/{class_name}', file))
    for file in val_files:
        shutil.move(os.path.join(class_name, file), os.path.join(f'val/{class_name}', file))
    for file in test_files:
        shutil.move(os.path.join(class_name, file), os.path.join(f'test/{class_name}', file))

    return len(train_files), len(val_files), len(test_files)

for class_name in ['glioma_tumor', 'meningioma_tumor', 'pituitary_tumor', 'normal']:
    train_count, val_count, test_count = split_data(class_name)
    print(f"{class_name} - Train: {train_count}, Val: {val_count}, Test: {test_count}")


glioma_tumor - Train: 720, Val: 90, Test: 91
meningioma_tumor - Train: 730, Val: 91, Test: 92
pituitary_tumor - Train: 675, Val: 84, Test: 85
normal - Train: 350, Val: 44, Test: 44


In [26]:
# Setting up image generators for multiclass classification with augmentation for training and normalization for validation & testing

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True
)

val_test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    'train',
    target_size=(256, 256),
    batch_size=32,
    class_mode='categorical'
)

val_generator = val_test_datagen.flow_from_directory(
    'val',
    target_size=(256, 256),
    batch_size=32,
    class_mode='categorical'
)

test_generator = val_test_datagen.flow_from_directory(
    'test',
    target_size=(256, 256),
    batch_size=32,
    class_mode='categorical'
)


Found 2475 images belonging to 4 classes.
Found 309 images belonging to 4 classes.
Found 312 images belonging to 4 classes.


# Modeling (Multiple Classification)

The modeling phase for multiple classification aimed to develop convolutional neural networks (CNNs) capable of classifying brain images into four categories: glioma_tumor, meningioma_tumor, pituitary_tumor, and normal. The initial baseline model served as a starting point, but the results indicated that the model struggled to accurately differentiate between the classes.

Upon evaluating the baseline model, the metrics revealed significant challenges. The precision and recall scores were low across all classes, with accuracy at only 28%. The F1-scores, which balance precision and recall, were also low, indicating that the model was not effectively distinguishing between the classes. These results suggested that the model was not able to learn the distinct features necessary for accurate classification.

To address these issues, subsequent models were developed with additional layers, more filters, and the inclusion of batch normalization to stabilize and improve learning. Despite these enhancements, the models continued to perform poorly, particularly in classifying the normal category, which had the smallest number of samples. The overall performance across all categories remained inadequate, likely due to the limited size of the dataset, which hindered the models' ability to generalize and learn the differences between the classes effectively.

Ultimately, the final model, although slightly improved, still exhibited low performance metrics, indicating that the models were not capable of reliably distinguishing between the four classes. This highlighted the need for a larger and more balanced dataset to improve classification accuracy and the model's ability to generalize across different categories.

In summary, the modeling process for multiple classification faced significant challenges due to the limited dataset size, resulting in poor performance metrics. The final model, while improved, was still unable to effectively distinguish between the four classes, underscoring the necessity for further data collection and possibly exploring alternative modeling approaches.

## Baseline CNN 

The Baseline CNN for multiple classification exhibited a validation loss of 0.5464 and a validation accuracy of 79.94%, which suggests that while the model was able to learn some features of the data, it struggled to generalize effectively across the four classes. The confusion matrix reveals that the model had difficulty correctly identifying the correct class for many samples, with significant overlap in its predictions across all tumor types and normal cases.

The classification report reflects these challenges, with precision and recall scores across all classes being relatively low. The model achieved a precision of 0.24 for glioma tumors and 0.24 for normal cases, with slightly higher precision for meningioma tumors (0.30) and pituitary tumors (0.30). The recall was highest for meningioma tumors at 0.36, but for glioma tumors, it was only 0.17, indicating that the model struggled particularly with this class. The F1-scores across all classes hovered around 0.20-0.33, which underscores the model's difficulty in effectively distinguishing between the different types of tumors and normal cases.

The baseline model’s primary weakness lies in its inability to accurately classify and distinguish between the four categories, as evidenced by the significant confusion across the classes. This model serves as a starting point, highlighting the need for further refinement and more complex architectures to improve the model’s ability to learn distinct features for each class and thereby enhance overall classification performance.

In [27]:
# Creating, compiling, training, and evaluating a baseline CNN model for multiclass classification

from sklearn.metrics import classification_report

cnn_model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dense(4, activation='softmax')  # 4 output units for the 4 classes
])

cnn_model.compile(
    optimizer=optimizers.Adam(learning_rate=0.0001),
    loss='categorical_crossentropy',  # Loss function for multiclass classification
    metrics=['accuracy']
)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = cnn_model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping]
)

val_loss, val_accuracy = cnn_model.evaluate(val_generator)
print(f"Validation Loss: {val_loss}")
print(f"Validation Accuracy: {val_accuracy}")

save_model(cnn_model, 'multiclass_cnn_model.h5')

val_generator.reset()
predictions = cnn_model.predict(val_generator)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = val_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
class_report = classification_report(true_classes, predicted_classes, target_names=val_generator.class_indices.keys())

print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Validation Loss: 0.5464129447937012
Validation Accuracy: 0.799352765083313
Confusion Matrix:
[[15 31 15 29]
 [27 33 10 21]
 [ 5 15 11 13]
 [15 32 10 27]]
Classification Report:
                  precision    recall  f1-score   support

    glioma_tumor       0.24      0.17      0.20        90
meningioma_tumor       0.30      0.36      0.33        91
          normal       0.24      0.25      0.24        44
 pituitary_tumor       0.30      0.32      0.31        84

        accuracy                           0.28       309
       macro avg       0.27      0.28      0.27       309
    weighted avg       0.27      0.28      0.27       309



## CNN Model with Additional Layers, Filters, Dropout and a Reduced Learning Rate

The Multiple Classification CNN model, enhanced with additional layers, filters, dropout, and a reduced learning rate, achieved a validation loss of 0.5051 and a validation accuracy of 81.23%. While the accuracy and loss indicate some improvement over the baseline model, the confusion matrix and classification report reveal that the model still struggled to effectively classify the various tumor types and normal cases.

The confusion matrix shows significant confusion across all classes, with the model frequently misclassifying glioma tumors, meningioma tumors, pituitary tumors, and normal cases. For instance, out of 90 glioma tumor cases, only 28 were correctly classified, and the rest were spread across other categories. Similarly, normal cases were often misclassified, with only 4 out of 44 being correctly identified.

The classification report highlights these challenges with low precision, recall, and F1-scores across all classes. The model achieved a precision of 0.31 for glioma tumors and 0.34 for meningioma tumors, but precision dropped significantly for normal cases (0.08). The recall was similarly low, particularly for normal cases, where it only reached 0.09. The overall accuracy of 28% and macro average F1-score of 0.25 underscore the model's continued difficulty in distinguishing between the different categories.

Although this model introduced additional complexity and refined the learning process with a reduced learning rate, it still struggled to learn the distinct features necessary for accurate classification across all classes. The model’s performance suggests that further architectural adjustments, possibly combined with a larger and more balanced dataset, would be necessary to improve its ability to differentiate between the multiple classes effectively.

In [28]:
# Creating, compiling, training, and evaluating an enhanced CNN model 

cnn_model = models.Sequential([
    layers.Conv2D(64, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(512, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(512, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(4, activation='softmax')  
])

cnn_model.compile(
    optimizer=optimizers.Adam(learning_rate=0.00005),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = cnn_model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping]
)

val_loss, val_accuracy = cnn_model.evaluate(val_generator)
print(f"Validation Loss: {val_loss}")
print(f"Validation Accuracy: {val_accuracy}")

save_model(cnn_model, 'enhanced_multiclass_cnn_model.h5')

val_generator.reset()
predictions = cnn_model.predict(val_generator)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = val_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
class_report = classification_report(true_classes, predicted_classes, target_names=val_generator.class_indices.keys())

print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Validation Loss: 0.5050603747367859
Validation Accuracy: 0.8122977614402771
Confusion Matrix:
[[28 29 13 20]
 [27 30 13 21]
 [14 10  4 16]
 [20 20 21 23]]
Classification Report:
                  precision    recall  f1-score   support

    glioma_tumor       0.31      0.31      0.31        90
meningioma_tumor       0.34      0.33      0.33        91
          normal       0.08      0.09      0.08        44
 pituitary_tumor       0.29      0.27      0.28        84

        accuracy                           0.28       309
       macro avg       0.25      0.25      0.25       309
    weighted avg       0.28      0.28      0.28       309



 ## CNN with More Layers, Filters, & Batch Normalization

The CNN model for multiple classification, enhanced with more layers, filters, and batch normalization, achieved a validation loss of 0.4836 and a validation accuracy of 83.17%. While these metrics suggest some improvement in the model's fit and stability, the confusion matrix and classification report indicate ongoing challenges in effectively distinguishing between the four classes.

The confusion matrix shows that while the model improved slightly in identifying certain classes, significant confusion remains. For example, out of 90 glioma tumor cases, the model correctly identified 24, but it also misclassified many of these cases into other categories, particularly pituitary tumors. The normal cases were also frequently misclassified, with only 5 out of 44 being correctly identified.

The classification report reflects these difficulties, with precision, recall, and F1-scores still relatively low across all classes. Glioma tumors had a precision of 0.34 and recall of 0.27, while meningioma tumors had even lower scores, with precision and recall around 0.27-0.26. The model particularly struggled with normal cases, achieving only 0.11 for both precision and recall. The overall accuracy was 26%, and the macro average F1-score was 0.24, indicating that the model's improvements did not significantly enhance its ability to differentiate between the different categories.

Despite the additional layers, filters, and batch normalization, which were intended to improve the model’s generalization and stability, the model continued to struggle with classifying the various tumor types and normal cases. These results suggest that while architectural enhancements provided some benefits, further strategies—such as increasing the dataset size or using more advanced techniques—are necessary to improve the model's ability to accurately classify multiple classes.

In [29]:
# Creating, compiling, training, and evaluating enhanced CNN model with more layers, increased filters, and batch normalization 

cnn_model = models.Sequential([
    layers.Conv2D(64, (3, 3), activation='relu', input_shape=(256, 256, 3)),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(256, (3, 3), activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(512, (3, 3), activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    
    layers.Conv2D(512, (3, 3), activation='relu'),
    layers.BatchNormalization(),
    layers.MaxPooling2D((2, 2)),
    
    layers.Flatten(),
    layers.Dense(1024, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(4, activation='softmax')  
])

cnn_model.compile(
    optimizer=optimizers.Adam(learning_rate=0.00005),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = cnn_model.fit(
    train_generator,
    epochs=50,
    validation_data=val_generator,
    callbacks=[early_stopping]
)

val_loss, val_accuracy = cnn_model.evaluate(val_generator)
print(f"Validation Loss: {val_loss}")
print(f"Validation Accuracy: {val_accuracy}")

save_model(cnn_model, 'enhanced_multiclass_cnn_with_bn.h5')

val_generator.reset()
predictions = cnn_model.predict(val_generator)
predicted_classes = np.argmax(predictions, axis=1)
true_classes = val_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
class_report = classification_report(true_classes, predicted_classes, target_names=val_generator.class_indices.keys())

print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Validation Loss: 0.48360633850097656
Validation Accuracy: 0.8317152261734009
Confusion Matrix:
[[24 23  7 36]
 [22 24 19 26]
 [ 8 15  5 16]
 [17 26 13 28]]
Classification Report:
                  precision    recall  f1-score   support

    glioma_tumor       0.34      0.27      0.30        90
meningioma_tumor       0.27      0.26      0.27        91
          normal       0.11      0.11      0.11        44
 pituitary_tumor       0.26      0.33      0.29        84

        accuracy                           0.26       309
       macro avg       0.25      0.24      0.24       309
    weighted avg       0.27      0.26      0.26       309



# Evaluation (Multiple Classification)

The evaluation of the final multiple classification model on the test data revealed significant challenges in accurately distinguishing between the four classes: glioma_tumor, meningioma_tumor, pituitary_tumor, and normal. While the model achieved a test accuracy of 85.58%, a closer examination of the confusion matrix and classification report highlights substantial issues. The confusion matrix showed that the model frequently misclassified glioma, meningioma, and pituitary tumors, often confusing them with one another. For instance, only 26 out of 91 glioma tumor samples were correctly identified, with many being misclassified as pituitary tumors. The model particularly struggled with the normal class, correctly identifying only 8 out of 44 samples, frequently mistaking normal brain images for various types of tumors.

The classification report further underscored these difficulties, with precision and recall scores across all classes being low, particularly for the normal class, where precision and recall were both below 0.20. The overall accuracy, which was only 28%, along with the macro and weighted averages for precision, recall, and F1-scores hovering around 0.27-0.28, indicate that the model had significant difficulty in reliably distinguishing between the four categories. This poor performance is likely due to the limited size and imbalance of the dataset, which hindered the model's ability to learn distinct features for each class. The results suggest that further improvements, such as increasing the dataset size, enhancing data augmentation strategies, or exploring more advanced modeling techniques, are necessary to achieve better classification performance.

The implications of the final multiple classification model's performance highlight the complexities and challenges of applying machine learning to the task of brain tumor classification across multiple types. The model's difficulties in accurately distinguishing between tumor types and normal brain tissue suggest that it is not yet reliable enough to be deployed in a clinical setting without significant further development. Misidentifying tumor cases could lead to delayed treatment resulting in worsened health outcomes. The model's misclassification of normal brain scans as tumors is also concerning, as it could lead to unnecessary stress and potential harm to patients through misdiagnosis and inappropriate treatment plans.

In [30]:
# Loading, and evaluating my final multiple classification model on the test data

cnn_model = load_model('enhanced_multiclass_cnn_with_bn.h5')

test_loss, test_accuracy = cnn_model.evaluate(test_generator)
print(f"Test Loss: {test_loss}")
print(f"Test Accuracy: {test_accuracy}")

test_generator.reset()
predictions = cnn_model.predict(test_generator)
predicted_classes = np.argmax(predictions, axis=1)

true_classes = test_generator.classes

conf_matrix = confusion_matrix(true_classes, predicted_classes)
class_report = classification_report(true_classes, predicted_classes, target_names=test_generator.class_indices.keys())

print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)


Test Loss: 0.4234536588191986
Test Accuracy: 0.8557692170143127
Confusion Matrix:
[[26 17 12 36]
 [27 26 15 24]
 [13 14  8  9]
 [19 28 11 27]]
Classification Report:
                  precision    recall  f1-score   support

    glioma_tumor       0.31      0.29      0.30        91
meningioma_tumor       0.31      0.28      0.29        92
          normal       0.17      0.18      0.18        44
 pituitary_tumor       0.28      0.32      0.30        85

        accuracy                           0.28       312
       macro avg       0.27      0.27      0.27       312
    weighted avg       0.28      0.28      0.28       312



# Deployment

## Choice of Metrics
The choice of metrics for evaluating the models in this project was guided by the critical need to minimize false negatives in the context of brain tumor detection. **Recall** was prioritized as the most important metric because it measures the model's ability to correctly identify all actual tumor cases. This is crucial in medical applications where missing a tumor could lead to delayed treatment and serious health consequences. **Precision** was the second most important metric, ensuring that when the model predicts a tumor, it is likely correct, thereby minimizing unnecessary stress and medical procedures for patients.

## Final Model Selection
Based on performance on the chosen metrics with the validation data, the Binary CNN with More Layers, Filters, & Adjusted Learning Rate was identified as the final model for deployment. This model achieved a balance between high recall and precision, particularly in detecting tumor cases. With a validation accuracy of 93.87% and a recall of 0.8684, it was deemed the most effective model for reliably identifying tumors while maintaining a reasonable level of precision.

## Performance Evaluation on Holdout Test Data
When evaluated on the holdout test data, the final binary model demonstrated a test loss of 0.1526 and a test accuracy of 92.26%. The model correctly identified 231 out of 266 tumor cases, yielding a recall of 0.8684, which is critical for ensuring that as many tumor cases as possible are detected. However, the model struggled with nontumor cases, correctly identifying only 9 out of 44 instances, resulting in a precision and recall of 0.20 for this class. While the binary model is suitable for deployment due to its high recall, none of the multiclass models are ready for deployment due to their significant challenges in accurately distinguishing between the four classes, particularly with poor performance in identifying normal brain scans.

## Implications for Solving the Real-World Problem
The final model's high recall rate has significant implications for solving the real-world problem of brain tumor detection. The ability to reliably identify tumor cases ensures that critical cases are not missed, which is vital in the early detection and treatment of brain tumors. This model can serve as an essential tool in clinical settings, where time is of the essence, and the stakes are high. By reducing the likelihood of missed diagnoses, the model could contribute to improved patient outcomes, particularly in cases where early intervention is possible and can lead to more effective treatment. However, the model’s difficulty in distinguishing nontumor cases from tumors also suggests that while it is a powerful aid, it should be used in conjunction with human expertise to ensure comprehensive diagnostic accuracy. This balance between automated detection and expert oversight is crucial in ensuring that the tool enhances, rather than replaces, the nuanced judgment required in medical diagnostics.

## Limitations of the Data and Project
Several limitations of the data and the project have impacted the model's performance. The **dataset's imbalance**—with significantly fewer nontumor images compared to tumor images—hindered the model's ability to accurately distinguish between the two classes. Additionally, the **limited size** of the dataset was a major constraint, particularly for multiclass classification, where the model struggled to learn distinct features for each tumor type. These limitations suggest that further data collection, augmentation, and exploration of more advanced modeling techniques are necessary to improve the model's performance.

## Summary of Implications for the Real-World Problem and Stakeholders
The implications of this project for the real-world problem of brain tumor detection are substantial. For **medical professionals**, the deployment of this model could lead to faster and more accurate diagnoses, particularly for identifying tumor cases that might otherwise be missed. This can alleviate the diagnostic burden on radiologists, especially in resource-constrained settings, and provide a safety net by ensuring that potential tumors are flagged for further review. **Healthcare providers** could benefit from integrating this model into diagnostic workflows, reducing the time required for initial assessments and potentially lowering costs associated with extensive testing. For **patients**, the model’s high recall means a higher likelihood of early detection, which is critical for successful treatment and improved survival rates. **Researchers** can also benefit from this project by building upon its findings to further refine models, potentially improving their accuracy and robustness in classifying brain tumors. However, the model's limitations underscore the importance of continuous development and the need for careful deployment, ensuring that it is used as an adjunct to, rather than a replacement for, clinical expertise. The project highlights the potential for AI in healthcare, but also the necessity of collaboration between technology and human judgment to achieve the best outcomes for patients.

# References

- **American Cancer Society. (n.d.). Brain and Spinal Cord Tumors in Adults - Detection, Diagnosis, and Staging. Retrieved from https://www.cancer.org/cancer/types/brain-spinal-cord-tumors-adults/detection-diagnosis-staging.html**