# Brain Tumor MRI Classification Project

## Project Overview
This project aims to classify brain tumors from MRI images into four distinct categories: Glioma, Meningioma, No Tumor, and Pituitary. In this first notebook, we establish the data pipeline, perform exploratory data analysis (EDA), and prepare the images for the deep learning training phase.

## Extract Preprocessed Data

In [None]:
import os
import zipfile
import glob

def extract_preprocessed_data():
    zip_candidates = ['/content/preprocessed_data.zip', *glob.glob('/content/*preprocessed*.zip')]
    zip_path = None
    for candidate in zip_candidates:
        if os.path.exists(candidate):
            zip_path = candidate
            break

    if not zip_path:
        print("preprocessed_data.zip not found in /content/")
        return False

    if os.path.exists('/content/preprocessed_data') and os.path.exists('/content/preprocessed_data/config.json'):
        required_files = [
            'X_train.npy', 'X_val.npy', 'X_test.npy',
            'y_train.npy', 'y_val.npy', 'y_test.npy',
            'y_train_cat.npy', 'y_val_cat.npy', 'y_test_cat.npy',
            'config.json'
        ]
        missing = [f for f in required_files if not os.path.exists(f'/content/preprocessed_data/{f}')]
        if not missing:
            print("preprocessed_data folder already exists")
            return True

    try:
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall('/content/')

        print(f"Extraction completed: {os.path.basename(zip_path)}")
        return True
    except Exception as e:
        print(f"ERROR: {str(e)}")
        return False

if extract_preprocessed_data():
    for f in sorted(os.listdir('/content/preprocessed_data')):
        print(f"├── {f}")
else:
    print("Cannot proceed without preprocessed data")

Extraction completed: preprocessed_data.zip
├── X_test.npy
├── X_train.npy
├── X_val.npy
├── config.json
├── y_test.npy
├── y_test_cat.npy
├── y_train.npy
├── y_train_cat.npy
├── y_val.npy
├── y_val_cat.npy


## Environment and Dependencies
We utilize TensorFlow and Keras for building the neural network, along with NumPy and Pandas
for data handling. Matplotlib and Seaborn are used for performance visualization

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import json
import time
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import (ModelCheckpoint,ReduceLROnPlateau,LearningRateScheduler)
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.applications.inception_v3 import preprocess_input
np.random.seed(42)
tf.random.set_seed(42)
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)
print(f"TensorFlow Version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")

TensorFlow Version: 2.19.0
GPU Available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## Data Pipeline
The dataset consists of preprocessed MRI scans stored as NumPy arrays. We define paths for
loading data and saving training artifacts.

In [None]:
DATA_PATH = '/content/preprocessed_data'
OUTPUT_PATH = '/content/training_results'
os.makedirs(OUTPUT_PATH, exist_ok=True)
os.makedirs(f'{OUTPUT_PATH}/models', exist_ok=True)
os.makedirs(f'{OUTPUT_PATH}/histories', exist_ok=True)
os.makedirs(f'{OUTPUT_PATH}/plots', exist_ok=True)
X_train = np.load(f'{DATA_PATH}/X_train.npy')
X_val = np.load(f'{DATA_PATH}/X_val.npy')
X_test = np.load(f'{DATA_PATH}/X_test.npy')
y_train_cat = np.load(f'{DATA_PATH}/y_train_cat.npy')
y_val_cat = np.load(f'{DATA_PATH}/y_val_cat.npy')
y_test_cat = np.load(f'{DATA_PATH}/y_test_cat.npy')
with open(f'{DATA_PATH}/config.json', 'r') as f:
    config = json.load(f)

## Data Augmentation Strategy

To improve model generalization and mitigate overfitting, we implement a moderate augmentation strategy that includes rotations, shifts, and flips. Vertical flipping is deemed safe for MRI
brain scans.

In [None]:
train_datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.15,
    height_shift_range=0.15,
    shear_range=0.15,
    zoom_range=0.15,
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest'
)

## Model Architecture

This time we use InceptionNetV3 as base model. we use ImageNet as weight for this transfer learning model. InceptionNetV3 is use as base model and we train the top layer of the model. Each block at the top layer is followed by Batch Normalization and Dropout.

In [None]:
def build_inceptionv3(input_shape=(224, 224, 3), num_classes=4):
    # Load InceptionV3 pre-trained
    base_model = InceptionV3(
        weights='imagenet',
        include_top=False,
        input_shape=input_shape
    )

    # Freeze base model layers
    base_model.trainable = False

    # Build model with InceptionV3
    model = models.Sequential([
        base_model,
        layers.GlobalAveragePooling2D(),
        layers.Dense(512, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.55),
        layers.Dense(256, activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.55),
        layers.Dense(num_classes, activation='softmax')
    ])

    return model

In [None]:
model = build_inceptionv3(
    input_shape=X_train.shape[1:],
    num_classes=config['num_classes']
)

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m87910968/87910968[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


## Training Process

The model is trained for 100 epochs using the Adam optimizer. We monitor validation accuracy
to save the best weights and reduce the learning rate when the loss plateaus.

In [None]:
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy', tf.keras.metrics.Precision(name='precision'), tf.keras.metrics.Recall(name='recall')]
)

In [None]:
callbacks = [
    ModelCheckpoint(filepath=f'{OUTPUT_PATH}/models/best_model.h5', monitor='val_accuracy', save_best_only=True),
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=7, min_lr=1e-7)
]

In [None]:
history = model.fit(
    train_datagen.flow(X_train, y_train_cat, batch_size=32),
    epochs=100,
    validation_data=(X_val, y_val_cat),
    callbacks=callbacks
)

Epoch 1/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 437ms/step - accuracy: 0.6259 - loss: 1.2031 - precision: 0.6403 - recall: 0.5930



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m105s[0m 551ms/step - accuracy: 0.6263 - loss: 1.2016 - precision: 0.6407 - recall: 0.5934 - val_accuracy: 0.7760 - val_loss: 0.6742 - val_precision: 0.7873 - val_recall: 0.7643 - learning_rate: 0.0010
Epoch 2/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 371ms/step - accuracy: 0.7722 - loss: 0.6587 - precision: 0.7902 - recall: 0.7543



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 388ms/step - accuracy: 0.7722 - loss: 0.6587 - precision: 0.7902 - recall: 0.7543 - val_accuracy: 0.8693 - val_loss: 0.3621 - val_precision: 0.8770 - val_recall: 0.8483 - learning_rate: 0.0010
Epoch 3/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 384ms/step - accuracy: 0.8008 - loss: 0.5497 - precision: 0.8242 - recall: 0.7798 - val_accuracy: 0.8565 - val_loss: 0.3472 - val_precision: 0.8764 - val_recall: 0.8436 - learning_rate: 0.0010
Epoch 4/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 389ms/step - accuracy: 0.8114 - loss: 0.4929 - precision: 0.8353 - recall: 0.7819 - val_accuracy: 0.8168 - val_loss: 0.3993 - val_precision: 0.8382 - val_recall: 0.7981 - learning_rate: 0.0010
Epoch 5/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 378m



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 393ms/step - accuracy: 0.8227 - loss: 0.4855 - precision: 0.8439 - recall: 0.8044 - val_accuracy: 0.8845 - val_loss: 0.2949 - val_precision: 0.9039 - val_recall: 0.8670 - learning_rate: 0.0010
Epoch 6/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 373ms/step - accuracy: 0.8438 - loss: 0.4283 - precision: 0.8596 - recall: 0.8256



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 390ms/step - accuracy: 0.8438 - loss: 0.4283 - precision: 0.8596 - recall: 0.8256 - val_accuracy: 0.9020 - val_loss: 0.2813 - val_precision: 0.9166 - val_recall: 0.8845 - learning_rate: 0.0010
Epoch 7/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 391ms/step - accuracy: 0.8433 - loss: 0.4371 - precision: 0.8600 - recall: 0.8234 - val_accuracy: 0.8763 - val_loss: 0.2988 - val_precision: 0.8967 - val_recall: 0.8611 - learning_rate: 0.0010
Epoch 8/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 379ms/step - accuracy: 0.8421 - loss: 0.4252 - precision: 0.8609 - recall: 0.8268 - val_accuracy: 0.8845 - val_loss: 0.2924 - val_precision: 0.8994 - val_recall: 0.8553 - learning_rate: 0.0010
Epoch 9/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m57s[0m 378



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 385ms/step - accuracy: 0.8795 - loss: 0.3433 - precision: 0.8942 - recall: 0.8587 - val_accuracy: 0.9160 - val_loss: 0.2353 - val_precision: 0.9228 - val_recall: 0.9067 - learning_rate: 0.0010
Epoch 19/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 378ms/step - accuracy: 0.8690 - loss: 0.3518 - precision: 0.8846 - recall: 0.8527 - val_accuracy: 0.9125 - val_loss: 0.2548 - val_precision: 0.9205 - val_recall: 0.8915 - learning_rate: 0.0010
Epoch 20/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 381ms/step - accuracy: 0.8712 - loss: 0.3474 - precision: 0.8882 - recall: 0.8556 - val_accuracy: 0.9148 - val_loss: 0.2325 - val_precision: 0.9211 - val_recall: 0.8996 - learning_rate: 0.0010
Epoch 21/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 393ms/step - accuracy: 0.8806 - loss: 0.3087 - precision: 0.8945 - recall: 0.8666 - val_accuracy: 0.9172 - val_loss: 0.2223 - val_precision: 0.9284 - val_recall: 0.9078 - learning_rate: 5.0000e-04
Epoch 32/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 370ms/step - accuracy: 0.8788 - loss: 0.3191 - precision: 0.8892 - recall: 0.8631



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 385ms/step - accuracy: 0.8789 - loss: 0.3190 - precision: 0.8892 - recall: 0.8632 - val_accuracy: 0.9183 - val_loss: 0.2214 - val_precision: 0.9230 - val_recall: 0.9090 - learning_rate: 5.0000e-04
Epoch 33/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 379ms/step - accuracy: 0.8753 - loss: 0.3230 - precision: 0.8920 - recall: 0.8652 - val_accuracy: 0.9067 - val_loss: 0.2472 - val_precision: 0.9133 - val_recall: 0.8973 - learning_rate: 5.0000e-04
Epoch 34/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 386ms/step - accuracy: 0.8883 - loss: 0.3031 - precision: 0.9021 - recall: 0.8737 - val_accuracy: 0.9102 - val_loss: 0.2181 - val_precision: 0.9222 - val_recall: 0.8985 - learning_rate: 5.0000e-04
Epoch 35/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 385ms/step - accuracy: 0.8875 - loss: 0.2891 - precision: 0.8984 - recall: 0.8751 - val_accuracy: 0.9218 - val_loss: 0.2158 - val_precision: 0.9286 - val_recall: 0.9102 - learning_rate: 5.0000e-04
Epoch 44/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 381ms/step - accuracy: 0.8951 - loss: 0.2902 - precision: 0.9079 - recall: 0.8832 - val_accuracy: 0.9183 - val_loss: 0.2230 - val_precision: 0.9217 - val_recall: 0.9067 - learning_rate: 5.0000e-04
Epoch 45/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 369ms/step - accuracy: 0.8887 - loss: 0.2943 - precision: 0.9029 - recall: 0.8763



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 383ms/step - accuracy: 0.8887 - loss: 0.2943 - precision: 0.9029 - recall: 0.8763 - val_accuracy: 0.9230 - val_loss: 0.2091 - val_precision: 0.9291 - val_recall: 0.9172 - learning_rate: 5.0000e-04
Epoch 46/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 380ms/step - accuracy: 0.8934 - loss: 0.2897 - precision: 0.9063 - recall: 0.8825 - val_accuracy: 0.9067 - val_loss: 0.2321 - val_precision: 0.9190 - val_recall: 0.8996 - learning_rate: 5.0000e-04
Epoch 47/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m59s[0m 389ms/step - accuracy: 0.8915 - loss: 0.2781 - precision: 0.8993 - recall: 0.8791 - val_accuracy: 0.9090 - val_loss: 0.2387 - val_precision: 0.9180 - val_recall: 0.9008 - learning_rate: 5.0000e-04
Epoch 48/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 



[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 383ms/step - accuracy: 0.8950 - loss: 0.2880 - precision: 0.9064 - recall: 0.8853 - val_accuracy: 0.9347 - val_loss: 0.1917 - val_precision: 0.9433 - val_recall: 0.9312 - learning_rate: 5.0000e-04
Epoch 54/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 380ms/step - accuracy: 0.8968 - loss: 0.2769 - precision: 0.9067 - recall: 0.8865 - val_accuracy: 0.8985 - val_loss: 0.2384 - val_precision: 0.9147 - val_recall: 0.8880 - learning_rate: 5.0000e-04
Epoch 55/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m60s[0m 394ms/step - accuracy: 0.8936 - loss: 0.2772 - precision: 0.9050 - recall: 0.8798 - val_accuracy: 0.9102 - val_loss: 0.2166 - val_precision: 0.9215 - val_recall: 0.9043 - learning_rate: 5.0000e-04
Epoch 56/100
[1m152/152[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

In [None]:
MODEL_NAME = 'inceptionv3'

In [None]:
history_path = f'{OUTPUT_PATH}/histories/{MODEL_NAME}_history.npy'
np.save(history_path, history.history)

In [None]:
final_model_path = f'{OUTPUT_PATH}/models/{MODEL_NAME}_final.h5'
model.save(final_model_path)



In [None]:
best_model = keras.models.load_model(f'{OUTPUT_PATH}/models/{MODEL_NAME}_final.h5')



## Test Time Augmentation (TTA)

TTA is utilized during the inference phase. By generating 10 augmented versions of each test
image and averaging the predictions, we significantly increase the robustness of the final classification.

In [None]:
def predict_with_tta(model, X, n_augmentations=10):
    predictions = []
    preds = model.predict(X, verbose=0)
    predictions.append(preds)
    tta_gen = ImageDataGenerator(
        rotation_range=15,
        width_shift_range=0.1,
        height_shift_range=0.1,
        horizontal_flip=True,
        vertical_flip=True
    )
    for i in range(n_augmentations):
        aug_iterator = tta_gen.flow(X, batch_size=len(X), shuffle=False)
        X_aug = next(iter(aug_iterator))
        preds_aug = model.predict(X_aug, verbose=0)
        predictions.append(preds_aug)
    return np.mean(predictions, axis=0)

test_preds_tta = predict_with_tta(best_model, X_test, n_augmentations=10)
test_acc_tta = np.mean(np.argmax(test_preds_tta, axis=1) == np.argmax(y_test_cat, axis=1))

print(f"\nTTA completed!")

# EVALUATION

print("\n")
print("EVALUATION")

# Validation
val_results = best_model.evaluate(X_val, y_val_cat, verbose=0)
print(f"\nValidation Results (Best Model):")
print(f"Loss: {val_results[0]:.4f}")
print(f"Accuracy: {val_results[1]*100:.2f}%")
print(f"Precision: {val_results[2]:.4f}")
print(f"Recall: {val_results[3]:.4f}")

# Test (standard)
test_results = best_model.evaluate(X_test, y_test_cat, verbose=0)
print(f"\nTest Results (Standard):")
print(f"Loss: {test_results[0]:.4f}")
print(f"Accuracy: {test_results[1]*100:.2f}%")
print(f"Precision: {test_results[2]:.4f}")
print(f"Recall: {test_results[3]:.4f}")

# Test (with TTA)
print(f"\nTest Results (With TTA):")
print(f"Accuracy: {test_acc_tta*100:.2f}%")

print("\nSUMMARY:")
print(f"Baseline Test Acc: 93.82%")
print(f"Test Acc (Standard): {test_results[1]*100:.2f}%")
print(f"Test Acc (TTA): {test_acc_tta*100:.2f}%")


TTA completed!


EVALUATION

Validation Results (Best Model):
Loss: 0.1966
Accuracy: 92.88%
Precision: 0.9339
Recall: 0.9230

Test Results (Standard):
Loss: 0.2196
Accuracy: 91.30%
Precision: 0.9194
Recall: 0.9047

Test Results (With TTA):
Accuracy: 93.67%

SUMMARY:
Baseline Test Acc: 93.82%
Test Acc (Standard): 91.30%
Test Acc (TTA): 93.67%
