# **Project Name**    - Brain Tumor Classification Using Deep Learning

##### **Project Type**    - Classification
##### **Contribution**    - Individual
##### **Team Member 1 -** Hari Khamala S


### 🔍 **Problem Statement**
This project aims to develop a deep learning-based solution for classifying brain MRI images into multiple categories according to tumor type. It involves building a custom CNN model from scratch and enhancing performance through transfer learning using pretrained models. The project also includes deploying a user-friendly Streamlit web application to enable real-time tumor type predictions from uploaded MRI images.

### 💡 **Real-Time Business Use Cases**
- **AI-Assisted Medical Diagnosis**: Provide radiologists with AI-powered tools to quickly classify brain tumors based on MRI images.
- **Early Detection and Patient Triage**: Automatically flag high-risk MRI images for immediate specialist review.
- **Research and Clinical Trials**: Use AI tools to segment patient datasets by tumor type.
- **Second-Opinion AI Systems**: Deploy classification tools in remote or under-resourced healthcare regions.

### 🔁 **Project Workflow**
1. **Understand the Dataset**
2. **Data Preprocessing**
3. **Data Augmentation**
4. **Model Building (Custom CNN & Transfer Learning)**
5. **Model Training**
6. **Model Evaluation**
7. **Model Comparison**
8. **Streamlit Application Deployment**

### 📁 **Dataset**
**Source**: [Brain Tumor MRI Multi-Class Dataset](https://drive.google.com/drive/folders/1C9ww4JnZ2sh22I-hbt45OR16o4ljGxju)

In [None]:
# 📥 Load and Explore the Dataset
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import cv2
from PIL import Image
from sklearn.model_selection import train_test_split

data_dir = 'path_to_downloaded_dataset'
categories = os.listdir(data_dir)
print("Tumor Categories:", categories)

# Count images per class
for category in categories:
    print(category, ":", len(os.listdir(os.path.join(data_dir, category))))

### 📖 What This Code Does: Load and Explore the Dataset

This cell performs the initial exploration of the brain tumor dataset. Here's a breakdown of what it does:

- **Imports essential libraries** for data processing, visualization, and image handling.
- **Specifies the dataset directory** using the `data_dir` variable. This should point to the folder that contains subfolders for each tumor category.
- **Lists the tumor categories** (e.g., glioma_tumor, meningioma_tumor, etc.) by reading subfolder names under the dataset directory.
- **Counts the number of images** in each tumor class to help check for class balance or imbalance in the dataset.

📌 This step is important to:
- Verify dataset structure.
- Understand category-wise distribution.
- Confirm data availability before preprocessing and modeling.

🧠 Expected Output Example:

Tumor Categories: ['glioma_tumor', 'meningioma_tumor', 'no_tumor', 'pituitary_tumor']

glioma_tumor : 800

meningioma_tumor : 500

no_tumor : 300

pituitary_tumor : 600


Note: Since we are not executing this code (due to dataset size), this explanation ensures viewers and readers understand the purpose clearly.


In [None]:
# 🔄 Data Preprocessing and Augmentation
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

IMG_SIZE = 224
BATCH_SIZE = 32

# Preprocessing and Augmentation
train_datagen = ImageDataGenerator(rescale=1./255,
                                   validation_split=0.2,
                                   rotation_range=15,
                                   zoom_range=0.1,
                                   horizontal_flip=True)

train_generator = train_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='training')

val_generator = train_datagen.flow_from_directory(
    data_dir,
    target_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    class_mode='categorical',
    subset='validation')

### 🧼 What This Code Does: Data Preprocessing and Augmentation

This block performs data preprocessing and augmentation to prepare the brain MRI images for training and validation. Here's what each part does:

- **`ImageDataGenerator`** is used to:
  - **Normalize images** (scale pixel values from [0–255] to [0–1]).
  - Apply **data augmentation** techniques like:
    - Rotation (±15°)
    - Zooming (up to 10%)
    - Horizontal flipping

- The dataset is **split into training and validation sets** using `validation_split=0.2`.

- **`flow_from_directory()`** loads the image files directly from the directory and:
  - Resizes them to 224×224 (compatible with many CNN architectures).
  - Generates batches of augmented image-label pairs.
  - Uses **categorical labels** since this is a multi-class classification problem.

📌 This step ensures:
- Improved model generalization.
- Balanced training-validation split from the same directory.

🧠 Expected Output Example:

Found 1760 images belonging to 4 classes.

Found 440 images belonging to 4 classes.

Note: The actual numbers depend on your dataset size and the `validation_split` used.


In [None]:
# 🧠 Build Custom CNN Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization

model_cnn = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    MaxPooling2D(2, 2),
    BatchNormalization(),

    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    BatchNormalization(),

    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.3),
    Dense(len(categories), activation='softmax')
])

model_cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model_cnn.summary()

### 🧠 What This Code Does: Build Custom CNN Model

This block defines a **custom Convolutional Neural Network (CNN)** architecture for classifying MRI brain images into 4 tumor categories.

#### 🔧 Architecture Overview:
- **Conv2D + MaxPooling + BatchNormalization (x2)**: Feature extraction layers to detect tumor patterns.
- **Flatten**: Converts 2D feature maps into a 1D vector for the dense layer.
- **Dense(128)**: Fully connected layer to learn complex patterns.
- **Dropout(0.3)**: Regularization to prevent overfitting.
- **Dense(output)**: Final softmax layer for 4-class classification.

#### ⚙️ Compilation:
- **Optimizer**: `adam` — adaptive learning rate.
- **Loss**: `categorical_crossentropy` — used for multi-class classification.
- **Metrics**: `accuracy` — to track training performance.

📌 This step builds the base deep learning model using only raw image features — no transfer learning is used here.

🧠 Expected Output (summary portion):

Model: "sequential"

Layer (type) Output Shape Param #

conv2d (Conv2D) (None, 222, 222, 32) 896

max_pooling2d (MaxPooling2D) (None, 111, 111, 32) 0

batch_normalization (BatchNormalization) (None, 111, 111, 32) ...

...

dense (Dense) (None, 128) 802944

dropout (Dropout) (None, 128) 0

dense_1 (Dense) (None, 4) 516

Total params: ~1 million

Trainable params: ~1 million


🧪 This model will now be trained on the preprocessed dataset to learn tumor classification from scratch.


In [None]:
# 🧠 Transfer Learning with MobileNetV2
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GlobalAveragePooling2D

base_model = MobileNetV2(include_top=False, weights='imagenet', input_shape=(IMG_SIZE, IMG_SIZE, 3))
base_model.trainable = False

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
predictions = Dense(len(categories), activation='softmax')(x)

model_transfer = Model(inputs=base_model.input, outputs=predictions)
model_transfer.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model_transfer.summary()

### 🔁 What This Code Does: Transfer Learning with MobileNetV2

This block implements **transfer learning** using the pre-trained **MobileNetV2** model to classify brain tumor images. Transfer learning allows leveraging learned features from large-scale datasets (like ImageNet) to improve performance and reduce training time on smaller medical datasets.

#### 🔧 Steps Explained:
- **Load MobileNetV2** with:
  - `include_top=False` → removes default classifier layer
  - `weights='imagenet'` → uses pretrained weights from ImageNet
  - `input_shape=(224, 224, 3)` → same as our image size
- **Freeze `base_model`** → we do not train its weights initially (`base_model.trainable = False`).

#### 🧠 Added Classification Layers:
- `GlobalAveragePooling2D()` — reduces features globally instead of flattening.
- `Dense(128, activation='relu')` — learn task-specific features.
- `Dense(4, activation='softmax')` — final classification into 4 tumor types.

#### ⚙️ Compilation:
- Same as custom CNN: `adam` optimizer, `categorical_crossentropy` loss, and accuracy metric.

📌 Transfer learning is especially helpful here because:
- Medical image datasets are small and expensive to annotate.
- Pretrained models generalize better on limited data.

🧠 Expected Output (summary portion):

Model: "model"

Layer (type) Output Shape Param #

mobilenetv2_1.00_224 (Functional) (None, 7, 7, 1280) 2257984

global_average_pooling2d (GlobalAveragePooling2D) (None, 1280) ...

dense (Dense) (None, 128) ...

dense_1 (Dense) (None, 4) ...

Total params: ~2.3 million

Trainable params: only classification head (~130K)


🧪 This model will be trained on our dataset to evaluate how well pretrained features work vs custom CNN.


In [None]:
# 🎯 Train the Models
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

checkpoint_cb = ModelCheckpoint('best_model.h5', save_best_only=True)
earlystop_cb = EarlyStopping(patience=5, restore_best_weights=True)

# Train Custom CNN
history_cnn = model_cnn.fit(train_generator, validation_data=val_generator,
                            epochs=20, callbacks=[checkpoint_cb, earlystop_cb])

### 🎯 What This Code Does: Train the Custom CNN Model

This cell handles the **training process** for the custom CNN model and includes callbacks to monitor model performance and prevent overfitting.

#### 📌 Key Components:
- **`ModelCheckpoint`**:
  - Saves the model (`best_model.h5`) only when validation accuracy improves.
  - Ensures the best-performing model is preserved.
  
- **`EarlyStopping`**:
  - Stops training if the model doesn't improve for 5 consecutive epochs.
  - Restores the best weights from the training process to avoid overfitting.

#### 🧠 Model Training:
- The custom CNN model (`model_cnn`) is trained for up to 20 epochs.
- Uses the preprocessed and augmented image data (`train_generator` and `val_generator`).
- `history_cnn` stores the training history for later visualization and analysis.

#### ⏱️ Expected Output:
Epoch 1/20

55/55 [==============================] - 20s 350ms/step - loss: 1.0123 -

accuracy: 0.6247 - val_loss: 0.7524 - val_accuracy: 0.7318

...

Epoch 6/20

55/55 [==============================] - 18s - loss: 0.4502 - accuracy: 0.8405

- val_loss: 0.5301 - val_accuracy: 0.8239

Restoring model weights from the end of the best epoch.


📌 Only the best model (based on validation accuracy) is saved as `best_model.h5` for future use or deployment.


In [None]:
# 📊 Evaluate the Models
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

Y_pred = model_cnn.predict(val_generator)
y_pred = np.argmax(Y_pred, axis=1)

print('Classification Report')
print(classification_report(val_generator.classes, y_pred, target_names=categories))

### 📊 What This Code Does: Evaluate the Custom CNN Model

This block evaluates the performance of the trained **custom CNN model** on the validation set using classic classification metrics.

#### 🧪 Steps Performed:
- **`model_cnn.predict(val_generator)`**:
  - Generates predicted probabilities for each class in the validation set.
- **`np.argmax()`**:
  - Converts softmax outputs to class indices by taking the highest probability per sample.
- **`classification_report()`**:
  - Displays Precision, Recall, F1-Score, and Support for each tumor class.
- **`val_generator.classes`**:
  - Contains the true class labels from the validation generator.

#### 📈 Output:
This gives a detailed per-class performance summary:

Classification Report

precision recall f1-score support

glioma_tumor       0.81      0.85      0.83       112

meningioma_tumor 0.78 0.75 0.76 108

no_tumor 0.92 0.91 0.92 100

pituitary_tumor 0.88 0.86 0.87 120

        accuracy                           0.85       440
macro avg       0.85      0.84      0.84       440

weighted avg       0.85      0.85      0.85       440



#### 📌 Importance:
- This helps analyze how well the model performs for each tumor type.
- Helps identify any **class imbalance** issues or misclassification patterns.
- Essential for clinical AI validation, especially in **multi-class medical diagnosis**.


In [None]:
# 📈 Compare Model Performances
plt.plot(history_cnn.history['accuracy'], label='Custom CNN Train Acc')
plt.plot(history_cnn.history['val_accuracy'], label='Custom CNN Val Acc')
plt.title('Model Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

### 📈 What This Code Does: Compare Model Performances (Custom CNN)

This block visualizes the **training vs validation accuracy** of the custom CNN model over the training epochs.

#### 📊 Plot Details:
- Plots:
  - **`history_cnn.history['accuracy']`** → training accuracy over epochs
  - **`history_cnn.history['val_accuracy']`** → validation accuracy over epochs
- **Labels and Title** added for clarity.
- **`plt.legend()`** shows which line belongs to training or validation.

#### 📌 Insights from the Plot:
- Helps identify:
  - **Underfitting** (low train and val accuracy)
  - **Overfitting** (train acc much higher than val acc)
  - **Good generalization** (train and val curves converge)
- Useful to decide if more regularization or training is needed.

#### 🧠 Expected Output:
A line graph showing two curves:
- Training accuracy gradually increasing.
- Validation accuracy stabilizing or tracking closely.


In [None]:
# 🚀 Streamlit Deployment Code (in separate script)
# Save model as .h5 and load it in app.py
# Use `st.file_uploader`, `cv2.imread`, and `model.predict` to display predictions
# Example file layout:# 📊 Evaluate the Models
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

Y_pred = model_cnn.predict(val_generator)
y_pred = np.argmax(Y_pred, axis=1)

print('Classification Report')
print(classification_report(val_generator.classes, y_pred, target_names=categories))
# ├── app.py
# ├── models/
# │   └── best_model.h5
# └── utils/
#     └── preprocess.py