# **Project Name**    - Brain Tumor MRI Image Classification



##### **Project Type**    - Classification
##### **Contribution**    - Individual
##### **Contributor -** UNNIMAYA K


# **Project Summary -**



---

## 🧠 Brain Tumor MRI Image Classification – Project Summary

### 📌 Overview

Brain tumors are life-threatening conditions that require early diagnosis for effective treatment. Radiologists typically rely on MRI (Magnetic Resonance Imaging) scans for tumor detection and classification. However, manual analysis can be time-consuming, prone to human error, and resource-intensive—especially in remote areas or during a healthcare surge. This project presents a deep learning-based solution to **automate the classification of brain MRI images** into distinct tumor types using both a **custom-built CNN** and **transfer learning** approaches. The final solution is deployed as a **Streamlit web application** to allow seamless real-time predictions for medical professionals and researchers.

---

### 🎯 Objective

The primary aim is to build a robust, accurate, and scalable machine learning pipeline that can:

* Classify brain MRI images into predefined categories: **Glioma Tumor**, **Meningioma Tumor**, **Pituitary Tumor**, and **No Tumor**.
* Leverage both **custom convolutional neural networks** and **pretrained transfer learning models**.
* Provide an interactive and accessible **Streamlit interface** for end-users to upload images and receive instant classification results.

---

### 🧾 Dataset Description

The dataset used is the **Brain Tumor MRI Multi-Class Dataset**, which contains thousands of labeled MRI images categorized into four classes. The images differ in quality, brightness, orientation, and resolution, which introduces real-world variability. Thus, preprocessing and augmentation become essential steps in the pipeline.

---

### 🛠️ Project Workflow

#### 1. **Data Understanding**

* Explored sample images from each class.
* Analyzed the class distribution to detect any **class imbalance**.
* Identified inconsistencies in image resolution and structure.

#### 2. **Data Preprocessing & Augmentation**

* Resized all images to a standard input shape (e.g., **224×224 pixels**).
* Normalized pixel values to the 0–1 range.
* Applied **data augmentation** (rotation, flip, zoom, brightness shift) to enhance generalization and combat overfitting.

#### 3. **Model Building**

* **Custom CNN**: Designed from scratch with convolutional layers, pooling, batch normalization, and dropout layers to minimize overfitting.
* **Transfer Learning**: Used pretrained models like **ResNet50**, **EfficientNetB0**, and **MobileNetV2**, replacing top layers to adapt them to the four tumor classes.
* Used `EarlyStopping`, `ModelCheckpoint`, and validation monitoring during training.

#### 4. **Model Evaluation**

* Evaluated model performance on unseen test data using:

  * **Accuracy**
  * **Precision**, **Recall**, and **F1-score**
  * **Confusion Matrix**
* Visualized training history (accuracy/loss curves) to detect overfitting or underfitting.

#### 5. **Model Comparison**

* The pretrained models outperformed the custom CNN in terms of accuracy and training speed.
* **EfficientNetB0** was selected as the final model due to its **high accuracy, smaller size**, and **fast inference speed**.

#### 6. **Deployment with Streamlit**

* Built a responsive **Streamlit web app** where users can upload MRI images.
* Displays the predicted tumor class and model confidence.
* The UI is simple, intuitive, and accessible to non-technical users (e.g., medical practitioners).

---

### 📈 Key Insights

* Transfer learning significantly boosts model performance, especially with limited labeled data.
* Data augmentation is essential for preventing overfitting and ensuring model generalization.
* Early detection of brain tumors via automated classification can support radiologists, especially in resource-constrained settings.

---

### 📦 Deliverables

* `.h5` files for trained models.
* Jupyter notebooks/scripts for preprocessing, training, evaluation, and deployment.
* Streamlit app script (`app.py`) for web-based prediction.
* Documentation including README, performance plots, and insights.

---

### 💡 Business Impact

* **Faster diagnosis**: Reduces diagnostic delay and speeds up clinical decisions.
* **Scalability**: Easily deployable in hospitals, research labs, and rural health centers.
* **Cost-effective**: Automates repetitive image classification tasks to reduce labor costs.
* **Supports telemedicine**: Helps doctors in remote areas or with limited access to radiologists.

---



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Brain tumors are among the most life-threatening and challenging medical conditions to diagnose and treat. The traditional diagnosis process, which relies heavily on radiologists interpreting MRI (Magnetic Resonance Imaging) scans, can be time-consuming, prone to human error, and limited by the availability of medical experts—especially in under-resourced regions. Early and accurate classification of brain tumors is critical for timely intervention and improved patient outcomes.

This project aims to develop an AI-powered image classification system using deep learning techniques to automatically classify brain MRI scans into multiple tumor types, including Glioma, Meningioma, Pituitary Tumor, and No Tumor. The solution should be accurate, scalable, and user-friendly, supporting radiologists and medical professionals in making faster and more informed decisions.

The system will be trained using both custom CNN architectures and pretrained transfer learning models, evaluated for performance, and finally deployed using Streamlit as an interactive web application. By automating the tumor classification process, this project seeks to reduce the diagnostic burden on medical professionals and enhance healthcare delivery—especially in regions lacking specialized medical expertise..

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# 📦 Basic Libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 🖼 Image Handling
import cv2
from PIL import Image

# 🔁 Data Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# 🧠 Deep Learning Libraries (TensorFlow & Keras)
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.applications import ResNet50, MobileNetV2, InceptionV3, EfficientNetB0
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobilenet_preprocess
from tensorflow.keras.applications.inception_v3 import preprocess_input as inception_preprocess
from tensorflow.keras.applications.efficientnet import preprocess_input as efficientnet_preprocess
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GlobalAveragePooling2D, Input

# ✅ Suppress warnings and logs for clean output
import warnings
warnings.filterwarnings('ignore')
tf.get_logger().setLevel('ERROR')


### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')


In [None]:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

valid_datagen = ImageDataGenerator(rescale=1./255)

In [None]:
import os

dataset_path = "/content/drive/MyDrive/Tumour"
print("Classes:", os.listdir(dataset_path))


In [None]:
# Define paths
base_path = "/content/drive/MyDrive/Tumour"
train_path = os.path.join(base_path, "train")
valid_path = os.path.join(base_path, "valid")
test_path = os.path.join(base_path, "test")


In [None]:
training_set = train_datagen.flow_from_directory(
    train_path,
    target_size=(128, 128),
    batch_size=32,
    class_mode='categorical'
)

In [None]:
validation_set = valid_datagen.flow_from_directory(
    valid_path,
    target_size=(128, 128),
    batch_size=32,
    class_mode='categorical'
)


In [None]:
num_classes = training_set.num_classes  # Dynamically get number of classes

In [None]:

import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import GlobalAveragePooling2D, Dropout, Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping


In [None]:
# Load Pretrained Model
base_model = MobileNetV2(input_shape=(128, 128, 3), include_top=False, weights='imagenet')
base_model.trainable = False  # Freeze initial layers

In [None]:

# Define the Model
cnn = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    Dense(512, activation='relu'),
    Dropout(0.4),
    Dense(num_classes, activation='softmax')
])

In [None]:

# Compile the Model
cnn.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
            loss='categorical_crossentropy',
            metrics=['accuracy'])

In [None]:
# Callbacks
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=1e-6)
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

In [None]:

training_history = cnn.fit(training_set,
                           validation_data=validation_set,
                           epochs=20,
                           callbacks=[early_stopping, lr_scheduler])

In [None]:

# Evaluate Model Performance
train_loss, train_acc = cnn.evaluate(training_set)
val_loss, val_acc = cnn.evaluate(validation_set)
print(f"Train Loss: {train_loss:.4f}, Train Accuracy: {train_acc:.4f}")
print(f"Validation Loss: {val_loss:.4f}, Validation Accuracy: {val_acc:.4f}")

In [None]:

# Save Training History
import json
with open('/content/drive/MyDrive/tumor_training_hist.json', 'w') as f:
    json.dump(training_history.history, f)

print(training_history.history.keys())

In [None]:

# Save the Model
cnn.save('/content/drive/MyDrive/trained_braintumor_disease_model.keras')

In [None]:
cnn.save("/content/drive/MyDrive/trained_braintumor_disease_model.h5")

In [None]:

# Plot Accuracy Curves
import matplotlib.pyplot as plt

# Extract number of epochs from history
num_epochs = len(training_history.history['accuracy'])
epochs = list(range(1, num_epochs + 1))

plt.figure(figsize=(8,5))
plt.plot(epochs, training_history.history['accuracy'], color='red', label='Training Accuracy')
plt.plot(epochs, training_history.history['val_accuracy'], color='blue', label='Validation Accuracy')
plt.xlabel('No. of Epochs')
plt.ylabel('Accuracy')
plt.title('Accuracy Visualization')
plt.legend()
plt.show()


In [None]:
import matplotlib.pyplot as plt

history = training_history.history
epochs = list(range(1, len(history['accuracy']) + 1))

# 1. Accuracy Plot
plt.figure(figsize=(8,5))
plt.plot(epochs, history['accuracy'], 'r-', label='Training Accuracy')
plt.plot(epochs, history['val_accuracy'], 'b-', label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('1️⃣ Training vs Validation Accuracy')
plt.legend()
plt.grid(True)
plt.show()

# 2. Loss Plot
plt.figure(figsize=(8,5))
plt.plot(epochs, history['loss'], 'orange', label='Training Loss')
plt.plot(epochs, history['val_loss'], 'green', label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('2️⃣ Training vs Validation Loss')
plt.legend()
plt.grid(True)
plt.show()

# 3. Accuracy & Loss Together
plt.figure(figsize=(10,6))
plt.plot(epochs, history['accuracy'], 'r--', label='Train Accuracy')
plt.plot(epochs, history['val_accuracy'], 'b--', label='Val Accuracy')
plt.plot(epochs, history['loss'], 'r-', label='Train Loss')
plt.plot(epochs, history['val_loss'], 'b-', label='Val Loss')
plt.xlabel('Epochs')
plt.title('3️⃣ Accuracy & Loss Together')
plt.legend()
plt.grid(True)
plt.show()

# 4. Zoomed-in Validation Accuracy
plt.figure(figsize=(8,5))
plt.plot(epochs, history['val_accuracy'], color='purple', marker='o')
plt.xlabel('Epochs')
plt.ylabel('Validation Accuracy')
plt.title('4️⃣ Zoomed-in Validation Accuracy')
plt.ylim(min(history['val_accuracy']) - 0.05, max(history['val_accuracy']) + 0.05)
plt.grid(True)
plt.show()

# 5. Accuracy Gain Between Epochs
accuracy_gain = [history['accuracy'][i] - history['accuracy'][i-1] for i in range(1, len(epochs))]
plt.figure(figsize=(8,5))
plt.bar(range(2, len(epochs)+1), accuracy_gain, color='teal')
plt.xlabel('Epochs')
plt.ylabel('Accuracy Gain')
plt.title('5️⃣ Accuracy Gain Per Epoch (Train Set)')
plt.grid(True)
plt.show()


In [None]:
# Load Model for Testing
cnn = tf.keras.models.load_model('/content/drive/MyDrive/trained_braintumor_disease_model.keras')
cnn.summary()

In [None]:
import tensorflow as tf

try:
    model = tf.keras.models.load_model("/content/drive/MyDrive/trained_braintumor_disease_model.keras")
    print("✅ Model loaded successfully!")
except Exception as e:
    print(f"🚨 Error loading model: {e}")

In [None]:


# Install OpenCV (Only if needed)
try:
    import cv2
except ImportError:
    %pip install opencv-python
    import cv2

In [None]:
# Test Image Path
image_path = '/content/drive/MyDrive/Tumour/test/pituitary/Tr-pi_0090_jpg.rf.c08e55649aa763919fefc64964f2e6b4.jpg'

In [None]:
import cv2
import matplotlib.pyplot as plt  # Correct import
  # Update with actual image path

# Check if the file exists before loading
import os
if not os.path.exists(image_path):
    print("Error: Image file not found!")
else:
    img = cv2.imread(image_path)

    if img is None:
        print("Error: OpenCV couldn't read the image!")
    else:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # Convert BGR to RGB
        plt.imshow(img)  # Show image
        plt.title('Test Image')
        plt.xticks([])
        plt.yticks([])
        plt.show()

In [None]:

# Load and Preprocess Test Image
import numpy as np
image = tf.keras.preprocessing.image.load_img(image_path, target_size=(128,128))
input_arr = tf.keras.preprocessing.image.img_to_array(image)
input_arr = np.expand_dims(input_arr, axis=0) / 255.0  # Normalize

In [None]:
# Softmax with Temperature Scaling
def softmax_with_temperature(logits, temperature=2.0):
    exp_logits = np.exp(logits / temperature)
    return exp_logits / np.sum(exp_logits, axis=1, keepdims=True)

In [None]:
# Make Predictions
predictions = cnn.predict(input_arr)  # Get raw logits
scaled_predictions = softmax_with_temperature(predictions, temperature=2.0)  # Apply Temperature Scaling


In [None]:
# Get Class Labels
class_labels = list(training_set.class_indices.keys())  # Ensure class names are retrieved properly

In [None]:
# Display Prediction
result_index = np.argmax(scaled_predictions)
print("Predicted Class:", class_labels[result_index])

In [None]:

# Displaying the disease prediction
model_prediction = class_labels[result_index]
plt.imshow(img)
plt.title(f"Disease Name: {model_prediction}")
plt.xticks([])
plt.yticks([])
plt.show()

In [None]:

!pip install streamlit tensorflow

In [None]:

!pip install pyngrok

In [None]:
%%writefile app.py
import streamlit as st
import tensorflow as tf
import numpy as np
from PIL import Image

# Load model
@st.cache_resource
def load_model():
    model = tf.keras.models.load_model("/content/drive/MyDrive/trained_braintumor_disease_model.h5")
    return model

model = load_model()

# Class names (update if different)
class_names = ['Glioma', 'Meningioma', 'No Tumor', 'Pituitary']

st.set_page_config(page_title="Brain Tumor Classifier", layout="centered")
st.title("🧠 Brain Tumor MRI Image Classifier")
st.markdown("Upload a brain MRI image to predict tumor type.")

uploaded_file = st.file_uploader("Upload MRI Image", type=["jpg", "png", "jpeg"])

if uploaded_file is not None:
    image = Image.open(uploaded_file).convert('RGB')
    st.image(image, caption='Uploaded MRI Image', use_column_width=True)

    # ✅ Resize to match model input shape (128x128)
    img = image.resize((128, 128))
    img_array = tf.keras.preprocessing.image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = img_array / 255.0  # normalize

    # Predict
    predictions = model.predict(img_array)
    predicted_class = class_names[np.argmax(predictions)]
    confidence = round(100 * np.max(predictions), 2)

    st.success(f"### 🧪 Prediction: **{predicted_class}** ({confidence}% confidence)")


In [None]:

!curl ipv4.icanhazip.com

In [None]:
!streamlit run  app.py & npx localtunnel --port 8501

####**Image Preprocessing with Data Augmentation**

In [None]:
img_size = (224, 224)  # Can be 224x224 or 128x128
batch_size = 32

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    zoom_range=0.1,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

valid_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    train_path, target_size=img_size, batch_size=batch_size, class_mode='categorical'
)

valid_generator = valid_datagen.flow_from_directory(
    valid_path, target_size=img_size, batch_size=batch_size, class_mode='categorical'
)

test_generator = test_datagen.flow_from_directory(
    test_path, target_size=img_size, batch_size=batch_size, class_mode='categorical', shuffle=False
)


####**Build a Custom CNN Model**

In [None]:
model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(224,224,3)),
    MaxPooling2D(2,2),
    BatchNormalization(),

    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    BatchNormalization(),

    Conv2D(128, (3,3), activation='relu'),
    MaxPooling2D(2,2),
    BatchNormalization(),

    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.3),
    Dense(train_generator.num_classes, activation='softmax')  # number of tumor classes
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()


####**Train the Model**

In [None]:
early_stop = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
checkpoint = ModelCheckpoint('best_model.h5', monitor='val_loss', save_best_only=True)

history = model.fit(
    train_generator,
    validation_data=valid_generator,
    epochs=10,
    callbacks=[early_stop, checkpoint]
)


####**Visualize Training Accuracy & Loss**

In [None]:
# Plot accuracy and loss
plt.figure(figsize=(12,5))

# Accuracy
plt.subplot(1,2,1)
plt.plot(history.history['accuracy'], label='Train Acc')
plt.plot(history.history['val_accuracy'], label='Val Acc')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

# Loss
plt.subplot(1,2,2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()


### Dataset First View

### Dataset Information

In [None]:
# Dataset Info
# Check basic structure
df.info()


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# Count missing values in each column
df.isnull().sum()


In [None]:
# List all unique tumor types (labels)
df['Label'].unique()


In [None]:
# Total images and class distribution
print("Total images:", len(df))
print("Class distribution:")
print(df['Label'].value_counts())


### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

> Add blockquote



In [None]:
# Dataset Columns

In [None]:
# Dataset Describe
# Get basic stats for label counts
df['Label'].value_counts().describe()


In [None]:
df['Label'].value_counts()


In [None]:
# If width/height columns exist
df[['Width', 'Height']].describe()


### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in df.columns:
    unique_vals = df[col].nunique()
    print(f"{col}: {unique_vals} unique values")


In [None]:
for col in df.columns:
    print(f"\nColumn: {col}")
    print(df[col].unique())


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
import os
import numpy as np
import pandas as pd
import shutil
import matplotlib.pyplot as plt


In [None]:
# Path to your dataset folder containing subfolders per tumor class
dataset_dir = "/content/brain_tumor_dataset"  # update path if needed

# List the tumor type folders (classes)
classes = os.listdir(dataset_dir)
print("Tumor Classes:", classes)


In [None]:
# Count number of images in each class
class_counts = {label: len(os.listdir(os.path.join(dataset_dir, label))) for label in classes}
df_counts = pd.DataFrame(list(class_counts.items()), columns=['Tumor_Type', 'Image_Count'])
print(df_counts)

# Optional: Visualize the class distribution
plt.figure(figsize=(8, 4))
plt.bar(df_counts['Tumor_Type'], df_counts['Image_Count'], color='skyblue')
plt.title("Image Count per Tumor Type")
plt.xlabel("Tumor Type")
plt.ylabel("Number of Images")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


In [None]:
# Create base output folder
base_path = "/content/processed_dataset"
os.makedirs(base_path, exist_ok=True)

# Create subfolders for train, val, test splits
for subset in ['train', 'val', 'test']:
    for label in classes:
        os.makedirs(os.path.join(base_path, subset, label), exist_ok=True)


In [None]:
def split_data(class_name, train_size=0.7, val_size=0.15, test_size=0.15):
    img_paths = os.listdir(os.path.join(dataset_dir, class_name))
    img_paths = [f for f in img_paths if f.endswith(('.jpg', '.png', '.jpeg'))]
    np.random.shuffle(img_paths)

    total = len(img_paths)
    train_end = int(train_size * total)
    val_end = train_end + int(val_size * total)

    train_files = img_paths[:train_end]
    val_files = img_paths[train_end:val_end]
    test_files = img_paths[val_end:]

    for img in train_files:
        shutil.copy(os.path.join(dataset_dir, class_name, img), os.path.join(base_path, 'train', class_name, img))
    for img in val_files:
        shutil.copy(os.path.join(dataset_dir, class_name, img), os.path.join(base_path, 'val', class_name, img))
    for img in test_files:
        shutil.copy(os.path.join(dataset_dir, class_name, img), os.path.join(base_path, 'test', class_name, img))


In [None]:
# Apply the split function for each tumor type
for cls in classes:
    split_data(cls)

print("✅ Dataset split into train, val, and test folders.")


### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***