**<center><font size=5>Build your brain tumor AI model</font></center>**
***


**Table of Contents**
- <a href='#intro'>1. Project Overview and Objectives</a>
    - <a href='#dataset'>1.1. Data Set Description</a>
    - <a href='#tumor'>1.2. What is Brain Tumor?</a>
- <a href='#env'>2. Setting up the Environment</a>
- <a href='#import'>3. Data Import and Preprocessing</a>
- <a href='#cnn'>4. Building the AI model</a>
- <a href='#cnn'>5. Model evaluation</a>
- <a href='#concl'>6. Testing the model</a>
- <a href='#concl'>7. Conclusion</a>

# Introduction
Welcome to the Vizuara AI Labs project notebook. This guide is designed to help you build your own machine learning model for medical imaging diagnosis, starting with brain tumor detection. The structure of this notebook is organized into modular building blocks, allowing you to easily adapt and apply this workflow to other projects, such as heart disease classification, by modifying specific sections.

# <a id='intro'>1. Project Overview and Objectives</a>

The main purpose of this project was to build a CNN model that would classify if subject has a tumor or not base on MRI scan.

## <a id='dataset'>1.1. Data Set Description</a>

The image data that was used for this problem is [Brain MRI Images for Brain Tumor Detection](https://www.kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection). It conists of MRI scans of two classes:

* `NO` - no tumor, encoded as `0`
* `YES` - tumor, encoded as `1`

Unfortunately, the data set description doesn't hold any information where this MRI scans come from and so on.

## <a id='tumor'>1.2. What is Brain Tumor?</a>

> A brain tumor occurs when abnormal cells form within the brain. There are two main types of tumors: cancerous (malignant) tumors and benign tumors. Cancerous tumors can be divided into primary tumors, which start within the brain, and secondary tumors, which have spread from elsewhere, known as brain metastasis tumors. All types of brain tumors may produce symptoms that vary depending on the part of the brain involved. These symptoms may include headaches, seizures, problems with vision, vomiting and mental changes. The headache is classically worse in the morning and goes away with vomiting. Other symptoms may include difficulty walking, speaking or with sensations. As the disease progresses, unconsciousness may occur.
>
> ![](https://upload.wikimedia.org/wikipedia/commons/5/5f/Hirnmetastase_MRT-T1_KM.jpg)
>
> *Brain metastasis in the right cerebral hemisphere from lung cancer, shown on magnetic resonance imaging.*

Source: [Wikipedia](https://en.wikipedia.org/wiki/Brain_tumor)

# <a id='intro'>2. Setting up the Environment: Import Statements</a>


## import os
**What:**  
Python module to interact with files and folders.

**Why:**  
Used to load images, read directories, and create file paths.

**Advantages:**  
- Built-in  
- Works on all operating systems  

**Disadvantages:**  
- Wrong usage can delete files  
- Not directly ML related  


In [1]:
import os  # os = Python’s Operating System module.


## import keras
**What:**  
High-level deep learning library. Used to build, train, and evaluate neural networks easily.

**Advantages:**  
- Very simple  
- Beginner-friendly  

**Disadvantages:**  
- Less flexible than PyTorch  
- Version mismatch possible  

In [2]:
import keras




## from keras.models import Sequential
**What:**  
A linear stack of layers. Used when building simple CNN models layer-by-layer.

**Advantages:**  
- Very easy to write  
- Good for basic CNNs  

**Disadvantages:**  
- Cannot handle complex architectures  

In [3]:
from keras.models import Sequential

## Conv2D
**What:**  
Convolution layer to extract features from images. Used in all CNNs for feature detection.

**Advantages:**  
- Learns edges, textures, shapes  
- Key part of image models  

**Disadvantages:**  
- Heavy computation  

In [4]:
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout, BatchNormalization

In [5]:
from PIL import Image
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('dark_background')
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

# <a id='intro'>3(a) Data Import</a>

When working on different projects, you will need to load a different dataset. The best way to load a dataset is as follows:

(a) Upload the dataset to Google Drive

(b) The image path will be `/content/drive/My Drive/name_of_your_dataset`

In [6]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google.colab'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

# <a id='intro'>3(b) Data Processing</a>

1. First, we create a data list for storing image data in numpy array form
2. Secondly, we create a paths list for storing paths of all images
3. Thirdly, we create result list for storing one hot encoded form of target class whether normal or tumor

The label 0 is transformed into [1, 0] (one-hot encoding).

The label 1 is transformed into [0, 1] (one-hot encoding).

# How CNN Inputs Are Generated (Image → Tensor Pipeline)

We want to convert images into a numeric format that a CNN can understand.

## Step 0: Folder Structure

We assume images are stored like this:

dataset/  
├── cat/  
│   ├── cat1.jpg  
│   ├── cat2.jpg  
├── dog/  
│   ├── dog1.jpg  
│   ├── dog2.jpg  
└── horse/  
    ├── horse1.jpg  

Each folder name is the **class label**.

## Step 1: Load Image

We read an image from disk using PIL:

- Open file: `"dataset/cat/cat1.jpg"`
- Still in image format (not numbers yet)

## Step 2: Resize Image

CNN needs **fixed size images**, e.g.:

- 128 × 128
- or 224 × 224

So we resize every image to the same size to keep input shape consistent.

## Step 3: Convert to NumPy Array

Image → NumPy Array:

- Shape becomes: `(height, width, channels)`
- Example: `(128, 128, 3)` for RGB

Now image is just a bunch of numbers.

## Step 4: Normalize (Scale Pixel Values)

Raw pixels are in range **0–255**.

We divide by 255.0 to bring values into **0–1** range:

- Helps faster and more stable training.

## Step 5: Stack All Images into One Big Array

If we have `N` images, each of shape `(H, W, C)`, then:

- Final input shape: `(N, H, W, C)`

This is what we pass to the CNN.

## Simple Diagram: Image → Tensor

Raw Images (JPG/PNG)  
↓ (load with PIL)  
Resized Images (128×128)  
↓ (convert with NumPy)  
Arrays of Shape (128, 128, 3)  
↓ (divide by 255)  
Normalized Tensors  
↓ (stack all)  
Final Input: Shape (N, 128, 128, 3) → fed into CNN

## Labels (y)

- Each image belongs to a class: `cat`, `dog`, `horse`, etc.
- We convert text labels into numbers and then (optionally) into one-hot vectors.
- Example one-hot for 3 classes:

cat   → [1, 0, 0]  
dog   → [0, 1, 0]  
horse → [0, 0, 1]

These become the **target outputs** for training the CNN.


# Explanation: Processing Brain Tumor Images (YES category)

This code loads all “YES” (tumor present) images, resizes them, converts them into arrays, and assigns label = 0.



## Step 1: Initialize empty lists

data = []      → stores the processed images as NumPy arrays  
paths = []     → stores file paths of images  
result = []    → stores encoded labels for each image  



## Step 2: Loop through all files inside the "yes" folder

for r, d, f in os.walk('/content/drive/My Drive/brain_tumor_dataset/yes'):

- `os.walk()` goes through all subfolders and files inside the directory.
- r = root folder path  
- d = subdirectories  
- f = file names  

We only want `.jpg` images.



## Step 3: Build full file paths

paths.append(os.path.join(r, file))

- Joins folder path + file name  
- Example:  
  "/content/.../yes" + "Y1.jpg" → "/content/.../yes/Y1.jpg"  

We store all image paths in `paths` list.



## Step 4: Process each image one by one

for path in paths:



## Step 5: Open the image

img = Image.open(path)

- Loads the image from disk into memory.



## Step 6: Resize the image

img = img.resize((128,128))

- All images must be the **same size** (128×128).  
- CNNs cannot accept images of different sizes.



## Step 7: Convert the image to NumPy array

img = np.array(img)

- Changes the image into a tensor of shape (128, 128, 3).  
- CNNs only understand numbers, not images.



## Step 8: Filter only RGB images

if(img.shape == (128,128,3)):

Why?

- Some images may be grayscale (128×128×1).  
- CNN expects 3 channels (RGB).  
- This line ensures only valid RGB images are added.



## Step 9: Store the processed image

data.append(np.array(img))

- Adds the image array to `data` list.  
- `data` becomes your X (input images).



## Step 10: Add the label for "YES" tumor class

result.append(encoder.transform([[0]]).toarray())

Important:

- `0` is the label for YES (tumor present).  
- `encoder.transform([[0]])` converts label 0 into **one-hot format**.  
- Example if 2 classes:  
  0 → [1, 0]  
  1 → [0, 1]
- `.toarray()` converts sparse matrix to normal NumPy array.

So for every tumor image, we add:

Label → `[1, 0]`

This becomes your y (output labels).


## Final understanding

### data → list of processed images  
Shape example: (128,128,3)

### result → list of labels for each image  
Example: [ [1,0], [1,0], [1,0], ... ]

You now have:

- **X = data**
- **y = result**

Ready for training the CNN.


In [None]:
# Step 1: Initialize empty lists
data = []   # stores the processed images as NumPy arrays 
paths = []  # stores file paths of images 
result = [] #  stores encoded labels for each image
 
  
