## Lab 6, Group 2
### Names: Hailey DeMark, Deborah Park, Karis Park
### Student IDs: 48869449, 48878679, 48563429

Dataset: (Link to dataset)

In [3]:
import os 
import pandas as pd
import numpy as np
import tensorflow as tf
from PIL import Image
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

In [4]:
np.random.seed(0)
tf.random.set_seed(2)

# define dataset directory
dataset_path = 'Brain Tumor Data Set' 

# collect all image paths and labels
image_paths = []
image_labels = []

for root, _, files in os.walk(dataset_path):
    for file in files:
        if file.lower().endswith(('.png', '.jpg', '.jpeg')):
            image_paths.append(os.path.join(root, file))
            image_labels.append(os.path.basename(root)) 

print(f"Found {len(image_paths)} images.")

# load and preprocess images in grayscale (size 64x64)
image_data = []
for file in image_paths:
    try:
        image = Image.open(file).convert('L')         # convert to grayscale
        image = image.resize((64, 64))                # resize to fixed size
        image_array = np.array(image)                 # convert to NumPy array
        image_data.append(image_array)
    except Exception as e:
        print(f"Error loading {file}: {e}")

# convert list to NumPy array and normalize pixel values
X = np.array(image_data, dtype='float32') / 255.0
X = X.reshape(-1, 64, 64, 1)  # Add grayscale channel dimension

print("X shape:", X.shape)

# encode text labels to binary integers (e.g., yes = 1, no = 0)
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(image_labels)
print("Encoded labels:", np.unique(y))

Found 4514 images.
X shape: (4514, 64, 64, 1)
Encoded labels: [0 1]


### Preparation (3 points total)  
* [1.5 points] Choose and explain what metric(s) you will use to evaluate your algorithm’s performance. You should give a detailed argument for why this (these) metric(s) are appropriate on your data. That is, why is the metric appropriate for the task (e.g., in terms of the business case for the task). Please note: rarely is accuracy the best evaluation metric to use. Think deeply about an appropriate measure of performance.

    * We will use recall as our evaluation metric. Recall is the best metric to evaluate our algorithm's performance since our dataset show MRI scans of potential brain tumors. In this case, a false negative (incorrectly identifying that a tumor is not present) can be extremely serious. Failing to detect a brain tumor can lead to missed or delayed treatment, which would lead to more severe conditions or death. While a false positive may cause unnecessary procedures, more testing, and emotional distress, it is significantly less harmful than overlooking a real tumor. Using recall ensures that the model is optimized to catch as many true tumor cases as possible, even at the expense of preidicting more false positives cases. In the context of medical diagnostics, especially for conditions as serious as brain cancer, it is better to err on the side of caution rather than risk missing anything.  

* [1.5 points] Choose the method you will use for dividing your data into training and testing (i.e., are you using Stratified 10-fold cross validation? Shuffle splits? Why?). Explain why your chosen method is appropriate or use more than one method as appropriate. Convince me that your cross validation method is a realistic mirroring of how an algorithm would be used in practice. 

    *  (I asked AI to verify if we could do a stratified 80/20 split for this lab like our last lab since I wasn't sure how big the dataset should be to be considered big enough to use this split...AI said a stratified 80/20 split is best with another stratified 80/20 split on the training data. I explain more in my explantation, but feel free to change!!)
    

    * We will use a stratified 80/20 split to separate the data into training and test sets, ensuring that the test set remains unseen throughout training and tuning. From the 80% training set, we will then apply another stratified 80/20 split to create a separate validation set so that 64% of the data is traiing data, 16% is validation data, and 20% is test data. This two-step stratified approach is ideal because it ensures that all subsets maintain a consistent proportion of tumor and non-tumor images and it ensures balanced performance metrics. Our dataset has about 55% cancerous images and 44% non-cancerous images, so by using a stratified split, this ensures that each split will reflect the original distribution of the two cases of images. The separate validation set will allow for hyperparameter tuning and modifying the model without touching the test set, and also helps to avoid overfitting to test metrics. In our case, K-fold cross-validation is computationally expensive due to training CNNs and our large dataset. 

In [5]:
# 1. First split: training vs test
X_train_full, X_test, y_train_full, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# 2. Second split: training vs validation
X_train, X_val, y_train, y_val = train_test_split(
    X_train_full, y_train_full, test_size=0.2, stratify=y_train_full, random_state=42
)

### Modeling (6 points total)
* [1.5 points]  Setup the training to use data expansion in Keras (also called data augmentation). Explain why the chosen data expansion techniques are appropriate for your dataset. You should make use of Keras augmentation layers, like in the class examples.

    * (add in explaination about Data Expansion in Keras)

In [6]:
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D, RandomContrast
from tensorflow.keras.layers import RandomContrast, RandomTranslation, RandomZoom
from tensorflow.keras.layers import RandomRotation, RandomFlip
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers import Adam

keras.__version__

'3.9.2'

In [7]:
# Data Augmentation Layer
data_augmentation = Sequential([
    RandomFlip("horizontal"),         # Reflect common real-world scan orientation variance
    RandomRotation(0.05),             # Small tilts from patient positioning
    RandomZoom(0.1),                  # Variance in zoom/focus during imaging
    RandomTranslation(0.1, 0.1),      # Simulate slight patient movement
    RandomContrast(0.1)               # Reflect scanner lighting/contrast differences
], name="data_augmentation")

# CNN Model with augmentation included
model = Sequential()
model.add(data_augmentation)  # Augmentation applied during training only
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))  # Binary output for tumor classification

# Compile the model
model.compile(
    loss='binary_crossentropy',
    optimizer=Adam(),
    metrics=['Recall', 'Precision', 'accuracy']
)

# Train the model
history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    batch_size=32,
    epochs=20,
    shuffle=True,
    verbose=1
)

Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m91/91[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 26ms/step - Precision: 0.5268 - Recall: 0.5209 - accuracy: 0.5457 - loss: 0.6858 - val_Precision: 0.7438 - val_Recall: 0.4521 - val_accuracy: 0.6750 - val_loss: 0.6086
Epoch 2/20
[1m91/91[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 23ms/step - Precision: 0.6912 - Recall: 0.5930 - accuracy: 0.6782 - loss: 0.6165 - val_Precision: 0.8571 - val_Recall: 0.5569 - val_accuracy: 0.7524 - val_loss: 0.5441
Epoch 3/20
[1m91/91[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 23ms/step - Precision: 0.7329 - Recall: 0.6517 - accuracy: 0.7194 - loss: 0.5530 - val_Precision: 0.7730 - val_Recall: 0.7036 - val_accuracy: 0.7676 - val_loss: 0.5097
Epoch 4/20
[1m91/91[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 24ms/step - Precision: 0.7366 - Recall: 0.6786 - accuracy: 0.7288 - loss: 0.5430 - val_Precision: 0.7604 - val_Recall: 0.8174 - val_accuracy: 0.7967 - val_loss: 0.4403
Epoch 5/20
[1m91/91[0m [32m━━━━━━━━━

* [2 points] Create a convolutional neural network to use on your data using Keras. Investigate at least two different convolutional network architectures and investigate changing one or more parameters of each architecture such as the number of filters. This means, at a  minimum, you will train a total of four models (2 different architectures, with 2 parameters changed in each architecture). Use the method of train/test splitting and evaluation metric that you argued for at the beginning of the lab. Visualize the performance of the training and validation sets per iteration (use the "history" parameter of Keras). Be sure that models converge. 

In [None]:
# code

* [1.5 points] Visualize the final results of all the CNNs and interpret/compare the performances. Use proper statistics as appropriate, especially for comparing models. 

In [None]:
# code

* [1 points] Compare the performance of your convolutional network to a standard multi-layer perceptron (MLP) using the receiver operating characteristic and area under the curve. Use proper statistical comparison techniques.  

In [1]:
# code

### Exceptional Work (1 points total)
You have free reign to provide additional analyses. 
One idea (required for 7000 level students): Use transfer learning with pre-trained weights for your initial layers of your CNN. Compare the performance when using transfer learning to your best model from above in terms of classification performance. 

In [2]:
# code