# Complete Brain Tumor Classification Workflow

This notebook guides you through the entire process from setting up the environment to training and evaluating models for brain tumor classification.

## Workflow Overview

1. **Mount Google Drive** (for Colab)
2. **Clone/Update Repository**
3. **Install Dependencies**
4. **Preprocess Data** (required for first run or when raw data changes)
5. **Train Model**
6. **Evaluate Model**

## 1. Mount Google Drive (for Google Colab)

If you're running this in Google Colab, first mount your Google Drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 2. Clone/Update Repository

If the repository doesn't exist, it will be cloned. If it already exists, it will be updated with the latest changes.

In [None]:
# Check if repo exists, clone if it doesn't, or update if it does
import os

repo_name = "SE4050-Deep-Learning-Assignment"
if not os.path.exists(repo_name):
    print(f"Repository {repo_name} not found. Cloning...")
    !git clone https://github.com/IT22052124/SE4050-Deep-Learning-Assignment.git
    %cd {repo_name}
else:
    print(f"Repository {repo_name} already exists. Updating...")
    %cd {repo_name}
    # Pull the latest changes from the repository
    !git pull
    # Show the latest commit to confirm the update
    !git log -1 --pretty=format:"Updated to: %h - %s (%an, %ar)"

## 3. Install Dependencies

Install all required packages from requirements.txt.

In [None]:
!pip install -r requirements.txt

## 4. Preprocess the Data

**IMPORTANT**: This step is required when:
- Setting up the project for the first time
- When raw data has been changed
- If you want to recreate the processed data structure

This step converts your raw images into a structured train/val/test format that the models can use. The preprocessed data will be saved in the designated output location.

In [None]:
# Configure data paths for preprocessing
# Adjust these to your specific folder structure if needed
import os
from pathlib import Path

# Define where your raw data is located
drive_root = "/content/drive/MyDrive/brain_tumor_project"  # Update this to your specific drive path
raw_data_path = Path(drive_root) / "data/archive"
output_data_path = Path(drive_root) / "data/processed"

# Create output directory if it doesn't exist
os.makedirs(output_data_path, exist_ok=True)

# Run preprocessing script
!python src/common/preprocess.py

## 5. Train the CNN Model

Train the CNN model using the preprocessed data. You can customize hyperparameters as needed.

In [None]:
# Define paths and parameters for training
data_dir = "/content/drive/MyDrive/brain_tumor_project/data/processed"
output_dir = "/content/drive/MyDrive/brain_tumor_project/experiments/cnn"

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

# Run training with processed data structure
!python src/models/cnn/train_cnn.py \
  --data_dir={data_dir} \
  --output_dir={output_dir} \
  --use_processed \
  --epochs=20 \
  --batch_size=32 \
  --learning_rate=0.001

## 6. Evaluate the Model

Evaluate the trained model on the test set to get performance metrics and visualizations.

In [None]:
# Evaluate the trained model
model_path = f"{output_dir}/model.h5"

!python src/models/cnn/evaluate_cnn.py \
  --data_dir={data_dir} \
  --model_path={model_path} \
  --output_dir={output_dir}/evaluation \
  --use_processed

## 7. Visualize Results

Display evaluation metrics and visualizations.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path

# Load and display metrics
eval_dir = Path(f"{output_dir}/evaluation")
if (eval_dir / "metrics.csv").exists():
    metrics = pd.read_csv(eval_dir / "metrics.csv")
    print("Model Performance Metrics:")
    print(metrics)
    
# Display confusion matrix if available
if (eval_dir / "confusion_matrix.png").exists():
    plt.figure(figsize=(8, 6))
    img = plt.imread(eval_dir / "confusion_matrix.png")
    plt.imshow(img)
    plt.axis('off')
    plt.title('Confusion Matrix')
    plt.show()