<a href="https://colab.research.google.com/github/KedarPanchal/Breast-Cancer-Detector/blob/main/tumor_detector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning EfficientNet-B1 for Breast Cancer Classification

This notebook walks through the process of fine-tuning the EfficientNet-B1 model on the Breast Ultrasound Images Dataset (BUSI) to classify identified tumors in ultrasounds as benign or malignant.

AI models such as this one act as a "second set of eyes" can help improve radiologist accuracy and ensure that less malignant tumors are misclassified as benign, allowing for more people to receive the treatment they need.

**Dataset:** Dataset of breast ultrasound images by Walid Al-Dhabyani, Mohammed Gomma, Hussien Khaled, and Aly Fahmy.

### Notebook Sections:
1. Check Python Version and Import Dependencies
2. Download, Preprocess, and Load Data
3. Initialize Training and Evaluation Device and Functions
4. Define Model Architecture and Prepare for Training
5. Cross-Validate Model
6. Final Model, Conclusions, and Bibliography

This model was developed on an M4 MacBook Pro, 16 GB Unified RAM, 10-core CPU 10-core GPU

## Section 1: Check Python Version and Import Dependencies

### Python Version
This neural network runs on Python 3.12 to ensure compatability with its dependencies. If you are running this notebook in a virtual environment, ensure you have the correct runtime selected by running the below cell.

In [None]:
!python --version

### Install Packages
Install the following packages for use in the notebook:
* **Torch:** The model is built using the PyTorch framework (this is also what limits the Python version to <= 3.12)
* **Torchvision:** Has functions for handling and preparing datasets for PyTorch models
* **Opendatasets:** Download datasets from the Kaggle online repository
* **Scikit-Learn:** Use its k-fold dataset splitting functionality for k-fold cross validation.

In [None]:
%pip install torch
%pip install torchvision
%pip install opendatasets
%pip install scikit-learn

### Import Necessary Dependencies
Import necessary dependencies for modifying and fine-tuning a pretrained model, loading and transforming data, splitting data, calculating hyperparameters, and logging information during training.

In [None]:
# Model development
import torch
import torch.nn as nn
from torchvision import models

# Model training
import torch.optim as optim
from torch.optim import lr_scheduler
import copy
import os

# Data loading, transforming, and splitting
from torchvision.datasets import ImageFolder
from torchvision.transforms import v2
from torch.utils.data import DataLoader, Subset
from sklearn.model_selection import KFold

# Logging information during training
import time

## Section 2: Download, Preprocess, and Load Data

### Download Dataset
> Prior to running this code block, ensure you have access to your Kaggle username and API Key, as the download will prompt you to enter this information. Visit the Kaggle website for information on how to acquire an API key.

Download the breast tumor ultrasound images from Kaggle for use in training the model.

The neural network uses breast cancer ultrasound data from:

* The Breast Ultrasound Images (BUSI) Dataset (Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in Brief. 2020 Feb;28:104863. DOI: 10.1016/j.dib.2019.104863.)

The BUSI dataset had an additional "normal" class of ultrasounds that had no tumors. This class of images is deleted in this cell as the purpose of this model is to identify whether a detected tumor is malignant or benign, so a class of images with no tumor provides no value to this model.

In [None]:
import opendatasets
opendatasets.download("https://www.kaggle.com/datasets/aryashah2k/breast-ultrasound-images-dataset")

!mkdir data
# Remove the normal class of ultrasounds
!rm -rf breast-ultrasound-images-dataset/Dataset_BUSI_with_GT/normal
!mv breast-ultrasound-images-dataset/Dataset_BUSI_with_GT/* data
!rm -rf breast-ultrasound-images-dataset
!find data -type f -name "*_mask*.png" -delete

### Delete .DS_Store Files
macOS (the platform this model was developed on) adds `.DS_Store` files to folders. Delete these as they're not needed in the training data.

In [None]:
!find . -name ".DS_Store" -print -delete

### Verify Mean and Standard Deviation of Dataset
Calculate the mean and standard deviation of the dataset in order to effectively normalize the data to boost model performance. The standard deviation calculation utilizes a variation of the variance formula, where:
$$
\sigma^2 = \frac 1 n\sum_{i=0}^{n} x_i^2 - \mu^2
$$
$\sigma^2 =$ variance, or standard deviation squared

$n =$ number of data points

$x_i =$ value in the data at index $i$

$\mu =$ mean of the data

In [None]:
to_tensor = v2.Compose([
    v2.Grayscale(num_output_channels=1),
    v2.Resize((224, 224)),
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
])

mean_std_dataset = ImageFolder(root="data", transform=to_tensor)
mean_std_loader = DataLoader(dataset=mean_std_dataset, batch_size=32, shuffle=False)

sum = torch.tensor([0.0])
sum_2 = torch.tensor([0.0])
n = len(mean_std_loader)*32*(224**2)

for images, _ in mean_std_loader:
    sum += images.sum(axis=[0, 2, 3])
    sum_2 += (images**2).sum(axis=[0, 2, 3])

mean = sum/n
std = torch.sqrt((sum_2/n - mean**2))

# Should output a mean of 0.3178 and standard deviation of 0.2253
print(f"Mean: {mean}, Standard Deviation {std}")

### Initialize and Transform Datasets
Load and transform images from the dataset into a more suitable format for the model.

The data is turned into a labeled dataset with the following labels:
* Images in `data/benign` will have a label `0`
* Images in `data/malignant` will have a label `1`

Images in the dataset are also transformed, depending on whether they're used for training or evaluation: 
* Training images are converted to grayscale (ultrasounds are in black and white anyway, so training on 3 color channels is a waste of compute), have various random transformations performed on them when loaded to improve model generalizability, resized to `224x224` pixels, transformed to tensors, and normalized to have a mean of 0.3178 and standard deviation of 0.2253.
* Evaluation images are converted to grayscale, resized to `224x224` pixels, transformed to tensors, and normalized to have a mean of 0.3178 and standard deviation of 0.2253.

In [None]:
train_transform = v2.Compose([
    v2.Grayscale(num_output_channels=1),
    v2.RandomHorizontalFlip(0.5),
    v2.RandomRotation(20),
    v2.RandomAutocontrast(0.3),
    v2.RandomAdjustSharpness(2, 0.3),
    v2.RandomEqualize(0.2),

    v2.Resize((224, 224)),

    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.3178], std=[0.2253])
])

test_transform = v2.Compose([
    v2.Grayscale(num_output_channels=1),
    v2.Resize((224, 224)),
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    v2.Normalize(mean=[0.3178], std=[0.2253])
])

dataset = ImageFolder(root="data", transform=None)

## Section 3: Initialize Training and Evaluation Device and Functions

### Select Device for Training
Select the best available device for training, testing, and performing inferences with the AI model. If a CUDA GPU is available, all calculations will be performed on the GPU. If an M-series Mac is used, PyTorch's MPS backend is used. Otherwise, all calculations will be done on the CPU.

In [None]:
device = "cpu"
if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"

print(f"Device: {device}")

### Define Training Function
Define the function used to train the model. The function logs the following:
* The current fold if performing k-fold cross validation
* The current epoch the model is being trained on
* The current batch in the current epoch the model is being trained on
* The cumulative loss across the past 10 batches
* The time it took to train the past 10 batches

The function also contains the necessary reshaping and casting to make the model's outputs compatible with the loss function used to train this model (initialized in a later cell).

In [None]:
def train_model(model, data_loader, optimizer, loss_fn, scheduler, current_fold=None, num_epochs=20, device=device):
    model.train()
    for epoch in range(num_epochs):
        i = 0
        start_time = time.time()
        current_loss = 0.0

        for inputs, labels in data_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = loss_fn(outputs.view(-1), labels.float())

            loss.backward()
            optimizer.step()
            scheduler.step()

            current_loss += loss.item()
            if i % 10 == 9 or i == len(data_loader) - 1:
                end_time = time.time()
                if current_fold is not None:
                    print(f"[Fold: {current_fold + 1}, Epoch: {epoch + 1}/{num_epochs}, Batch: {i + 1}/{len(data_loader)}] Loss: {current_loss:0.5f}, Time Elapsed: {end_time - start_time:0.5f}s")
                else:
                    print(f"[Epoch: {epoch + 1}/{num_epochs}, Batch: {i + 1}/{len(data_loader)}] Loss: {current_loss:0.5f}, Time Elapsed: {end_time - start_time:0.5f}s")
                current_loss = 0.0
                start_time = end_time
            i += 1


    print("Training Complete!")

### Define Evaluation Function
Define the function utilized for evaluating the model against a certain threshold. Returns a 4-element tuple containing the classification model's accuracy, precision, recall, and F1 score (in that order).

In [None]:
def evaluate_model(model, data_loader, threshold, device=device):
    total = 0
    correct = 0
    total_positive = 0
    predicted_positive = 0
    predicted_positive_correct = 0
    with torch.no_grad():
        model.eval()
        for images, labels in data_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            predicted = torch.sigmoid(outputs.data)
            predicted = (predicted > threshold).long()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            total_positive += (labels == 1).sum().item()
            predicted_positive += (predicted == 1).sum().item()
            predicted_positive_correct += (predicted == labels and predicted == 1).sum().item()

    accuracy = correct/total
    precision = predicted_positive_correct/predicted_positive
    recall = predicted_positive_correct/total_positive
    f1_score = 2 * precision * recall/(precision + recall)
    return (accuracy, precision, recall, f1_score)

## Section 4: Define Model Architecture and Prepare for Training

### Initialize EfficientNet-B1 Model
Initialize the classification model.

The classification model used is a fine-tuning of the EfficientNet-B1 architecture. EfficientNet-B1 was selected due to its balance between computational efficiency, performance, training time, and generalizability. The model is initialized using its pretrained weights used in classifying the ImageNet dataset (the data the original EfficientNet models were trained on) to leverage its existing knowledge as a starting point. Since ImageNet has RGB images as its data and 1000 output classes, the model's input and classification layers are modified to accept grayscale images and have a single output logit for binary classification.

* The input layer is modified to accept grayscale images (1 input feature in the initial `Conv2d` layer instead of the 3 usually used when processing RGB images).
* The final classification head is replaced with a more robust multilayer perceptron classifier that outputs a single logit: whether the input image contains a benign or malignant tumor. `SiLU` activation layers are utilized in this classification head to maintain consistency with the rest of the EfficientNet-B1 model (which uses `SiLU` consistently as an activation function), and `Dropout` layers are placed to ensure that the model does not overfit.

In [None]:
cancer_net = models.efficientnet_b1(weights=models.EfficientNet_B1_Weights.DEFAULT)
cancer_net.features[0] = nn.Sequential(
    nn.Conv2d(1, 32, kernel_size=3, stride=2, padding=1, bias=False),
    nn.BatchNorm2d(32),
    nn.SiLU(inplace=True)
)
cancer_net.classifier = nn.Sequential(
    nn.Dropout(0.5),
    nn.Linear(cancer_net.classifier[1].in_features, 512),
    nn.SiLU(inplace=True),
    nn.Dropout(0.5),
    nn.Linear(512, 128),
    nn.SiLU(inplace=True),
    nn.Dropout(0.5),
    nn.Linear(128, 1)    
)

### Initialize Loss, Model State Dictionary, and Number of Training Epochs
Set up variables for use in final model training and k-fold cross-validation:
* `num_epochs`: The number of epochs the model will be trained on during cross-validation and final training. 10 epochs were selected as it created a good balance between preventing under/overfitting the model.
* `cancer_net`: Reinitializes the model to run on the most efficient device.
* `loss_fn`: Uses `BCEWithLogitsLoss` to calculate the binary crossentropy loss of the model, as the model is a binary classification model whose output is a single logit.
* `state_dict`: Saves a deep copy the default weights for the model to reinitialize to during k-fold cross-validation.

In [None]:
num_epochs = 10
cancer_net = cancer_net.to(device)
loss_fn = nn.BCEWithLogitsLoss(pos_weight=torch.Tensor([1.2 * len(os.listdir("./data/benign"))/len(os.listdir("./data/malignant"))]).to(device))
state_dict = copy.deepcopy(cancer_net.state_dict())

## Section 5: Cross-Validate the Model

### Train the EfficientNet-B1 Model Using K-Fold Cross Validation
Cross-validate the custom EfficientNet-B1 model across 5 folds, keeping track of its average accuracy, precision, recall, and F1 score across all 5 folds. 

During each fold, the following is done:
* The model's optimizer and scheduler are reinitialized.
* The training and evaluation data is split, transformed, and loaded according to the current fold.
* The model is trained on the training data in batches of 32 images.
* The model is evaluated on the evaluation data.
* The model's weights are reinitialized to the default for the next fold.

The model uses the `AdamW` optimizer with a weight decay of 0.0001 and a `CyclicLR` scheduler with exponential scaling and quick cycles. The `KFold` dataset splitter is initialized with a random state of 42 to ensure reproducability. When implementing this model the random state can be removed to ensure random dataset splitting. The number 42 was selected because it is the answer to life, the universe, and everything, and thus seemed to satisfy the question: "what value should I seed my dataset splitter's randomizer to?"

The model's average performance across all folds upon training was as follows (this may differ slightly when re-run due to the random transformations performed on the training data and some random weight initializations in the modified model):
* Accuracy: 84.5343%
* Precision: 71.3774%
* Recall: 88.2413%
* F1 Score: 0.788155

In [None]:
folds = 5
batch_size = 32

total_a, total_p, total_r, total_f1 = 0, 0, 0, 0

k_fold = KFold(n_splits=folds, shuffle=True, random_state=42)
for fold, (train_i, test_i) in enumerate(k_fold.split(dataset)):
    optimizer = optim.AdamW(cancer_net.parameters(), lr=1e-3, weight_decay=1e-4)
    scheduler = lr_scheduler.CyclicLR(optimizer, base_lr=3e-4, max_lr=1e-3, step_size_up=2, mode="exp_range")

    train_data = Subset(dataset=dataset, indices=train_i)
    train_data.dataset.transform = train_transform
    train_loader = DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)

    test_data = Subset(dataset=dataset, indices=test_i)
    test_data.dataset.transform = test_transform
    test_loader = DataLoader(dataset=test_data, batch_size=1, shuffle=False)

    train_model(cancer_net, train_loader, optimizer, loss_fn, scheduler, current_fold=fold, num_epochs=num_epochs)
    accuracy, precision, recall, f1_score = evaluate_model(cancer_net, data_loader=test_loader, threshold=0.3)
    total_a += accuracy
    total_p += precision
    total_r += recall
    total_f1 += f1_score

    print(f"Test Accuracy: {accuracy:0.7f}")
    print(f"Test Precision: {precision:0.7f}")
    print(f"Test Recall: {recall:0.7f}")
    print(f"Test F1 Score: {f1_score:0.7f}")
    
    cancer_net.load_state_dict(state_dict)

print()
print(f"Average Accuracy: {total_a/folds:0.7f}")
print(f"Average Precision: {total_p/folds:0.7f}")
print(f"Average Recall: {total_r/folds:0.7f}")
print(f"Average F1 Score: {total_f1/folds:0.7f}")

## Section 6: Final Model, Conclusions, and Bibliography

### Train the Final Model
Train the model now that the model's architecture and hyperparameters have been cross-validated. 

The model's weights are reinitialized to be trained on the entirety of the BUSI dataset, transformed according to the training transformations specified earlier.

In [None]:
cancer_net.load_state_dict(state_dict=state_dict)

optimizer = optim.AdamW(cancer_net.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = lr_scheduler.CyclicLR(optimizer, base_lr=3e-4, max_lr=1e-3, step_size_up=2, mode="exp_range")

dataset.transform = train_transform
dataset_loader = DataLoader(dataset=dataset, batch_size=32, shuffle=True)

train_model(cancer_net, dataset_loader, optimizer, loss_fn, scheduler, None, num_epochs)

### Save the Final Model's Weights
Save the model's weights for future inference since it has been fully trained.

In [None]:
torch.save(cancer_net.state_dict(), "cancer_net_weights.pth")

### Conclusion
While the fine-tuned version of EfficientNet-B1 creates a moderately accurate classification model that balances training time and inference speed with performance. Due to this model being developed on macOS, and thus using the Metal Performance Shader backend. This backend is missing a lot of the features available to CUDA devices that improve its training speed, most notably mixed-precision training and JIT compilation, so model performance was sacrificed for reduced development time. If developed on a CUDA device, a much deeper custom architecture could be used (such as Fus2Net) without having exorbinant training and inference times.

If this model were to be used in an applied setting, tumors marked as benign should also be analyzed by an experienced radiologist to ensure that false negatives don't slip through the model's 88% recall. Due to its lower accuracy, there's also a chance that the model may classify benign tumors as potentially malignant, but this is less worrysome in a medical setting as further investigation of potential false positives is always the safest move.

### Bibliography
* Al-Dhabyani W, et al. Dataset of breast ultrasound images. Data in Brief. 2020 Feb;28:104863. *DOI: 10.1016/j.dib.2019.104863.*
* Ma, H., et al. Fus2Net: A Novel Convolutional Neural Network for Classification of Benign and Malignant Breast Tumor in Ultrasound Images *ResearchGate preprint DOI:10.21203/rs.3.rs-853246/v1*
* Tan M, Le Quoc V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.  *arXiv:1905.11946*