# **Let's make a Binary Tumor Classifier using Deep Learning and Pytorch**

Our main focus is getting a model than can classify benign and malignant tumors just by looking at an image

For this project we will be working with the ISIC dataset, in this case a small subset with a bit less than 400 images classified in a CSV file as malignant or benign. 

(Available right here:  [https://challenge.isic-archive.com/data/](http://))

First, we'll define the objective, getting at least 70% accuracy for our baseline(a baseline is just a first prototype) that we'll later train to over 95% accuracy.

In this notebook you will learn how:

*     Prepare the metadata from a CSV (**C**omma **S**eparated **V**alues) file

*     Use different techniques so we can get enough data (more on this later)

*     Create a CustomDataset class so we can create our DataLoaders (DataLoaders are split into Training and Validation, more on this later)    

*     Briefly go over the concept of how a Neural Network learns and how Gradient Descent and Loss functions work

*     Define the training and validation functions to assess our model's effectiveness

*     Choose what kind of architecture/pretrained model we will use for this task

*     Train our baseline-model so it can reach our objective accuracy of 80%

# Preparing the data and metadata

Here we'll import the necessary libraries:

In [1]:
!pip install torch-summary
!pip install timm

Collecting torch-summary
  Downloading torch_summary-1.4.5-py3-none-any.whl.metadata (18 kB)
Downloading torch_summary-1.4.5-py3-none-any.whl (16 kB)
Installing collected packages: torch-summary
Successfully installed torch-summary-1.4.5


In [2]:
import torch
from pathlib import Path
import pandas as pd
from PIL import Image
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader

Define the paths to our images and our csv file, We'll be working with **Kaggle**, an ML competition site with free **GPU's** to train models faster.

In [3]:
image_path = Path('/kaggle/input/isic-small/ISIC-images')
csv_path = Path('/kaggle/input/isic-small/ISIC-images/metadata.csv')

We read the CSV file with *pandas* (imported as ***pd***) and print the first lines.

In [4]:
# Read the csv file
metadata_df = pd.read_csv(csv_path)
metadata_df.head()

Unnamed: 0,isic_id,attribution,copyright_license,age_approx,anatom_site_general,benign_malignant,clin_size_long_diam_mm,concomitant_biopsy,diagnosis,diagnosis_confirm_type,family_hx_mm,image_type,melanocytic,personal_hx_mm,sex
0,ISIC_0000003,Anonymous,CC-0,30.0,upper extremity,benign,,False,nevus,,,dermoscopic,True,,male
1,ISIC_0000012,Anonymous,CC-0,30.0,posterior torso,benign,,False,nevus,,,dermoscopic,True,,male
2,ISIC_0000013,Anonymous,CC-0,30.0,posterior torso,malignant,,True,melanoma,histopathology,,dermoscopic,True,,female
3,ISIC_0000014,Anonymous,CC-0,35.0,posterior torso,benign,,False,nevus,,,dermoscopic,True,,male
4,ISIC_0000015,Anonymous,CC-0,35.0,posterior torso,benign,,False,nevus,,,dermoscopic,True,,male


That's a lot of columns, for this problem we only care about the **'isic_id'** column and the **'benign_malignant'** column. Luckily, *pandas* allows us to throw away the other columns in just one line.
We will also print the value counts to see how many benign and how many malignant images we have:

In [5]:
metadata_df = metadata_df[['isic_id', 'benign_malignant']]
print(metadata_df['benign_malignant'].value_counts())
metadata_df.head()

benign_malignant
benign                     303
malignant                   73
indeterminate                1
indeterminate/malignant      1
Name: count, dtype: int64


Unnamed: 0,isic_id,benign_malignant
0,ISIC_0000003,benign
1,ISIC_0000012,benign
2,ISIC_0000013,malignant
3,ISIC_0000014,benign
4,ISIC_0000015,benign


There are 303 benign images and 73 malignant images, we will have to oversample the malignant ones to get better results (oversampling is basically repeating images so we can balance the two classes). We also have to remove the indeterminate images as we are making a binary classification model.

In [6]:
# We remove the indeterminate values
metadata_df = metadata_df[(metadata_df['benign_malignant'] == 'benign') | (metadata_df['benign_malignant'] == 'malignant')]
print(metadata_df['benign_malignant'].value_counts())
metadata_df.head()

benign_malignant
benign       303
malignant     73
Name: count, dtype: int64


Unnamed: 0,isic_id,benign_malignant
0,ISIC_0000003,benign
1,ISIC_0000012,benign
2,ISIC_0000013,malignant
3,ISIC_0000014,benign
4,ISIC_0000015,benign


After keeping only the relevant columns and images, we define our *get_image_path* function to return the path of each image. We create an **'image_path'** column and ensure that all rows in *metadata_df* have a valid image_path. We do this by creating an **'image_exists'** column and then dropping it after we're done.

In [7]:
# Define a function to return an image path
def get_image_path(isic_id):
    return image_path / f"{isic_id}.jpg"

# Create a image_path column
metadata_df['image_path'] = metadata_df['isic_id'].apply(get_image_path)

# Ensure the image exists
metadata_df['image_exists'] = metadata_df['image_path'].apply(lambda x: x.exists())

# Remove rows where images do not exist
metadata_df = metadata_df[metadata_df['image_exists']]

# Remove the image_exists column
metadata_df = metadata_df.drop(columns = ['image_exists'])

metadata_df.head()

Unnamed: 0,isic_id,benign_malignant,image_path
0,ISIC_0000003,benign,/kaggle/input/isic-small/ISIC-images/ISIC_0000...
1,ISIC_0000012,benign,/kaggle/input/isic-small/ISIC-images/ISIC_0000...
2,ISIC_0000013,malignant,/kaggle/input/isic-small/ISIC-images/ISIC_0000...
3,ISIC_0000014,benign,/kaggle/input/isic-small/ISIC-images/ISIC_0000...
4,ISIC_0000015,benign,/kaggle/input/isic-small/ISIC-images/ISIC_0000...


If we print the value_counts of the **benign_malignant** column:

In [8]:
metadata_df['benign_malignant'].value_counts()

benign_malignant
benign       303
malignant     73
Name: count, dtype: int64

# Balancing the classes

We can clearly see that there is a class imbalance. This means that there are a lot more images of the benign class than there are of the malignant class. To solve this we will use a technique called oversampling, where we basically repeat the malignant images until we have enough to make the imbalance less noticeable

To do this we will separate *metadata_df* into **benign_df** and **malignant_df**

In [9]:
benign_df = metadata_df[metadata_df['benign_malignant'] == 'benign']
malignant_df = metadata_df[metadata_df['benign_malignant'] == 'malignant']

We now define a custom function to oversample images called *oversample_items*.

This function is a simple integer division added to a remainder to reach the target count.

In [10]:
# Define a custom function to oversample the malignant images
def oversample_items(items, target_count):
    return (items * (target_count // len(items))) + items[:target_count % len(items)]

> *(items * (target_count // len(items)))* multiplies the items, let items = [1,2,3], by the integer division of target_count / len(items). Let *target_count* be 10. This would multiply the list by 10//3 (which equals 3), making our original list [1,2,3,1,2,3,1,2,3].
> 
> After this we add *items[:target_count % len(items)]* which in this case would be items[:1], which is equal to the first element of the list, in this case one. Our final list being [1,2,3,1,2,3,1,2,3,1].

In [11]:
# Desired number of augmented malignant images
oversampled_malignant = 200

# Oversample the malignant image file paths
malignant_items_list = oversample_items(malignant_df['image_path'].to_list(), oversampled_malignant)

After calling the function to oversample the malignant items we can get *malignant_items_list* and concatenate it with **pd.concat** along with *benign_df*, creating a dataframe that holds the image_paths and the labels.

In [12]:
# Create a combined DataFrame
combined_df = pd.concat([
    pd.DataFrame({'image_path': benign_df['image_path'], 'label': 'benign'}),
    pd.DataFrame({'image_path': malignant_items_list, 'label': 'malignant'})
])
combined_df.head()

Unnamed: 0,image_path,label
0,/kaggle/input/isic-small/ISIC-images/ISIC_0000...,benign
1,/kaggle/input/isic-small/ISIC-images/ISIC_0000...,benign
3,/kaggle/input/isic-small/ISIC-images/ISIC_0000...,benign
4,/kaggle/input/isic-small/ISIC-images/ISIC_0000...,benign
5,/kaggle/input/isic-small/ISIC-images/ISIC_0000...,benign


Now we move on to step two. We create the *CustomDataset* class so we can later create the *Dataset* object and eventually create the *DataLoaders*.

# Creating the CustomDataset class and the DataLoaders

In [13]:
# Let's create the CustomDataset class so we can create the Dataset for our model
class CustomDataset(Dataset):
    def __init__(self, image_paths, categories, transforms):
        self.image_paths = image_paths
        self.categories = categories
        self.transforms = transforms
        
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        img_path = self.image_paths[idx]
        image = Image.open(img_path).convert('RGB')
        label = 1 if (self.categories[idx] == 'malignant') else 0
        if self.transforms:
            image = self.transforms(image)
        label = torch.tensor(label)
        return image, label

Next, we convert all the values in the **'image_path'** column to a list, same for the **'label'** column, and we create our *CustomDataset* object with the pertinent transformations.

In [26]:
import torchvision.transforms as transforms
from sklearn.model_selection import train_test_split

train_transformss = transforms.Compose([
    transforms.Resize((224, 224)), # Resize the image
    transforms.RandomHorizontalFlip(),
    transforms.RandomVerticalFlip(),
    transforms.RandomRotation(degrees=20),
    transforms.RandomResizedCrop(size=224, scale=(0.8, 1.0)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.RandomGrayscale(p=0.1),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

valid_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

images = combined_df['image_path'].to_list()
categories = combined_df['label'].to_list()

# Split the dataset into training and validation as to only transform the training images
train_images, valid_images, train_categories, valid_categories = train_test_split(
    images, categories, test_size=0.2, random_state=42, stratify=categories
)

train_dataset = CustomDataset(images, categories, train_transformss)
valid_dataset = CustomDataset(valid_images, valid_categories, valid_transforms)

> The transformations defined are:
>
>     -Resize(height, width) which ensures all images are the same size.
>
>     -ToTensor(image) which turns the image into pixel values for all three channels(RGB).
>
>     -Normalize(mean, standard deviation) which applies normalization to the image tensor so the values are similar to what the model has seen. These are the standard values for ImageNet which have been used for models like ResNet.  


>We have split the transforms as to not alter the validation dataset and only apply data augmentation in the training dataset.
The rest are just transformations to increase the variability in our model and its ability to generalize and correctly classify images it has not seen before.

Let's display an image tensor. We can fetch it from our *CustomDataset* object by calling the __getitem__ function and giving the index parameter a value, in this case the value 1.

In [27]:
from IPython.display import display

image, label = train_dataset.__getitem__(1)
display(image)

tensor([[[ 0.0227,  0.0227,  0.0741,  ..., -0.7650, -0.8164, -0.9192],
         [ 0.0227, -0.0458,  0.1254,  ..., -0.6965, -0.7822, -0.8849],
         [ 0.0398,  0.1083,  0.2111,  ..., -0.6965, -0.7822, -0.9020],
         ...,
         [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
         [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179],
         [-2.1179, -2.1179, -2.1179,  ..., -2.1179, -2.1179, -2.1179]],

        [[ 0.1352,  0.1352,  0.1877,  ..., -0.6352, -0.6702, -0.7402],
         [ 0.1001,  0.0476,  0.2052,  ..., -0.5826, -0.6527, -0.7577],
         [ 0.0826,  0.1176,  0.2402,  ..., -0.5826, -0.6527, -0.7927],
         ...,
         [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
         [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357],
         [-2.0357, -2.0357, -2.0357,  ..., -2.0357, -2.0357, -2.0357]],

        [[ 0.5834,  0.6008,  0.6356,  ...,  0.0431, -0.0267, -0.1312],
         [ 0.5659,  0.4962,  0.6531,  ...,  0

We will now create our two **DataLoader** objects, one for training and one for validation. Our training and validation datasets are already split into 80% training and 20% validation

To create the Datasets and Dataloaders we will import from from *torch.utils.data* the **Dataset** and **Dataloader** objects.

In [28]:
# Create the Dataloaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=32, shuffle=False)

xb, yb = next(iter(train_loader))
xb.shape, yb.shape

(torch.Size([32, 3, 224, 224]), torch.Size([32]))

The dataset is split into training and validation, and the DataLoader splits the dataset into batches to feed into the GPU which performs calculations (primarily matrix multiplication) at incredible speeds in the CUDA language.

Now we have to make a decision about the model we'll be using. We can make our own model using Pytorch or make use of **[transfer-learning](http://https://builtin.com/data-science/transfer-learning#:~:text=Transfer%20learning%20is%20the%20reuse,networks%20with%20comparatively%20little%20data.)** and fine-tune a pretrained model for this task.

Just for the fun of it we will try both, but first let's define our neural network and understand how it works

# How does a simple Neural Network make a prediction?

We will make a model from scratch and see how it actually makes decisions.

To make a model from scratch with the least Pytorch functionalities possible we need to randomly initialize our model's parameters. A model's parameters are the weights and bias:
* Weights: The weights are the strength of the connection between two neurons, determining how much the value of one neuron influences the second neuron. They shape how the neural network makes decisions and are learned during training.

* Bias: The bias is an additional parameter in each neuron that allows the model to shift the activation function to fit the data better. It helps the model learn the patterns even when all the input features are zero, adding flexibility to the learning process.

In [29]:
def init_params(size, std=1.0):
    return (torch.randn(size)*std).requires_grad_()
weights = init_params((224*224*3, 2))
bias = init_params(2)
print(weights.shape, bias.shape)

torch.Size([150528, 2]) torch.Size([2])


We'll make our *weights* be a tensor of ([2, 128 * 128 * 3]) as we have two classes (benign and malignant), and our images are resized for 128 * 128 pixels in 3 *RGB* channels.

We initialize our *bias* to (2).

Now we're going to bring in the most useful operation in Deep Learning, the *dot product*, also called matrix multiplication. We're going to get every image's tensor and we're going to multiply it with the weights, and then, using [broadcasting](http://https://medium.com/@krinaljoshi/broadcasting-in-pytorch-fc438ee04cc5), add the bias. This way we're going to end up with a matrix that contains the probabilities for each class

In [32]:
def linearOne(image_batch):
    # Flatten each image in the batch to shape (batch_size, height*width*3)
    batch_size = image_batch.size(0)
    image_batch = image_batch.view(batch_size, -1)  # Flatten each image while preserving batch size
    return image_batch @ weights + bias  # Broadcasting will handle adding bias to each row
xb, yb = next(iter(train_loader))
predictions = linearOne(xb)

> The *'@'* symbol represents the dot product operation in Python.


*xb* is a batch from our training Dataloader. We pass the first image to our *linearOne* function, and we get back a matrix of two elements.

Let's see what's in *predictions[0]* :

In [33]:
for p in predictions[0]:
    print(p)

tensor(-619.2442, grad_fn=<UnbindBackward0>)
tensor(844.1592, grad_fn=<UnbindBackward0>)


We get back two values for the first item, the first represents the *benign* class, and the second one represents the *malignant* class. The bigger value is the model's prediction.

# How does a Neural Network actually learn?

A Neural Network(NN for short) does not actually learn, its parameters are improved by an algorithm called gradient descent. Let's define the function:

In [34]:
def calculate_gradient(images, labels, model):
    predictions = model(images)
    loss = loss_function(predictions, labels)
    loss.backward()

What's actually going on in the function:

**Get the predictions:**

*predictions = model(images)*

**Measure the prediction against the actual label:**

*loss = loss_function(predictions, labels)*

* **Loss Function**: The loss function measures how well or how poorly our model is doing.

**Backpropagation:**

*loss.backward()*

When we call *loss.backward()*, PyTorch performs backpropagation, which is just a fancy word for calculating the gradients of the loss function.

**Gradient**: The gradient is the derivative of the loss function with respect to the model's parameters (weights and biases). It tells us how much the loss function would change if we changed the parameters slightly.

In [35]:
def update_parameters(parameters, learning_rate):
    for parameter in parameters:
        parameter.data -= parameter.grad * learning_rate
        parameter.grad.zero_()

Now we update the parameters by subtracting from the originals their gradient multiplied by the learning rate. Think of the learning rate as the size of our steps to reach the lowest point of a valley while blindfolded, if we make our steps too big we'll skip the lowest point, if we make them too small it's going to take us a long time.

>         parameter.data -= parameter.grad * learning_rate



We subtract from the original parameter as we're moving in the opposite direction of the gradient

After, we set the parameter's grad to zero after each iteration as they accumulate and can mess up the calculations
>         parameter.grad.zero_()

# Training and Validation functions

Let's put it all together:

In [36]:
def train_model(model, train_loader, parameters):
    for images, labels in train_loader:
        calculate_gradient(images, labels, model)
        update_parameters(parameters, learning_rate)

Parameters passed to the training function

* **Model**: The model is the neural network architecture, that we have defined for the task at hand.

* **Dataloader**: The Dataloader, simply put, splits our dataset into batches.

* **Parameters**: The model's parameters (weight and bias), in this case initialized with the *init_params* function to a random value.

Ok, that's how our model's parameters get better and how it actually "learns", but we need to create two more functions to measure how good the performance is:

*batch_accuracy*, which returns the accuracy of the batch.

*validate_model*, which appends all the batches accuracy in a list and then returns the mean of all the items rounded to 4 decimals.

In [37]:
def batch_accuracy(predictions, labels):
    predictions = predictions[:, 1].sigmoid()
    correct = (predictions>0.5).float() == labels.float()
    return correct.float().mean()

def validate_model(model, valid_dataloader):
    accs = []
    for images,labels in valid_dataloader:
        accuracy = batch_accuracy(model(images), labels)
        accs.append(accuracy)
    return round(torch.stack(accs).mean().item(), 4)

Ok, now that we have defined all the functions for our model to learn, let's define the parameters that we'll pass to train_model and validate_model

In [38]:
import torch.nn as nn

model = linearOne
train_loader = train_loader
valid_loader = valid_loader
parameters = weights, bias
learning_rate = 1e-3
loss_function = nn.CrossEntropyLoss()

For our loss function we will use nn.CrossEntropyLoss as it is ideal for binary classification

Our learning rate is 0.001 as of now

In [39]:
xb, yb = next(iter(train_loader))
xb.shape, yb.shape

(torch.Size([32, 3, 224, 224]), torch.Size([32]))

> It's always a good practice to get our dataloaders batch sizes to debug our model

In [40]:
for i in range(5):
    train_model(model, train_loader, parameters)
    print(validate_model(model, valid_loader))

0.4547
0.5125
0.5203
0.5125
0.5703


Our model is not doing all that bad, with just one layer it can predict tumors at almost 60% accuracy, I tried with 128x128 images and the accuracy jumped up to 70%, with just one layer of matrix multiplication!! Let's add a couple more layers and see how it affects the accuracy:

Let's create a new architecture function:

In [41]:
def linearThree(image_batch):
    # Flatten each image in the batch to shape (batch_size, height*width*3)
    batch_size = image_batch.size(0)
    image_batch = image_batch.view(batch_size, -1) # Flattened while preserving batch_size
    layer_one = image_batch @ w1 + b1
    layer_two = layer_one @ w2 + b2
    layer_three = layer_two @ w3 + b3
    return layer_three

In [42]:
w1 = init_params((224*224*3, 256))
b1 = init_params((256))
w2 = init_params((256, 128))
b2 = init_params((128))
w3 = init_params((128, 2))
b3 = init_params((2))

> We initialize 3 layers of weights, the values of 256 and 128 neurons for the hidden layers are random, but I have only seen ^2 numbers used for the middle layers.

Now we can try training it, we'll see if it actually improves or if the accuracy gets worse.

In [43]:
model = linearThree
parameters = [w1, b1, w2, b2, w3, b3]
learning_rate = 1e-3
loss_function = nn.CrossEntropyLoss()

In [44]:
for i in range(5):
    train_model(model, train_loader, parameters)
    print(validate_model(model, valid_loader))

0.5188
0.5188
0.5188
0.5188
0.5188


Surprisingly, it has gotten worse. This could be due to a myriad of reasons, including but not limited to: missing learning rate adjustment, overfitting to the dataset as it is smaller than the model's capacity, vanishing gradients...

Now, let's make use of transfer-learning, which is just getting a pre-trained model on a large dataset (like ResNet which is trained on the ImageNet dataset), and fine-tune it for our specific task. Let's use a small architecture for efficiency reasons and since we've seen bigger != better.

We're going to have to automate our manual steps in our training and validation functions with Pytorch objects and functions, in this case, replace our manual gradient descent with an optimizer and add a few lines to fine-tune our pretrained model:

In [45]:
def train_model(model, train_loader, optimizer, loss_function):
    model.train()
    for images, labels in train_loader:
        # CUDA
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        predictions = model(images)
        loss = loss_function(predictions, labels)
        loss.backward()
        optimizer.step()

def batch_accuracy(predictions, labels):
    predictions = predictions[:, 1].sigmoid()
    correct = (predictions>0.5).float() == labels.float()
    return correct.float().mean()

def validate_model(model, valid_dataloader):
    model.eval()
    accs = []
    with torch.no_grad():
        for images,labels in valid_dataloader:
            # CUDA
            images, labels = images.to(device), labels.to(device)
            accuracy = batch_accuracy(model(images), labels)
            accs.append(accuracy)
    return round(torch.stack(accs).mean().item(), 4)

Let's define our parameters to pass to the training and validation functions. We'll use the mobilenetV2 model and change its last layer to be of two neurons instead of 1000.

In [48]:
from torch import optim
import torchvision.models as models
import timm

# Load pre-trained model
model = models.mobilenet_v2(pretrained = True)
# Get the number of input features for the classifier
num_features = model.classifier[1].in_features
# Replace the last layer with a new Linear layer with 2 output features (for binary classification)
model.classifier[1] = torch.nn.Linear(num_features, 2)

# CUDA
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# Using Pytorch functionalities
optimizer = optim.SGD(model.parameters(), lr=5e-3)
loss_function = torch.nn.CrossEntropyLoss()

> In this block of code we have loaded our model and changed its last layer to be of 2 neurons, defined an optimizer and a loss_function and wrote a line to send the model to train on a GPU if available.

In [49]:
for i in range(10):
    train_model(model, train_loader, optimizer, loss_function)
    print(validate_model(model, valid_loader))

0.575
0.6781
0.7094
0.8328
0.8719
0.8906
0.875
0.9219
0.9609
0.9609


Our model reaches 96% accuracy, it may sound to good to be true but the only way of knowing if the model just overfitted is testing it with a test set or seeing how it evolves. The notebook is getting too long so let's just keep this as a baseline and feed it more data later.

Let's export our model to use it later:

In [54]:
torch.save(model.state_dict, "tumor_baseline_model_mobileV2_parameters.pth")
torch.save(model, "tumor_baseline_model_mobileV2_full.pth")

As we've seen, fine-tuning a specific model for our task has rendered better results than using our own. This might not always be the case so you need to learn through trial and error, don't get discouraged as the road is long and with enough work you can be great at Deep Learning.

**Good Luck!**