# Introduction to ISIC dataset 
ISIC dataset is aimed to improve melanoma diagnosis. It provides the largest publicly available collection of quality controlled dermoscopic images of skin lesions. 

Beginning in 2016, ISIC has organized annual challenges for the computer science community, aiming to advance melanoma diagnosis through machine learning and computer vision. These challenges have evolved over the years, focusing on diagnostic accuracy, out-of-distribution problems, and the impact of clinical context. By 2018, algorithms surpassed clinicians in melanoma diagnosis, attracting global participation. The ISIC Archive, the largest publicly available collection of dermoscopic images, contains over 13,000 quality-controlled images, vetted by melanoma experts. This centralized dataset, combined with growing research on automated dermoscopic analysis, provides an invaluable resource for developing robust, AI-powered skin cancer detection algorithms.

# Problem Addressed  
Skin cancer is a major public health problem, with Melanoma, the deadliest form of skin cancer, had over 330,000 new cases globally in 2023. Due to resource constraints it is not possible to conduct. Automated diagnosis of melanoma is crucial because melanoma is the most lethal form of skin cancer, responsible for the majority of skin cancer-related deaths. Early detection is vital, as survival rates exceed 95% if caught in the early stages. However, diagnosing melanoma can be challenging due to its visual similarity to benign lesions, leading to missed or late diagnoses.

# Dataset Description
The dataset for ISIC 2019 contains 25,331 images available for the classification of dermoscopic images among nine different diagnostic categories:

- Melanoma
- Melanocytic nevus
- Basal cell carcinoma
- Actinic keratosis
- Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis)
- Dermatofibroma
- Vascular lesion
- Squamous cell carcinoma
- None of the above

In [1]:
# Downloading dataset and storing it into the working folder ./data/skin-cancer/
import os


data_folder = './data/skin-cancer/'

os.makedirs(data_folder, exist_ok=True)


!kaggle datasets download -d nodoubttome/skin-cancer9-classesisic

!unzip skin-cancer9-classesisic.zip -d {data_folder}
print("done")

Dataset URL: https://www.kaggle.com/datasets/nodoubttome/skin-cancer9-classesisic
License(s): other
Downloading skin-cancer9-classesisic.zip to /kaggle/working
 34%|█████████████▉                           | 267M/786M [00:01<00:03, 146MB/s]^C
 37%|███████████████                          | 288M/786M [00:02<00:03, 146MB/s]
User cancelled operation
Archive:  skin-cancer9-classesisic.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of skin-cancer9-classesisic.zip or
        skin-cancer9-classesisic.zip.zip, and cannot find skin-cancer9-classesisic.zip.ZIP, period.
done


In [2]:
# Making necessary imports 
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader
from sklearn.model_selection import train_test_split
from tqdm import tqdm



In [3]:
# Composing transforms that will be applied sequentially for modularity, ease of use.
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
}


In [4]:
# Initiating a train loader and validation loader that wraps around a dataset to convert the dataset into an iterable item for the purpose of loading, batching, shuffling etc


full_dataset = datasets.ImageFolder(os.path.join(data_folder, 'Skin cancer ISIC The International Skin Imaging Collaboration/Train'), data_transforms['train'])

train_size = int(0.8 * len(full_dataset))
val_size = len(full_dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(full_dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)

dataloaders = {'train': train_loader, 'val': val_loader}
dataset_sizes = {'train': len(train_dataset), 'val': len(val_dataset)}


In [5]:
# Loading ResNet18 with pretrained weights 
model = models.resnet18(pretrained=True)

# The number of outputs in the fully connected layer needs to be changed to 9
num_in_ftrs = model.fc.in_features
model.fc = nn.Linear(num_in_ftrs, 9)  

# Movig to our model to GPU for faster processsing
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 53.3MB/s]


In [6]:
# Setting the loss function and the optimizer 
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)


In [5]:
# Wrapping with tqdm for ease of tracking progress
from tqdm import tqdm

def train_model(model, criterion, optimizer, dataloaders, dataset_sizes, num_epochs=25):
    # Defining what is to be done in a single epoch
    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)
        
        # Setting model to train or eval mode based on current phase 
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()
            else:
                model.eval()  

            running_loss = 0.0 # loss for this epoch
            running_corrects = 0 # number of correct predictions for this epoch

            
            for inputs, labels in tqdm(dataloaders[phase], desc=f'{phase} {epoch}/{num_epochs - 1}', leave=False):
                inputs = inputs.to(device)
                labels = labels.to(device)

                optimizer.zero_grad() 
                ''' This clears out prevously accumulated gradients from previous batch. 
                  We do not intend to accumulate gradients over multiple batches and add them, but 
                 loss.backward() works by accumulating these gradients over multiple batches. '''   

                # Gradients are to be computed when phase is train 
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)
                    
                    
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                
                running_loss += loss.item() * inputs.size(0) # Multiplying average loss by batch size.
                running_corrects += torch.sum(preds == labels.data) # Summing over all correct predictions within this batch. 

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

        print()

    return model


In [6]:
'''
For test, model has to be set to eval mode. 
The test dataloader is built upon test dataset and shufffling is set to false. 
The running loss and running corrects are computed as in the training process
''' 

def test_model(model, criterion, data_folder):
    
    
    test_dir = os.path.join(data_folder, 'Skin cancer ISIC The International Skin Imaging Collaboration/Test/')
    
    data_transforms = transforms.Compose([
        transforms.Resize((224, 224)),  
        transforms.ToTensor(),          
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])  
    ])
    
    test_dataset = datasets.ImageFolder(test_dir, transform=data_transforms)
    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False, num_workers=4)
    
    model.eval()
    
    running_loss = 0.0
    running_corrects = 0
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, labels)
            
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)
    
    test_loss = running_loss / len(test_dataset)
    test_acc = running_corrects.double() / len(test_dataset)
    
    print(f'Test Loss: {test_loss:.4f} Acc: {test_acc:.4f}')




In [7]:
# Training the model
trained_model = train_model(model, criterion, optimizer, dataloaders, dataset_sizes, num_epochs=25)


NameError: name 'model' is not defined

In [8]:
# Test the model for the output
test_model(trained_model, criterion, data_folder='./data/skin-cancer/')


NameError: name 'trained_model' is not defined