<a href="https://colab.research.google.com/github/acybermind/Application-of-Transfer-Learning-on-Discremination-of-Leukemia-/blob/main/Transfer_Learning_Project_pre.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Application of Transfer Learning on Discremination of Leukemia**

Recently, **machine learning** has played a crucial role in unearthing the hidden parts of various disciplines such as physics, neuroscience, and engineering. Its societal impacts will reach beyond our predictions in a couple of decades. **Deep learning**, a sophisticated machine learning technique, consists of neural networks simulating the behavior of any dynamic system. The constraints in biological sciences strengthen the importance of deep learning in hypothesizing and testing possible explanations about underpinning mechanisms of that phenomena. **The opportunity cost of minuscule imaging parts of cells and making decisions based on that imaging is very high**, and thanks to deep learning, we can overcome these constraints.

Although network architecture and optimization techniques are some of the powerful primary weapons of deep learning, these weapons are ineffective for some problems in which researchers have technical limitations. At this point, **transfer learning** offers a solution with the help of gained knowledge of pre-trained models applied to similar issues. The success of transfer learning depends on how pre-trained models generalize the patterns vital for similar problems. Acquired knowledge from the models trained for object detection tasks, for example, is transferred to the disease detection tasks.

## Required Libraries and Its Specific Functions for Performing Transfer Learning 

In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
from google.colab import drive

import torch
from torch import nn
import torch.optim as optim
import torch.nn.functional as F
from torch.optim import lr_scheduler
from torch.autograd import Variable
import torchvision
from torchvision import datasets, transforms, models
from torch.utils.data.sampler import SubsetRandomSampler
from tqdm import tqdm
import time
import os
import copy
# print("PyTorch Version: ",torch.__version__)   # Be sure about having pytorch properly!
# print("Torchvision Version: ",torchvision.__version__)

## Data Augmentation for Training 

In [3]:
# convert data to a normalized torch.FloatTensor
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(1200), # for the C_NMC datase to make it 300 since images are 450x450
                                       transforms.RandomHorizontalFlip(),
                                       transforms.Resize(224),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.RandomResizedCrop(1200), # for the C_NMC datase to make it 300 since images are 450x450
                                      transforms.RandomRotation(30),
                                      transforms.Resize(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])



### Connect to Drive 

In [4]:
drive.mount._DEBUG = True
drive.mount('/content/drive', force_remount=True)
drive.mount('/content/drive')

unset HISTFILE; export PS1="root@5eb8afa60cf0-629dada5b567475eb4557b79f7c69708: "
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
[01;34m/content[00m# root@5eb8afa60cf0-629dada5b567475eb4557b79f7c69708: umount -f /content/drive || umount /content/drive; pkill -9 -x drive
umount: /content/drive: no mount point specified.
umount: /content/drive: no mount point specified.
root@5eb8afa60cf0-629dada5b567475eb4557b79f7c69708: pkill -9 -f /opt/google/drive/directoryprefetcher_binary
root@5eb8afa60cf0-629dada5b567475eb4557b79f7c69708: ( while `sleep 0.5`; do if [[ -d "/content/drive" && "$(ls -A /content/drive)" != "" ]]; then echo "google.colab.drive MOUNTED"; break; fi; done ) &
[1] 1233
root@5eb8afa60cf0-629dada5b567475eb4557b79f7c69708: ( /opt/google/drive/drive --features=fuse_max_background:1000,max_read_qps:1000,max_write_qps:1000,max_operation_batch_size:15,max_parallel_push_task_instances:10,opendir_timeout_ms:120000,vi

## If you want to use the **C_NMA** dataset, then skip to run the code below and run the subsequent one


In [None]:
#### For the leukemia dataset

data_dir= '/content/drive/MyDrive/Colab Notebooks/NeuroMatch_DL/Transfer_Learning/SN-AM'
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/val', transform=test_transforms)
#defining classes
classes=['ALL','MM'] 


# number of subprocesses to use for data loading
num_workers = 2
# how many samples per batch to load
batch_size = 32
# percentage of training set to use as validation
valid_size = 0.2

num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

# define samplers for obtaining training and validation batches
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(valid_idx)   

#torch.utils.data.WeightedRandomSampler(weights, num_samples, replacement=True, generator=None)

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    sampler=train_sampler, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, 
    sampler=valid_sampler, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, 
    num_workers=num_workers)

### If you run the SN-AM dataset, then do not run the code below which is written for the C_NMC dataset 

In [5]:
# number of subprocesses to use for data loading
num_workers = 2
# how many samples per batch to load
batch_size = 32

data_dir= '/content/drive/MyDrive/Colab Notebooks/NeuroMatch_DL/Transfer_Learning/C_NMC'
print(os.path.exists(data_dir))
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
valid_data = datasets.ImageFolder(data_dir + '/val', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/testing_data', transform=test_transforms)
#defining classes
classes=['all','hem'] 

# prepare data loaders (combine dataset and sampler)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)
valid_loader = torch.utils.data.DataLoader(valid_data, batch_size=batch_size, num_workers=num_workers)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)

True


In [6]:
# helper function to un-normalize and display an image
def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    plt.imshow(np.transpose(img, (1, 2, 0))) 

In [7]:
# obtain one batch of training images
dataiter = iter(train_loader)
print(dataiter)
images, labels = dataiter.next()
images = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
# display 20 images
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    imshow(images[idx])
    ax.set_title(classes[labels[idx]])# Load the next batch
batch_images, batch_labels = next(iter(train_loader))
print('Batch size:', batch_images.shape)

# Display the first image from the batch
plt.imshow(batch_images[0].permute(1, 2, 0))
plt.show()


<torch.utils.data.dataloader._MultiProcessingDataLoaderIter object at 0x7f0abcc10ee0>


AttributeError: ignored

In [None]:
def save_model(epochs, model, optimizer, criterion):
    """
    Function to save the trained model to disk.
    """
    torch.save({
                'epoch': epochs,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'loss': criterion,
                }, './model.pth')

In [None]:
# training
def train(model, train_loader, optimizer, criterion):
    model.train()
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    running_loss = 0.0
    correct = 0.0
    total = 0.0
    for data in train_loader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        optimizer.zero_grad()
        # forward pass
        outputs = model(images)
        # calculate the loss
        loss = criterion(outputs, labels)
        running_loss += loss.item()

        # calculate the accuracy
        _, preds = torch.max(outputs.data, 1)
        
        total += labels.size(0)
        correct += (preds == labels).sum().item()
        # backpropagation
        loss.backward()
        # update the optimizer parameters
        optimizer.step()
        
                
    # loss and accuracy for the complete epoch
    epoch_loss = running_loss / len(train_loader)
    epoch_acc = 100. * (correct / total)
    
    return epoch_loss, epoch_acc

In [None]:
# for test and validation 
def test(model, test_loader, criterion):
    model.eval()
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    
    running_loss = 0.0
    correct = 0.0
    total = 0.0
    with torch.no_grad():
        for data in test_loader:
            images, labels = data
            images = images.to(device)
            labels = labels.to(device)
            # forward pass
            outputs = model(images)
            # calculate the loss
            loss = criterion(outputs, labels)
            running_loss += loss.item()
            # calculate the accuracy
            total += labels.size(0)
            _, preds = torch.max(outputs.data, 1)
            correct += (preds == labels).sum().item()
        
    # loss and accuracy for the complete epoch
    epoch_loss = running_loss / len(test_loader)
    epoch_acc = 100. * (correct / total)
    
    return epoch_loss, epoch_acc

In [None]:
def save_plots(train_acc, valid_acc, test_acc, train_loss, valid_loss, test_loss):
    """
    Function to save the loss and accuracy plots to disk.
    """
    # create figure
    fig = plt.figure(figsize=(18, 5))

    # setting values to rows and column variables
    rows = 1
    columns = 2

    # Adds a subplot at the 1st position
    fig.add_subplot(rows, columns, 1)

    plt.plot(train_acc,'-o')
    plt.plot(valid_acc,'-o')
    plt.plot(test_acc,'-o')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.legend(['Train','Valid','Test'])
    plt.title('Train vs Valid vs Test Accuracy')

    # Adds a subplot at the 2nd position
    fig.add_subplot(rows, columns, 2)

    plt.plot(train_loss,'-o')
    plt.plot(valid_loss,'-o')
    plt.plot(test_loss,'-o')
    plt.xlabel('epoch')
    plt.ylabel('losses')
    plt.legend(['Train','Valid','Test'])
    plt.title('Train vs Valid vs Test Losses')

    plt.show()

In [None]:
def main():
  # lists to keep track of losses and accuracies
  train_loss, valid_loss, test_loss = [], [], []
  train_acc, valid_acc, test_acc = [], [], []

  # start the training
  for epoch in range(epochs):
      print(f"[INFO]: Epoch {epoch+1} of {epochs}")
      train_epoch_loss, train_epoch_acc = train(model, train_loader,optimizer, criterion)
      valid_epoch_loss, valid_epoch_acc = test(model, valid_loader,criterion)
      test_epoch_loss, test_epoch_acc = test(model, test_loader,criterion)
      train_loss.append(train_epoch_loss)
      valid_loss.append(valid_epoch_loss)
      test_loss.append(test_epoch_loss)
      train_acc.append(train_epoch_acc)
      valid_acc.append(valid_epoch_acc)
      test_acc.append(test_epoch_acc)
      print(f"Training loss: {train_epoch_loss:.3f} | Training acc: {train_epoch_acc:.3f}")
      print(f"Validation loss: {valid_epoch_loss:.3f} | Validation acc: {valid_epoch_acc:.3f}")
      print(f"Test loss: {test_epoch_loss:.3f} | Test acc: {test_epoch_acc:.3f}")
      print('-*-'*20)
  # save the trained model weights
  save_model(epochs, model, optimizer, criterion)
  print('TRAINING COMPLETE')
  # save the loss and accuracy plots
  save_plots(train_acc, valid_acc, test_acc, train_loss, valid_loss, test_loss )

# The First Pre-trained Model     ----->     VGG16


In [None]:
model = models.vgg16(pretrained=True)
# model   # if you want to see the architecture uncomment this line 

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth


  0%|          | 0.00/528M [00:00<?, ?B/s]

In [None]:
# Freeze parameters so we don't backprop through them 
for param in model.parameters():
    param.requires_grad = False
    
# Newly created modules have require_grad=True by default
num_features = model.classifier[6].in_features
features = list(model.classifier.children())[:-1] # Remove last layer
features.extend([nn.Linear(num_features, len(classes))]) # Add our layer with 4 outputs
model.classifier = nn.Sequential(*features) # Replace the model classifier

In [None]:
from torch.optim import Adam
 
# Define the loss function with Classification Cross-Entropy loss and an optimizer with Adam optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.0005, momentum=0.9) 

# criterion = nn.NLLLoss()
# optimizer = optim.Adam(model.classifier.parameters(), lr=0.0005) 
epochs = 10

main()

[INFO]: Epoch 1 of 10
Training loss: 0.486 | Training acc: 94.500
Validation loss: 28.533 | Validation acc: 35.988
Test loss: 29.507 | Test acc: 33.333
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
[INFO]: Epoch 2 of 10
Training loss: 7.579 | Training acc: 81.271
Validation loss: 44.405 | Validation acc: 35.988
Test loss: 45.950 | Test acc: 33.333
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
[INFO]: Epoch 3 of 10
Training loss: 11.255 | Training acc: 77.786
Validation loss: 45.377 | Validation acc: 35.988
Test loss: 44.893 | Test acc: 33.333
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
[INFO]: Epoch 4 of 10
Training loss: 11.190 | Training acc: 77.999
Validation loss: 44.814 | Validation acc: 35.988
Test loss: 43.892 | Test acc: 33.333
-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-
[INFO]: Epoch 5 of 10
Training loss: 10.935 | Training acc: 78.331
Validation loss: 43.249 | Validation acc: 35.988
Test loss: 43.529 | Test acc: 

# The Second Pre-trained Model     ----->     DenseNet121

In [None]:
model = models.densenet121(pretrained=True)
# model   # if you want to see the architecture uncomment this line  

In [None]:
# Freeze parameters so we don't backprop through them 
for param in model.parameters():
    param.requires_grad = False

model.classifier = nn.Sequential(nn.Linear(1024, 512),
                                 nn.ReLU(),
                                 nn.Dropout(0.5),
                                 nn.Linear(512,256),
                                 nn.Linear(256, 2),
                                 nn.LogSoftmax(dim=1))


In [None]:
from torch.optim import Adam
 
# Define the loss function with Classification Cross-Entropy loss and an optimizer with Adam optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.0005, momentum=0.9) 

# criterion = nn.NLLLoss()
# optimizer = optim.Adam(model.classifier.parameters(), lr=0.0005) 
epochs = 10

main()

# The Third Pre-trained Model     ----->     GoogleNet

In [None]:
model = models.googlenet(pretrained=True)
# model   # if you want to see the architecture uncomment this line 

In [None]:
# Freeze parameters so we don't backprop through them 
for param in model.parameters():
    param.requires_grad = False

model.fc = nn.Sequential(nn.Linear(1024, 512),
                                 nn.ReLU(),
                                 nn.Dropout(0.6), # it was 0.5
                                 nn.Linear(512,2),
                                 nn.LogSoftmax(dim=1)

In [None]:
from torch.optim import Adam
 
# Define the loss function with Classification Cross-Entropy loss and an optimizer with Adam optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.0005, momentum=0.9) 

# criterion = nn.NLLLoss()
# optimizer = optim.Adam(model.classifier.parameters(), lr=0.0005) 
epochs = 10

main()