# Blood Smears Classification

We'll compare the performance of a dense neural network and of a convolutional neural network with a dense top for the classification of blood smears in a dataser of blood smears containing blood cells with and without malaria. 

## Load Data

We'll load and uncompress the single cell dataset from 
https://lhncbc.nlm.nih.gov/LHC-research/LHC-projects/image-processing/malaria-datasheet.html

The Malaria dataset was published in S. Rajaraman, S. K. Antani, M. Poost- chi, K. Silamut, Md A. Hossain, R. J. Maude, S. Jaeger, and G. R. Thoma. Pre-trained convolutional neural net- works as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ, 6:e4568, 2018.
It is available at https://data.lhncbc.nlm.nih.gov/public/Malaria/cell_images.zip

In [None]:
import os
from torchvision.datasets.utils import download_url, _extract_zip

dataset_path = os.path.join(".", "blood_smears_dataset")
if not os.path.exists(dataset_path):
    url = "https://data.lhncbc.nlm.nih.gov/public/Malaria/cell_images.zip"
    download_url(url, ".")
    _extract_zip("cell_images.zip", dataset_path, None)
    os.remove("cell_images.zip")

We'll define the path to the directories containing the `Infected` and `Parasitized` images.

In [None]:
import glob

base_dir = os.path.join(dataset_path, "cell_images")
uninfected_files = glob.glob(os.path.join(base_dir, "Uninfected", "*.png"))
parasitized_files = glob.glob(os.path.join(base_dir, "Parasitized", "*.png"))


## Visualize Data

We'll then visualize some of the data.

In [None]:
def plot_blood_smears(title, files):
    import matplotlib.pyplot as plt

    fig, axs = plt.subplots(3, 3, figsize=(8, 8))
    for i, ax in enumerate(axs.ravel()):
        image = plt.imread(files[i])
        ax.imshow(image)
        
    fig.suptitle(title, fontsize=16)
    plt.show()

In [None]:
from blood_smears import plot_blood_smears

plot_blood_smears("Uninfected", uninfected_files)
plot_blood_smears("Parasitized", parasitized_files)

## Data Preprocessing

We'll define a pipeline to resize the images to 28 by 28 pixels and convert them to PyTorch tensors (note that `ToTensor()` also normalizes their values between 0 and 1) ...

In [None]:
from torchvision.transforms import Compose, Resize, ToTensor

pipeline = Compose([Resize((28, 28)), ToTensor()])

... we set up a dataset where images are loaded from a structured directory (`base_dir`) ...

In [None]:
from torchvision.datasets import ImageFolder
from torchvision.transforms import Lambda

dataset = ImageFolder(base_dir, transform=pipeline)

In [None]:
print(f"Default mapping: {dataset.class_to_idx}")

In [None]:
dataset.class_to_idx = {'Parasitized': 1, 'Uninfected': 0}
dataset.targets = [abs(target - 1) for target in dataset.targets]

... creates a subset the full dataset and split it into `train` and `test` sets ...

In [None]:
from torch import randperm
from torch.utils.data import random_split, Subset

images_num = 5000
images_idx = randperm(len(dataset))[:images_num]
images = Subset(dataset, images_idx)

train_size = int(0.8 * len(images))
test_size = len(images) - train_size
train, test = random_split(images, [train_size, test_size])

... and define the dataloaders for both sets. For the training, we'll set `batch_size = 32`.

In [None]:
from torch.utils.data import DataLoader

train_loader = DataLoader(train, batch_size=32, shuffle=True)
test_loader = DataLoader(test, batch_size=124, shuffle=False)

## Fully-connected Neural Network

We'll define a Fully-connected Neural Network (FCNN) using `deeplay`. The FCNN has 2 layers with 128 features.

In [None]:
import deeplay as dl

FCNN = dl.MultiLayerPerceptron(in_features = 28 * 28 * 3,
                                hidden_features = [128, 128],
                                out_features = 1,
                                out_activation = torch.nn.Sigmoid,
)
FCNN.blocks.activation.configure(torch.nn.Sigmoid)

We'll define a classifier based on the FCNN architecture, including loss function, evaluation metrics and othe hyperparameters ...

In [None]:
import torchmetrics as tm

FCNN_classifier_template = dl.BinaryClassifier(
        model=FCNN,
        optimizer=dl.RMSprop(lr=.001),
        )

FCNN_classifier = FCNN_classifier_template.create()
print(FCNN_classifier)

... and a trainer including other hyperparameters ...

In [None]:
FCNN_trainer = dl.Trainer(
    max_epochs=20, # How many times to run through the entire dataset
    accelerator="auto", # Use GPU if available
)

We'll start the training and visualize the evaluation metrics.

In [None]:
FCNN_trainer.fit(FCNN_classifier, train_loader)

We'll calculate the performance over the test set.

In [None]:
_ = FCNN_trainer.test(FCNN_classifier, test_loader)

## ROC curve and AUC
We'll use the function `plot_ROC_AUC` to get the ground truth and predictions for all the images in the test set, calculate the ROC and AUC, and visualize the results.
```python
def plot_ROC_AUC(classifier, dataset):
    from torch import tensor, stack
    from sklearn.metrics import roc_curve, auc
    from matplotlib import pyplot as plt

    im, gt = zip(*dataset)
    pred = classifier(tensor(stack(im))).tolist()
    # calculate the ROC curve
    fpr, tpr, thresholds = roc_curve(gt, pred, pos_label=1) 
    roc_auc = auc(fpr, tpr) 

    # plot the ROC curve
    plt.figure(figsize=(5, 5))
    plt.plot(fpr, tpr, label=f"ROC curve (AUC = {roc_auc:.3f})", linewidth=2)
    plt.title("ROC curve")
    plt.xlabel("False positive rate")
    plt.ylabel("True positive rate")
    plt.axis("square")
    plt.xlim(0, 1)
    plt.ylim(0, 1)
    plt.legend(loc = 'center right')
    plt.show()

    return im, gt, pred, roc_auc
```

In [None]:
from blood_smears import plot_ROC_AUC
_,_,_,_ = plot_ROC_AUC(classifier = FCNN_classifier, dataset=test)

## Convolutional Neural Network with a dense top
We'll now build a convolutional neural network (CNN) with a FCNN at the end ...


In [None]:
CNN = dl.Sequential(
    dl.ConvolutionalNeuralNetwork(in_channels = 3, hidden_channels = [32, 32, 64], out_channels = 64),
    dl.Layer(torch.nn.AdaptiveAvgPool2d,output_size = 1),
    dl.MultiLayerPerceptron(in_features = 64, hidden_features = [], out_features = 1,out_activation = torch.nn.Sigmoid)
)
CNN[0].blocks[2].pool.configure(torch.nn.MaxPool2d, kernel_size = 2)

... define a classifier using the CNN ...

In [None]:
CNN_classifier_template = dl.BinaryClassifier(
    model=CNN, 
    optimizer=dl.RMSprop(lr=.001),
)

CNN_classifier = CNN_classifier_template.create()
print(CNN_classifier)

... train it ...

In [None]:
CNN_trainer = dl.Trainer(
    max_epochs=20, # How many times to run through the entire dataset
    accelerator="auto", # Use GPU if available
)

CNN_trainer.fit(CNN_classifier, train_loader)

... evaluate the performance over the test set ...

In [None]:
_ = CNN_trainer.test(CNN_classifier, test_loader)

... and display the ROC curve with the AUC value.

In [None]:
images, gt, pred, _ = plot_ROC_AUC(classifier=CNN_classifier, dataset=test)

## Failure analysis
We'll use a function to visualize some of the wrongly classified cells, looking for common patterns.
```python
def plot_failure(images, gt, pred, threshold = 0.5, num_of_plots = 5):
    from matplotlib import pyplot as plt 
    from numpy import array, squeeze   
    
    pred = array(pred).squeeze()
    gt = array(gt).squeeze()
    images = array(images)

    pred_class = pred > threshold

    false_positives = (pred_class == 1) & (gt == 0)
    false_positives_images = images[false_positives]

    false_negatives = (pred_class == 0) & (gt == 1)
    false_negatives_images = images[false_negatives]

    plt.figure(figsize=(num_of_plots*2, 5))
    for i in range(num_of_plots):

        # false positives
        plt.subplot(2, num_of_plots, i + 1)
        plt.imshow(false_positives_images[i].transpose(1, 2, 0))
        if i == 0:
            plt.title("False positives", fontsize=16, y=1.1)

        # false negatives
        plt.subplot(2, num_of_plots, i + num_of_plots + 1)
        plt.imshow(false_negatives_images[i].transpose(1, 2, 0))
        if i == 0:
            plt.title("False negatives", fontsize=16, y=1.1)

    plt.tight_layout()
    plt.show()
```

In [None]:
from blood_smears import plot_failure
plot_failure(images=images, gt=gt, pred=pred, threshold = 0.5, num_of_plots = 4)

## Filters
We can access and visualize the filters used by the network at a specific layer.

In [None]:
weights = CNN_classifier.model[0].input_block.layer.weight
w = weights.clone().detach()

from blood_smears import plot_filters_activations
plot_filters_activations(input = w, n_rows=4, label = 'Filters', normalize = True)

## Activations and Grad-CAM
To visualize the network feautures, we'll use `hooks`, functions that allows us to access the information that the model sees during forward and backward passes, such as activations and gradients, respectively. We'll define them as context manager classes, so that we can use them with the `with` statement:
```python
class fwd_hook():
    def __init__(self, m):
        self.hook = m.register_forward_hook(self.hook_func)   
    def hook_func(self, m, i, o):
        print('Forward hook running...') 
        self.stored = o.detach().clone()
        print(f'Activations size: {self.stored.size()}')
    def __enter__(self, *args): 
        return self
    def __exit__(self, *args): 
        self.hook.remove()

class bwd_hook():
    def __init__(self, m):
        self.hook = m.register_full_backward_hook(self.hook_func)
    def hook_func(self, m, gi, go):
        print('Backward hook running...')
        self.stored = go[0].detach().clone()
        print(f'Gradients size: {self.stored.size()}')
    def __enter__(self, *args): 
        return self
    def __exit__(self, *args): 
        self.hook.remove()
```

We'll randomly pick the image of an infected smear.

In [None]:
import numpy as np
import matplotlib.image as mpimg

ind_infect = np.where((torch.cat(gt)==1).tolist())[0]
ind=np.random.choice(ind_infect,1)[0]

test_image = images[ind]
test_image_hr=mpimg.imread(dataset.imgs[subset.indices[test.indices[ind]]][0])

Activations and gradients at a specific layer can be obtained from the forward and backward pass, respectively ...

In [None]:
from blood_smears import fwd_hook, bwd_hook

test_layer = CNN_classifier.model[0].blocks[3].layer

with bwd_hook(test_layer) as bh:
    with fwd_hook(test_layer) as fh:
        out = CNN_classifier.model(test_image.unsqueeze(0)).backward()
activations = fh.stored
gradients = bh.stored

... we can plot the activations ...

In [None]:
plot_filters_activations(input = activations.permute(1,0,2,3),n_rows=8,label = 'Feature maps', normalize = False)

... or combine gradients and activations to calculate Grad-CAM and inspect on which part of an image the CNN focuses on to predict its outputs. 

In [None]:
pooled_grad = gradients[0].mean(dim=[1,2], keepdim = True)
grad_cam = torch.nn.functional.relu((pooled_grad*activations[0]).sum(0)).detach().numpy()

from blood_smears import plot_gradcam
plot_gradcam(grad_cam, test_image_hr)