![logo](https://github.com/donatellacea/DL_tutorials/blob/main/notebooks/figures/1128-191-max.png?raw=true)

# Image classification with Deep Learning 
---

With the term **Machine Learning** (ML) we define a set of algorithms and methods that provide a machine with the ability to learn automatically and improve from experience without being explicitly programmed.
When we have labeled data, we can use the label to guide the learning process, and this is called **Supervised learning**. If data are not labeled, it means that we don't have a guide or supervision, and this is called **Unsupervised learning**.
Within Supervised learning, we can have two different kinds of problems:
 - **Regression problem**: the task of predicting a continuous quantity,  
 - **Classification problem**: the task of predicting a label or a class (discrete values).

This tutorial will show you how to perform classification with Deep Neural Network (NN) on images. We will work with two public datasets, and we will see a binary classification and a multi-class classification problem. 

In order to start working on the notebook, click on the following button, this will open this page in the Colab environment and you will be able to execute the code on your own.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/donatellacea/DL_tutorials/blob/main/notebooks/DL_Classification_tutorial.ipynb)


### Setup the environment

**If you already did this step for the Tensorflow Playground tutorial, you can skip the setup section and start with the Import and Install section. Otherwise, complete the next step before starting the tutorial.**

Now that you are visualizing the notebook in Colab, run the next cell, in order to create a folder in your Google Drive. All the files for this tutorial will be uploaded to this folder. After the first execution you might receive some warning and notification, please follow these instructions:
1. Warning: This notebook was not authored by Google. Click on Run anyway.
2. Permit this notebook to access your Google Drive files? Click on Yes, and select your account.
3. Google Drive for desktop wants to access your Google Account. Click on 'Allow'.

At this point, a folder has been created and you can navigate it through the lefthand panel in Colab, you might also have received an email that informs you about the access on your Google Drive. 

In [None]:
# Create a folder in your Google Drive
from google.colab import drive
drive.mount('/content/drive')

Execute the next cells to clone the repository from GitHub, so the important files and notebooks for this tutorial will be downloaded to your working folder on the Drive that you created in the previous step.

In [None]:
%cd drive/MyDrive

In [None]:
!git clone https://github.com/donatellacea/DL_tutorials

In [None]:
%cd DL_tutorials/notebooks

### Import and install

In [None]:
!pip install alive_progress

In [None]:
# Run this cell to import the main packages we will use
import pandas as pd
import numpy as np
import os
import shutil
import glob
import sklearn
import random
random.seed(1)
import matplotlib.pyplot as plt 
import PIL
import plotly.graph_objects as go
import scipy.ndimage
from skimage import io 
from alive_progress import alive_bar
from check_file import *
from utils import *
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

import torch
torch.manual_seed(0)
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms

## Binary Classification
---

In this problem, we will use the Lung CT scans dataset in order to predict whether the patient has Codiv-19 or not. Since the output can be positive or negative, this is a classic example of **binary classification**. 

### Dataset 
The dataset, available on Kaggle (https://www.kaggle.com/datasets/luisblanche/covidct), will be downloaded in your google drive folder that we will create in the first step of the tutorial.

It counts in a total of 746 images divided as follows:
- 397 No Covid
- 349 Covid

The images, i.e. CT scans, are obtained through Computed Tomography, a medical imaging technique used in radiology (x-ray) to obtain detailed internal images of the body noninvasively for diagnostic purposes. Only with proper training is it possible to interpret the scans, so without a radiology/medical background, it is tough to understand the presence of Covid-19 from the scan. But we will see that a well-trained NN can help the technicians and doctors diagnose this kind of disease.

Run the next cell to download the data, you should see a folder that contains two subfolders one for each class, Covid and No-Covid.

In [None]:
# Define path
main_path = '/content/drive/MyDrive/DL_tutorials/notebooks/'

The next cell will download the dataset to your google drive. From there you can eventually look at the images or download them on your own machine.

In [None]:
!curl -L https://www.dropbox.com/s/ynxtbh7t0mts30k/Dataset_CT_lungs.zip?dl=1 > /content/drive/MyDrive/DL_tutorials/notebooks/Dataset_CT_lungs.zip

In [None]:
shutil.unpack_archive(main_path + 'Dataset_CT_lungs.zip', main_path)
shutil.rmtree(main_path + '__MACOSX')

Now, let's have a look at the kind of images we have.

In [None]:
# Create the path to each folder
data_path = main_path + '/Dataset_CT_lungs/'
pos_files = glob.glob(os.path.join(data_path, "CT_COVID",'*.*'))
neg_files = glob.glob(os.path.join(data_path, 'CT_NonCOVID','*.*'))
images = pos_files + neg_files
num_total = len(images)

In [None]:
# Plot 9 random CT scans from the dataset to see how they look like
plt.subplots(3, 3, figsize=(8, 8)) 
num_fig = 9
ax_name = ['No Covid'] * num_fig
for i, number in enumerate(random.sample(range(num_total), num_fig)):
    im = PIL.Image.open(images[number])
    arr = np.array(im)
    plt.subplot(3, 3, i + 1)
    if 'CT_COVID' in images[number]:        
        ax_name[i] = 'Covid'
    plt.xlabel(ax_name[i], fontsize=15)
    plt.imshow(arr, cmap="gray", vmin=0, vmax=255)
plt.tight_layout()
plt.show()

Let's see how the [Teachable Machine](https://teachablemachine.withgoogle.com/) is able to recognize Covid from the CT scans. If you didn't do the previous tutorial on the Teachable Machine, and you have any doubt about it, you can open the respective notebook where you can find some more information.

You can upload the dataset directly from the drive, but it might be quicker to download the dataset on your own computer and do the upload from there, so it will take only a couple of minutes.
Crate the label for the two classes and upload the data (Do not use all the images, but keep a couple of images from each class, so that you can use them later in the preview.)

Now, start the training of the teachable machine with the panel Under the Hood opened to look at the learning curves and the accuracies.

### Model evaluation - learning curves

Your goal is to check whether the model is able to recognize the disease or not and with which performances. To test the model you can either upload a single scan from the folders and see if the predicted output is correct.
But to have a wider comprehension of what is happening, open the under the hood panel and check the learning curves during the training.

**Accuracy per epoch**

Accuracy is one of the evaluation metrics we can use to evaluate how good a model is. It can be defined as the number of samples correctly classified over the total number of samples.

<center>
$\text{Accuracy} = \dfrac{\text{# of sample correctly classified}}{\text{total # of sample}}$
</center>


Looking at the accuracy plot over the epochs we can say that the model is not perming badly.



**Loss per epoch**

Another interesting way to see if the model is correctly learning is to look at the loss function at different epochs. A NN works trying to minimize the difference between the prediction and the label, this is usually described through a *loss function*. The network is learning if the loss function is decreasing over time.
What is the difference between the training and the test curve?

According to you, which might be the reason? Discuss it with your team and think about possible changes that could improve the model's performance.
Run the next cell to check whether your answer is correct.

In [None]:
# Run this cell to have the answer to the first question 
check_task_tm_2_1()

**Confusion matrix**

Sometimes accuracy is not always enough to evaluate a model. Let's see that with an example. Let's assume that we have 10 patients and only one has a disease. If our model predicts that a patient is always normal, it means that its accuracy would be $\frac{9}{10} = 0.90$, which is pretty high, even if the model is not giving us relevant information since we are more interested in detecting the disease instead of only having such a high accuracy and missing relevant informations.

For this reason, it is interesting to introduce other metrics such as sensitivity and specificity.
- **sensitivity**: represents the true positive rate (in terms of probability is the probability of predicting positive given that the patient has the disease);
- **specificity**: represents the true negative rate (i.e. the probability of predicting negative, given that the patient is normal).

This information is usually summarized in a table, called a confusion matrix, used to look at the performance of the classifier in form of a table. The columns represent the Groundtruth (GT) and the columns are the output predicted by the model. Each cell corresponds to the number of the element corresponding to each GT/model prediction combination and is called:
- True Positive (TP): when both GT and the prediction are positive
- False Negative (FN): when the GT is positive but the output is negative
- False Positive (FP): when the GT is negative (i.e. normal patient) but the output is positive
- True negative: when both GT and the prediction are negative

In medical applications, especially FN and FP should be reduced as much as possible in order to avoid missing the detection of a disease or alarming people who are actually sane.

<div>
<img src="https://github.com/donatellacea/DL_tutorials/blob/main/notebooks/figures/confusionmatrix.png?raw=true" width="300" height="200"/>
</div>

To see the confusion matrix of our model, click the confusion matrix button in the 'Under the hood' panel. You will notice that, even if the model is performig well, there are still some FP and TN.

As you can see the DL algorithm, even if it is not perfect, is able to recognize Covid and no-Covid cases, a difficult task for a technician to perform and impossible for ordinary people with no medical background to undertake.

Now that we explored a binary classification example through the Teachable Machine, let's have a closer look at a NN over a different dataset for multi-class classification. In the next part of the tutorial, you will directly see a coding example of a deep neural network, in order to see what hides behind an interface like the one we used in the first section of this tutorial.

#### Extra exercise

Come back here later or after the lesson to solve this small exercise and review what we learned.

According to what we explained, could you compute the sensitivity and specificity of this model? 
Insert your solution in the following cell and run to check whether your answer is correct.

In [None]:
# Subsitute None with the values that you read in the confusion matrix created on the Teachable Machine
confusion_matrix = []
confusion_matrix.append(None) # 'Class covid & Pred Covid'
confusion_matrix.append(None) # 'Class covid & Pred No Covid'
confusion_matrix.append(None) # 'Class No Covid & Pred Covid'
confusion_matrix.append(None) # 'Class No Covid & Pred No Covid'

#Subsitute None in the specificity and sensitivity field with your solution (approx at the order 10^-2)
sensitivity = None
specificity = None

check_task_tm_2_2(confusion_matrix, sensitivity, specificity)

## Multi-class Classification
---

In this problem, we will use the MedNIST dataset in order to predict whether the image belongs to one of the six possible classes. Since the output can be positive or negative, this is a classic example of **multi-class classification**. 

### Dataset 

The [MedNIST](https://github.com/Project-MONAI/MONAI/blob/master/examples/notebooks/mednist_tutorial.ipynb), was gathered from several sets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset and is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic).

The original dataset counts more than 50k images, which we reduced for the purposes of this tutorial. Run the following commands to download the dataset in your drive and unzip it. Note that this step might take a few minutes.

In [None]:
!curl -L https://www.dropbox.com/s/wrbfk4o63f3cn5k/MedNIST_0.5.zip?dl=1 > /content/drive/MyDrive/DL_tutorials/notebooks/MedNIST_0.5.zip

In [None]:
shutil.unpack_archive(main_path + 'MedNIST_0.5.zip', main_path)
shutil.rmtree(main_path + '__MACOSX')

Now let's have a look at the data.
First, we need to save all the image names in a dataframe (df), i.e. a table, to have direct and quick access to them.

In [None]:
df, mp = get_MedNIST_dataframe()

Now, let's see how many possible classes we have and their names and labels.

In [None]:
print(mp)

And finally, we can plot some random samples from the dataset to have a closer look a the images. Before we start building our classification model, run the next cell and take some time to analyze the images, and try to anticipate how the network will behave.

Which classes do you expect will be harder to classify and why?

In [None]:
plt.subplots(4, 4, figsize=(8, 8))
random.seed(7) 
for i, k in enumerate(random.sample(range(len(df)), 16)):
    im = PIL.Image.open(main_path + "MedNIST_0.5/" + df[0].iloc[k])
    arr = np.array(im)
    plt.subplot(4, 4, i + 1)
    plt.xlabel(df[2].iloc[k])
    plt.imshow(arr, cmap="gray", vmin=0, vmax=255)
plt.tight_layout()
plt.show()

### Define the structure of the Convolutional Neural Network (CNN)

The network we build is formed by several hidden layers, as shown in the image ([Image credit](https://www.mathworks.com/videos/introduction-to-deep-learning-what-are-convolutional-neural-networks--1489512765771.html)).
The aim of each layer is briefly explained below.

<div>
<img src="https://github.com/donatellacea/DL_tutorials/blob/main/notebooks/figures/cnn.png?raw=true" width="500" height="300"/>
</div>


 


**Convolutional layer**: convolution is a mathematical word for what is essentially a moving window or filter across the image being studied. As the filter slides over the images and the dot products between the pixel values and the filter are computed, creating the so-called convolved feature map (see image below - [credit](https://media4.giphy.com/media/i4NjAwytgIRDW/giphy.gif?cid=ecf05e477z2u4ge19e34frejcm5q6o228fiyohcg0viafep7&rid=giphy.gif&ct=g)).
<div>
<img src="https://media4.giphy.com/media/i4NjAwytgIRDW/giphy.gif?cid=ecf05e477z2u4ge19e34frejcm5q6o228fiyohcg0viafep7&rid=giphy.gif&ct=g" width="300" height="200"/>
</div>

**Max pooling layer**: It is another sliding window type technique, but instead of applying weights as in the convolution, it applies the max function over the contents of the window. A pooling layer is a way to subsample an input feature map or output from the convolutional layer that has already extracted salient features from an image in our case, this is also called downsampling.

**Dropout layer**: dropout removes a percentage of the neuron connections - helping to prevent overfitting by reducing the feature space for convolutional and, especially, dense layers.

**Linear layer**: The linear layer is used in the final stages of the neural network. It is also called a fully connected layer. This layer helps in changing the dimensionality of the output from the preceding layer so that the model can easily define the relationship between the values of the data in which the model is working.

In [None]:
class CNN(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=8, kernel_size=(3,3)) #out_channels=32
        self.pool1 = nn.MaxPool2d(kernel_size=(2,2), stride=(2,2))
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=(3,3)) # in_channels=32
        self.pool2 = nn.MaxPool2d(kernel_size=(2,2), stride=(2,2))
        self.flatten = nn.Flatten()
        self.dropout = nn.Dropout(0.2)
        self.lin1 = nn.Linear(3136, 64)
        self.lin2 = nn.Linear(64, num_classes)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)
        x = self.flatten(x)
        x = self.dropout(x) # test with no dropout
        x = F.relu(self.lin1(x))
        x = self.lin2(x)
        
        return x

### Set up the parameters

We define the hyperparameters which, unlike the parameters that describe the model itself, characterize the learning process. In particular, we define:

- in_channels: number of input channels
- num_classes = number of possible output., i.e. the class that can be predicted (in the case of the Covid dataset the number of classes is 2; in this case, the classes present in the dataset are 6)
- lr = learning rate is the step size during the training process that determines the speed and how well the model trains.
- batch_size = number of samples processed before the model is updated, it's often set as a power of 2.
- num_epochs = number of iterations over the dataset.

In [None]:
# Set device in case it is possible to access a GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Number of input and output
in_channels = 1
num_classes = 6

# Hyperparameters
lr = 0.0001
batch_size = 64
num_epochs = 30

### Train-test split

No matter if we are dealing with classification or regression problems, a crucial aspect of determining if the results are meaningful or not is the evaluation of the performance of our model. 

The train-test split is a technique for evaluating the performance of a machine learning algorithm that can use any supervised learning method.

The goal is to divide the dataset into two sub-sets:

- **Train set**: the sample of data used to fit the model.
- **Test set**: the sample of data, unseen during the training, used to evaluate the fit machine learning model.

It is essential to point out that the evaluation must be made on data that are not visible to the network during the training. In other words, the objective is to estimate the machine learning model's performance on new data not used to train the model, i.e. the test data.

In [None]:
train_loader, test_loader = create_train_test_dataset(df, train_ratio=0.5, batch_size=batch_size)

### Initializing 

In [None]:
model = CNN(in_channels, num_classes).to(device)

### Training

The training will take a few minutes. Notice that the 'Current loss' decreases during the training phase, meaning that the network is learning.

In [None]:
# train network
def train(model, train_data, test_data, num_epochs):
    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    loss_list = []
    loss_test_list = []
    for epoch in range(num_epochs):
        with alive_bar(len(train_data), title= (f'Epoch {epoch}'), force_tty=True, bar='classic', spinner='dots_waves') as bar:
            for batch, (data, targets, _) in enumerate(train_data):
                data = data.to(device=device)
                targets = targets.to(device=device)

                #Forward
                scores = model(data)
                loss = criterion(scores, targets)

                #Backward
                optimizer.zero_grad()
                loss.backward()

                # Gradient descent
                optimizer.step()
                bar()
        print(epoch, "Current Loss:", loss)
        loss_list.append(loss.detach().item())
        loss_test_list.append(evaluate_loss(model, test_data, device))
        
    # Display learning curves
    fig = go.Figure(layout=go.Layout(xaxis=dict(title="Epochs"),
                                 yaxis=dict(title="Loss"),
                                 title = 'Learning curves from train and test set'))

    fig.add_scatter(marker=dict(size=7, color="dodgerblue"))
    fig.data[-1].x = [i for i in range(len(loss_list))]
    fig.data[-1].y = loss_list
    fig.data[-1].name = 'train loss'

    fig.add_scatter(marker=dict(size=7, color="coral"))
    fig.data[-1].x = [i for i in range(len(loss_test_list))]
    fig.data[-1].y = loss_test_list
    fig.data[-1].name = 'test loss'

    fig.show()

    return loss_list, loss_test_list

In [None]:
loss_list, loss_test_list = train(model, train_loader, test_loader, num_epochs)

Looking at learning curve, the network seems to perform pretty well, but let's look at the accuracy and what is predicting wrong and try to understand why it is making mistakes.

### Showcases

In [None]:
# Compute accuracy  
print('Train set:')
list_of_train_incorrect_preds, list_of_train_preds = evaluate_score(model, train_loader, device)
print('Test set:')
list_of_test_incorrect_preds, list_of_test_preds = evaluate_score(model, test_loader, device)

In [None]:
list_of_test_incorrect_preds[0][0]
mislabeled_image = main_path + 'MedNIST_0.5/' + list_of_test_incorrect_preds[0][0]
plt.figure()
im = PIL.Image.open(mislabeled_image)
plt.imshow(im, cmap="gray", vmin=0, vmax=255)
plt.tight_layout()
plt.show()
print('Ground Truth: ', get_key(mp, list_of_test_incorrect_preds[0][1]), '- class', list_of_test_incorrect_preds[0][1])
print('Predicted class: ', get_key(mp, list_of_test_incorrect_preds[0][2]), '- class', list_of_test_incorrect_preds[0][2])

Even if the Is it the kind of error you were expecting after the first look at the dataset?
Run the next cell to see other incorrectly classified samples.

In [None]:
plt.subplots(2, 3, figsize=(10, 10))
for i, k in enumerate(random.sample(range(len(list_of_test_incorrect_preds)), 6)):
    im = PIL.Image.open(main_path + 'MedNIST_0.5/' + list_of_test_incorrect_preds[k][0])
    arr = np.array(im)
    plt.subplot(2, 3, i + 1)
    plt.xlabel({'GT: ': get_key(mp, list_of_test_incorrect_preds[k][1]),
                'Pred: ': get_key(mp, list_of_test_incorrect_preds[k][2])})
    plt.imshow(arr, cmap="gray", vmin=0, vmax=255)
plt.show()

### Confusion matrix

In [None]:
cm = confusion_matrix([v[0] for v in list_of_test_preds],[v[1] for v in list_of_test_preds])

disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp = disp.plot(include_values=True, cmap=plt.cm.Blues)

plt.show()

Discuss with your team the possible answer and to have insights, use the following interactive table to explore it more in detail the dataset. Run the cells so you can directly interact with the dataframe: click on the filter button on the up-right side to investigate the dataset.

As a reminder the classes and the respespective labels are:
 - Hand - 0
 - BreastMRI - 1
 - ChestCT - 2
 - HeadCT - 3
 - AbdomenCT - 4
 - CXR - 5
 
 If you want a suggestion run the hint cell below.

In [None]:
# Run this cell if you want a hint
hint()

In [None]:
from google.colab import data_table
data_table.enable_dataframe_formatter()

In [None]:
df_explore = df.rename(columns={0: 'filename',
                                1: 'class label',
                                2: 'class name'})
df_explore

In [None]:
# Run this cell to check your answer
check_MedNIST()

### Training on the full dataset

Now that we discovered that the first dataset was biased, let's train the model over a well-balanced dataset and look at the results. We will need to repeat some of the previous steps on the new dataset.

In [None]:
# Dataframe creation 
df_complete, _ = get_MedNIST_dataframe(percentage_to_treat=[1., 1., 1., 1., 1., 1.])

In [None]:
# Train-test split
train_loader_complete, test_loader_complete = create_train_test_dataset(df_complete, train_ratio=0.5, batch_size=batch_size)

In [None]:
#Inizialization
model = CNN(in_channels, num_classes).to(device)

In [None]:
# Training
train(model, train_loader_complete, test_loader_complete, num_epochs)

In [None]:
print('Train set:')
list_prova = evaluate_score(model, train_loader_complete, device)
print('Test set:')
list_of_test_incorrect_preds = evaluate_score(model, test_loader_complete, device)

We can notice that the accuracy and performance are improved. But let's look at the wrongly classified samples.

As we expected at the beginning of the tutorial, after having a first look at the data, the network, even when performing well, still has trouble predicting the Chest and Abdomen classes, since the images are quite similar to each other. 

Other errors are probably due to the fact that some samples (like hands, CXRs or heads) have a large portion of the image in black or grey and teh network might be confused.

### Test on a different image

Let's see now how our model behaves when we feed it with a completely new, different image.

In [None]:
# Read the new image and rezise it
single_image = PIL.Image.open(main_path + 'image_number.jpg')
resized_image = scipy.ndimage.zoom(single_image, 2.3, order=1)
plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255)

In [None]:
# Transoform the image before give it to the model
transform = transforms.ToTensor()
resized_image.reshape((1, 64, 64))
input_image = transform(resized_image).to(device=device)
input_image = input_image.unsqueeze(0)

In [None]:
# Evaluate the model on the new image
model.eval()              # turn the model to evaluate mode
with torch.no_grad():     # does not calculate gradient
    class_index = model(input_image).argmax()   #gets the prediction for the image's class
    
print('Prediceted class:', get_key(mp, class_index.item()), '- class label: ', class_index.item())

As you can see the network is unable to recognize that the image does not belong to any of the classes. Even if the image is not related to the dataset that we used for the training, the model always makes a prediction!

Congratulations! You completed this tutorial!