![logo](https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/main/figures/1128-191-max.png?raw=true)

# Image classification with Deep Learning 
---

In the previous tutorial we performed binary classification, and we descovered that, even if it is not perfect, the Teachable Machine is able to recognize Covid and Non-Covid cases; a difficult task for a technician to perform and impossible for people with no medical background to undertake.

Now let's have a closer look at a Neural Network (NN) over a different dataset for multi-class classification. In this tutorial, you will directly see a coding example of a deep NN, in order to see what hides behind an interface like the one we used in the first section of this lecture.

In order to start working on the notebook, click on the following button, this will open this page in the Colab environment and you will be able to execute the code on your own on the browser.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/translatum_2023/notebooks/DL_Classification_tutorial.ipynb)

In this tutorial you will only have to execute the cell, look at the outputs, and discuss them with your group.

### Artificial Neural Networks

#### 💡 Key concepts
Deep Learning (DL) is a subset of Machine Learning (ML) that uses Artificial Neural Networks (ANN or simply NN) to mimic the learning process of the human brain.

The structure of NN is made up of vertical stucked components called **layers**. There are Three types of layers:
- **Input Layer:** this is the first layer of the NN, and it accepts the data and passes it to the rest of the network
- **Hidden Layer:** can be one or more, and this kind of layer is the one responsible for the excellent performance and complexity of NNs. They can perform a series of functions and feature creation that allows to solve really highly non-linear problems
- **Output Layers:** this is the last layer, and it holds the results or output of the problem

To visualize and explore the different layers, you can use the Google website TensorFlow Playground.

![ANN](https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/translatum_2023/figures/ANN.png?raw=true)

We prepared an extra tutorial that you can find in the Homeowrk section and that you can do at home.

### Setup the environment
If you already did the "Teachable Machine" tutorial or you are not running the notebook in Colab, go directly to Import and Install section. Otherwise, run and follow the steps in the upcoming cell.

In [None]:
#@markdown #### Step 1. Run this cell to connect your Google Drive to Colab and install packages
#@markdown * Click on the URL.
#@markdown * Sign in your Google Account.
#@markdown * Click on "Files" site on the right. Refresh the site. Your Google Drive folder should now be available here as "drive".
#mounts user's Google Drive to Google Colab.
#@markdown At this point, a folder has been created in your Drive and you can navigate it through the lefthand panel in Colab, you might also have received an email that informs you about the access on your Google Drive.
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive
!git clone --branch translatum_2023 https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials.git
%cd /content/drive/MyDrive/DL-lecture-tutorials

### Import and install

In [None]:
!pip install alive_progress

In [None]:
# Run this cell to import the main packages we will use
import pandas as pd
import numpy as np
import os
import shutil
import glob
import sklearn
import random
random.seed(1)
import matplotlib.pyplot as plt 
import PIL
import plotly.graph_objects as go
import scipy.ndimage
from skimage import io 
from alive_progress import alive_bar
from check_file import *
from utils import *
from sklearn.metrics import confusion_matrix
from sklearn.metrics import ConfusionMatrixDisplay

import torch
torch.manual_seed(0)
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as transforms

### Dataset 

The [MedNIST](https://github.com/Project-MONAI/MONAI/blob/master/examples/notebooks/mednist_tutorial.ipynb), was gathered from several sets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset and is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic).

The original dataset counts more than 50k images, which we reduced for the purposes of this tutorial. Run the following commands to download the dataset to your Drive and unzip it. Note that this step might take a few minutes.

In this tutorial, we will use the MedNIST dataset in order to predict whether the image belongs to one of the six possible classes. Therefore, this is a classic example of **multi-class classification**. 

In [None]:
# Define path
main_path = '/content/drive/MyDrive/DL-lecture-tutorials/'

In [None]:
!curl -L https://www.dropbox.com/s/wrbfk4o63f3cn5k/MedNIST_0.5.zip?dl=1 > /content/drive/MyDrive/DL-lecture-tutorials/MedNIST_0.5.zip

In [None]:
shutil.unpack_archive(main_path + 'MedNIST_0.5.zip', main_path)
shutil.rmtree(main_path + '__MACOSX')

Now let's have a look at the data.
First, we need to save all the image names in a dataframe (df), i.e. a table, to have direct and quick access to them.

In [None]:
df, mp = get_MedNIST_dataframe()

Now, let's see how many classes we have, their names, and labels. And also have a look at how a dataframe looks like.

In [None]:
print(mp)

In [None]:
from google.colab import data_table
data_table.enable_dataframe_formatter()

In [None]:
df_explore = df.rename(columns={0: 'filename',
                                1: 'class label',
                                2: 'class name'})
df_explore

Before we start building our classification model, run the next cell and take some time to analyze the images, and try to anticipate how the network will behave.

Which classes do you expect will be harder to classify and why?

In [None]:
plt.subplots(4, 4, figsize=(8, 8))
random.seed(6) 
for i, k in enumerate(random.sample(range(len(df)), 16)):
    im = PIL.Image.open(main_path + "MedNIST_0.5/" + df[0].iloc[k])
    arr = np.array(im)
    plt.subplot(4, 4, i + 1)
    plt.xlabel(df[2].iloc[k])
    plt.imshow(arr, cmap="gray", vmin=0, vmax=255)
plt.tight_layout()
plt.show()

### Define the structure of the Convolutional Neural Network (CNN)

#### 💡 Key concepts

The network we build is formed by several hidden layers, as shown in the image ([Image credit](https://www.mathworks.com/videos/introduction-to-deep-learning-what-are-convolutional-neural-networks--1489512765771.html)).

<div>
<img src="https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/main/figures/cnn.png?raw=true" width="500" height="300"/>
</div>

The explanation of the layers is in the extra material session at the end of this notebook. Now we only focus on the layer that  gives the name to the whole architecture.

**Convolutional layer**: convolution is a mathematical word for what is essentially a moving window or filter across the image being studied. As the filter slides over the images, the dot products between the pixel values and the filter are computed, creating the so-called convolved feature map (see image below - [credit](https://media4.giphy.com/media/i4NjAwytgIRDW/giphy.gif?cid=ecf05e477z2u4ge19e34frejcm5q6o228fiyohcg0viafep7&rid=giphy.gif&ct=g)).
<div>
<img src="https://media4.giphy.com/media/i4NjAwytgIRDW/giphy.gif?cid=ecf05e477z2u4ge19e34frejcm5q6o228fiyohcg0viafep7&rid=giphy.gif&ct=g" width="300" height="200"/>
</div>

### Define the structure of the CNN

In [None]:
class CNN(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channels, out_channels=8, kernel_size=(3,3))
        self.pool1 = nn.MaxPool2d(kernel_size=(2,2), stride=(2,2))
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=(3,3)) 
        self.pool2 = nn.MaxPool2d(kernel_size=(2,2), stride=(2,2))
        self.flatten = nn.Flatten()
        self.dropout = nn.Dropout(0.2)
        self.lin1 = nn.Linear(3136, 64)
        self.lin2 = nn.Linear(64, num_classes)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)
        x = self.flatten(x)
        x = self.dropout(x) 
        x = F.relu(self.lin1(x))
        x = self.lin2(x)
        
        return x

### Set up the parameters

We define the input/output and the hyperparameters which, unlike the parameters that describe the model itself, characterize the learning process. In particular, we define:

- in_channels: number of input channels
- num_classes = number of possible output., i.e. the class that can be predicted (in the case of the Covid dataset the number of classes is 2; in this case, the classes present in the dataset are 6)
- lr = learning rate is the step size during the training process that determines the speed and how well the model trains.
- batch_size = number of samples processed before the model is updated, it's often set as a power of 2.
- num_epochs = number of iterations over the dataset.

In [None]:
# Set device in case it is possible to access a GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Number of input and output
in_channels = 1
num_classes = 6

# Hyperparameters
lr = 0.0001
batch_size = 64
num_epochs = 30

### Train-test split

No matter if we are dealing with classification or regression problems, a crucial aspect of determining if the results are meaningful or not is the evaluation of the performance of our model. 

The train-test split is a technique for evaluating the performance of a machine learning algorithm that can use any supervised learning method.

The goal is to divide the dataset into two sub-sets:

- **Train set**: the sample of data used to fit the model.
- **Test set**: the sample of data, unseen during the training, used to evaluate the fit machine learning model.

It is essential to point out that the evaluation must be made on data that are not visible to the network during the training. In other words, the objective is to estimate the machine learning model's performance on new data not used to train the model, i.e. the test data.

In [None]:
train_loader, test_loader = create_train_test_dataset(df, train_ratio=0.5, batch_size=batch_size)

### Initializing 

In [None]:
model = CNN(in_channels, num_classes).to(device)

### Training

The training will take a few minutes. Notice that the 'Current loss' decreases during the training phase, meaning that the network is learning.

In [None]:
# train network
def train(model, train_data, test_data, num_epochs):
    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    loss_list = []
    loss_test_list = []
    for epoch in range(num_epochs):
        with alive_bar(len(train_data), title= (f'Epoch {epoch}'), force_tty=True, bar='classic', spinner='dots_waves') as bar:
            for batch, (data, targets, _) in enumerate(train_data):
                data = data.to(device=device)
                targets = targets.to(device=device)

                #Forward
                scores = model(data)
                loss = criterion(scores, targets)

                #Backward
                optimizer.zero_grad()
                loss.backward()

                # Gradient descent
                optimizer.step()
                bar()
        print(epoch, "Current Loss:", loss)
        loss_list.append(loss.detach().item())
        loss_test_list.append(evaluate_loss(model, test_data, device))
        
    # Display learning curves
    fig = go.Figure(layout=go.Layout(xaxis=dict(title="Epochs"),
                                 yaxis=dict(title="Loss"),
                                 title = 'Learning curves from train and test set'))

    fig.add_scatter(marker=dict(size=7, color="dodgerblue"))
    fig.data[-1].x = [i for i in range(len(loss_list))]
    fig.data[-1].y = loss_list
    fig.data[-1].name = 'train loss'

    fig.add_scatter(marker=dict(size=7, color="coral"))
    fig.data[-1].x = [i for i in range(len(loss_test_list))]
    fig.data[-1].y = loss_test_list
    fig.data[-1].name = 'test loss'

    fig.show()

    return loss_list, loss_test_list

In [None]:
loss_list, loss_test_list = train(model, train_loader, test_loader, num_epochs)

Looking at the learning curve produced at the end of the training, the network seems to perform pretty well, let's have a look also at the accuracy.

In [None]:
# Compute accuracy  
print('Train set:')
list_of_train_incorrect_preds, list_of_train_preds = evaluate_score(model, train_loader, device)
print('Test set:')
list_of_test_incorrect_preds, list_of_test_preds = evaluate_score(model, test_loader, device)

The global accuracy looks pretty high, but let's look at the errors the network is doing.

### Showcases

In [None]:
mislabeled_image = main_path + 'MedNIST_0.5/' + list_of_test_incorrect_preds[2][0]
plt.figure()
im = PIL.Image.open(mislabeled_image)
plt.imshow(im, cmap="gray", vmin=0, vmax=255)
plt.tight_layout()
plt.show()
print('Ground Truth: ', get_key(mp, list_of_test_incorrect_preds[2][1]), '- class', list_of_test_incorrect_preds[2][1])
print('Predicted class: ', get_key(mp, list_of_test_incorrect_preds[2][2]), '- class', list_of_test_incorrect_preds[2][2])

The previous cell shows a mislabeled class, probably you would expect the network to be wrong for this class.
Let's have a look at other examples.

In [None]:
plt.subplots(2, 3, figsize=(10, 10))
for i, k in enumerate(random.sample(range(len(list_of_test_incorrect_preds)), 6)):
    im = PIL.Image.open(main_path + 'MedNIST_0.5/' + list_of_test_incorrect_preds[k][0])
    arr = np.array(im)
    plt.subplot(2, 3, i + 1)
    plt.xlabel({'GT: ': get_key(mp, list_of_test_incorrect_preds[k][1]),
                'Pred: ': get_key(mp, list_of_test_incorrect_preds[k][2])})
    plt.imshow(arr, cmap="gray", vmin=0, vmax=255)
plt.show()

It seems that several times the network predicts a 'Chest' even though it is clearly not the case. To investigate more the kind of errors the network is doing, plot the confusion matrix and look at the accuracy per each class. 

### Confusion matrix

In [None]:
cm = confusion_matrix([v[0] for v in list_of_test_preds],[v[1] for v in list_of_test_preds])
display_labels = list(mp.keys())
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=display_labels)
disp = disp.plot(include_values=True, cmap=plt.cm.Blues, xticks_rotation=45)
plt.show()

In [None]:
# Accuracy per class
for x, acc in enumerate(cm.diagonal()/cm.sum(axis=1)):
    print(get_key(mp, x), ' - ', acc)

🤔 **Discuss with your team** the possible reason why we only have such poor accuracy in certain classes.

If you want you can also go up to the dataframe we printed at the beginning of this part of the tutorial, and explore the dataset more in detail, clicking on the filter button on the up-right side.

In case you need a suggestion to understand why the accuracy is so low in certain classes, run the hint cell below.

In [None]:
# Run this cell if you want a hint otherwise execute the next cells
hint()

In [None]:
# Run this cell to check your answer
check_MedNIST()

### Training on the full dataset

Now that we discovered that the dataset we used was biased, let's train the model over the well-balanced dataset (same number of samples for each class) and look at the results. We will need to repeat some of the previous steps on the new dataset.

In [None]:
# Dataframe creation 
df_complete, _ = get_MedNIST_dataframe(percentage_to_treat=[1., 1., 1., 1., 1., 1.])

In [None]:
# Train-test split
train_loader_complete, test_loader_complete = create_train_test_dataset(df_complete, train_ratio=0.5, batch_size=batch_size)

In [None]:
#Initialization
model = CNN(in_channels, num_classes).to(device)

In [None]:
# Training
loss_list_complete, loss_test_list_complete = train(model, train_loader_complete, test_loader_complete, num_epochs)

In [None]:
print('Train set:')
list_of_train_incorrect_preds_complete, list_of_train_preds_complete = evaluate_score(model, train_loader_complete, device)
print('Test set:')
list_of_test_incorrect_preds_complete,  list_of_test_preds_complete = evaluate_score(model, test_loader_complete, device)

In [None]:
cm_complete = confusion_matrix([v[0] for v in list_of_test_preds_complete],[v[1] for v in list_of_test_preds_complete])
disp = ConfusionMatrixDisplay(confusion_matrix=cm_complete, display_labels=display_labels)
disp = disp.plot(include_values=True, cmap=plt.cm.Blues, xticks_rotation=45)
plt.show()

In [None]:
# Accuracy per class
for x, acc in enumerate(cm_complete.diagonal()/cm_complete.sum(axis=1)):
    print(get_key(mp, x), ' - ', acc)

We can conclude that not only the global accuracy is improved, but also that every single class has indicatively the same accuracy value, meaning that the network is properly learning to classify images, without sub-representing any class.

The goal of this tutorial was to show the difference between an imbalanced and a well-balanced dataset, most of the time it won't be possible to add more data like we did today, but there are techniques that can be used to treat an imbalanced dataset (i.e. use weighted loss functions or resample the dataset). The most important thing is to be aware of the existence of this phenomenon in order to find the best way to deal with it and avoid missing important pieces of information.

### Test on a different image

Finally, let's see now how our model behaves when we feed it with a completely new, different image.

In [None]:
# Read the new image and rezise it
single_image = PIL.Image.open(main_path + 'image_number.jpg')
resized_image = scipy.ndimage.zoom(single_image, 2.3, order=1)
plt.imshow(resized_image, cmap="gray", vmin=0, vmax=255)

In [None]:
# Transoform the image before give it to the model
transform = transforms.ToTensor()
resized_image.reshape((1, 64, 64))
input_image = transform(resized_image).to(device=device)
input_image = input_image.unsqueeze(0)

In [None]:
# Evaluate the model on the new image
model.eval()              # turn the model to evaluate mode
with torch.no_grad():     # does not calculate gradient
    class_index = model(input_image).argmax()   #gets the prediction for the image's class
    
print('Prediceted class:', get_key(mp, class_index.item()), '- class label: ', class_index.item())

As you can see the network is unable to recognize that the image does not belong to any of the classes. Even if the image is not related to the dataset that we used for the training, the model always makes a prediction!

---

Congratulations! You completed this tutorial!

---

### Homework

To better understand the structure of a NN, open the ([Tensorflow Playground Tutorial](https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/translatum_2023/notebooks/Tensorflow_playground_tutorial.ipynb)). You can follow it on your own, performing the task and getting automatic feedback and answers.

### Extra material

Here you can find other details about the network architecture, in particular the explanation of the other layers.

**Max pooling layer**: It is another sliding window type technique, but instead of applying weights as in the convolution, it applies the max function over the contents of the window. A pooling layer is a way to subsample an input feature map or output from the convolutional layer that has already extracted salient features from an image, this is also called downsampling.

**Dropout layer**: dropout removes a percentage of the neuron connections - helping to prevent overfitting by reducing the feature space for convolutional and, especially, dense layers.

**Linear layer**: The linear layer is used in the final stages of the neural network. It is also called a fully connected layer. This layer helps in changing the dimensionality of the output from the preceding layer so that the model can easily define the relationship between the values of the data and give the final probabilities for each label.