![logo](https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/main/figures/1128-191-max.png?raw=true)

Getting to know the power of Deep Learnig with The Teachable Machine
---
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/translatum_2023/notebooks/Teachable_machine_tutorial.ipynb)

This notebook will introduce you to the fascinating Machine Learning (ML) world through the google experiment Teachable Machine.

This tutorial goes through the three major steps any ML (or deep learning) project relies on:

- Providing data
- Training a model
- Testing and evaluating the trained model

These will be the main subjects of this tutorial.

## 📄 Format

For each subject we go through three steps:
1.  **Introduction to key concepts:** If you already know the concepts, you can simply skip the reading
2.  **Task:** What you need to accomplish at this step
3.  **Step-by-step tutorial:** How to accomplish the previously stated task

All along this tutorial, we will be working on Binary classification for COVID data. In this problem, we will use the Lung CT scans dataset in order to predict whether the patient has Codiv-19 or not. Since the output can be positive or negative, this is a classic example of **binary classification**.

## 🤖 Teachable Machine
As mentioned earlier, we will conduct experiments with the Teachable Machine. To access the website just click on this link: [Teachable Machine](https://teachablemachine.withgoogle.com/train) and you will see the following page.
This tutorial is focused on image classification, but feel free to come back later and explore the other projects.

<img  src="https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/main/figures/Teachable_machine.png?raw=true"  alt="Teachable_machine"  style="width: 700px;"/>

## Providing data

### 💡 Key concepts
- **Classification:** Classification in machine learning is a process of categorizing data into predefined classes or categories based on patterns and features.
- **Training data:** In order to learn, a model needs to see enough examples of each class to be able to correctly classify new examples later. The set of examples that the model learns from is called **Training set.**

### ✅ Task: Get to know the data and upload it to the Teachable Machine

#### Dataset
The dataset, available on Kaggle (https://www.kaggle.com/datasets/luisblanche/covidct), will be downloaded into your google drive.
The dataset counts a total of 746 images divided as follows:
- 397 No Covid
- 349 Covid

The images, i.e. CT scans, are obtained through Computed Tomography, a medical imaging technique used in radiology (X-ray) to obtain noninvasively detailed internal images of the body for diagnostic purposes. Only with proper training is it possible to interpret the scans, so without a radiology/medical background, it is tough to understand the presence of Covid-19 from the scan. But we will see that a well-trained NN can help the technicians and doctors diagnose this kind of disease.

### 📝 Step by step tutorial

In [None]:
#@markdown #### Step 1. Run this cell to connect your Google Drive to Colab and install packages
#@markdown * Click on the URL.
#@markdown * Sign in your Google Account.
#@markdown * Click on "Files" site on the right. Refresh the site. Your Google Drive folder should now be available here as "drive".
#mounts user's Google Drive to Google Colab.
from google.colab import drive
drive.mount('/content/gdrive')
%cd /content/gdrive/MyDrive/
!git clone https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials.git
%cd /content/gdrive/MyDrive/DL-lecture-tutorials

####  Step 2. Look at the data
Although not very intuitive in this example, it's always a good idea to take a look at your data first to get a sense of what you are working with.

####  🤖Teachable Machine interface: Classes

After creating a new project, on the lefthand side of the page, you can edit the classes, choosing a label for each class you want the model to learn. You could both upload the images from your computer, and record them using the webcam..
In order to start collecting the images trough the camera hold on the blue button below the camera window, or open the menu and choose to record a few seconds without holding the button.

<img  src="https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/translatum_2023/figures/Teachable_machine_classes.png?raw=true"  alt="Teachable_machine_classes"  style="width: 500px;"/>

####  Step 3. Upload training data
For each class:
1. Give it a name (COVID or No COVID)
2. Import images (in your case, from google Drive)
3. Navigate to `data/train` and upload all corresponding images

## Training a model

### 💡 Key concepts
Hyper-parameters: one can obtain different Machine Learning models by configuring so-called hyper-parameters. In this example with focus on three of them:
- **Epochs**: the number of iterations over the dataset.
- **Batch Size**: the number of samples processed before the model is updated.
- **Learning Rate**: step size during the training process that determines the speed and how well the model trains.

### ✅ Task: Train the model to recognize Covid vs Non-Covid CT scans

### 📝 Step by step tutorial

#### Step 1. Click on Train Model!
This uses the interface's default hyper-parameter values.
Move on to the next step (testing and evaluating the model), and see how the model is doing.

#### Step 2. Play with hyper-parameters
After trying default hyper-parameters, you can try to play with them and see what happens (Click on "Advanced").

<img  src="https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/translatum_2023/figures/Teachable_machine_model.png?raw=true"  alt="Teachable_machine_model"  style="width: 250px;"/>

While the model trains, open the panel "Under the hood" and look at the learning curves and the accuracy.

## Testing and evaluating the trained model

### 💡 Key concepts
Your goal is to check whether the model is able to recognize the disease or not and its performances. 
To have a wide comprehension of what is happening, one needs to check the learning curves during the training.

**Accuracy per epoch** Accuracy is one of the evaluation metrics we can use to evaluate how good a model is. It can be defined as the number of samples correctly classified over the total number of samples.

<center>

$$\text{Accuracy} = \dfrac{\text{\# of samples correctly classified}}{\text{total \# of sample}}$$

</center>


**Loss per epoch** Another interesting way to see if the model is correctly learning is to look at the loss function at different epochs. A NN works trying to minimize the difference between the prediction and the label, this is usually described through a *loss function*. If the loss function is decreasing over time, it means that the network is learning.

**Confusion matrix** Sometimes accuracy is not always enough to evaluate a model. Let's see that with an example. Assume that we have 10 patients and only one of them has a disease. If our model predicts that a patient is always healthy, it means that its accuracy would be $\frac{9}{10} = 0.90$, which is pretty high, even if the model is not giving us relevant information. Indeed, in this case, we are more interested in detecting the disease instead of only having such high accuracy and missing relevant information.

For this reason, it is interesting to introduce other metrics such as sensitivity and specificity.
-  **sensitivity**: that represents the true positive rate (in terms of probability is the probability of predicting positive given that the patient has the disease);
-  **specificity**: that represents the true negative rate (i.e. the probability of predicting negative, given that the patient is healthy).

This information is usually summarized in a table, called **'confusion matrix'**, used to look at the performance of the classifier in form of a table. The rows represent the Ground truth (GT) and the columns are the output predicted by the model. Each cell coincides with the number of the element corresponding to each GT/model prediction combination and is called:

- **True Positive (TP)**: when both GT and the prediction are positive
- **False Negative (FN)**: when the GT is positive but the output is negative
- **False Positive (FP)**: when the GT is negative (i.e. healthy patient) but the output is positive
- **True negative**: when both GT and the prediction are negative

In medical applications, especially FN and FP should be reduced as much as possible, in order to avoid missing the detection of a disease or alarming people who are actually in good health.

<div>

<img  src="https://github.com/HelmholtzAI-Consultants-Munich/DL-lecture-tutorials/blob/main/figures/confusionmatrix.png?raw=true"  width="300"  height="200"/>

</div>

**Train-test split**
The train-test split is a technique for evaluating the performance of a machine learning algorithm that can use any supervised learning method. The whole dataset is divided into two sub-sets:
  
-  **Train set**: the sample of data used to fit the model.
-  **Test set**: the sample of data, unseen during the training, used to evaluate the fit machine learning model.


### ✅ Task: Check the model performance

### 📝 Step by step tutorial

#### Step 1. Use your model on unseen data

1. The input option should be on `on`. Next to it, there's a dropdown menu set to `Webcam` by default. Set it to `File`
2. Import images
3. Navigate to `data/unseen` and select an image (either corresponding to COVID or not). How is the model doing?


#### Step2. Look at metrics obtained after training
Open the 'Under the hood' panel and check the learning curves during the training.
> Looking at the accuracy plot over the epochs we can state that the model is not performing badly.

🤔 What is the difference between the training and the test curve? According to you, which might be the reason? Discuss it with your team and think about possible changes that could improve the model's performance.

#### Step3. Look at the confusion matrix
To see the confusion matrix of our model, click the confusion matrix button in the **'Under the hood'** panel. You will notice that, even if the model is performing well, there are still some FP and TN.

## 🧑‍🏫 Learnings: TODO
- understand the limitations and pitfalls of DL
- think and discuss about posssible interesting applications of classification with Teachable Machine.

## Bonus tasks: play with your own data

### Image recognition hand/fingers
1) Using the camera, create your own dataset with two labeled classes in order to train the machine to recognize an open hand and one finger.
2) Train the model and look at the result. Are you satisfied with your model? How could you change the dataset in order to improve the performances?
3) Is your model able to recognize a fist? Probably not, what would you do to teach this new thing to your NN?
4) Try now to change the background or use the other hand, how is the model reacting then?

### Be creative!
1. Try to build your own with more than two classes classifier (cat/dog/cow, apple/banana/strawberry,..).
2. Now think about your domain: how these kinds of models could help your research or work?