# W3 practical

## 📝 Learning goals of practical

- You can discuss how and why to use train/test splits for training machine learning models

- You can explain how overfitting of neural networks can arise and give an example of how to combat this

- You can reflect on neural networks' dependence on (unbiased) training data

- You can list various methods of assessing model performance and discuss their up- and downsides.

TIP: To speed up learning these deep neural networks. In the top right next to 'RAM', click the upside down triangle, select 'Change runtime type' and click 'GPU'.



## Data setup and inspection

In this practical you will train a model to classify tomato leaves that have been infected by some biotic stress. While doing so, we will also investigate how a model's dependence on data can be misleading, and perhaps even harmful.

In [None]:
!git clone https://github.com/gabrieldgf4/PlantVillage-Dataset.git
!pip install git+https://github.com/CropXR/EduXR.git

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
from collections import Counter
import tensorflow as tf
from tensorflow import keras
import numpy as np
from pathlib import Path
from dsplantbreeding.Datasets.biotic_stress_images import get_image_biotic_stress_dataset
from dsplantbreeding.actions import count_labels_in_dataset, decrease_brightness_on_label, augment_image, preview_images, show_classification_examples
from dsplantbreeding.metrics import show_accuracy, show_confusion_matrix, show_auroc
from dsplantbreeding.Models import train_dl_model, get_the_best_model_ever

In [None]:
# We will only investigate Tomato healthy vs infected.
base_dir = Path('/content/PlantVillage-Dataset')
healthy_dir = base_dir / 'Tomato___healthy'
infected_dirs = list(base_dir.glob('Tomato___[!healthy]*'))
infected_dirs

N.b. here we group the infected into one category, but we could also train our model to predict each of the categories separately.

In [None]:
dataset = get_image_biotic_stress_dataset(healthy_dir, infected_dirs)

In [None]:
preview_images(dataset)

This is the 'human readable input'. But what does the input for the model 'look' like?

In [None]:
dataset.as_numpy_iterator().next()

### ❓Questions

- What do these numbers represent exactly?

In [None]:
train_dataset, validation_dataset = keras.utils.split_dataset(dataset, left_size=0.8, shuffle=True)

### ❓Questions

- Why do we split the data into train and test data?
- List some things that should be taken into consideration when splitting data into train and test.

## Using the best model

Here I will provide you with a model I developed that has 90% accuracy! It's now up to you to evaluate if you agree that this is indeed the best model ever.

In [None]:
test_model = get_the_best_model_ever()
show_accuracy(test_model, validation_dataset)

### ❓Questions

- With this accuracy, do you think you would use this model? What extra steps would you take to look further into the classification performance?

Let's look at the number of different classes in the dataset.

In [None]:
count_labels_in_dataset(dataset)

### ❓Questions

- Does this class distribution change your view on the model's accuracy score?

Let's investigate this model further by plotting a confusion matrix and receiver operating curve.

In [None]:
show_confusion_matrix(test_model, validation_dataset)

In [None]:
show_auroc(test_model, validation_dataset)

### ❓Questions

- Explain what both visualisations show exactly.
- Based on these evaluations, does the model provide useful predictions?

##  Deep learning
Now let's train a deep learning model to see if it can outperform the model you just used. In this case we use a convolutional neural network, which is a special form of the neural network we discussed in the lecture.

In [None]:
model = train_dl_model(train_dataset, validation_dataset, epochs=3)

Let's look at the raw outputs of the neural network. Here are the outputs for 10 images.

In [None]:
model.predict(train_dataset.batch(10).take(1))

### ❓Questions

- What does this predicted number represent?
- What would have to be modified in the neural network model to change it into a multi-class classifier?
-How would that impact performance?

Again let's evaluate the performance of the model we just trained.

In [None]:
show_confusion_matrix(model, train_dataset)

In [None]:
show_confusion_matrix(model, validation_dataset)

In [None]:
show_auroc(model, validation_dataset)

### ❓Questions

- Which confusion matrix is more useful? The one on the train dataset or the test dataset?
- Would you prefer this deep learning model over the model you tested earlier?

Let's look at some example misclassifications:

In [None]:
show_classification_examples(model, validation_dataset)

Let's simulate a case in which for the training dataset all the healthy leaves were photographed slightly darker than infected leaves, perhaps because the farmer visited the infected field later in the day. In the validation dataset (i.e. the dataset other farmers might apply this model on) the opposite was the case. How do you think this will impact model performance?

In [None]:
dataset = get_image_biotic_stress_dataset(healthy_dir, infected_dirs)

train_dataset, validation_dataset = keras.utils.split_dataset(dataset, left_size=0.8, shuffle=True)

# Dim healthy (label 0)
train_ds = train_dataset.map(decrease_brightness_on_label(0))
# Dim infected (label 1)
val_ds = validation_dataset.map(decrease_brightness_on_label(1))

In [None]:
preview_images(train_ds)

In [None]:
preview_images(val_ds)

In [None]:
model = train_dl_model(train_ds, val_ds, epochs=2)
show_confusion_matrix(model, val_ds)

In [None]:
show_classification_examples(model, val_ds)

### ❓Questions

- How would you explain these results?
- Could you think of method(s) to fix this?

## How to fix it?

One way to mitigate this problem we just encountered is through image augmentation!

In [None]:
augmented_train_ds = train_ds.map(augment_image)
preview_images(augmented_train_ds)

Assignment: fill in the code below to train a model on this augmented data and evaluate its performance

In [None]:
model = train_dl_model(train_dataset=FILL_IN, validation_dataset=FILL_IN, epochs=2)
show_confusion_matrix(model, FILL_IN)
show_auroc(model, FILL_IN)
show_classification_examples(model, FILL_IN)

### ❓Questions

- Explain how image augmentation aids generalisability.
- If you have extra time, pick one or more of these questions to investigate:
  -  How is model performance changed if you shrink the dataset or change the class distribution?
  -  Can the classifer be applied to a species different than tomato
  - Can you think or find other evaluation metrics that would be useful?