This is a demo notebook for showcasing some of the steps required for training a convolutional neural network (CNN) for image classification. This particular model is designed to discriminate between parasitized and uninfected cells from the thin blood smear slide images of segmented cells in a dataset for [malaria detection](https://www.tensorflow.org/datasets/catalog/malaria).

Note that this notebook is best run using a programming environment that is connected to a GPU. In Google Colab, you can check this by going to "Runtime" > "Change Runtime Type" and then selecting "T4 GPU". This notebook can work with other environments, but it also needs to include support for other software packages like `opencv`.

# Important: Run this code cell each time you start a new session!

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import cv2
import os
import tensorflow_datasets as tfds
import tensorflow as tf
import tensorflow.keras.backend as K
import copy
from google.colab.patches import cv2_imshow
from tqdm.notebook import tqdm
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers, regularizers
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.layers import Conv2D, MaxPool2D, UpSampling2D, Dropout, Concatenate, Input, Reshape, MaxPooling2D, Dense, Conv2DTranspose, Softmax, Flatten, BatchNormalization, Activation
from tensorflow.keras import Model

np.random.seed(42)
tf.random.set_seed(42)

# Loading the Dataset

Deep learning requires lots of data for model training. Loading all of the training data at once can be extremely resource intensive for computers, making it nearly impossible to train on all of the data at the same time. Instead, we must instead feed the data into the model in small ***batches***.

Larger batch sizes can lead to more stable updates to the model's parameters but require more memory, while smaller batch sizes can provide noisier updates but consume less memory.

The code below prepares a data structure that will allow us to load the dataset. It then prepares a series of operations that will resize the images, apply one-hot encoding to the labels, and then load up the data in batches.

Note that this function is not actually loading the data right away; it is simply creating the objects that will load the data on demand once we are ready to train.

In [None]:
image_size = (100, 100)

In [None]:
from tensorflow.python.data.ops.dataset_ops import ShuffleDataset
import copy

def load_dataset(num_train_samples, batch_size):
    # Create the objects that will load data for us on the fly
    # Always use the last 20% of the data for testing
    # Use num_train_samples for training
    dataset_name = "malaria"
    split_info = [f'train[:{num_train_samples}]', 'train[80%:]']
    (ds_train, ds_test), ds_info = tfds.load(dataset_name, split=split_info,
                                            with_info=True, as_supervised=True,
                                            download=True, shuffle_files = True)

    # Resize the images
    ds_train = ds_train.map(lambda image, label: (tf.image.resize(image, image_size), label))
    ds_test = ds_test.map(lambda image, label: (tf.image.resize(image, image_size), label))

    # Convert the labels to one-hot encoding (e.g., 0 = [1, 0] and 1 = [0, 1])
    def input_preprocess(image, label):
        label = tf.one_hot(label, 2)
        return image, label
    ds_train = ds_train.map(input_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
    ds_test = ds_test.map(input_preprocess)

    # Prepare to load data in batches
    ds_train = ds_train.batch(batch_size=batch_size, drop_remainder=True)
    ds_test = ds_test.batch(batch_size=batch_size, drop_remainder=True)
    return ds_train, ds_test

Let's look at the data to see how it is formatted:

In [None]:
ds_train, ds_test = load_dataset(100, 5)
for img_batch, label_batch in ds_train.take(1):
    for img, label in zip(img_batch, label_batch):
        img = img.numpy().astype(int)
        plt.figure(figsize=(1, 1))
        plt.imshow(img)
        plt.title(f'Label: {label}')
        plt.show()

# Prepare the Model

In order to train a model, we first need to define its structure. Covering the ins and outs of this process will require nearly a semester's worth of lectures, so we are only going to discuss the process on a surface level.

## Define the Model Architecture

We are going to going to use a ***convolutional neural network (CNN)***. This architecture is specifically designed for working with images. You might recall that we talked about this concept briefly when we discussed image kernels in the context of image processing.

<img src="https://drive.google.com/uc?id=1aYq_6S6Plf2NFlFipAYdZbcuv00Z9jUe" width=500px/>

The first stage of the model (on the left), passes a series of kernels along the input image and produces a series of new images that encode different characteristics of the input. The second stage repeats the same process, but the input to this process is now the output from the previous stage. As the model goes through multiple stages, it is gradually able to combine features across the entire image, similar to how we need to see eyes and mouths before we can identify faces.

## Model Training

Training a deep learning model basically entails the following steps:
1. Initialize the model's parameters (e.g., kernel weights) to some random values
2. Use the model to generate predictions for a single batch of training data
3. Measure the discrepancy between the predictions and the known labels for the batch
4. Update the model's parameters depending on the discrepancy from step 3
5. Repeat steps 2–4 until the model gets satisfactory performance

An apt analogy for this process might be learning multiplication tables from flash cards. When someone goes through the flash cards for the first time, they may get most of the answers wrong. As they go thorugh the same cards over and over, they hopefully adjust their understanding of multiplication to the point when they get most of the answers right.

All of these steps have significant nuance that we could discuss, but let's focus on steps 3 and 4. At any given moment during model training, bigger discrepancies between the model's predictions and the known labels should result in larger updates to the model; otherwise, the model isn't learning to improve itself. What this means is that the way we measure the discrepancy is very important.

The most intuitive thing to do is to measure model performance using the metrics we've discussed already (e.g., mean-absolute error for regression, and accuracy for classification). However, there are flaws if we were to use some of these metrics to guide model training. Take classification accuracy for example. By using accuracy, a prediction for a single data point can either be right or wrong, nothing in between; there is no notion of being "closer or further to the right answer". Because this metric is so rigid, it's difficult to reflect when the model is making marginal improvements that will eventually get it to make the right predictions.

This leads to the idea of a ***loss function***. Like the performance metrics we've discussed already, a loss function measures the inconsistency between model predictions and known labels for a given batch, but it does so in a way that helps guide model training. The ultimate goal of model training is to minimize the loss, which should eventually result in improved performance.

There are different types of loss functions, each designed for specific types of machine learning tasks. For example, mean squared error (MSE) is often used for regression tasks, while something called ***categorical cross-entropy*** is used for multi-class classification tasks. We aren't going to go into this function in detail, but just know that it addresses the limitations of classification accuracy mentioned earlier.

## Transfer Learning

Learning model weights from scratch can take a significant amount of time and training data. However, we don't need to start from scratch. We can take advantage of a concept called ***transfer learning*** (or pre-training), where knowledge gained from solving one problem is applied to a different but related problem. In the case of images, we can rely on the fact that learning about edges, corners, and other shapes to classify images in one dataset may be useful for performing another image classification task, even if the datasets are distinct.

For our specific problem, we are going to start with a CNN model called [EfficientNet](https://arxiv.org/abs/1905.11946) that was trained on the [ImageNet](https://image-net.org/) dataset. ImageNet is one of the foundational datasets that accelerated the evolution of deep learning models for images. If you look at the dataset, you'll notice that it contains images of generic objects like cats, dogs, and planes. Even though our target problem is very different, learning basic visual features from this dataset is better than starting completely from scratch.

As we will see shortly, deep learning models have thousands of parameters that get learned during model training. We could use EfficientNet as a starting point and update all of the parameters when we introduce our data, but this could undo some of the benefits of transfer learning. Instead, we are going to force part of the model to stay "frozen" and create some extra layers at the end so that the it can be customized to our problem.

## Defining the Entire Model

The model is defined using the code below. We don't have enough time to talk about each and every line, but the key steps are explained in the comments:

In [None]:
def create_model(pre_train, size):
    # Create the base model structure,
    # and load parameters from ImageNet if pretraining
    weight_source = 'imagenet' if pre_train else None
    base_model = EfficientNetB0(weights=weight_source, include_top=False)

    # Add extra layers so that the model can be customized for our problem
    channels = 3
    for layer in base_model.layers[:-2]:
        layer.trainable = False
    x_in = Input(shape=(size[0], size[1], channels))
    x = base_model(x_in)
    x = Conv2D(64, 3, padding='same', activation='relu')(x)
    x = Flatten()(x)
    x = Dense(100, activation='relu',
              kernel_regularizer=regularizers.l2(1e-2),
              bias_regularizer=regularizers.l2(1e-2))(x)
    x_out = Dense(2, activation='softmax', # assuming 2 possible classes
                  kernel_regularizer=regularizers.l2(1e-2),
                  bias_regularizer=regularizers.l2(1e-2))(x)
    model = Model(inputs=x_in, outputs=x_out)

    # Add a loss function
    optimizer = tf.keras.optimizers.Adam(learning_rate=1e-2)
    model.compile(optimizer=optimizer, loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return base_model, model

In [None]:
base_model, model = create_model(True, image_size)

Let's look at the structure of this model:

In [None]:
base_model.summary()

Observe that there are hundreds of thousands of trainable parameters in this model. These parameters are the values that go inside our image kernels, along with some other numbers that are learned towards the end of the model architecture. With any machine learning model, more trainable parameters require more training data to avoid overfitting. This is one of the many reasons why deep learning requires tons of training data.

# Putting It All Together

The code block below will allow you to train a model while tweaking four different settings:
1. `train_samples`: The amount of data used for model training
2. `batch_size`: The size of the batches that are used for model training
3. `epochs`: The number of times that the entire train dataset is fed into the model for model training
4. `pre-train`: Whether or not the model is pre-trained

Play around with them to see how they affect model performance.

In [None]:
train_samples = 250 # @param {type: "slider", min:250, max:2001, step:250}
batch_size = 32 # @param {type: "slider", min:32, max:128, step:32}
epochs = 3  # @param {type: "slider", min:1, max:25}
pre_train = True  # @param {type: "boolean"}

ds_train, ds_test = load_dataset(train_samples, batch_size)
base_model, model = create_model(pre_train, image_size)
hist = model.fit(ds_train, epochs=epochs,
                 validation_data=ds_test, verbose=1)

The `.fit()` method already prints out a lot of useful information for us when we make it verbose (`verbose=1`). However, we can also print out what happens to the output of the loss function and the model's accuracy each time it has looked at the entire training dataset.

In [None]:
plt.figure(figsize=(6, 3))
plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

In [None]:
plt.figure(figsize=(6, 3))
plt.plot(hist.history['accuracy'])
plt.plot(hist.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()