# Image classification with Convolutional Neural Network

## Import dependencies

At the beginning of each project, we import the libraries and required dependencies, as well as configure preferred settings in the notebook.

In [None]:
import keras
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import precision_score, recall_score, confusion_matrix
import os
import seaborn as sns
import sys
from datetime import datetime
import glob

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from utils.plotting import plot_confusion_matrix
from model.trainer import Cifar10Trainer
from data.processing import standardize_data
from model.evaluate import evaluate_model
from utils.plotting import plot_confusion_matrix, plot_multi_auc
np.set_printoptions(threshold=sys.maxsize)
%matplotlib inline

## Load Dataset

The cifar10 dataset contains a total of 50,000 images for training, 10,000 images for testing.

Let's load the dataset and take a look at the contents to understand how it is structured.

The dataset is already avaiable as part of the Keras dataset library. We can use the method "cifar10.load_data" to load the data into an array in memory.

We will then inspect the array objects to see the contents.

In [None]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
label_list = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
label_decoder = {i: label for i, label in enumerate(label_list)}
print(f"the training dataset image dimensions are: {x_train.shape}")
print(f"the training dataset label dimensions are: {y_train.shape}")
print(f"the test dataset image dimensions are: {x_test.shape}")
print(f"the test dataset label dimensions are: {y_test.shape}")

### Question

What do the 4 numbers in the training dataset image dimensions mean?

What do the 2 numbers in the training data label dimensions mean?

### Answer

<br> <br>

## Explore the image dataset
`x_train` and `x_test` represent the training and test images in raw numerical numbers.

We see that the `x_train` object has a dimension of (50000, 32, 32, 3)

<br>
Here's how to interpret these dimensional numbers:

The first dimension is 50,000 units in length, this dimension indexes individual image examples

The second dimension is 32 units in length, this dimension is the vertical axis of the image

The third dimension is 32 units in length, this dimension is the horizontal axis of the image

The fourth dimension is 3 units in length, this dimension is the color channel axis of the image

<br>
`y_train` and `y_test` are the training and test ground truth labels for each of the image examples.

Let's take a look at the training image data.
We can select the first example, of the first channel (red color) 



In [None]:
print(x_train[0, ::2, ::2, 0])

### Question
The notation of ::2 means selecting elements in steps of 2 (every other element). If we take every other element in the horizontal and vertical axis, how many numbers in the rows and columns do you expect?

### Answer

<br> <br>

There are a total of 10 object classes, they are each represented by a digit from 0 to 9.

Respectively, they are:

In [None]:
_ = [print(f"{label}, {value}") for label, value in label_decoder.items()]

Let's take a look at how many image examples are there per class

In [None]:
label_collections = {}
for i in range(len(label_decoder)):
    indices, _ = np.where(y_train == i)
    label_collections[i] = indices
    print(f"number of {label_decoder[i]} examples: {indices.size}")

## Precision / Recall

Two common metrics to measure the performance of classification models are 
1. Precison
2. Recall

Precision is defined as:

`number of correct positive predictions / number of predicted positive examples`

Recall is defined as:

`number of correct positive predictions / number of actual positive examples`


Below, we created a mocked up model which predicted the classification of 16 different images. Each image is shown with the actual label, as well as the predicted label.


In [None]:
np.random.seed(423)
fig = plt.figure(figsize=(10, 10))
columns = 4
rows = 5
proba = 0.8
true_labels = []
preds = []
for i in range(1, columns*rows +1):
    img_id = np.random.choice(x_train.shape[0])
    img = x_train[img_id]
    fig.add_subplot(rows, columns, i)
    plt.imshow(img)
    plt.xticks([], [])
    plt.yticks([], [])
    true_labels.append(y_train[img_id].item())
    if np.random.rand(1) > proba:
        preds.append(np.random.choice([x for x in range(10) if x != true_labels[i-1]]))
    else:
        preds.append(true_labels[i-1])
    plt.title(f"actual: {label_decoder[true_labels[i-1]]} \n predicted: {label_decoder[preds[i-1]]}")
fig.tight_layout()
plt.show()

### Question

A model produced the predictions above (given the actual label as shown)

What is the precision of the model?

What is the recall of the model?

### Answer

<br> <br>

## Confusion Matrix

A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known. Each row of the matrix represents the instances in a actual class while each column represents the instances in an predicted class.

Based on the same mock up model predictions for the 16 images, we can visualize the performance of the model using a confusion matrix. Each row of the confusion matrix table represents the actual class of each of the 16 image examples, while each column of the table represents the predicted class for the same images.

For instance, the top row is labeled `airplane`, looking across this row, we see that the number 1 appears in first column (which is Predicte label of `airplane`). This means that of the one image that is actually `airplane`, the model also predicted it to be `airplane`.

Let's take a look at the row labeled `bird`. Looking across this row, the number 2 appearing in a column also labeled `bird`. Meaning that 2 images are predicted correctly as `bird`. We also see that a number 1 appears in the predicted column labeled `dog`. This means the single image which is actually `bird` is predicted incorrectly as `dog`.

In [None]:
precision = precision_score(true_labels, preds, average="micro")
print(f"precision score of the model is {precision}")
plot_confusion_matrix(true_labels, preds, classes=label_list,
                      title='Confusion matrix, without normalization')

### Question

What is the model prediction accuracy for the class `horse`? Accuracy is defined as 

`the number of correct predictions / total number of images in that class`

Which classes of images did the model perform the worst on? i.e., lowest percentage of correct predictions.

Which classes of images did the model predicted perfectly on? i.e., all image labels correctly predicted

### Answer

<br> <br>

## RGB Image

Each image in the CIFAR10 dataset is composed of 3 color channels, Red, Green and Blue.

Below, we plot the colored image next to each of the channels in grayscale.


In [None]:
image_id_to_view = 24
image = x_train[image_id_to_view]
fig = plt.figure(figsize=(10, 4))
plt.grid(b=None)
fig.add_subplot(1, 4, 1)
plt.imshow(image)
plt.grid(b=None)
plt.title("Colored")
titles = ["Channel 1", "Channel 2", "Channel 3"]
channel_mapping = {0:2, 1:1, 2:0}
for i in range(3):
    fig.add_subplot(1, 4, i + 2)
    plt.imshow(image[:,:,channel_mapping[i]], cmap='gray')
    plt.title(titles[i])
    plt.grid(b=None)

### Questions

Based on the color image and the grayscale maps for each channel, can you figure out which Channel is Red, Green and Blue?

### Answers

Channel 1 is 

Channel 2 is 

Channel 3 is 

You can modify the `image_id_to_view` above and rerun the cell to see different images.

## Label one-hot-encoding

The variable `y_train` contains the training dataset ground truth labels. As we have seen previously, it has the following dimensions

In [None]:
print(f"the training dataset label dimensions are: {y_train.shape}")

There are 50,000 example images, each with a single label, represented by a number from 0 to 9 (for a total of 10 image classes)

Let's see the first 10 examples of the labels in the `y_train` variable.

In [None]:
print(y_train[:10, 0])
print([label_decoder[x] for x in y_train[:10, 0]])

Each numerical value can be mapped to a text description of the class

In [None]:
_ = [print(f"{label}, {value}") for label, value in label_decoder.items()]

These labels constitute the targets which the model will attempt to learn during the training process. Given each input image, the model optimizer will slowly adjust its internal parameters to improve its accuracy in predicting the correct class labels.

Given the numerical representation of the class labels, it would seem that this is similar to a regression model, i.e., given an input image, predict a numerical value. However, if we solve this as a regression problem, the model performance will likely be poor.

### Question

What are some reasons that this model will not work well if it is constructed as a regression problem?

### Answer

<br> <br>

We cannot simply ask the model to simply learn to predict a value between 0-9. Instead, since there are 10 classes of images, we can set up the model to predict 10 separate numbers, each representing the likelihood that the image belong to each class.

To this end, we use a technique called one-hot encoding where the original values between 0-9 is represented (or encoded) by an array of 10 binary values, each of which can take on a value of either 0 or 1. The position of the element within the array that takes on a value of 1 corresponds to the value of the original class label. Note that for each array, only 1 element can have a value of 1, the rest of the array must be all zeros.

For example, a label value of 2 can be represented by the following array:

`[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]`

The value 1 appears in the 3rd element in the array, therefore, this array represents the value of 2 (counting from 0)

### Question

How do you represent the number 5 (out of 10 classes) using one-hot encoding?

### Answer

<br> <br>

We can use a convenient function `to_categorical` which converts numerical representations to one-hot encoded representations.

Let's take a look at the 11th to 20th examples in the training dataset in this one-hot encoding form, displayed as a table of values. Each row in this table is one example. The column position of the "1" element represents the oiginal numerical value.

In [None]:
y_train_one_hot = keras.utils.to_categorical(y_train, num_classes=np.unique(y_train).size)
print(y_train_one_hot[10:20,:])

We can also plot this in a color coded chart (called a heatmap).

In [None]:
one_hot_plot = y_train_one_hot[10:20,:].astype(int)
fig, ax = plt.subplots(figsize=(6, 6))
im = ax.imshow(one_hot_plot, interpolation='nearest', cmap="Blues")
ax.grid(False)
fmt = 'd'
thresh = np.max(one_hot_plot) / 2
for i in range(one_hot_plot.shape[0]):
    for j in range(one_hot_plot.shape[1]):
        ax.text(j, i, format(one_hot_plot[i, j], fmt),
                ha="center", va="center",
                color="white" if one_hot_plot[i, j] > thresh else "black")
ax.set_xticks([x for x in range(len(label_list))])
ax.set_yticks([i for i in range(one_hot_plot.shape[0])])
fig.tight_layout()
plt.xlabel("label")
plt.ylabel("example")
plt.show()

### Question

Given these one-hot encoded labels, can you decode the original text descriptions for each of the 10 examples?
You can use the following code to display the decoding dictionary

`_ = [print(f"{label}, {value}") for label, value in label_decoder.items()]`

In [None]:
### execute the decoding dictionary here

### Answer

The class labels for the 10 examples are:

<br> <br>

## Data Preprocessing

Before the images can be used by the learning algorithm to build a classifier efficiently, there are several preprocessing steps that are typically performed to improve the training process and model results.

We take the first 100 image examples and collect all the image pixel values. Next, a histogram or distribution plot is created, which shows the distribution (fraction) of pixels that take on each binned value ranges.


In [None]:
test_images = x_train[:500,:,:,0].copy()
sns.distplot(test_images.ravel(), bins=50)
plt.xlabel('Bin values')
plt.ylabel('Fraction of pixels')
plt.show()

The figure above has 50 bins, between 0 and 255, the height of each bar shows the fraction of pixels which fall within the bin range.
For example, the first bin has a height of 0.002, meaning that 0.2% of the pixels falls within this range of values.

### Question

What is the size (width) of each bin?

Which bin is the most common (approximately)?

### Answer

<br> <br>

A very common preprocessing step in machine learning for continuous input data (numerical values, such as image pixel values) is mean subtraction. This is done in two simple steps.

1. Calculate the average (mean) pixel value
2. Subtract the average (mean) pixel value from average pixel

The effect of this is that the distribution of pixel values will be 'centered' at 0. The reason for doing this is beyond the scope of this exercise, but in a nutshell, this makes it easier for the model optimizer to learn the weights quickly (and reach better results).

Below, we perform the mean subtraction step and display the resulting pixel value distribution.

In [None]:
test_images = test_images.astype(float)
test_images -= np.mean(test_images)
sns.distplot(test_images.ravel(), bins=50)
plt.xlabel('Bin values')
plt.ylabel('Fraction of pixels')
plt.show()

### Questions

Based on the new distribution, what is the approximate average pixel value in the original data?

Did the `shape` of the histogram change?

If it changed, why?

If it did not change, why?

### Answer

<br> <br>

A common technique for image preprocessing is called `standardization`. The objective of this step is to transform the data such that the pixel values have a distribution with standard deviation of 1. The process is performed in two steps.

1. Mean substraction (already completed in the previous cells)
2. Normalize by the standard deviation 


Below, we perform the standardization step and display the new distribution

In [None]:
test_images /= np.std(test_images)
sns.distplot(test_images.ravel(), bins=50)
plt.xlabel('Bin values')
plt.ylabel('Fraction of pixels')
plt.show()

### Question

Did the `shape` of the distribution change?

What information is lost in the standardization step?

Do you think the information lost matter for the purpose of model training?

Here we perform the one-hot encoding steps and the standardization steps to the actual dataset

In [None]:
y_train = keras.utils.to_categorical(y_train, num_classes=10)
y_test = keras.utils.to_categorical(y_test, num_classes=10)
x_train = standardize_data(x_train)
x_test = standardize_data(x_test)

## Model Training Parameters

Now that the input data has been well prepared, we can begin training a convolutional neural network model.

Due to the complexity of the model architecture, and the programming involved in defining a network, here we will use a pre-built network and allow you to adjust the parameters that affect the various aspects of the optimization process as well as image augmentation methods.

All the tunable parameters are specified using the `hyperparameters` variable. It is a dictionary, indicating the `key` and `value` pairs. The `key` is the name of the parameter, the `value` is the setting for that parameter.

Here's a brief explanation of each parameter

`learning_rate`: This determines how fast the neural network weights are updated per iteration step, the larger the number, the faster the models learns, however too high a number will result in learning instability and model divergence

`batch_normalization`: This is an advanced (but now common-place) normalization technique. In addition to normalizing the input image data, this parameter will tell the algorithm to normalize the output of certain network layers per batch of images calculate. You can set this to False and see what the effects are

`weight_decay`: This is a regularization technique which prevents the network weights from getting too large. The effect is that the model is more generalizable and performs better on test dataset when properly tuned. However, when the value is too large, the performance of the model will drop.

`base_filters`: base number of convolution filters

`batch_size`: number of images to train per iteration

`fc_size`: number of nodes per dense neural network layer

`dropout`: the probability in which a node output is set to 0, this improves the generalizability of the model and prevents the network from relying heavily in any one node

`lr_decay`: exponential decay rate of the learning rate, this helps the model stabilize around the loss minimum, however, if the decay is too large, the model will learn very slowly

`rotation_angle`: image augmentation, rotation of the image

`width_shift_range`: image augmentation, width shift of the image

`height_shift_range`: image augmentation, height shift of the image

`shear_range`: image augmentation, range of image shear

`zoom_range`: image augmentation, range of image zoom

`horizontal_flip`: image augmentation, whether to flip the image horizontally

`early_stopping_patience`: how many epochs to wait till the model training is stopped (after the loss ceases to decrease)

`reduce_lr_patience`: how many epochs between learning rate reduction

`reduce_lr_factor`: reduction factor of the learning rate for every "reduce_lr_patience" epochs

`activation`: activation function (relu, sigmoid, tanh)

`classification_activation`: activation function of the output layer, "softmax" for multi-class problems

`loss`: loss function, "categorical_crossentropy" for multi-class problems which one-hot encoded labels


In [None]:
hyperparameters = {
  "learning_rate": 0.001,
  "batch_normalization": True,
  "weight_decay": 1e-4,
  "base_filters": 32,
  "batch_size": 64,
  "fc_size": 128,
  "dropout": 0.2,
  "lr_decay": 1e-6,
  "rotation_angle": 15,
  "width_shift_range": 0.1, 
  "height_shift_range": 0.1,
  "shear_range": 0.1,
  "zoom_range": 0.1,
  "horizontal_flip": True,
  "early_stopping_patience": 15,
  "reduce_lr_patience": 6,
  "reduce_lr_factor": 0.3,
  "activation": "relu",
  "classifier_activation": "softmax",
  "loss": "categorical_crossentropy"
}

Set up the directory and training job names to save artifacts

In [None]:
artifact_base = "/tmp/artifact/cifar10"
job_name = "training_job_"
timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
artifact_directory = os.path.join(artifact_base, job_name+timestamp)

### Model training

First we will create a model trainer object, this will also display the model architecture and output shape of each individual layer.

In [None]:
trainer = Cifar10Trainer(
    model_name="convnet6",
    hyperparameters=hyperparameters,
    artifact_directory=artifact_directory,
    x_train=x_train,
    y_train=y_train,
    x_test=x_test,
    y_test=y_test
)

### Questions

How many convolution layers are there?

How many pooling layers are there?

What is the size of the flattened vector which is feeding the final classifier dense_1 neural network?

### Answers

<br> <br>

We can initiate the training using the method `train()`

The progress of the training will be displayed in epochs. Each epoch is one complete run through the entire training dataset.

In [None]:
trainer.train()

## Evaluate model

Let's look at the model evaluation results for a pre-trained artifact.

We will look at the confusion matrix, as well as the Receiver Operating Charateristics.

In [None]:
model_artifact = "../artifacts/weights.92-0.449348.hdf5"
fpr, tpr, roc_auc, cm = evaluate_model(model_artifact, x_test, y_test, label_list)

### Question

Based on the confusion matrix, which class of object did the model have the most trouble with?

Which other classes did the model confuse this class with most?

### Answer

<br> <br>

### Evaluate your own model

In [None]:
artifact_paths = glob.glob(os.path.join(artifact_directory, "*.hdf5"))
print('all available model artifacts:')
for i, path in enumerate(artifact_paths):
    print(f'epoch {i+1} artifact: {path}')

Enter the file path for the model artifact you'd like to evaluate below

In [None]:
model_artifact = ''

In [None]:
fpr, tpr, roc_auc, cm = evaluate_model(model_artifact, x_test, y_test, label_list)

### Additional exercise

Can you tune the model to obtain accuracy > 0.88?

### Bonus exercise

Can you add one more convolutional layer to the model, right before the flatten layer?

Hint: you'll need to modify /cifar10/model/model.py source code
