# Prepare Environment

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from IPython.display import display

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams["axes.grid"] = False
%matplotlib inline

In [None]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)

**Note**: most of the code in the notebook is a simplified version of the tutorial example from Tensorflow ([here](https://www.tensorflow.org/tutorials/keras/classification))

# Import the CIFAR10 dataset



<table>
  <tr><td>
    <img src="https://cdn-images-1.medium.com/freeze/max/1000/1*LyV7_xga4jUHdx4_jHk1PQ.png"
         alt="Fashion MNIST sprite"  width="600">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> <a href="https://www.cs.toronto.edu/~kriz/cifar.html">CIFAR10 samples</a>, collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.<br/>&nbsp;
  </td></tr>
</table>

Here, 50,000 images are used to train the network and 10,000 images to evaluate how accurately the network learned to classify images. Tensorflow provides CIFAR10 small images classification dataset that we can use as follows:

The expected output:

```
Training set: (50000, 32, 32, 3), (50000, 1)
Test set: (10000, 32, 32, 3), (10000, 1)
```

In [None]:
from keras.datasets import cifar10

(X_train, y_train), (X_test, y_test) = # YOUR CODE HERE

print(f'Training set: {X_train.shape}, {y_train.shape}')
print(f'Test set: {X_test.shape}, {y_test.shape}')

Further split the training set into training (40,000) and validation (10,000) sets.

The expected output:
```
Training set: (40000, 32, 32, 3), (40000, 1)
Validation set: (10000, 32, 32, 3), (10000, 1)
Test set: (10000, 32, 32, 3), (10000, 1)
```

In [None]:
# YOUR CODE HERE
from sklearn.model_selection import train_test_split

X_train, X_valid, y_train, y_valid = # YOUR CODE HERE

print(f'Training set: {X_train.shape}, {y_train.shape}')
print(f'Validation set: {X_valid.shape}, {y_valid.shape}')
print(f'Test set: {X_test.shape}, {y_test.shape}')

As the labels have an extra dimension (i.e., `(xxx, 1)`), we need to squeeze them to be `(xxx, )`. You may find the `np.squeeze()` function useful. The expected output should be

```
Training set: (40000, 32, 32, 3), (40000,)
Validation set: (10000, 32, 32, 3), (10000,)
Test set: (10000, 32, 32, 3), (10000,)
```

In [None]:
# YOUR CODE HERE

print(f'Training set: {X_train.shape}, {y_train.shape}')
print(f'Validation set: {X_valid.shape}, {y_valid.shape}')
print(f'Test set: {X_test.shape}, {y_test.shape}')

The images are 32x32x3 NumPy arrays, with pixel values ranging from 0 to 255. The *labels* are an array of integers, ranging from 0 to 9. These correspond to the following classes:

<table>
  <tr>
    <th>Label</th>
    <th>Class</th>
  </tr>
  <tr>
    <td>0</td>
    <td>Airplane</td>
  </tr>
  <tr>
    <td>1</td>
    <td>Automobile</td>
  </tr>
    <tr>
    <td>2</td>
    <td>Bird</td>
  </tr>
    <tr>
    <td>3</td>
    <td>Cat</td>
  </tr>
    <tr>
    <td>4</td>
    <td>Deer</td>
  </tr>
    <tr>
    <td>5</td>
    <td>Dog</td>
  </tr>
    <tr>
    <td>6</td>
    <td>Frog</td>
  </tr>
    <tr>
    <td>7</td>
    <td>Horse</td>
  </tr>
    <tr>
    <td>8</td>
    <td>Ship</td>
  </tr>
    <tr>
    <td>9</td>
    <td>Truck</td>
  </tr>
</table>

Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:

In [None]:
# class_names = # YOUR CODE HERE
class_names = [
    'airplane', 'automobile', 'bird', 'cat', 'deer', 
    'dog', 'frog', 'horse', 'ship', 'truck']

Let's look at an example of the fashion MNIST.

In [None]:
plt.figure()
plt.imshow(X_train[0])
plt.colorbar()
plt.grid(False)
plt.xlabel(class_names[y_train[0]])
plt.show()

# Data Preprocessing

It is a common pratice to **normalize the range of independent variables or features of data**. This is mainly because many classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized **so that each feature contributes approximately proportionately to the final distance**.

There are many other feature scaling techniques, which can be found in [here](https://en.wikipedia.org/wiki/Feature_scaling).

In this exercise, we'll only scale the inputs to be in the range [0-1] rather than [0-255].

In [None]:
# YOUR CODE HERE

# Define a Model

Please define a deep learning model that you would like to use for this problem here.

**Hint**: The model need to first flatten the image from a three-dimensional array to a one-dimensional array before feeding to the `tf.keras.Dense` layer.

In [None]:
from keras.models import Sequential
from keras.layers import *

num_classes = # YOUR CODE HERE

model = # YOUR CODE HERE

model.summary()

# Train a Model

In this section, we will first define several parameters that will be used during the training.

*   `epochs`: the number of training epochs (one epoch means the model has seen the entire training samples one times).
*   `batch_size`: the number of examples per one training step.
*   `learning_rate`: a hyperparameter that defines the adjustment in the weights of our network with respect to the loss gradient.


In [None]:
epochs = # YOUR CODE HERE
batch_size = # YOUR CODE HERE
learning_rate = # YOUR CODE HERE

## Loss Function

Which loss function should we use for this CIFAR10?

In [None]:
loss = # YOUR CODE HERE

## Optimizer

The optimizers that are commonly used to train deep learning models are Stochastic Gradient Descent (SGD), Adam, RMSProp, Adadelta, etc. The list of optimizers provided by TF-Keras can be found [here](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers).

In [None]:
optimizer = # YOUR CODE HERE

## Compile the Model

Next, we configures the model for training by calling `compile()` function.

In [None]:
# YOUR CODE HERE

## Train a model

We are now ready to train our model. Let's start feeding the data to train the model and it will learn to classify images.

**Note**: The `fit()` function will return the training log, and we will keep it in `hist`.

You can read more on the arguments for the `fit` function [here](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#fit).

In [None]:
hist = # YOUR CODE HERE

Next we will plot the loss and the accuracy to see whether the model is subject to the overfitting or the underfitting problems.

In [None]:
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(hist.history['loss'], label='train')
ax.plot(hist.history['val_loss'], label='valid')
ax.set_ylabel('Loss')
ax.set_xlabel('Epochs')
plt.legend()
plt.show()

fig, ax = plt.subplots(figsize=(8,6))
ax.plot(hist.history['accuracy'], label='train')
ax.plot(hist.history['val_accuracy'], label='valid')
ax.set_ylabel('Accuracy')
ax.set_xlabel('Epochs')
plt.legend()
plt.show()

plt.close('all')

Let's see the model prediction in details. Here we will apply the trained model on the validation set.

The expected output:
```
(10000, 10)
(10000,)
```

In [None]:
# Predict the probability of each image
# y_hat_valid_probs = # YOUR CODE HERE

# Select the class with the highest probability as the predicted class.
# y_hat_valid = # YOUR CODE HERE

print(y_hat_valid_probs.shape)
print(y_hat_valid.shape)

To make it more human-friendly, we will visualize the input image and its corresponding prediction to see how our model performs.

In [None]:
def plot_image(i, probs, true_label, img):
    probs, true_label, img = probs, true_label[i], img[i]
    plt.grid(False)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(img, cmap=plt.cm.binary)
    predicted_label = np.argmax(probs)
    if predicted_label == true_label:
        color = 'blue'
    else:
        color = 'red'
    plt.xlabel(
        '{} {:2.0f}% ({})'.format(
            class_names[predicted_label],
            100*np.max(probs),
            class_names[true_label]),
        color=color)
    
def plot_prob_dist(i, probs, true_label):
    probs, true_label = probs, true_label[i]
    plt.grid(False)
    plt.xticks(range(10))
    plt.yticks([])
    thisplot = plt.bar(range(10), probs, color="#777777")
    plt.ylim([0, 1])
    predicted_label = np.argmax(probs, axis=-1)
    thisplot[predicted_label].set_color('red')
    thisplot[true_label].set_color('blue')

def plot_output(probs, images, labels):
    num_rows = 5
    num_cols = 3
    num_images = num_rows*num_cols
    plt.figure(figsize=(2*2*num_cols, 2*num_rows))
    for i in range(num_images):
        plt.subplot(num_rows, 2*num_cols, 2*i+1)
        plot_image(i, probs[i], labels, images)
        plt.subplot(num_rows, 2*num_cols, 2*i+2)
        plot_prob_dist(i, probs[i], labels)
    plt.tight_layout()
    plt.show()

<table>
  <tr>
    <th>Label</th>
    <th>Class</th>
  </tr>
  <tr>
    <td>0</td>
    <td>Airplane</td>
  </tr>
  <tr>
    <td>1</td>
    <td>Automobile</td>
  </tr>
    <tr>
    <td>2</td>
    <td>Bird</td>
  </tr>
    <tr>
    <td>3</td>
    <td>Cat</td>
  </tr>
    <tr>
    <td>4</td>
    <td>Deer</td>
  </tr>
    <tr>
    <td>5</td>
    <td>Dog</td>
  </tr>
    <tr>
    <td>6</td>
    <td>Frog</td>
  </tr>
    <tr>
    <td>7</td>
    <td>Horse</td>
  </tr>
    <tr>
    <td>8</td>
    <td>Ship</td>
  </tr>
    <tr>
    <td>9</td>
    <td>Truck</td>
  </tr>
</table>

In [None]:
# Color correct predictions in blue and incorrect predictions in red.
plot_output(y_hat_valid_probs, X_valid, y_valid)

In [None]:
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print('Validation Set')
print(confusion_matrix(y_true=y_valid, y_pred=y_hat_valid))
print(f'Accuracy: {accuracy_score(y_true=y_valid, y_pred=y_hat_valid):.2f}')
print(f'Macro F1-score: {f1_score(y_true=y_valid, y_pred=y_hat_valid, average="macro"):.2f}')

# Evaluate Performance on Test Set

Once you have finished the model training, you then evaluate the classification performance on the test set (i.e., the unseen dataset).

In [None]:
# y_hat_test_probs = # YOUR CODE HERE
# y_hat_test = # YOUR CODE HERE

In [None]:
print('Test Set')
print(confusion_matrix(y_true=y_test, y_pred=y_hat_test))
print(f'Accuracy: {accuracy_score(y_true=y_test, y_pred=y_hat_test):.2f}')
print(f'Macro F1-score: {f1_score(y_true=y_test, y_pred=y_hat_test, average="macro"):.2f}')

# Error Analysis

It's always a good idea to inspect the output and make sure everything looks fine. Here we'll look at some examples our model gets right, and some examples it gets wrong on the test sets.

First, we determine which samples are correct or incorrect on the test set.

In [None]:
correct_indices = # YOUR CODE HERE
incorrect_indices = # YOUR CODE HERE

Then we plot the images with their corresponding classes. In the incorrect case, we also plot the ground truth classes for comparison.

In [None]:
# Correct
idx = np.random.choice(np.arange(len(correct_indices)), 15)
print('Correct')
plot_output(
    y_hat_test_probs[correct_indices[idx]],
    X_test[correct_indices[idx]],
    y_test[correct_indices[idx]])

In [None]:
# Incorrect
idx = np.random.choice(np.arange(len(incorrect_indices)), 15)
print('Incorrect')
plot_output(
    y_hat_test_probs[incorrect_indices[idx]],
    X_test[incorrect_indices[idx]],
    y_test[incorrect_indices[idx]])