## 1. The MNIST Data

The [MNIST](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/mnist/load_data) data is a database of handwritten digit images. It contains 60K 28x28 grayscale images in the training set and 10K images in the test set. Each image shows a 10-digit number ranged between 0-9.

- ``x_train``: NumPy array of grayscale image data with shape (60000, 28, 28).  

- ``y_train``: NumPy array of digit labels (integers in range 0-9) with shape (60000,).

- ``x_test``: NumPy array of grayscale image data with shape (10000, 28, 28).  

- ``y_test``: NumPy array of digit labels with shape (10000,).


### 1.1 Load the data

In [None]:
#uncomment below codes to ignore the certificate verification requirement if you run this notebook on a macOS system
#import ssl
#ssl._create_default_https_context = ssl._create_unverified_context

from tensorflow.keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

display(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

Let's take a look of the 1st image, which is a 28 * 28 matrix.

- As the image is grayscale, each pixel contains only one integer value (ranged from 0 to 255):  0 represent black, 255 represents white.





In [None]:
print(X_train[0])     #alternatively, print(X_train[0, :, :])

### 1.2 Select 24 Images Randomly for Display

In [None]:
import numpy as np
rng = np.random.RandomState(0)                                  # random seed at 0
index = rng.choice(np.arange(len(X_train)), 24, replace=False)  # generate 24 indices randomly

X_train24 = X_train[index]
y_train24 = y_train[index]

display(X_train24.shape, y_train24.shape)

In [None]:
import matplotlib.pyplot as plt
figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(12, 8))         # a figure with 4 * 6 subplots (axes is a 2D array)

# loop over the flattened axes, image, and target label
# numpy.ravel() falttens the 2D axes as a 1D array to loop over
for axes, image, target in zip(axes.ravel(), X_train24, y_train24):
    axes.matshow(image, cmap = plt.cm.gray_r)      # display each image (28*28 matrix) with reversed gray scale
    axes.set_xticks([])                            # remove x-axis tick marks
    axes.set_yticks([])                            # remove y-axis tick marks
    axes.set_title(target)                         # set target value as subplot title

## 2. Prepare the Data

### 2.1 Reshape and Scale Features

CNN in Keras require the features as 4D array with the shape `(no. of images., width, height, channels)`, so we need to reshape the 3D ``X_train`` and ``X_test`` into 4D by adding a 4th dimension.



In [None]:
# reshape from 3D to 4D (1 element in the 4th Dimension)

X_train_new = X_train.reshape((60000, 28, 28, 1))
X_test_new = X_test.reshape((10000, 28, 28, 1))

display(X_train.shape, X_test.shape, X_train_new.shape, X_test_new.shape)

Deep learning networks perform better on scaled data (both MinMax and StandardScaler will do). Let's simply normalize the pixel values by min-max scaling.

- Also, neutral network in keras requires `X` as a floating point tensor, not integers.  

In [None]:
X_train_new = X_train_new/ 255
X_test_new = X_test_new/ 255

### 2.2 Reshape Target Variable

The target variable is a 1D array with a class label for each image. There are two approaches to handle it in a multi-class classification task.


- Encode the original class labels as ``integer tensor`` (i.e., a 1D array with ``no. of instance`` values), which is what we have now. <font color = 'green'> **See part2 of Week9_NNs for demonstration.**<font>

- Apply `one-hot encoding`(also known as `categorical encoding`) to the labels so that the 1D array will be transformed as a 2D array in the shape `no. of instance * no. of classes`. Each row is a vector with all `0`s but a `1` in the place of the label index.   <font color='red'>**We take this approach for demonstration purpose.**<font>

  Technically, `One-hot` encoding can be done with (1)  the `tensorflow.keras.layers.CategoryEncoding` function with `output_mode = 'one_hot'` (check [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/CategoryEncoding) for details); or (2) the `tensorflow.keras.utils.to_categorical` function (check [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) for details).


  

In [None]:
# one-hot encoding to class labels

from tensorflow.keras.utils import to_categorical

y_train_new = to_categorical(y_train)
y_test_new  = to_categorical(y_test)
display(y_train.shape, y_test.shape, y_train_new.shape, y_test_new.shape)

In [None]:
# check target of the first 10 training images

display(y_train[:10], y_train_new[:10])

## 3. Build a CNN model on Training Set


### 3.1 Define a CNN Model

1. The first hidden layer is [``Conv2D`` layer](https://keras.io/api/layers/convolution_layers/convolution2d/) with below setting. Note the input is `28 x 28 x 1` for each image, output is `26 x 26 x 64`.

> - ``filters``= 64: the number of filters in the resulting feature map.
> - ``kernel_size``= (3, 3): the size of the kernel used in each filter.
> -  ``strides``=(1, 1) (default)
> -  ``padding``= 'valid' (default, no padding)
> - ``activation``= 'relu': the 'relu' (Rectified Linear Unit) activation function.


2. The second hidden layer is a [``MaxPooling2D`` layer](https://keras.io/api/layers/pooling_layers/max_pooling2d/). With below setting, it reduces the previous layer’s output from `26 x 26 x 64` to `13 x 13 x 64` (i.e., 75% reduction)

> - ``pool_size``=(2, 2) (default)
> - ``strides``= None (default to `pool_size`)

3. Then we use another `Conv2D` layer with 128 filters (size 3*3) followed by another `MaxPooling2D` layer.

> - The input to the 2nd `Conv2D` layer is `13 x 13 x 64`, then the output will be `11 x 11 x 128` as the kernel size is (3,3).
> - For odd dimensions like 11 x 11, the `MaxPooling2D` layer rounds down (here to 10 x 10), so output will be `5 x 5 x 128`.

4. Flatten the 3D output of the 2nd `MaxPooling2D` layer as a 1D array with 3200 values (i.e., 5 * 5 * 128) with a `Flatten layer` (check [here](https://keras.io/api/layers/reshaping_layers/flatten/) for details).  

5. A CNN model contains **at least one `Dense` Layer**, here we have two: the first `Dense` layer create new features, the second (with `softmax` activation) is the output layer which returns 10 probabilities for each image.


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

cnn = Sequential()
cnn.add(Conv2D(filters=64, kernel_size=(3, 3),  activation='relu'))   # use default value for strides and padding
cnn.add(MaxPooling2D())                                               # default pool_size and strides
cnn.add(Conv2D(filters=128, kernel_size=(3, 3), activation='relu'))
cnn.add(MaxPooling2D())
cnn.add(Flatten())
cnn.add(Dense(units = 128, activation='relu'))
cnn.add(Dense(units = 10, activation='softmax'))    # output probabilities of 10 classes

### 3.2 Compile and Train the Model  

The modeling training process is the same as a feed-forward neural network.

- Note that with ``one-hot encoding`` applied to target labels, we should use ``categorical_crossentropy`` as the loss function here.  If the target labels are in the format of `integer tensor` (e.g., original labels), we should use `` sparse_categorical_crossentropy`` loss.


In [None]:
# compile the model
cnn.compile(optimizer='adam',                   # other optimizers work as well
            loss='categorical_crossentropy',    # with one-hot encoding, use `categorical_crossentropy` as loss function
            metrics=['accuracy'])


# train the model (it may take a while due to the big training size)
cnn.fit(x = X_train_new,         # use reshaped x_train
        y = y_train_new,         # use reshaped y_train
        epochs=5,
        batch_size= 5000,        # use a big batch size to speed up training
        validation_split = 0.2)  # 20% saved for validation

### 3.3 Model Summary and Visualization

How many parameters to learn? 485K!


In [None]:
cnn.summary()      # No. of instances as None (omitted)


Alternatively, we can also display the model summary in a more readable format.

- Here we visualze the model and (optionally) save it as a `png` or `jpeg` picture in Google Drive, with `plot_model` function from `tensorflow.keras.utils` module  (check [here](https://keras.io/api/utils/model_plotting_utils/) for details).
- We can load it back for display later with the `Image` function from `IPython` (check [here](https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html#IPython.display.Image) for details).  



In [None]:
# connect Google Drive with Colab first

from google.colab import drive
drive.mount('/content/drive')

In [None]:
from tensorflow.keras.utils import plot_model
from IPython.display import Image

# Option 1: save in Google Drive(so that we can find it later), need to mount drive first.
plot_model(cnn, to_file='/content/drive/MyDrive/Colab Notebooks/Deep Learning/cnn.png', show_shapes=True,  show_layer_names=True)
#Image(filename='/content/drive/MyDrive/Colab Notebooks/Deep Learning/cnn.png')       # load and display the saved image


# Option 2: save the pic as a temp file in runtime (no need to mount drive). Note temp file will be gone once the runtime is closed.
#plot_model(cnn, to_file='cnn.png', show_shapes=True,  show_layer_names=True)
#Image(filename='cnn.png')

## 4. Model Evaluation on Test Set

Let's take this model as the final model and test its performance on test data. Please complete below blocks.

- Use ``X_test_new`` and ``y_test_new`` for prediction and testing.
- You may noticed that the `batch_size` in model evaluation is by default 32.

### 4.1 Estimate Class Probabilities

<font color=red>***Exercise 1: Your Codes Here***</font>  


In [None]:
predictions = cnn.predict(X_test_new)

# Display the shape of the result and probabilities for the first image
# your code here

### 4.2 Get Predicted Class Labels  

<font color=red>***Exercise 2: Your Codes Here***</font>  

As class label is the same as column index in the probability matrix, let's
return col indices of the maximum probability for each image (row).


In [None]:
#Return the indices of the maximum values along an axis (1 means col)

# your code here

### 4.3 Check Model loss and Accuracy  

<font color=red>***Exercise 3: Your Codes Here***</font>  


In [None]:
loss, accuracy = cnn.evaluate(X_test_new, y_test_new)

# Display the lost and accuracy
# your code here

### 4.4 Locate Incorrect Predictions

<font color=red>***Exercise 4: Your Codes Here***</font>  

Please use ``X_test`` and ``y_test`` for visualization here, as ``X_test_new`` is scaled version of  ``X_test`` in the range of [0.1]  and ``y_test_new`` is 2D.

- **Step 1**: obtain the feature values, actual and predicted class labels for incorrect predictions.

In [None]:
X_test2 = X_test[predicted_labels != y_test]
y_test2 = y_test[predicted_labels != y_test]
y_pred2 = predicted_labels[predicted_labels != y_test]

display(X_test2.shape, y_test2.shape, y_pred2.shape)

- **Step 2**:  Randomly select 24 mis-classified images for visualization.

In [None]:
rng = np.random.RandomState(1)                                  # random seed at 1
# your code here (reference to 1.2)


- **Step 3**: visualize the feature matrix as heatmap. You may also visualize their actual and predicted class labels for comparison. Looking at the digits, you can see why handwritten digit recognition is a challenge.


In [None]:
figure, axes = plt.subplots(nrows=4, ncols=6, figsize=(12, 10))

# loop over the axes, image pixels, actual and predicted label
# your code here (reference to 1.2)

## 5. Save and load a model

The `save` method of Keras models (check [here](https://www.tensorflow.org/guide/keras/serialization_and_saving) for details) stores the model's architecture, weights, and training configuration in a single `model.keras` zip archive.



In [None]:
#Save in my Google Drive (mount drive first)
cnn.save("/content/drive/MyDrive/Colab Notebooks/Deep Learning/cnn.keras")

#Alternatively, save the model as a temp file in runtime (which will be gone when this runtime is closed)
#cnn.save('cnn.keras')

Load the model with the `load_model` function from  `tensorflow.keras.models` module. With a model loaded, we can apply it for prediction or evaluation, or check the summary easily.  


In [None]:
from tensorflow.keras.models import load_model

cnn_reloaded = load_model("/content/drive/MyDrive/Colab Notebooks/Deep Learning/cnn.keras")
cnn_reloaded.summary()

#Alternatively, reload the temp file saved in runtime (before closing the runtime)
#cnn_reloaded = load_model('cnn.keras')

**Extension: Data Augmentation**

To augment image data in order to avoid overfitting, we may add augmentation layers in the model as well. For details, please visit [this link](https://keras.io/api/layers/preprocessing_layers/image_augmentation/).