<a href="https://colab.research.google.com/github/Cherishings/Deep_Learning/blob/main/Lab_3%2C_Part_2%2C_Image_Recognition_in_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 3, Part 2 Image Recognition in Python with Keras-Tensorflow

## 2.1 Introduction

**Note: before starting save a copy of this file in your own Google Drive - go to File and click Save a copy in Drive**

The objective of this part of the lab is to understand how to design and implement a deep convolutional network in Python with Keras-Tensorflow, for image recognition, i.e. classification of an image.

In this lab the dataset is the well-known Fashion MNIST data set, where there are 10 classes of different types of clothing.

## 2.2 Methods

#### **Data**

In this lab the problem is one of multiclass classification and therefore the input data $\mathbf{x}$ is an image (that we will write as a vector for convenience - just assume the image is flattened into a vector for notational convenience), and the class label output data $y$ is one of 10 class values, representing different types of clothing.

#### **Model**

The model predicts the probabilty of each class based on a deep convolutional  network,
$$ \hat{\mathbf{y}} = f\left( \phi(\mathbf{x}) ; \boldsymbol{\theta} \right) $$
where $\hat{\mathbf{y}}$ is the prediction of the probability of each model class arising from a softmax output layer function $f$; For notational convenience we represent all model parameters in a vector $\boldsymbol{\theta}$, which comprises all the model parameters in the deep network. The function $\phi(\mathbf{x})$ performs feature extraction on the input  and is constructed from layers of simple functions, mainly convolutional functions here.

The model uses chains of convolutional layers, among other types of layer, to extract features from the raw input image, to obtain $\phi(\mathbf{x})$, where the convolutional layer is described as
\begin{equation}
    z_{i,j,k}^{(l)} = \sum_{c}^{}{\sum_{m}^{}{\sum_{n}^{}{w_{m,n,c,k}^{(l)}x_{i + m,j + n,c}^{(l - 1)} + b_{k}^{(l)}}}} \quad \text{for} \quad k = 1,\ldots,N_{F}
\end{equation}
where $N_{F}$ is the number of filters. The output of the convolutional layer is passed through a nonlinear activation function
\begin{equation}
    x_{i,j,k}^{(l)} = h\left( z_{i,j,k}^{(l)} \right)
\end{equation}
where the activation function $h(.)$ could be e.g. a rectified linear
unit (ReLU).

#### **Loss function**

The loss function, $J(\boldsymbol{\theta})$, that we minimise to estimate the model parameters, $\boldsymbol{\theta}$, is the categorical cross-entropy for multiclass classification,
$$ J(\boldsymbol{\theta}) =  - \sum_{j = 1}^{m}{{\sum_{k = 1}^{K} y_{j,k}\log{\hat{y}}_{j,k}}} $$
where the number of classes $K=10$ and $m$ is the number of data samples. \\


#### **Parameter estimation algorithm**

The parameter estimation algorithm here is based on the Adam algorithm, which combines a momemtum-like term, $v_{j}$, with an adaptive learning rate, $r_{j}$, and bias corrected versions of these terms, $\hat{v}_j $ and $ \hat{r}_j$, where the $j$-th parameter update is

\begin{equation}
    \theta_{j} \leftarrow \theta_{j} - \frac{\epsilon}{\sqrt{\hat{r}_{j}}} \hat{v}_{j}
\end{equation}

where $\epsilon$ is a learning rate and

\begin{gather}
    \hat{r}_j = \frac{r_{j}}{1- \beta_1^t} \\
    \hat{v}_j = \frac{v_{j}}{1- \beta_2^t} \\
    v_{j} \leftarrow \beta_1 v_{j} + \left( 1 - \beta_1 \right)g_{j} \\
    r_{j} \leftarrow \beta_2 r_{j} + \left( 1 - \beta_2 \right)g_{j}^{2}  
\end{gather}
where $t$ is the time-step of the optimization and $g_j$ is the stochastic estimate of the loss function gradient for parameter $j$,
\begin{equation}
        g_{j} = \nabla_{\boldsymbol{\theta}} \hat{J}(\theta_j)
\end{equation}
and  the estimate of the gradient of the loss function $\nabla_{\boldsymbol{\theta}} \hat{J}(\boldsymbol{\theta})$ is obtained from a mini-batch of data via automatic differentiation.



## 2.3 Import data

This section imports the standard data set Fashion MNIST from an online repository. It should take just a few seconds to download and unzip.

In [None]:
# import libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import seaborn as sns
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization, Activation, Input, Conv2D, MaxPooling2D, Flatten, Softmax
from keras import optimizers, regularizers

# import data
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# print data info
print('Number of training data:' , len(train_images))
print('Number of testing data:' , len(test_images))

# define class labels
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
num_classes = 10 # number of classes

# display example image
plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()

## 2.4 Data pre-processing

This section performs some basic data pre-processing with data normalisation and splitting into training and testing data sets.

This section also displays some example class images from the dataset.

In [None]:
# normalise image data
train_images = train_images / 255.0
test_images = test_images / 255.0

# display some sample images
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

# These are gray-scale images and therefore they only have a depth of 1, unlike
# a colour image that would have a depth of 3 (RGB values). However, the depth
# is not explcitly stored so we need to explicitly set the image
# depth to 1 so that it is the shape Keras is expecting
train_img = np.expand_dims(train_images, axis=-1)
test_img = np.expand_dims(test_images, axis=-1)
print(train_img[19].shape)
print(test_img[11].shape)

# define the training and test labels
train_labels = tf.keras.utils.to_categorical(train_labels, num_classes)
test_labels = tf.keras.utils.to_categorical(test_labels, num_classes)

## 2.5 Model design

Note that the network design has two convolutional blocks:

Conv --> Batch norm --> ReLU

and a max pooling layer is used for dimension reduction

In the quiz, you will be prompted to change the number of filters, but it is set low to start with at num_filters = 4.

In [None]:
# number of convolutional filters
num_filters = 4

# define model
model = Sequential()

model.add(Input(shape=(28,28,1)))  # image input size is 28x28x1

model.add(Conv2D(num_filters, kernel_size =(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size =(2, 2), strides=(2, 2), padding= 'same'))

model.add(Conv2D(num_filters, kernel_size =(3, 3), padding='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))

model.add(Flatten())

model.add(Dense(num_classes))
model.add(Softmax())

# set the optimization options and compile the model
opt = optimizers.Adam(learning_rate=0.001)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# print out the model summary
model.summary()

## 2.6 Train the model

This section trains the deep convolutional network using the Adam algorithm.

In [None]:
history = model.fit(train_img, train_labels, batch_size=64, epochs=10, validation_split=0.1)

## 2.7 Plot accuracy and loss over training iterations

Plot accuracy and loss over training epochs (for both training data and validation data) - it is important to monitor convergence of the algorithm via these plots to assess whether the parameter estimation has converged.

In [None]:
# list all data in history
print(history.history.keys())

# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()



## 2.8 Print out the accuracy on the independent Test data set


In [None]:
# print out the accuracy on independent test data
score = model.evaluate(test_img, test_labels, verbose=0)
print("Test accuracy:", score[1])

## 2.9 Confusion matrix

Calculate and display the confusion matrix for this problem - the confusion matrix is important to inspect because it gives more insight into classifier performance across all classes than simply inspecting accuracy, which obscures detail.

In [None]:
# obtain model predictions and convert softmax outputs 0-1 to integer class label predictions
Yhat = model.predict(test_img)                    # predict model outputs on validation data as softmax outputs of probability of each class
Yhat_integer = np.argmax(Yhat, axis=1)            # obtain the most likely class prediction as the argument of the max softmax output
Y_test_integer = np.argmax(test_labels, axis=1)   # obtain the true class as an integer

# calculate and plot confusion matrix
cm = confusion_matrix(Y_test_integer, Yhat_integer , normalize="pred")    # calculate the confusion matrix
plt.figure(2).set_figwidth(15)                                            # setup new figure
sns.heatmap(cm/np.sum(cm), annot=True, fmt=".2%", cmap="Blues",)          # plot the confusion matrix using the sns package
plt.title("Confusion Matrix", fontsize = 12)                              # title
plt.xlabel("Predicted Class", fontsize = 12)                              # xlabel
plt.ylabel("True Class", fontsize = 12)                                   # ylabel
plt.show()                                                                # show plot

Note that for the confusion matrix, all the 10 classes are equally split, so the max accuracy per class is 10% - for 10 classes this adds up to 100% (the point being that 10% accuracy for any single class is the best possible result).

## End of part 2 of the lab