## Introduction to Convolutional Neural Networks

- Basic concepts behind `Convolutional Neural Networks`;  a type of artificial neural network used in image recognition & processing that is specifically designed to **`- PROCESS PIXEL DATA -`**


- **Part One**: Learn to implement a Convolutional Neural Network classifier with `TensorFlow 2.0` & `Keras`


- **Part Two**: Evaluate the CNN after it was trained


#### Rerference

- **Conv2D** layers provide magnifier like operations at two-dimensions, ie, it slides a small 2D window over a larger 2D window - (the window being the image itself)


- **Dense** layers are used later on for generating predictions, ie classifications


- **Max pooling** layers are (typically) added after a Conv2D layer to provide an additional magnifier operation. It also slides a window over the image, & selects the maximum value for further propagation


- **Flatten** connects the convolutional parts of the layer with the dense parts. Dense layers can only handle flat data, ie one-dimensional data, however convolutional layers are anything but 1D. Flatten takes all dimensions & concatenates them after each other


- **Dropout** introduces a small amount of random noise into the summary during Training. Essentially, it breaks a bit of the magnifier in an effort to improve model performance & reduce overfitting. This way, unsual images can potentially be classified correctly

### 1. Model dependencies

In [1]:
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D

## MNIST dataset
from tensorflow.keras.datasets import mnist

### 2. Model configuration steps

- 1. MNIST images are 28×28 pixels, thus we set both `image_width` & `image_height` to 28


- 2. A batch size of 250 samples will be used


- 3. The number of epochs will be 25. That is, data is fed to the NN 25 times (in batches of 250 samples)


- 4. The number of classes will be 10 (ie, 0 to 9)


- 5. 20% of the training data will be used for validaiton during optimisation


- 6. To view as much output as possible, configure training to be `verbose`

In [2]:
## Step 1.
image_width, image_height = 28, 28

In [3]:
## Step 2. 
batch_size = 250

In [4]:
## Step 3. 
number_epochs = 25

In [5]:
## Step 4. 
number_class = 10

In [6]:
## Step 5.
validation_split = 0.2

In [7]:
## Step 6.
verbosity = 1

### 3. Loading & preparing MNIST data

In [8]:
## Load MNIST data
(input_train, target_train), (input_test, target_test) = mnist.load_data()

### 4 Reshape Feature Vectors

- 1. Reshape INPUT data (ie, Feature Vectors)


- 2. Parse numbers as floats. This optimises the trade-off between memory & number precision 


- 3. Convert into interval (ie, 0, 1) range
    - This converts the images into `greyscale` by dividing the image samples by 255
    - Our only interest is in the actual number itself & not the colour of the number

In [9]:
print(f"> Old shape: {input_train.shape}")

## Input Train
input_train = input_train.reshape(input_train.shape[0], image_width, image_height, 1)
print(f"> New shape: {input_train.shape[0], image_width, image_height, 1}")

> Old shape: (60000, 28, 28)
> New shape: (60000, 28, 28, 1)


In [10]:
## Input Test
print(f"> Old shape: {input_test.shape}")

## Input Train
input_test = input_test.reshape(input_test.shape[0], image_width, image_height, 1)
print(f"> New shape: {input_test.shape[0], image_width, image_height, 1}")

> Old shape: (10000, 28, 28)
> New shape: (10000, 28, 28, 1)


In [11]:
## Input Shape
input_shape = (image_width, image_height, 1)
input_shape

(28, 28, 1)

In [12]:
## Step 2. 
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

In [13]:
## Step 3. 
input_train = input_train / 255
input_test = input_test / 255

## 5. Convert Target Vectors

- Convert Target Vectors to Categorical Targets, ie from integers (0 - 9) into categorical data

In [14]:
## Before conversion
target_train

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [15]:
target_train = tensorflow.keras.utils.to_categorical(target_train, number_class)

In [16]:
## After conversion
target_train

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]], dtype=float32)

In [17]:
## Before conversion
target_test

array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)

In [18]:
target_test = tensorflow.keras.utils.to_categorical(target_test, number_class)

In [19]:
## After conversion
target_test

array([[0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

## 6. Model Architecture
#### Steps

- 1. Create the model itself


- 2. Begin with a 2D convolutional layer
    - it learns 32 filters (or feature maps) based on the data
    - the kernal, ie the small window that slides over the image is 3x3 pixels
    - activation is 'relu' for nonlinearity
    - input shape will be (28, 28, 1)


- 3. This is followed with a MaxPooling2D layer with a pool size of 2x2
    - this window slides over the 32 filters (see Step 2) 
    - for each slide it will take the maximum value & pass it on to the next layer
    
    
- 4. Add Dropout to introduce random noise during training in an effort to reduce potential overfitting


- 5. Repeat the 2D convolutional layer process
    - this time with 64 filters (feature maps)
    
    
- 6. Repeat the MaxPooling2D layer


- 7. Repeat Dropout 


- 8. Convert filters learnt & processed into a flat structure (before predictions can be generated)


- 9. Finally, allow data to be passed through two Dense layers
    - the first which will be 'relu' activated
    - the second will be 'softmax' activated to generate a multiclass probablilty distribution (ie, computes the probability that the 'item' belongs to one of the classes [0 > 9]

In [20]:
## Step 1. 
model = Sequential()

2022-07-12 15:17:36.326954: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [21]:
## Step 2.
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))

In [22]:
## Step 3. 
model.add(MaxPooling2D(pool_size=(2, 2)))

In [23]:
## Step 4.
model.add(Dropout(0.25))

In [24]:
## Step 5. 
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', input_shape=input_shape))

In [25]:
## Step 6. 
model.add(MaxPooling2D(pool_size=(2, 2)))

In [26]:
## Step 7. 
model.add(Dropout(0.25))

In [27]:
## Step 8. 
model.add(Flatten())

In [28]:
## Step 9. 
model.add(Dense(256, activation='relu'))
model.add(Dense(number_class, activation='softmax'))

### 7. Compile model

- Configures the model architecture created above


- Here the loss value, optimizer & additional metrics that are used during the `Training` process are defined


- The loss functions is used to compute the difference between the actual targets & the targets **generated by the model** during an epoch


- The higher the difference or the higher the loss, the worse the model will perform


- The overall goal of the ML training process is to **MINIMISE LOSS**


- Given that this is a Classification task, the function `cross entropy` will be used


- This compares the actual outcomes with the **generated outcomes** computing the entropy / difficulty of successfully comparing the classes


- Given that data is categorical - `categorical_crossentropy` is used


- `Adaptive Moment Estimation` or `Adam` is selected for optimisation (the standard optimiser used today)


- For the sake of being more intuitive to humans, accuracy is used as a metric

In [29]:
model.compile(loss = tensorflow.keras.losses.categorical_crossentropy,
              optimizer = tensorflow.keras.optimizers.Adam(),
              metrics = ['accuracy'])

### 8. Train model

- ie, fit training data (both inputs & targets) to the model to initiate the training process 

In [30]:
model.fit(input_train, 
          target_train,
          batch_size = batch_size,
          epochs = number_epochs,
          verbose = verbosity,
          validation_split = validation_split)

2022-07-12 15:17:38.835267: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x7f7a7a442fa0>

### 9. Adding test metrics for testing generalisation

- Once the model is trained using both `Training` & `Validation` data, `Test` data can now be added to the model in order to evaluate the model’s predictive performance


- This is executed after the Training proceess

In [35]:
## Evaluate
score = model.evaluate(input_test, target_test, verbose=0)

In [36]:
## Generate 
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

Test loss: 0.02900446392595768 / Test accuracy: 0.9926999807357788


### 10. Interpreting model performance

- In 25 epochs, the model has achieved a validation accuracy of approximately 99.76%, ie the model successfully predicticted the input to the network (CNN) 99% of the time. 


- The model also shows a similar performace for generalisation test executed (see 9) using the test data with an accuracy of 99.27%


- Note that the Model loss is better than during training