# Exercise 3 - Convolution Neural Networks for Image Analysis

In this workshop we will usea CNN to classify images in the  [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset. 

We will classify the fashion products in Fashion MNIST  into 10 classes. The dataset contains 70,000 grayscale images, with resolution of 28x28 pixels.   
We will use 60,000 images to train the classification CNN and 10,000 images to evaluate the accuracy of the network. 

## Load the data

Generally we have to download the dataset from a repository and upload to use in our application. However, Keras python library provides us access to directly load a limited number of benchmark datasets.   
Thus, we can access and load the Fashion-MNIST dataset directly from Keras.

In [None]:
# Load Tensorflow library
%tensorflow_version 1.x
from keras.datasets import fashion_mnist

The dataset is already setup as training and testing sets. (60K for training and 10K for testing)

In [None]:
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

In [None]:
# View the dataset shape
print('Train data shape:', X_train.shape)
print('Test data shape:', X_test.shape)

X_train contains 60,000 train images with a resolution of 28x28.  
The *labels* are an array of integers, ranging from 0 to 9. These correspond to the *class* of clothing the image represents:

<table>
  <tr>
    <th>Label</th>
    <th>Class</th> 
  </tr>
  <tr>
    <td>0</td>
    <td>T-shirt/top</td> 
  </tr>
  <tr>
    <td>1</td>
    <td>Trouser</td> 
  </tr>
    <tr>
    <td>2</td>
    <td>Pullover</td> 
  </tr>
    <tr>
    <td>3</td>
    <td>Dress</td> 
  </tr>
    <tr>
    <td>4</td>
    <td>Coat</td> 
  </tr>
    <tr>
    <td>5</td>
    <td>Sandal</td> 
  </tr>
    <tr>
    <td>6</td>
    <td>Shirt</td> 
  </tr>
    <tr>
    <td>7</td>
    <td>Sneaker</td> 
  </tr>
    <tr>
    <td>8</td>
    <td>Bag</td> 
  </tr>
    <tr>
    <td>9</td>
    <td>Ankle boot</td> 
  </tr>
</table>

Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:

In [None]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

## Explore and pre-process the data

In [None]:
# Load visualization library
import matplotlib.pyplot as plt

We will plot an image from the dataset to get a feel for the data

In [None]:
image_index = 1
plt.figure()
plt.imshow(X_train[image_index])
plt.colorbar()
plt.grid(False)
plt.show()

Display the first 25 images from the training set and display the class name below each image.   
It is important to verify that the data is in the correct format before building the model.

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(X_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[y_train[i]])
plt.show()

A main step of image processing is to scale the data. With this step, we will be able observe  the range of data.

In [None]:
# Import numpy library to process matrices
import numpy as np

In [None]:
print('min:', np.min(X_train[image_index]))
print('max:', np.max(X_train[image_index]))

It is evident that the data ranges from 0-255.   
We scale these values to a range of 0 to 1 before feeding to the neural network model.  
For this, we divide the values by 255.   
It's important that the training set and the testing set are preprocessed in the same way:

In [None]:
X_train = X_train / 255.0
X_test = X_test / 255.0

Prepare for CNN processing by reshaping the image into (# training/testing samples, height, width, 1)

In [None]:
height, width = 28, 28

In [None]:
X_train = X_train.reshape(X_train.shape[0], height, width, 1)
X_test = X_test.reshape(X_test.shape[0], height, width, 1)

## Enable GPU

CPUs are designed for more general computing workloads.  GPUs in contrast are less flexible, however GPUs are designed for parallel  computations. Deep Neural Networks (DNN) are structured in a very uniform manner such that at each layer of the network thousands of identical artificial neurons perform the same computation.   Therefore the structure of a DNN fits quite well with the kinds of computation that a GPU can efficiently perform.

To enable GPU in Google Colab, 


1. Click on Runtime menu on the toolbar
2. Select change runtime type
3. Select GPU for Hardware Acceleration option list.
4. Select SAVE.



## Model Building

In this workshop, we will develop 3 deep learning models with increasing complexity to evaluate the classification accuracy.  


1.   3 Layer DNN
2.   1 Layer CNN
3.   3 Layer CNN

### 3-DNN Model

In this DNN model, we flatten the 28x28 image into a 784 input feature vector.  
It should be noted that with flattening we will face the inherent problems discussed in the class (i.e., scale, rotation, location, high-demensionality)

In [None]:
# Import Keras libraries
from keras.models import Sequential
from keras.layers import Dense, Flatten

In [None]:
dnn3_model = Sequential()

In [None]:
dnn3_model.add(Flatten(input_shape=(width, height, 1)))  # Add Keras Flatten layer to conver (28x28) image -> 784 feature vector

In [None]:
dnn3_model.add(Dense(128, activation='relu')) 
dnn3_model.add(Dense(64, activation='relu')) 
dnn3_model.add(Dense(10, activation='softmax'))  # In the final layer we use softmax activation to classify the input into 10 classes.

Softmax is an activation function that turns logits into probabilities that sum to one. E.g.,

---


 
![alt text](https://engmrk.com/wp-content/uploads/2018/05/Fig1-3.jpg)

For a complete understanding of available activation functions, please refer the following resources:  


1.   [ML Cheatsheat](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html)
2.   [Towards data science](https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6)




**Compiling the model**

Before the model is ready for training, it needs a few more settings. These are added during the model's compile step:  

* Loss function —This measures how accurate the model is during training. We want to minimize this function to "steer" the model in the right direction. [Further details](https://keras.io/losses/) 
* Optimizer —This is how the model is updated based on the data it sees and its loss function. [Further details.](https://keras.io/optimizers/)
* Metrics —Used to monitor the training and testing steps. The following example uses accuracy, the fraction of the images that are correctly classified. [Further details.](https://keras.io/metrics/) 

In [None]:
dnn3_model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Initially, we will train the model with 10 learning epochs. Based on the learning curve, visualized below, you could update this hyper-parameter to best fit the training data.

In [None]:
# Train the model
epochs = 10
batch_size = 32
validation_split = 0.1
history = dnn3_model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split = validation_split)

In [None]:
# Visualize the training curve
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

In [None]:
# Visualize accuracy (mean squared error)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

In [None]:
test_loss, test_acc = dnn3_model.evaluate(X_test, y_test)
print('Test accuracy:', test_acc)

### 1-CNN

As discussed in the lecture, CNN is specifically designed to process two-dimensional data spaces (focusing on image data).  
First, we will design a single layer CNN. This will include one convolutional layer, one max pooling layer and two fully connected layers.

In [None]:
# Import required libraries
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

In [None]:
# Set input shape
input_shape = (width, height, 1)

Construct the CNN model.

In [None]:
cnn1_model = Sequential()

Here we will use 3x3 filters (or kernals) to learn.

In [None]:
cnn1_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))   # We will use 32 of 3x3 kernals to learn
cnn1_model.add(MaxPooling2D(pool_size=(2, 2)))   # We use 2x2 max pooling

Flattent the previous output before sending to fully connected layers.

In [None]:
cnn1_model.add(Flatten())

We will use one fully connected (FC) layer with 128 nodes.

In [None]:
cnn1_model.add(Dense(64, activation='relu'))
cnn1_model.add(Dense(10, activation='softmax'))

In [None]:
# Compile the model
cnn1_model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

As before, initially we will train the model for 10 epochs to evaluate the performance.  
Then, based on the accuracy/learning curve we will refine the hyper-parameters.

In [None]:
# Train the model
epochs = 10
batch_size = 32
validation_split = 0.1
history_1 = cnn1_model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split = validation_split)

In [None]:
cnn1_score = cnn1_model.evaluate(X_test, y_test)
print('Test loss:', cnn1_score[0])
print('Test accuracy:', cnn1_score[1])

In [None]:
# Visualize the training curve
plt.plot(history_1.history['loss'])
plt.plot(history_1.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

By analyzing the learning curve, you should be able to identify whether training the model longer would provide better results.

### 3-CNN

In this step, we will design a complex three layer CNN intending an improvement in the accuracy.

In [None]:
# Import required libraries
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

In [None]:
# Set input shape
input_shape = (width, height, 1)

In [None]:
cnn3_model = Sequential()

In [None]:
# CNN Layer 1
cnn3_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
cnn3_model.add(MaxPooling2D((2, 2)))

# CNN Layer 2
cnn3_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
cnn3_model.add(MaxPooling2D(pool_size=(2, 2)))

# CNN Layer 3
cnn3_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))

In [None]:
cnn3_model.add(Flatten())  # Flatten the previous output before sending to fully connected layers

In [None]:
cnn3_model.add(Dense(128, activation='relu'))
cnn3_model.add(Dense(10, activation='softmax'))

In [None]:
# Compile the model
cnn3_model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# Train the model
epochs = 10
batch_size = 32
validation_split = 0.1
history_3 = cnn3_model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split = validation_split)

In [None]:
cnn3_score = cnn3_model.evaluate(X_test, y_test)
print('Test loss:', cnn3_score[0])
print('Test accuracy:', cnn3_score[1])

In [None]:
# Visualize the training curve
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

Based on the leanring curve, it seems after first epoch the model starts to overfit.  
In such scenarios, **dropout** is used to mitigate the effect of overfitting. 

### 3-CNN with Dropout

Dropout aims to solve the significant challenge of overfitting. It is one of the biggest advancements in deep learning proposed in recent years.

The idea is very simple though, it is to randomly drop units in a deep neural network.

Learning the relationship between the inputs and the outputs of a dataset is a very complicated procedure. If you have a very small dataset, the relationship maybe a result of noise in the input sample.

Dropout refers to randomly and temporary removing a unit, either in a hidden or a visible layer, and all of its incoming and outgoing connections.

Further reading on dropout: 


*   [Machine Learning Mastery](https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/)
*   [Marko Jerkic](https://markojerkic.com/what-is-dropout-deep-learning/)

![alt text](https://i1.wp.com/cdn-images-1.medium.com/max/800/1*iWQzxhVlvadk6VAJjsgXgg.png?resize=800%2C398&ssl=1)



Let's now attempt the same 3-CNN model with **dropout**.

In [None]:
# Load dropout from keras library
from keras.layers import Dropout

In [None]:
# Updated model layers
cnn3_dp_model = Sequential()

Now in each layer, we will add a dropout. The dropout rate is a hyper-parameter we could select based on our model requirements.

In [None]:
# CNN Layer 1
cnn3_dp_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
cnn3_dp_model.add(MaxPooling2D((2, 2)))
cnn3_dp_model.add(Dropout(0.3))

# CNN Layer 2
cnn3_dp_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
cnn3_dp_model.add(MaxPooling2D(pool_size=(2, 2)))
cnn3_dp_model.add(Dropout(0.3))

# CNN Layer 3
cnn3_dp_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
cnn3_dp_model.add(Dropout(0.3))

In [None]:
cnn3_dp_model.add(Flatten())  # Flatten the previous output before sending to fully connected layers

In [None]:
cnn3_dp_model.add(Dense(64, activation='relu'))
cnn3_dp_model.add(Dropout(0.3))
cnn3_dp_model.add(Dense(10, activation='softmax'))

In [None]:
# Compile the model
cnn3_dp_model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# Train the model
epochs = 15
batch_size = 32
validation_split = 0.1
history_3_dp = cnn3_dp_model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split = validation_split)

In [None]:
cnn3_dp_score = cnn3_dp_model.evaluate(X_test, y_test)
print('Test loss:', cnn3_dp_score[0])
print('Test accuracy:', cnn3_dp_score[1])

In [None]:
# Visualize the training curve
plt.plot(history_3_dp.history['loss'])
plt.plot(history_3_dp.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

In [None]:
plt.plot(history_3_dp.history['accuracy'])
plt.plot(history_3_dp.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper right')
plt.show()

We can now see the overfitting problem has been solved by adding dropout.  
In order to improve accuracy, we could train the model for longer - more epochs (Hint: Try 20, 30, 40, 50, etc.)