## Exercise 3

# **Improving Computer Vision Accuracy using Convolutions**

We saw how to do fashion recognition using a Deep Neural Network (DNN) containing three layers --
,  

1.   the input layer (in the shape of the data)
2.   the output layer (in the shapeof the desired output)
3.   hidden layer

And we experimented with the impact of different sizes of hidden layer, number of training epochs etc on the final accuracy. Our accuracy was about 89% on training and 87% on validation...It is not bad but is it possible to make that even better? One way is  using Convolutions Neural Networks (CNN).In short, with kernel we can take an array (usually 3x3 or 5x5) and pass it over the image. The concept of Convolutional Neural Networks, adding some layers to do convolution before we have the dense layers, and then the information going to the dense layers is more focussed, and possibly more accurate.

## **Imports**

In [7]:
import tensorflow as tf
from os import path, getcwd, chdir

# DO NOT CHANGE THE LINE BELOW. If you are developing in a local
# environment, then grab mnist.npz from the Coursera Jupyter Notebook
# and place it inside a local folder and edit the path to that location
path = f"{getcwd()}/../tmp2/mnist.npz"

The parameters are:

The number of convolutions we want to generate. Purely arbitrary, but good to start with something in the order of 32
The size of the Convolution, in this case a 3x3 grid
The activation function to use -- in this case we'll use relu, which you might recall is the equivalent of returning x when x>0, else returning 0
In the first layer, the shape of the input data.
We'll follow the Convolution with a MaxPooling layer which is then designed to compress the image, while maintaining the content of the features that were highlighted by the convolution. By specifying (2,2) for the MaxPooling, the effect is to quarter the size of the image. The idea is that it creates a 2x2 array of pixels, and picks the biggest one, thus turning 4 pixels into 1. It repeats this across the image, and in so doing halves the number of horizontal, and halves the number of vertical pixels, effectively reducing the image by 25%.

In [8]:
def train_mnist_conv():
    mnist = tf.keras.datasets.mnist
    (training_images, training_labels), (test_images, test_labels) = mnist.load_data()
    training_images=training_images.reshape(60000, 28, 28, 1)
    training_images=training_images / 255.0
    test_images = test_images.reshape(10000, 28, 28, 1)
    test_images=test_images/255.0
    model = tf.keras.models.Sequential([
      tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
      tf.keras.layers.MaxPooling2D(2, 2),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'],)
    #model.fit(training_images, training_labels, epochs=10)
    #test_loss, test_acc = model.evaluate(test_images, test_labels)
    #print(test_acc)
    
    history = model.fit(training_images, training_labels, epochs=10)
    # YOUR CODE SHOULD END HERE
    
    # model fitting
    return history.epoch, history.history['accuracy'][-1]

In [9]:
train_mnist_conv()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 0.9982500076293945)

# **Evaluation**
Using Convolutions we could improve the accuracy from %87 to 99.8%. We used a single convolutional layer and a single MaxPooling 2D with 10 epochs.