# Convolutional Neural Networks using Tensorflow


Note: most of the explanation remains the same.


Here, we will be using Tensorflow and keras to build our neural network. Keras is now used as a part of tensorflow (could be used as a standalone package before, but this is better).

Here, we will be working with the MNIST dataset that contains 10's of thousands of black and white images of handwritten numbers. We will be training our model to predict what the written number is.

Note: Don't try to memorize the syntax and functions. Make sure you understand what is going on though.
You will learn the syntax as you practice more problems. [This](https://www.tensorflow.org/api_docs) is the link to TensorFlow documentation. Be careful about the versions though, TF2.0 is fairly different from this version TF 1.13

In [1]:
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np


In [2]:
print(tf.__version__)

1.13.1


### Dataset

This dataset is probably the most common dataset used, and hence, it comes preloaded in keras. As we can see, it is split into images and labels, and further into train and test. 

The labels contain the actual number that is written in the image.

In the following steps, we see that there are 60000 training images, each of the size 28 pixels (height) and 28 pixels (width)

In [3]:
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [4]:
train_images.shape

(60000, 28, 28)

In [5]:
len(train_images)

60000

#### Preprocessing

It is recommended that you normalize your input before sending it to your neural network. A short answer on why from [this](https://www.quora.com/Why-does-mean-normalization-help-in-gradient-descent) Quora post:

"The problem is when you have features with different scales. A commonly used optimization technique, such as gradient descent, uses the product of the learning rate times the gradient of the cost function as the step size. As a result, when the features have different scales, the step size can be different for each feature. When you try different learning rates, you will find that 1) the learning rate is too small and it will take a long time to converge, or 2) the learning rate is too big for one or more features, and it never converges."


In images, the highest pixel value is 255, and lowest is 0. This is a black and white image (grayscale) so images have only one channel (as opposed to Red, Blue, Green channels (3) in full color images).

To scale the values to some value between 0 and 1, one method is to divide every pixel value by 255. And this is what we do in the next step.

#### reshaping inputs

We need to reshape the data into a certain format, for conv2d to be able to work with it. In case we skip the reshaping step, we will encounter errors.
More details [here](https://stackoverflow.com/questions/37085653/when-bulding-a-cnn-i-am-getting-complaints-from-keras-that-do-not-make-sense-to).

In [6]:
train_images = train_images / 255
test_images = test_images / 255

train_images = train_images.reshape([-1, 28, 28, 1])
test_images = test_images.reshape((-1, 28, 28, 1))

## Building the convolutional neural network

We start off with a convolutional layer with 32 filters each of dimension 3 x 3. The activation used is ```relu``` (you can experiment with other activations like ```tanh``` too, and see what happens) and we also define the input shape.
For more information, look at the link at the end. Even though its on Pytorch, it could give you some insights on how CNNs work.

After that we have a maxpooling layer with a pool size of 2x2. Then a dropout layer with a drop percentage of 25%.
Then we flatten and pass it on to a fully connected layer of size 10, which is also the output layer.

In [7]:
model = keras.Sequential ([
    keras.layers.Conv2D(32, (3,3), activation = 'relu', input_shape = (28, 28, 1)),
    keras.layers.MaxPool2D(pool_size = (2, 2)),
    keras.layers.Dropout(0.25),
    keras.layers.Flatten(),
    keras.layers.Dense(10, activation = tf.nn.softmax)
])

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


We will be using the ```ADAM``` optimizer which another optimizer like gradient descent with a ```sparse_categorical_crossentropy``` loss (this is a solid choice because we are dealing with categorical data).  (you can read more about these at the link provided at the end of the notebook)

In [8]:
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy',
             metrics = ['accuracy'])

## Training the model

We use ```model.fit(input_data, target_labels)``` to train our model on input data to fit the target labels. We are going to train for five epoch. Each epoch signifies one iteration over the whole training data.

Due to extra parameters in the convolutional layers, this will take longer to train.

In [9]:
model.fit(train_images, train_labels, epochs = 5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7ff502ff07f0>

### Testing our model

We test our model on the test images and measure its performance by comparing its predictions with the actual test labels.
We see that it gives 98.18% accuracy, greater than what we previously acheived with an ANN, by just adding a single convolutional layer. This might not seem like much, but for bigger problems, CNNs will give significantly much better results.

You can experiment by adding more layers.

In [10]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(test_acc)

0.9818


### PyTorch Implementation

[This](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) blog post has a Pytorch implementation of a convolutional neural network along with detailed explanation.