<a href="https://colab.research.google.com/github/ccarpenterg/LearningTensorFlow2.0/blob/master/02_introduction_to_convnets_and_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Introduction to Convolutional Neural Networks with TensorFlow 2

Convolutional neural networks (CNNs or convnets for short) are one of most exciting new developments in the field of computer vision. A key feature of CNNs is that feature extraction is carried out by the convolutional layers, and there is no need for feature engineering.

### GPUs in Colab
We'll be using the GPU that is provided by Google in Colab, so in order to enable the GPU for this notebook, follow the next steps:

* Navigate to **Edit** → **Notebook settings**
* Open the **Hard accelerator** drop-down menu and select **GPU**

# Image classification using a Convnet with Tensorflow 2

we import some standard Python libraries, the tensorflow framework and some keras modules. In the end, we check we're using the right tensorflow version:

In [1]:
#import print function from future
from __future__ import absolute_import, division, print_function, unicode_literals

import os

%tensorflow_version 2.x

#import TensorFlow and check version
import tensorflow as tf

from tensorflow.keras import datasets, layers, models

print(tf.__version__)

TensorFlow 2.x selected.
2.1.0-rc1


**MNIST Dataset**

For this introduction to convolutional neural networks with Tensorflow 2.0, we'll be using the MNIST dataset. First we download the dataset and then normalize it:

In [9]:
mnist = datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

print(x_train.shape)
print(x_test.shape)

x_train, x_test = x_train / 255.0, x_test / 255.0

(60000, 28, 28)
(10000, 28, 28)


Since hhe MNIST images have only one channel, we need to explicitly reshape them to include a fourth dimension (that one channel):

In [3]:
print(x_train.shape)
print(x_test.shape)

x_train = x_train.reshape((60000, 28, 28, 1))
x_test = x_test.reshape((10000, 28, 28, 1))

print(x_train.shape)
print(x_test.shape)

(60000, 28, 28)
(10000, 28, 28)
(60000, 28, 28, 1)
(10000, 28, 28, 1)


We begin with 3 convolutional layers. The first one has 32 filters of 3x3 pixels, takes our MNIST images as input (28, 28, 1) and applies RELU as the activation function. The output of this operations are 32 feature maps, over which we then apply max-pooling of 2x2 pixels. The same operations are repeated in a similar way for the other convolutional layers.

In [0]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

Now we can get a summary of this convolutional neural network. Our model includes the method summary, which will show us the tensor shape for each layer and how many parameters will be calculated:

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


### Neural Network Bookkeeping: Parameters

Now let's find out how the TensorFlow calculated the parameters for each convolutional layer. 

***1st convolutional layer***

In the first convolutional layer we have 32 kernels or filters of 3x3. 
Each filter has 9 (3x3) parameters and that gives us a total of 288 parameters: 32 x 9.
But at the same time, each one of these filters has a bias: 32 x 1

32 x 9 weights + 32 x 1 biases = 320 parameters

***2nd convolutional layer***

After applying the first pooling we end up with 32 feature maps, and that's the input for our second convolutional layer. In the second layer we use 64 filters of 3x3, but now we apply these 64 filter to each of the 32 feature maps.

Each filter has 9 (3x3) parameters and that gives us a total of 576 parameters: 64 x 9. And each one of the filters has a bias: 64  x 1

32 x (64 x 9 weights) + 64 x 1 biases = 18496 parameters.

***3rd convolutional layer***

After applying the second polling we end up with 64 feature maps. In the third layer we also use 64 filters of 3x3, and we apply these filters to the 64 feature maps.

Each filter has 9 (3x3) parameters and that gives us a total of 576 parameters: 64 x 9. And each one of the filters has a bias: 64  x 1

64 x (64 x 9 weights) + 64 x 1 biases = 36928 parameters.

### Dense Classifier

We now add a dense layer and final layer with 10 neurons, one for each of the digit classes (one, two, three, etc). Because of the way in which dense layers work we need to convert the 3D tensor into a vector, and for that task we use a Flatten layer.

In [6]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

#Let's check our neural network architecture again
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                3

Now let's do the math for the rest of the neural network. First we have to stretch out the output tensor of our 3rd convolution layer:

tensor dimensions = (3, 3, 64) --> 3 x 3 x 64 = 576 elements vector

That's the input layer dimension of the second part of our neural network. So now we have a vector of length 576, a hidden dense layer composed of 64 neurons and an output layer of 10 neurons.

Each of the dense layer's neurons are connected to all the neurons in the input layer. And each of the 64 neurones has a bias:

64 x 576 weights + 64 x 1 bias = 36928 parameters

And finally, for our output layer we have:

10 * 64 weights + 10 biases = 650 parameters

### Neurons and Tensor Dimensions

Now let's a look at each layers' neurons and how the input tensor is transformed as it goes through the neural network.

Our MNIST dataset contains grayscale images of 28x28, this means that it contains only one channel. Therefore input tensor dimensions are:

(28, 28, 1) = 784 input neurons

In the case of the convolution layers, the neurons live in the feature maps. For the first convolutional layer we have 32 feature maps (given by the numbers of filters), and we use a formula to calculate the width and height for a particular feature map:

w = (w - f + 2p) / s + 1 --> width: (28 - 3 + 2x0) / 1 + 1 = 26

h = (h - f + 2p) / s + 1 --> height: (28 - 3 + 2x0) / 1 + 1 = 26

f is the size of the filter; p is the padding; s is the stride.

(26, 26, 32) = 21632 neurons. That's a lot, but after applying max pooling, which outputs the maximum activation given a max pooling of 2 x 2, we end up with:

(13, 13, 32) = 5408 neurons.

This convolution process continues until we reach the last convolution layer and we end up with:

(3, 3, 64) = 576 neurons and we are ready to add our dense layers.



### Compile and train the model

In [7]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fd060462e48>

### Evaluating accuracy

So far we have measured the accuracy for our training set, so now we determine the accuracy for our test set (the unseen data):

In [8]:
test_loss, test_acc = model.evaluate(x_test, y_test)

