#Convolutional Neural Networks

---


##In this tutorial I will be using Keras with TensorFlow as backend to calssify digits from the MNIST Dataset

First you need to install Keras using the following lines

In [1]:
# https://keras.io/
!pip install -q keras
import keras

Using TensorFlow backend.


Then import Keras and all the layers and libraries we need

In [0]:
import numpy
from keras import backend as K
from keras.datasets import mnist
from keras.utils import np_utils

We then import the layers of the convolutional neural network.The network consists of two main components :

1. Convolutional layers : the convolutional layer is responsible for the convolutional operation in which feature maps identifies features in the images.
and is usually followed by two types of layers which are :
>*   **Dropout** : Dropout is a regulization technique where you turn off part of the network's layers randomally to increase regulization and hense decrease overfitting. We use when the training set accuracy is muuch higher than the test set accuracy.
>*   **Max Pooling** : The maximum output in a rectangular neighbourhood. It is used to make the network more flexible to slight changes and decrease the network computationl expenses by extracting the group of pixels that are highly contributing to each feature in the feature maps in the layer.
2. Dense layers : The dense layer is a fully connected layer that comes after the convolutional layers and they give us the output vector of the Network.

As a convention in Convolutional Neural Network we decrease the dimensions of the layers as we go deeper and increase the number of feature maps to make it detect more features and decrease the number of computational cost.

![alt text](https://raw.githubusercontent.com/MoghazyCoder/Machine-Learning-Tutorials/master/Untitled.png)

 

In [0]:
from keras.layers import Dense, Dropout,Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D

Sequential layers are stacked such that every layer passes its output to the next layer without you specifying extra information so we import Sequential from models

In [0]:
from keras.models import Sequential
model = Sequential()

We must specify which data format convention Keras will follow using the following line of code. Keras can accept the number of channels before other dimensions or after it but here we have to specify which convention we will use. We will use channels last which is Tensorflow's convention .

In [0]:
K.set_image_data_format('channels_last')
numpy.random.seed(0)

We should call mnist.load_data() which contains the mnist Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.  when we call mnist.load_data() it returns two tuples one for the training set containing the images and their corresponding lables and another one for the test set.

In [6]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


We then reshape the samples according to TensorFlow convention which we chosed previously using "K.set_image_data_format('channels_last')" samples,rows,columns,channels as we are using channels_last if you are using channels_first you will need to change the order to samples,channels,rows,column and here we have only one channel because we are using the image in grayscale not RGB.

In [0]:
X_train = X_train.reshape(X_train.shape[0], 28, 28 , 1).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 28, 28 , 1).astype('float32')

To increase the efficiency and the convergence of the algorithm we normalize the data based on the fact that the pixels' maximum value is 255 so we divide all the pixels by 255 to obtain results between 0 and 1.

In [0]:
X_train = X_train / 255
X_test2 = X_test / 255

Making the output in the form of one vs all (aka one hot encoding) which means that we will have 10 calsses from 0 to 9 one class for each number from 0 to 9


In [0]:
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)

Now lets implement the first layer of the convolutional network as shown in the schema below .
![alt text](https://raw.githubusercontent.com/MoghazyCoder/Machine-Learning-Tutorials/master/Layer.png)
For the sequential model you just stack the layers and only specify the image input dimensions in the first layer.
Our first layer will be a convolutional layer Conv2D() where we specify the number of feature maps , the input shape and the activation function which is here relu .The relu activation function is represented mathematically by max(0,X).
We then add the max pooling layer (which is the most common kind of pooling) with a kernel of dimensions 2 * 2 .


In [0]:
model.add(Conv2D(30, (5, 5), input_shape=( 28, 28 , 1), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

Lets add the 2nd layer but this time we increase the feature maps .

In [0]:
model.add(Conv2D(70, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

Now we add a flatten layer that takes the output of the CNN and flattens it and passes it as an input to the Dense Layers which passes it to the output layer.
we have used number of classes = 10 because we have 10 numbers from 0 to 9 .
every dense layer contains 300 neurons except for the output layer.
We use Softmax with the output layer to output estimated probability vector for  multi-class classification .

In [0]:
num_classes = 10
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))

We have to compile the model and then try training it using the fit() function which fits the training data and labels , the number of epochs and the batch_size which is the number of photos per training cycle.
The last thing that we are going to do is to evaluate the model to ensure that it doesn't overfit the trainig data .Evaluating the model is done by using the weights that resulted from the training step and using it to estimate the value of the test data that the model haven't seen before to estimate how well the model will perform in the future on new data.

if you are using cross-validation split then the convention is to split the data by 60% training set , 20% validation set and 20% test set but in the era of big data this ratio may vary according to the amount of data you have.

We have used categorical_crossentropy as the cost function for that model but what does we mean by **cost function**

####Cost function : It is a measure of the overall loss in our network after assigning values to the parameters during the forward phase so it indicates how well the parameters were chosen during the forward probagation phase.

#### Optimizer : It is the gradiant descent algorithm that is used. We use it to minimize the cost function to approach the minimum point. We are using adam optimizer which is one of the best gradient descent algorithms. You can refere to this paper to know how it works https://arxiv.org/abs/1412.6980v8

You can use other metrics to measure the performance other than accuracy as precision or recall or F1 score. the choice depends on the problem itself. Where high recall means low number of false negatives , High precision means low number of false positives and     F1 score is a trade off between them. You can refere to this article for more about precision and recall http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html

In [14]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train, epochs= 10, batch_size=200)

scores = model.evaluate(X_test, y_test, verbose = 10)

print ( scores )


Epoch 1/10
Epoch 2/10
Epoch 3/10
 3000/60000 [>.............................] - ETA: 57s - loss: 0.0025 - acc: 0.9993

Epoch 4/10
Epoch 5/10
10000/60000 [====>.........................] - ETA: 50s - loss: 0.0023 - acc: 0.9993

Epoch 6/10
Epoch 7/10
11400/60000 [====>.........................] - ETA: 49s - loss: 0.0087 - acc: 0.9973

Epoch 8/10
Epoch 9/10
12800/60000 [=====>........................] - ETA: 47s - loss: 0.0052 - acc: 0.9981

Epoch 10/10
[0.11124998168051242, 0.9931]


###This tutorial is written by AbdElRhman ElMoghazy.

### Refrences ,Textbooks and Tutorials :
Hands on machine learning with scikit-learn and TensorFlow by Aurélien Géron

Pyhron machine learning 2nd edition by Sebastian Raschka ,Vahid Mirjalili

http://www.deeplearningbook.org/

https://keras.io/

https://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/

https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/index.html?index=..%2F..%2Findex#0