# CONVOLUTIONAL NEURAL NETWORK with MNIST Dataset

In [44]:
import tensorflow as tf
tf.__version__ #check version of tensorflow

'1.14.0'

We firstly classify MNIST using a simple Multi-layer perceptron and then, in the second part, we use deeplearning to improve the accuracy of our results.

## I. Classify MNIST using a simple model (Multi-player Perceptron-MLP )  

<h3>What is MNIST?</h3>

According to LeCun's website, the MNIST is a: "database of handwritten digits that has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image".

<h3>Import the MNIST dataset using TensorFlow built-in feature</h3>

It's very important to notice that MNIST is a high optimized data-set and __it does not contain images__. You will need to build your own code if you want to see the real digits.

In [45]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot = True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### Creating an interactive section

In [46]:
sess = tf.InteractiveSession()



### Creating placeholders

In [47]:
x = tf.placeholder(tf.float32, shape = [None, 784]) # 784 = 28x28 pixels
y_act = tf.placeholder(tf.float32, shape = [None, 10] ) # 10 = nb of elements from 0 to 9

### Assigning bias and weights to null tensors

We now sizes of $x$, $y$ are respectively (784,1), (10,1). Hence size of $W$ (weight) is (784,10). Note that $y=W^Tx$.

In [48]:
#Weight tensor
W = tf.Variable(tf.zeros([784,10],tf.float32)) 
#Bias tensor
b = tf.Variable(tf.zeros([10],tf.float32))

### Execute the assignment operation

In [49]:
#run the operation initialize_all_variables using an interactive session
sess.run(tf.global_variables_initializer())

### Adding Weights and Biases to input

In [50]:
tf.matmul(x,W)+b

<tf.Tensor 'add_6:0' shape=(?, 10) dtype=float32>

### Softmax regression

In [51]:
y_pred = tf.nn.softmax(tf.matmul(x,W)+b)

_Logistic function_ output is used for the classification between two target classes 0/1. _Softmax function_ is generalized type of logistic function. That is, Softmax can output a multiclass categorical probability distribution. 

### Cost function

We use cross-entropy function to determine the loss of model.

$$H(p,q) = -\sum_{x}p(x)\log q(x)$$

where $p$ is a probability of actual values and $q$ is a probability of predicted values.

In [52]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_act*tf.log(y_pred),reduction_indices=[1]))

### Type of optimization: Gradient Descent

There are several optimizers available, in our case we will use Gradient Descent because it is well established optimizer.

In [53]:
train_step = tf.train.GradientDescentOptimizer(learning_rate =  0.5).minimize(cross_entropy)

### Training batches

Train using minibatch Gradient Descent

In [54]:
#Load 50 training examples for each training iteration
for i in range(1000):
    batch = mnist.train.next_batch(50)
    train_step.run(feed_dict = {x:batch[0], y_act: batch[1] })

### Test

In [55]:
correct_prediction = tf.equal(tf.argmax(y_pred,1),tf.argmax(y_act,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
acc = accuracy.eval(feed_dict = {x:mnist.test.images,y_act:mnist.test.labels})*100
print("The final accuracy for the simple ANN model is: {} %".format(acc))

The final accuracy for the simple ANN model is: 91.21000170707703 %


In [56]:
sess.close()#finish session

<a id="ref5"></a>
<h2>How to improve our model?</h2>

<h4>Several options as follow:</h4>
<ul>
    <li>Regularization of Neural Networks using DropConnect</li>
    <li>Multi-column Deep Neural Networks for Image Classification</li> 
    <li>APAC: Augmented Pattern Classification with Neural Networks</li>
    <li>Simple Deep Neural Network with Dropout</li>
</ul>
<h4>In the next part we are going to explore the option:</h4>
<ul>
    <li>Simple Deep Neural Network with Dropout (more than 1 hidden layer)</li>
</ul> 

## II. Classify MNIST using a Deep Neural Network.

In the first part, we learned how to use a simple ANN to classify MNIST. Now we are going to expand our knowledge using a Deep Neural Network. 


Architecture of our network is:
    
- (Input) -> [batch_size, 28, 28, 1]  >> Apply 32 filter of [5x5]
- (Convolutional layer 1)  -> [batch_size, 28, 28, 32]
- (ReLU 1)  -> [?, 28, 28, 32]
- (Max pooling 1) -> [?, 14, 14, 32]
- (Convolutional layer 2)  -> [?, 14, 14, 64] 
- (ReLU 2)  -> [?, 14, 14, 64] 
- (Max pooling 2)  -> [?, 7, 7, 64] 
- [fully connected layer 3] -> [1x1024]
- [ReLU 3]  -> [1x1024]
- [Drop out]  -> [1x1024]
- [fully connected layer 4] -> [1x10]


The next cells will explore this new architecture.

### Starting the code

In [57]:
import tensorflow as tf

#finish possible remaining session
sess.close()

#Start interaction session
sess = tf.InteractiveSession()


<h3>1 .The MNIST data</h3>

In [58]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


### Initial parameters

Create general parameters for the model

In [59]:
witdh = 28 #witdh of the image in pixels
height = 28 #height of the image in pixels
flat = witdh * height # number of pixels in one image
class_output = 10 #number of possible classifications for the problem

### Input and Output

Create place holders for inputs and outputs:

In [60]:
x = tf.placeholder(tf.float32,shape = [None, flat])
y_act = tf.placeholder(tf.float32,shape = [None, class_output])

#### Converting images of the data set to tensors

In [61]:
x_image = tf.reshape(x,[-1,28,28,1])

_1st_ dimension: batch number (can be any size),

_2nd_ dimension: height,

_3rd_ dimension: width,

_4th_ dimension: image channels (1 channel as grayscale).

### 2. Convolutional Layer 1

<h4>Defining kernel weight and bias</h4>
We define a kernel here. The Size of the filter/kernel is 5x5;  Input channels is 1 (grayscale);  and we need 32 different feature maps (here, 32 feature maps means 32 different filters are applied on each image. So, the output of convolution layer would be 28x28x32). In this step, we create a filter / kernel tensor of shape <code>[filter_height, filter_width, in_channels, out_channels]</code>

In [62]:
W_conv1 = tf.Variable(tf.truncated_normal([5,5,1,32], stddev = 0.1 ))
b_conv1 = tf.Variable(tf.constant(0.1, shape = [32] )) # 32 biases for 32 outputs


<h4>Convolve image with weight tensor and add biases.</h4>

In [63]:
convolve1= tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1


In [64]:
convolve1

<tf.Tensor 'add_8:0' shape=(?, 28, 28, 32) dtype=float32>

#### Apply the ReLu activation function

Let $f(x)$ is a ReLU function,
$$f(x)=\max(0,x)$$

In [65]:
h_conv1 = tf.nn.relu(convolve1)

#### Apply the max pooling

In [66]:
conv1 = tf.nn.max_pool(h_conv1, ksize = [1,2,2,1], strides = [1,2,2,1], padding = 'SAME')#max_pool 2x2
conv1

<tf.Tensor 'MaxPool_2:0' shape=(?, 14, 14, 32) dtype=float32>

First layer completed!

### 3. Convolutional Layer 2

#### Weights and Biases of kernels

We apply the convolution again in this layer. Lets look at the second layer kernel:  
- Filter/kernel: 5x5 (25 pixels) 
- Input channels: 32 (from the 1st Conv layer, we had 32 feature maps) 
- 64 output feature maps  

<b>Notice:</b> here, the input image is [14x14x32], the filter is [5x5x32], we use 64 filters of size [5x5x32], and the output of the convolutional layer would be 64 convolved image, [14x14x64].

<b>Notice:</b> the convolution result of applying a filter of size [5x5x32] on image of size [14x14x32] is an image of size [14x14x1], that is, the convolution is functioning on volume.

In [67]:
W_conv2 = tf.Variable(tf.truncated_normal([5,5,32,64], stddev = 0.1 ))
b_conv2 = tf.Variable(tf.constant(0.1, shape = [64] )) # 64 biases for 64 outputs


<h4>Convolve image with weight tensor and add biases.</h4>

In [68]:
convolve2= tf.nn.conv2d(conv1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2


#### Apply the ReLu activation function

In [69]:
h_conv2 = tf.nn.relu(convolve2)

#### Apply the max pooling

In [70]:
conv2 = tf.nn.max_pool(h_conv2, ksize = [1,2,2,1], strides = [1,2,2,1], padding = 'SAME')#max_pool 2x2
conv2

<tf.Tensor 'MaxPool_3:0' shape=(?, 7, 7, 64) dtype=float32>

Second layer completed. The output of layer 2  is 64 matrices of [7x7]

### 4. Fully Connected Layer (Dense)

You need a fully connected layer to use the Softmax and create the probabilities in the end. Fully connected layers take the high-level filtered images from previous layer, that is all 64 matrices, and convert them to a flat array.

So, each matrix [7x7] will be converted to a matrix of [49x1], and then all of the 64 matrix will be connected, which make an array of size [3136x1]. We will connect it into another layer of size [1024x1]. So, the weight between these 2 layers will be [3136x1024]


<img src="https://ibm.box.com/shared/static/pr9mnirmlrzm2bitf1d4jj389hyvv7ey.png" alt="HTML5 Icon" style="width: 800px; height: 400px;"> 


#### Flattening Second Layer

In [71]:
layer2_matrix = tf.reshape(conv2,[-1,7*7*64]) # we don't care the 1st dimension


#### Weights and biase between layer 2 and 3

In [72]:
W_fc1 = tf.Variable(tf.truncated_normal([7*7*64,1024], stddev = 0.1 ))
b_fc1 = tf.Variable(tf.constant(0.1, shape = [1024] )) # 1024 biases for 1024 outputs


#### Matrix Multiplication (Applying weights and biases)

In [73]:
fc1 = tf.matmul(layer2_matrix, W_fc1)+b_fc1

#### Apply the ReLU activation function

In [74]:
h_fc1 = tf.nn.relu(fc1)
h_fc1

<tf.Tensor 'Relu_5:0' shape=(?, 1024) dtype=float32>

Third layer is completed!

### 5. Dropout Layer, (Optional phase for reducing overfitting)

It is a phase where the network "forget" some features. At each training step in a mini-batch, some units get switched off randomly so that it will not interact with the network. That is, it weights cannot be updated, nor affect the learning of the other network nodes.  This can be very useful for very large neural networks to prevent overfitting.

In [75]:
keep_prob = tf.placeholder(tf.float32)
layer_drop = tf.nn.dropout(h_fc1, rate = 1-keep_prob)
layer_drop

<tf.Tensor 'dropout_1/mul_1:0' shape=(?, 1024) dtype=float32>

Note: p (keep_prob)=0.5 is recommended configuration, except for the input layer which is recommended to have p=0.8.

In other words, at testing time we treat it as a normal neural network without dropout, and at training time we upscale the values by 1/prob.

### 6. Readout Layer (Softmax Layer)

Type: Softmax, Fully Connected Layer.

<h4>Weights and Biases</h4>

In last layer, CNN takes the high-level filtered images and translate them into votes using softmax.
Input channels: 1024 (neurons from the 3rd Layer); 10 output features

In [76]:
W_fc2 = tf.Variable(tf.truncated_normal([1024,10], stddev = 0.1 ))
b_fc2 = tf.Variable(tf.constant(0.1, shape = [10] )) # 10 biases for 10 outputs


#### Matrix Multiplication (Applying weights and biases)

In [77]:
fc = tf.matmul(layer_drop, W_fc2)+b_fc2

#### Apply the Softmax activation function

In [78]:
y_CNN = tf.nn.softmax(fc)

In [79]:
y_CNN

<tf.Tensor 'Softmax_3:0' shape=(?, 10) dtype=float32>

<a id="ref7"></a>
<h2>Summary of the Deep Convolutional Neural Network</h2>

Now is time to remember the structure of  our network

#### 0) Input - MNIST dataset
#### 1) Convolutional and Max-Pooling
#### 2) Convolutional and Max-Pooling
#### 3) Fully Connected Layer
#### 4) Processing - Dropout
#### 5) Readout layer - Fully Connected
#### 6) Outputs - Classified digits

We've completed our CNN model. Now we will train it by our MNIST dataset.

### Define functions

#### Define the loss function

We use cross-entropy function to determine the loss of model.

$$H(p,q) = -\dfrac{1}{N}\sum_{x}p(x)\log q(x)$$

where $p$ is a probability of actual values, $q$ is a probability of predicted values and $N$ is number of instances. 

In [80]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_act*tf.log(y_CNN),reduction_indices=[1]))

#### Define the optimizer

We use the optimizer AdamOptimizer.

In [81]:
train_step = tf.train.AdamOptimizer(learning_rate = 1e-4).minimize(cross_entropy)

#### Define prediction

In [82]:
correct_prediction =  tf.equal(tf.argmax(y_CNN,1),tf.argmax(y_act,1))

#### Define accuracy

In [83]:
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

#### Run session

In [84]:
sess.run(tf.global_variables_initializer())

###  Train the model and evaluate the model

In [None]:
#### Run it if you want to see the train accuracy and loss values 

#### Train model
# train_accuracy=0.0
# train_loss=0.0
# for i in range(1100):
#     batch = mnist.train.next_batch(50)
#     if i%100 == 0:
#         #train_accuracy = accuracy.eval(feed_dict = {x:batch[0], y_act:batch[1],keep_prob: 1.0})
#         train_accuracy, train_loss = sess.run([accuracy,cross_entropy],feed_dict = {x:batch[0], y_act:batch[1],keep_prob: 1.0})
#         print("Step %d, training accuracy %g, training loss %g"%(i,float(train_accuracy),float(train_loss)))
#     train_step.run(feed_dict = {x:batch[0], y_act:batch[1],keep_prob: 0.5})



##### Evaluate model

# #evaluate in batches to avoid out-of-memory issues
# n_batches = mnist.test.images.shape[0]//50
# cumulative_accuracy = 0.0
# for index in range(n_batches):
#     batch = mnist.test.next_batch(50)
#     cumulative_accuracy += accuracy.eval(feed_dict = {x:batch[0], y_act:batch[1],keep_prob: 1.0})
# print("Test accuracy {}".format(cumulative_accuracy/n_batches))




In [86]:
#Vectors for stroring accuracy and loss values
train_accuracy = []
train_loss = []
test_accuracy = []
test_loss = []
#Setting batch size to avoid costly computation
size_batch = 50
train_n_batches = mnist.train.images.shape[0]//size_batch
test_n_batches = mnist.test.images.shape[0]//size_batch

for _ in range(40):
    cumulative_train_accuracy = 0.0
    cumulative_train_loss =0.0
    cumulative_test_accuracy = 0.0
    cumulative_test_loss =0.0
    #Train the model
    for i in range(train_n_batches):
        batch = mnist.train.next_batch(size_batch)
        batch_train_accuracy, batch_train_loss = sess.run([accuracy,cross_entropy],feed_dict = {x:batch[0], y_act:batch[1],keep_prob: 1.0})
        cumulative_train_accuracy += batch_train_accuracy
        cumulative_train_loss += batch_train_loss
        train_step.run(feed_dict = {x:batch[0], y_act:batch[1],keep_prob: 0.5})
    train_accuracy.append(cumulative_train_accuracy/train_n_batches)
    train_loss.append(cumulative_train_loss/train_n_batches)
    #Evaluate the model (evaluate in batches to avoid out-of-memory issues)
    for index in range(test_n_batches):
        batch = mnist.test.next_batch(size_batch)
        batch_test_accuracy, batch_test_loss = sess.run([accuracy,cross_entropy],feed_dict = {x:batch[0], y_act:batch[1],keep_prob: 1.0})
        cumulative_test_accuracy += batch_test_accuracy
        cumulative_test_loss += batch_test_loss
    test_accuracy.append(cumulative_test_accuracy/test_n_batches)
    test_loss.append(cumulative_test_loss/test_n_batches)    
    

### Plot train-validation loss and accuracy 

In [None]:
#Training-Validation loss
plt.plot(range(len(train_loss)),train_loss, color = 'b', label = "Training Loss" )
plt.plot(range(len(train_loss)),test_loss, color = 'r', label = "Test Loss" )
plt.title("Training-Test loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

In [None]:
#Training-Validation Accuracy
plt.plot(range(len(train_accuracy)),train_accuracy, color = 'b', label = "Training accuracy" )
plt.plot(range(len(train_accuracy)),test_accuracy, color = 'r', label = "Test accuracy" )
plt.title("Training-Test accuracy")
plt.xlabel("Epochs")
plt.ylabel("accuracy")
plt.legend()
plt.show()

### Visualization

Fistly, we will visualize all the filters.

In [None]:
kernels = sess.run(tf.reshape(tf.transpose(W_conv1, perm=[2, 3, 0,1]),[32, -1]))

In [None]:
!wget --output-document utils1.py http://deeplearning.net/tutorial/code/utils.py
import utils1
from utils1 import tile_raster_images
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline
image = Image.fromarray(tile_raster_images(kernels, img_shape=(5, 5) ,tile_shape=(4, 8), tile_spacing=(1, 1)))
### Plot image
plt.rcParams['figure.figsize'] = (18.0, 18.0)
imgplot = plt.imshow(image)
imgplot.set_cmap('gray')  

Secondly,  we will see the output of an image passing through 1st convolution layer.

In [None]:
import numpy as np
plt.rcParams['figure.figsize'] = (5.0,5.0)
sampleimage = mnist.test.images[1]
plt.imshow(np.reshape(sampleimage,[28,28]),cmap="gray")

In [None]:
ActivatedUnits = sess.run(convolve1,feed_dict={x:np.reshape(sampleimage,[1,784],order='F'),keep_prob:1.0})
filters = ActivatedUnits.shape[3] #remember that size of 1st convolution layer is
                                  # [-1,28,28,32]      
plt.figure(1, figsize=(20,20))
n_columns = 6
n_rows = np.math.ceil(filters / n_columns) + 1
for i in range(filters):
    plt.subplot(n_rows, n_columns, i+1)
    plt.subplots_adjust(wspace=0.5, hspace=0.5)
    plt.title('Image through filter ' + str(i))
    plt.imshow(ActivatedUnits[0,:,:,i], interpolation="nearest", cmap="gray")

Finally, we will see the output of an image passing through 2nd convolution layer.

In [None]:
ActivatedUnits = sess.run(convolve2,feed_dict={x:np.reshape(sampleimage,[1,784],order='F'),keep_prob:1.0})
filters = ActivatedUnits.shape[3]#size of 2nd convolution layer is
                                  # [-1,7,7,64]
plt.figure(1, figsize=(20,20))
n_columns = 8
n_rows = np.math.ceil(filters / n_columns) + 1
for i in range(filters):
    plt.subplot(n_rows, n_columns, i+1)
    plt.subplots_adjust(wspace=0.5, hspace=0.5)
    plt.title('Image through filter ' + str(i))
    plt.imshow(ActivatedUnits[0,:,:,i], interpolation="nearest", cmap="gray")