# 2nd: Convolutional Neural Network

## Data

In [1]:
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data',one_hot=True)

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz


## CNN Architecture

The architecture of the network will be:

 - **Input** -> [batch_size,28,28,1] >> Apply 32 filter of [5x5]
 
 
 - **Convolutional layer 1**
    - **Convolve Operation 1** -> [batch_size,28,28,32]
    - **ReLU 1** -> [?,28,28,32]
    - **Max Pooling 1** [?,14,14,32]


 - **Convolutional layer 2**
     - **Convolve Operation 2** -> [?,14,14,64]
     - **ReLU 2** -> [?,14,14,64]
     - **Max Pooling 2** [?,7,7,64]


 - **Fully connected layer 3** -> [1x1024]
     - **ReLU 3** -> [1x1024]
     - **Dropout** -> [1x1024]


 - **Fully connected layer 4** -> [1x10]
 

In [2]:
sess = tf.InteractiveSession()

### Initial parameters

Create general parameters for the model

In [3]:
width = 28 #Width of the image in pixels
height = 28 #Height of the image in pixels
flat = width * height #Number of pixels in one image
class_output = 10 #Number of possible classes

## Input Layer

### Placeholders



In [4]:
x = tf.placeholder(dtype=tf.float32,shape=[None,flat],name='x_placeholder')
y_ = tf.placeholder(dtype=tf.float32,shape=[None,class_output],name='y_gold_placeholder')

### Converting images of the data set to tensors

The input image is a 28x28 pixels and 1 channel (grayscale)

**We need to reshape the image to match this format:**

Image = [batch_size, width, height, number of channels]

In [5]:
x_image = tf.reshape(x,[-1,28,28,1]) #-1 in batch_size means any size

## Layer 1: Convolutional layer

Let's make out first convolutional layer

### Convolve operation

**Defining kernel weight and bias**

 - Kernel: 5x5
 - Input channels: 1
 - Feature maps: 32 => Number of filters applied on each image, like a depth, so transforms the output to [width,height,depth]
 
Kernel of shape: [filter_height,filter_widht,in_channels,out_channels]

<img src="https://ibm.box.com/shared/static/f4touwscxlis8f2bqjqg4u5zxftnyntc.png" style="width:400px;height:200px;" alt="HTML5 Icon" >

In [6]:
W_conv1 = tf.Variable(tf.truncated_normal(shape=[5,5,1,32],stddev=0.1)) #<- Weights it's just our kernel :D

#Bias always equal to the number of outputs
b_conv1 = tf.Variable(tf.constant(shape=[32],value=0.1))

**Convolve operation with weights and biases**

To create convolutional layer we use **tf.nn.conv2d** that computes 2D convolution given 4D input and filter tensors.

Inputs:
 - Tensor of shape [batch, in_height, in_width, channels] => x : [batch_size,28,28,1]
 - A kernel of shape [filter_height, filter_width, in_channels, out_channels] => W_conv1 : [5,5,1,32]
 - stride which is [1,1,1,1]
 
Process:
 - Change the filter to a 2D matrix with shape [5*5*1,32]
 - Extracts the image patches from the input tensor to a virtual tensor of shape [batch,28,28,5*5*1]
 - For each patch, right multiplies the filter matrix and the image patch vector
 
Output:
 - A tensor (2D convolution) of size [batch_size,28,28,32]
  - Notice that the output is like 32[28x28]images. Here 32 is considered as volume/depth of the output image
  
  <img src="https://ibm.box.com/shared/static/brosafd4eaii7sggpbeqwj9qmnk96hmx.png" style="width:400px;height:200px;" alt="HTML5 Icon" >

In [7]:
convolve1 = tf.nn.conv2d(input=x_image,filter=W_conv1,strides=[1,1,1,1],padding='SAME') + b_conv1

### ReLU

Go through all outputs convolution layer and apply ReLU (turn negative values into 0).

In [8]:
h_conv1 = tf.nn.relu(convolve1)

### Max Pooling

Use the max pooling, it'll reduce the output to [14,14,32]

This is because max pooling is an operation that finds maximum values and simplifies the inputs using spacial correlations between them. 

- **Value:** The input => The previous ReLU output


- **Kernel size:** 2x2 (halves the input)

 - Caution: It's defined as 'The size of the window for each dimension of the input tensor.' -> [2x2] => [1,2,2,1]


- **Strides:** The sliding behaviour, 2 pixels everytime, thus no overlaping



- **Padding:** SAME

<img src="https://ibm.box.com/shared/static/awyoq0e2r3hfx3n7xrvhw4y7gly683p4.png" alt="HTML5 Icon" style="width:400px;height:200px;"> 

In [34]:
h_pool1 = tf.nn.max_pool(value=h_conv1,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

#### First Layer Complete

In [35]:
layer1 = h_pool1

End of Convolutional Layer 1

## Layer 2: Convolutional layer

### Convolve operation

**Defining kernel weight and bias**

 - Kernel: 5x5
 - Input channels: 32 (from the 1st Conv Layer, we had 32 feature maps) 
 - Feature maps: 64 => Number of filters applied on each image, like a depth, so transforms the output to [width,height,depth]
 
**Notice:**
 - Input: [14x14x32]
 - Kernel: [5x5x32]
 - Filters: 64

So the output convolutional layer would be [14,14,64]

In [36]:
W_conv2 = tf.Variable(tf.truncated_normal(shape=[5,5,32,64],stddev=0.1)) #<- Weights it's just our kernel :D

#Bias always equal to the number of outputs
b_conv2 = tf.Variable(tf.constant(shape=[64],value=0.1))

#### Convolve operation

In [37]:
convolve2 = tf.nn.conv2d(input=layer1,filter=W_conv2,strides=[1,1,1,1],padding='SAME') + b_conv2

#### ReLU 

In [38]:
h_conv2 = tf.nn.relu(convolve2)

#### Max Pooling

In [39]:
h_pool2 = tf.nn.max_pool(value=h_conv2,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

#### Second Layer Complete

In [40]:
layer2 = h_pool2

End of Convolutional Layer 2, which output is:
 - 64 matrix of 7x7

## Layer 3: Fully Connected Layer

A Fully Connected Layer is needed to use the SoftMax and create the probabilities in the end. Fully connected layers take the high levels 'images' from previous layer, that is all 64 matrics, and convert them to an array (flatten)

1. Eeach matrix [7x7] -> [49x1]
2. All 64 matrics of [49,1] will connect into [3136x1]
3. Connect the [3136x1] to [1024x1] 
    - The weights between them will be [3136,1024]
    
<img src="https://ibm.box.com/shared/static/hvbegd0lfr1maxpq2gpq3g8ibvk8d2eo.png" alt="HTML5 Icon" style="width:400px;height:200px;"> 


#### Flattening layer 

So we catch the last layer and reshape it

In [41]:
layer2_matrix = tf.reshape(layer2,[-1,7*7*64])

#### Weights and biases between layer 2 and 3 (in the fully connected layer)

The 64 matrix of [7x7] => 3136
The outputs to Softmax => 1024

In [42]:
W_fc1 = tf.Variable(tf.truncated_normal(shape=[7*7*64,1024],stddev=0.1))
b_fc1 = tf.Variable(tf.constant(value=0.1,shape=[1024]))

#### Regular operation 

In [43]:
fcl3 = tf.matmul(layer2_matrix,W_fc1) + b_fc1

#### ReLU

In [44]:
h_fc1 = tf.nn.relu(fcl3)

#### Third layer complete

In [45]:
layer3 = h_fc1

### Improvements for Fully Connected Layer

### Dropout

It's a phase where the network 'forget' some features. Some units get's switched off randomly so that will not interact with the network. It prevents overfitting.

 - **keep prob:** A Placeholder of type float32
 - **x:** The layer to cause the droput

In [46]:
keep_prob = tf.placeholder(dtype=tf.float32)
layer3_drop = tf.nn.dropout(x=layer3,keep_prob=keep_prob) #Keep_prob is the dropout rate

End of Fully Connected Layer 3

## Layer 4: Readout layer (Softmax layer, output layer)

A Softmax, fully connected layer for the output


### Weight and biases

In last layer, the CNN takes the high level filtered images and translate them into votes using softmax.

 - Input: 1024 neuron from the 3rd (fully connected) layer
 - Output: 10 possible classes

In [47]:
W_fc2 = tf.Variable(tf.truncated_normal(shape=[1024,10],stddev=0.1))
b_fc2 = tf.Variable(tf.constant(value=0.1,shape=[10]))

#### Regular operation 

In [48]:
fcl4 = tf.matmul(layer3_drop,W_fc2) + b_fc2

#### Softmax activation

Softmax allows us to interpret the outputs of fcl4 as probabilities, so **y_conv** is a tensor of probabilities.

In [49]:
y_conv = tf.nn.softmax(fcl4)

#### Fourth Layer Complete

In [50]:
layer4 = y_conv

## Summary of the CNN

0. Input - MNIST dataset
1. Convolutional - ReLU - MaxPooling
2. Convolutonal - ReLU - MaxPooling
3. Fully Connected Layer w/dropout
4. Readout Layer - Softmax - Fully connected
5. Output - Classified Digits

### Define functions and train the model

#### Loss function

We need to compare our output layer 4 tensor with the correct labels. We can use cross entropy.

In [51]:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(layer4),reduction_indices=[1]))

#### Optimizer

It'll be done by an optimizer called Adagrad

In [52]:
learning_rate = 1e-4
train_step = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

#### Define prediction

Let's count some results

In [53]:
#Number of correct predictions
correct_prediction = tf.equal(tf.argmax(layer4,1),tf.argmax(y_,1))

#Accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32));

### Let's Train!

In [59]:
sess.close()

sess = tf.InteractiveSession()

tf.global_variables_initializer().run()

### Lightweight version

Try 20.000 epochs for a really hardcore version

In [60]:
epochs = 1100
batch_size = 50

for step in range(epochs):
    batch = mnist.train.next_batch(batch_size)
    feed = {x:batch[0],y_:batch[1],keep_prob:0.5} #Because of the dropout we need to specify the dropout rate
    
    if step % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict={x:batch[0],y_:batch[1],keep_prob:1.0}) 
        print('At step %d the training accuracy: %.2f' %(step, train_accuracy))
        
    train_step.run(feed_dict=feed)

At step 0 the training accuracy: 0.10
At step 100 the training accuracy: 0.94
At step 200 the training accuracy: 0.92
At step 300 the training accuracy: 1.00
At step 400 the training accuracy: 0.96
At step 500 the training accuracy: 0.94
At step 600 the training accuracy: 0.94
At step 700 the training accuracy: 0.98
At step 800 the training accuracy: 0.92
At step 900 the training accuracy: 0.96
At step 1000 the training accuracy: 0.98


---

## Evaluate the model

Print the evaluation on the test set

In [63]:
acc = accuracy.eval(feed_dict={x : mnist.test.images, y_ : mnist.test.labels,keep_prob:1.0}) * 100

print('The final accuracy for the CNN model on test set is: %.002f' % acc)

The final accuracy for the CNN model on test set is: 96.56
