# Convolutional Neural Networks

Setup

H = height, W = width, D = depth

    We have an input of shape 32x32x3 (HxWxD)
    20 filters of shape 8x8x3 (HxWxD)
    A stride of 2 for both the height and width (S)
    Valid padding of size 1 (P)

Formula for calculating the new height or width:

**new_height = (input_height - filter_height + 2 * P)/S + 1**  
**new_width = (input_width - filter_width + 2 * P)/S + 1**

What's the shape of the output? The answer format is HxWxD

The answer is **14x14x20**.

We can get the new height and width with the formula resulting in:

(32 - 8 + 2 * 1)/2 + 1 = 14
(32 - 8 + 2 * 1)/2 + 1 = 14

The new depth is equal to the number of filters, which is 20.

## Question on number of parameters without parameter sharing 

Without parameter sharing, each neuron in the output layer must connect to each neuron in the filter. In addition, each neuron in the output layer must also connect to a single bias neuron.

In [4]:
# 8 * 8 * 3 is the number of weights, we add 1 for the bias. 
# Each weight is assigned to every single part of the output (14 * 14 * 20). 
# So we multiply these two numbers together and we get the final answer
print((8*8*3+1)*(14*14*20))
# That's a HUGE amount!

756560


## Question on number of parameters with parameter sharing  

With parameter sharing, each neuron in an output channel shares its weights with every other neuron in that channel. So the number of parameters is equal to the number of neurons in the filter, plus a bias neuron, all multiplied by the number of channels in the output layer.

In [6]:
print((8*8*3+1)*(20))
# That's 196 times fewer parameters!

3860


That's 3840 weights and 20 biases. This should look similar to the answer from the previous quiz. The difference being it's just 20 instead of (14 * 14 * 20). Remember, with weight sharing we use the same filter for an entire depth slice. Because of this we can get rid of 14 * 14 and be left with only 20.

# CNN in TensorFlow

In [1]:
import tensorflow as tf

In [2]:
input = tf.placeholder(tf.float32, (None, 32, 32, 3)) # batch size, image H, image W, RGB
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) # (height, width, input_depth, output_depth)
filter_bias = tf.Variable(tf.zeros(20))
strides = [1, 2, 2, 1] # (batch, height, width, depth)
padding = 'VALID'
conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias

Note the output shape of **conv** will be [1, 13, 13, 20]. It's 4D to account for batch size, but more importantly, it's not [1, 14, 14, 20].   This is because the padding algorithm TensorFlow uses is not exactly the same as the one above. An alternative algorithm is to switch **padding** from **'VALID'** to **'SAME'** which would result in an output shape of [1, 16, 16, 20]. If you're curious how padding works in TensorFlow, read this document: https://www.tensorflow.org/api_guides/python/nn#Convolution

## TensorFlow Convolution

Let's examine how to implement a CNN in TensorFlow.

TensorFlow provides the **tf.nn.conv2d()** and **tf.nn.bias_add()** functions to create your own convolutional layers.

In [7]:
# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(tf.float32, shape=[None, image_height, image_width, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal([filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
# strides: (batch, height, width, depth)
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

The code above uses the **tf.nn.conv2d()** function to compute the convolution with **weight** as the filter and **[1, 2, 2, 1]** for the strides. TensorFlow uses a stride for each **input** dimension, **[batch, input_height, input_width, input_channels]**. We are generally always going to set the stride for **batch** and **input_channels** (i.e. the first and fourth element in the **strides** array) to be **1**.

You'll focus on changing **input_height** and **input_width** while setting **batch** and **input_channels** to 1. The **input_height** and **input_width** strides are for striding the filter over **input**. 

**This example code uses a stride of 2 with 5x5 filter over input.**

The **tf.nn.bias_add()** function adds a 1-d bias to the last dimension in a matrix.

## TensorFlow Max Pooling

![title](img/max-pooling.png)

The image above is an example of **max pooling with a 2x2 filter and stride of 2**. The four 2x2 colors represent each time the filter was applied to find the maximum value. 

Conceptually, the benefit of the max pooling operation is to reduce the size of the input, and allow the neural network to focus on only the most important elements. Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values.

TensorFlow provides the **tf.nn.max_pool()** function to apply max pooling to your convolutional layers.

In [8]:
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],
    strides=[1, 2, 2, 1],
    padding='SAME')

The **tf.nn.max_pool()** function performs max pooling with the **ksize** parameter as the size of the filter and the **strides** parameter as the length of the stride. **2x2 filters with a stride of 2x2 are common in practice**.

The **ksize** and **strides** parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor (**[batch, height, width, channels]**). For both **ksize** and **strides**, the batch and channel dimensions are typically set to **1**.

Max pooling is generally used to:
    * decrease the size of the output
    * prevent overfitting
Preventing overfitting is a consequence of reducing the output size, which in turn, reduces the number of parameters in future layers.

Recently, pooling layers have fallen out of favor. Some reasons are:

    * Recent datasets are so big and complex we're more concerned about underfitting.
    * Dropout is a much better regularizer.
    * Pooling results in a loss of information. Think about the max pooling operation as an example. We only keep the largest of n numbers, thereby disregarding n-1 numbers completely.

## Quiz Max Pooling 

H = height, W = width, D = depth

    We have an input of shape 4x4x5 (HxWxD)
    Filter of shape 2x2 (HxW)
    A stride of 2 for both the height and width (S)

Recall the formula for calculating the new height or width:

**new_height = (input_height - filter_height)/S + 1  
new_width = (input_width - filter_width)/S + 1**

NOTE: For a pooling layer the output depth is the same as the input depth. Additionally, the pooling operation is applied individually for each depth slice.

What's the shape of the output? Format is HxWxD.

In [16]:
H1=(4-2)/2+1
W1=(4-2)/2+1
print(H1,W1)
D=5

2.0 2.0


In [18]:
print('%dx%dx%d'%(H1,W1,D))

2x2x5


Here's the corresponding code:

In [19]:
input = tf.placeholder(tf.float32, (None, 4, 4, 5))
filter_shape = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = 'VALID'
pool = tf.nn.max_pool(input, filter_shape, strides, padding)

The output shape of **pool** will be [1, 2, 2, 5], even if **padding** is changed to **'SAME'**.

What's the result of a max pooling operation on the input:

[[[0, 1, 0.5, 10],
   [2, 2.5, 1, -8],
   [4, 0, 5, 6],
   [15, 1, 2, 3]]]

Assume the filter is 2x2 and the stride is 2 for both height and width. The output shape is 2x2x1.

The answering format will be 4 numbers, each separated by a comma, such as: 1,2,3,4.

Work from the top left to the bottom right

In [21]:
print(2.5,10,15,6)

2.5 10 15 6


What's the result of a average (or mean) pooling?

In [22]:
print((0+1+2+2.5)/4,(.5+10+1-8)/4,(4+0+15+1)/4,(5+6+2+3)/4)

1.375 0.875 5.0 4.0
