# Covnet Core Concepts

The video breaks down the concept of identifying images of cats. If there are cats on different parts of a picture, it would take a lot of computing power to identify the cat pictures independently of one another. If we impose 'translation invariance', you can teach the machine that the cat is still a cat regardless of its position on the picture.

Similarly, if you have a lot of text talking about kittens, the meaning of 'kittens' doesn't change throughout the text. That said, you should be able to train the network to remember what 'kittens' means. To do this, you employ a technique called 'weight sharing', so that you train the weights jointly for the inputs.

Statistical invariants are things that don't change on average across space or time. For images, weight sharing is a huge help in image classification, which brings us to convolutional neural networks.

For text and sequence processing, this breaks out into embeddings and recurrent neural networks.

# The first step for a CNN:

is to break up the image into smaller pieces. We do this by selecting a width and height that defines a filter.The filter looks at small pieces, or patches, of the image. These patches are the same size as the filter.

It's common to have more than one filter. Different filters pick up different qualities of a patch. For example, one filter might look for a particular color, while another might look for a kind of object of a specific shape. The amount of filters in a convolutional layer is called the filter depth.

We then simply slide the filter horizontally or vertically to focus on a different piece of the image. The amount by which the filter slides is referred to as the 'stride'. The stride is a hyperparameter which you, the engineer, can tune. Increasing the stride reduces the size of your model by reducing the number of total patches each layer observes. However, this usually comes with a reduction in accuracy.

Each pancake in our stack is considered a 'feature map'. When dealing with pictures, because the colors are split between red, green, and blue channels, each color channel can be considered a 'pancake' in the stack. The size of the pancake can be considered the 'depth', and for this example the depth is 3.

Valid padding is when you slide the kernal across the image, end at the edges, and repeat until having gone over the whole image. If you go past the edge, it's considered 'same padding', and adds 0s outside the bounds of the image.

# Dimensionality:

Given our input layer has a volume of W, our filter has a volume (height * width * depth) of F, we have a stride of S, and a padding of P, the following formula gives us the volume of the next layer: (W−F+2P)/S+1.

Knowing the dimensionality of each additional layer helps us understand how large our model is and how our decisions around filter size and stride affect the size of our network.

# Equations:

new_height = (input_height - filter_height + 2 * P)/S + 1

new_width = (input_width - filter_width + 2 * P)/S + 1

# Setup

1. H = height, W = width, D = depth
2. We have an input of shape 32x32x3 (HxWxD)
3. 20 filters of shape 8x8x3 (HxWxD)
4. A stride of 2 for both the height and width (S)
5. Valid padding of size 1 (P)
6. Depth is always == number of filters





In [2]:
iheight = 32
fheight = 8
paddingsize = 1
stridesize = 2

tempcheck = (iheight - fheight + 2 * paddingsize)/stridesize + 1

print(tempcheck)

14.0


In [8]:
# implementation with Tensorflow

import tensorflow as tf

input = tf.placeholder(tf.float32, (None, 32, 32, 3))
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) # (height, width, input_depth, output_depth)
filter_bias = tf.Variable(tf.zeros(20))
strides = [1, 2, 2, 1] # (batch, height, width, depth)
padding = 'VALID'
conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias



<tf.Tensor 'add_4:0' shape=(?, 13, 13, 20) dtype=float32>

# To calculate how many parameters are being used:
1. H = height, W = width, D = depth
2. We have an input of shape 32x32x3 (HxWxD)
3. 20 filters of shape 8x8x3 (HxWxD)
4. A stride of 2 for both the height and width (S)
5. Valid padding of size 1 (P)
6. Depth is always == number of filters
7. Output is 14x14x20

# Without reusing parameters:
Multiply the dimensions of each filter unit and add 1 to get the number of weights per filter. Then, multiply the dimensions of the output layer, and multiply it by the first value. This is a huge number of parameters.

(8 x 8 x 3 + 1) x (14 x 14 x 20) = 756560

# With reusing parameters:
The difference here is we swap (14 * 14 * 20) for 20 by itself. Remember, with weight sharing we use the same filter for an entire depth slice. Because of this we can get rid of 14 * 14 and be left with only 20.

(8 x 8 x 3 + 1) * 20 = 3860

With reusing parameters, we are reducing the number of computations by 99.5%


# Now for practical applications

The code below uses the tf.nn.conv2d() function to compute the convolution with weight as the filter and [1, 2, 2, 1] for the strides. TensorFlow uses a stride for each input dimension, [batch, input_height, input_width, input_channels]. 

We are generally always going to set the stride for batch and input_channels (i.e. the first and fourth element in the strides array) to be 1.

This example code uses a stride of 2 with 5x5 filter over input, and The tf.nn.bias_add() function adds a 1-d bias to the last dimension in a matrix.

In [13]:
import tensorflow as tf

# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(
    tf.float32,
    shape=[None, image_height, image_width, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal(
    [filter_size_height, filter_size_width, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

# Max Pooling
Conceptually, the benefit of the max pooling operation is to reduce the size of the input, and allow the neural network to focus on only the most important elements. Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values. 

TensorFlow provides the tf.nn.max_pool() function to apply max pooling to your convolutional layers.

The tf.nn.max_pool() function performs max pooling with the ksize parameter as the size of the filter and the strides parameter as the length of the stride. 2x2 filters with a stride of 2x2 are common in practice.

The ksize and strides parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor ([batch, height, width, channels]). For both ksize and strides, the batch and channel dimensions are typically set to 1.

Recently, pooling layers have fallen out of favor. Some reasons are:
1. Recent datasets are so big and complex we're more concerned about underfitting.
2. Dropout is a much better regularizer.
3. Pooling results in a loss of information. Think about the max pooling operation as an example. We only keep the 4. largest of n numbers, thereby disregarding n-1 numbers completely.


In [15]:
# How to apply max pooling with tensorflow
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],
    strides=[1, 2, 2, 1],
    padding='SAME')

In [16]:
# Calculate output shape with Tensorflow
input = tf.placeholder(tf.float32, (None, 4, 4, 5))
filter_shape = [1, 2, 2, 1]
strides = [1, 2, 2, 1]
padding = 'VALID'
pool = tf.nn.max_pool(input, filter_shape, strides, padding)

# Convolutional Network in TensorFlow
It's time to walk through an example Convolutional Neural Network (CNN) in TensorFlow.

The structure of this network follows the classic structure of CNNs, which is a mix of convolutional layers and max pooling, followed by fully-connected layers.

You've seen this section of code from previous lessons. Here we're importing the MNIST dataset and using a convenient TensorFlow function to batch, scale, and One-Hot encode the data.

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

import tensorflow as tf

# Parameters
learning_rate = 0.00001
epochs = 10
batch_size = 128

# Number of samples to calculate validation and accuracy
# Decrease this if you're running out of memory to calculate accuracy
test_valid_size = 256

# Network Parameters
n_classes = 10  # MNIST total classes (0-9 digits)
dropout = 0.75  # Dropout, probability to keep units

# Store layers weight & bias
weights = {
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    'out': tf.Variable(tf.random_normal([1024, n_classes]))}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))}

def conv2d(x, W, b, strides=1):
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)

def conv_net(x, weights, biases, dropout):
    # Layer 1 - 28*28*1 to 14*14*32
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    conv1 = maxpool2d(conv1, k=2)

    # Layer 2 - 14*14*32 to 7*7*64
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer - 7*7*64 to 1024
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output Layer - class prediction - 1024 to 10
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

In [24]:
# tf Graph input
x = tf.placeholder(tf.float32, [None, 28, 28, 1])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32)

# Model
logits = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

# Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf. global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(epochs):
        for batch in range(mnist.train.num_examples//batch_size):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: dropout})

            # Calculate batch loss and accuracy
            loss = sess.run(cost, feed_dict={
                x: batch_x,
                y: batch_y,
                keep_prob: 1.})
            valid_acc = sess.run(accuracy, feed_dict={
                x: mnist.validation.images[:test_valid_size],
                y: mnist.validation.labels[:test_valid_size],
                keep_prob: 1.})

            print('Epoch {:>2}, Batch {:>3} -'
                  'Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format(
                epoch + 1,
                batch + 1,
                loss,
                valid_acc))

    # Calculate Test Accuracy
    test_acc = sess.run(accuracy, feed_dict={
        x: mnist.test.images[:test_valid_size],
        y: mnist.test.labels[:test_valid_size],
        keep_prob: 1.})
    print('Testing Accuracy: {}'.format(test_acc))

0.875