# Assignment 4: Convolutional Models
## Design and train a Convolutional Neural Network

*Note: The assignments in this course build on each other, so please finish them in order.*

### Starter Code
Open the iPython notebook for this assignment ([4_convolutions.ipynb](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/4_convolutions.ipynb)), and follow the instructions to implement and run each indicated step. Some steps have been implemented for you.

### Evaluation
This is a self-evaluated assignment. As you go through the notebook, make sure you are able to solve each problem and answer any posed questions (save your responses as markdown in the notebook).

Improve the model by experimenting with its structure - how many layers, how they are connected, stride, pooling, etc. For more efficient training, try applying techniques such as dropout and learning rate decay. What does your final architecture look like?

------------------------------

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [5]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:

- convolutions need the image data formatted as a cube (width by height by #channels) (input เป็น cube มี $\text{weight}\cdot\text{height}\cdot\text{dept}$)
- labels as float 1-hot encodings. $[ 0 \quad 1 \quad 0 \quad 0 \quad \cdots \quad 0 \quad 0 \quad]$

In [6]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape((-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [21]:
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))/predictions.shape[0])

### Convnet: Two Convolutional Layers, One Fully Connected Layer and Softmax

- Let's build a small network with two convolutional layers, followed by one fully connected layer.
- Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

### Part 1: Load Data & Build Computation Graph
- **Weight Initialization**
    - One should generally initialize weights with a small amount of noise for symmetry breaking, and to prevent 0 gradients.
    - Since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias to avoid "dead neurons".
        - stddev=0.1
        
- **Calculating Output Size**
    - $O = \frac{W-K-2P}{S}+1$
        - O is the output height/length
        - W is the input height/length
        - K is the filter size
        - P is the padding
            - "same" = -1
            - "valid" = 0
        - S is the stride
        
Manual Calculation of Image Output Size

In [22]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

In [23]:
graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    # - convolutional layars     
    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    # - fully connected
    layer3_weights = tf.Variable(tf.truncated_normal([image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    # - Output : Softmax Layer
    layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    # Model.
    def model(data):
        # convolutional layars
        conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer1_biases)
        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer2_biases)
        # fully connected
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        return tf.matmul(hidden, layer4_weights) + layer4_biases # Output : Softmax Layer
  
    # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [24]:
num_steps  = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
          [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
            valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.737810
Minibatch accuracy: 12.5%
Validation accuracy: 10.0%
Minibatch loss at step 50: 1.634895
Minibatch accuracy: 37.5%
Validation accuracy: 57.5%
Minibatch loss at step 100: 0.898253
Minibatch accuracy: 75.0%
Validation accuracy: 71.4%
Minibatch loss at step 150: 0.732456
Minibatch accuracy: 75.0%
Validation accuracy: 74.2%
Minibatch loss at step 200: 0.498377
Minibatch accuracy: 87.5%
Validation accuracy: 77.9%
Minibatch loss at step 250: 0.901544
Minibatch accuracy: 75.0%
Validation accuracy: 78.3%
Minibatch loss at step 300: 1.158538
Minibatch accuracy: 62.5%
Validation accuracy: 77.9%
Minibatch loss at step 350: 0.336097
Minibatch accuracy: 93.8%
Validation accuracy: 79.7%
Minibatch loss at step 400: 1.201921
Minibatch accuracy: 56.2%
Validation accuracy: 78.8%
Minibatch loss at step 450: 0.848476
Minibatch accuracy: 68.8%
Validation accuracy: 80.1%
Minibatch loss at step 500: 0.234797
Minibatch accuracy: 93.8%
Validation accuracy: 81.0%
M

---------------------

## Problem 1
The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. 
- replace the strides by a max pooling operation (nn.max_pool()) of stride 2 and kernel(patch) size 2.

### Formats

- tf.nn.conv2d(input, filter, strides, padding)
- tf.nn.max_pool(value, ksize, strides, padding)

### Hyperparameters

- how many layers?
- how many conv layers?
- K : the filter size 
- S : string size
- P : Padding ("same" = -1, "valid" = 0)

image_size = 28x28, filter_size = 5, stride_side = 2

In [30]:
# Output size calculation
# for padding "same" which is -1
output_1 = (28.00 - 5.00 - (2*-1))/2 + 1.00
print(np.ceil(output_1),'x',np.ceil(output_1))
output_2 = (output_1 - 5.00 - (2*-1))/2 + 1.00
print(np.ceil(output_2),'x',np.ceil(output_2))

14.0 x 14.0
7.0 x 7.0


New function for Image Size: No Pooling

In [32]:
# Create image size function based on input, filter size, padding and stride
# 2 convolutions only
def output_size_no_pool(input_size, filter_size, padding, conv_stride):
    if padding == 'same':
        padding = -1.00
    elif padding == 'valid':
        padding = 0.00
    else:
        return None
    output_1 = float(((input_size - filter_size - 2*padding) / conv_stride) + 1.00)
    output_2 = float(((output_1 - filter_size - 2*padding) / conv_stride) + 1.00)
    return int(np.ceil(output_2))

patch_size = 5
final_image_size = output_size_no_pool(image_size, patch_size, padding='same', conv_stride=2)
print(final_image_size)

7


**Part 1: Load Data & Build Computation Graph**

In [35]:
# batch_size = 16
# patch_size = 5
# depth = 16
# num_hidden = 64

batch_size = 16
# Depth is the number of output channels 
# On the other hand, num_channels is the number of input channels set at 1 previously
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    '''Input data'''
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    '''Variables'''
    # Convolution 1 Layer
    # Input channels: num_channels = 1
    # Output channels: depth = 16
    layer1_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    
    # Convolution 2 Layer
    # Input channels: depth = 16
    # Output channels: depth = 16
    layer2_weights = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    
    # Fully Connected Layer (Densely Connected Layer)
    # Use neurons to allow processing of entire image
    layer3_weights = tf.Variable(tf.truncated_normal([final_image_size * final_image_size * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    
    # Readout layer: Softmax Layer
    layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))

    '''Model'''
    def model(data):
        # First Convolutional Layer with Pooling
        conv_1 = tf.nn.conv2d(data, layer1_weights, strides=[1, 1, 1, 1], padding='SAME')
        hidden_1 = tf.nn.relu(conv_1 + layer1_biases)
        pool_1 = tf.nn.max_pool(hidden_1, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        # Second Convolutional Layer with Pooling
        conv_2 = tf.nn.conv2d(pool_1, layer2_weights, strides=[1, 1, 1, 1], padding='SAME')
        hidden_2 = tf.nn.relu(conv_2 + layer2_biases)
        pool_2 = tf.nn.max_pool(hidden_2, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        
        # Full Connected Layer
        shape = pool_2.get_shape().as_list()
        reshape = tf.reshape(pool_2, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        
        # Readout Layer: Softmax Layer
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    '''Training computation'''
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf_train_labels))

    '''Optimizer'''
    # Learning rate of 0.05
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

    '''Predictions for the training, validation, and test data'''
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

#### Part 2: Run Computation & Iterate

In [None]:
num_steps = 30000

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 5000 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.075693
Minibatch accuracy: 12.5%
Validation accuracy: 10.0%
Minibatch loss at step 5000: 0.400405
Minibatch accuracy: 87.5%
Validation accuracy: 87.7%
Minibatch loss at step 10000: 0.795989
Minibatch accuracy: 75.0%
Validation accuracy: 88.8%
Minibatch loss at step 15000: 0.364743
Minibatch accuracy: 87.5%
Validation accuracy: 89.1%
Minibatch loss at step 20000: 0.461208
Minibatch accuracy: 87.5%
Validation accuracy: 89.6%
Minibatch loss at step 25000: 0.450123
Minibatch accuracy: 87.5%
Validation accuracy: 90.4%


-----------------------

## Problem 2
Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

### Details

- image_size = 28
- Convolutions
    - conv_filter_size = 5
    - conv_stride = 1
- Average Pooling
    - pool_filter_size = 2
    - pool_stride = 2
- padding='valid'
- Prevent overfitting
    - Learning rate decay
    - Regularization
    - Dropout
- Layers
    - Convolution
    - Pooling
    - Convolution
    - Pooling
    - Fully-connected
    - Fully-connected
    - Readout