Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [5]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))
  layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
  layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

Instructions for updating:
Colocations handled automatically by placer.


In [6]:
num_steps = 101

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.747132
Minibatch accuracy: 6.2%
Validation accuracy: 9.6%
Minibatch loss at step 50: 1.673349
Minibatch accuracy: 37.5%
Validation accuracy: 57.5%
Minibatch loss at step 100: 1.073132
Minibatch accuracy: 56.2%
Validation accuracy: 68.9%
Test accuracy: 75.9%


---
Problem 1
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [7]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, Flatten, Dense

In [8]:
batch_size = 128


pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


In [9]:
model_1 = Sequential([
    Conv2D(filters= 32,kernel_size=(3,3), strides=1, padding='valid', activation='relu', input_shape=(28,28,1) ),
    MaxPool2D(pool_size=(2,2)),
    
    Dropout(rate=0.25),
    
    Flatten(),
    Dense(1024, activation='relu'),
    Dropout(rate=0.25),
    Dense(10, activation='softmax'),
    
])


model_1.compile( optimizer='adam'
              , loss='sparse_categorical_crossentropy'
              , metrics=['accuracy']
             )


model_2 = Sequential([
    Conv2D(filters= 32,kernel_size=(3,3), strides=1, padding='valid', activation='relu', input_shape=(28,28,1) ),
    MaxPool2D(pool_size=(2,2)),
    
    Dropout(rate=0.25),
    
    Flatten(),
    Dense(1024, activation='relu'),
    Dropout(rate=0.25),
    Dense(10, activation='softmax'),
    
])


model_2.compile( optimizer='adam'
              , loss='sparse_categorical_crossentropy'
              , metrics=['accuracy']
             )


Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [10]:
import numpy as np
train_dataset_shape = train_dataset.shape
train_dataset = train_dataset.reshape((train_dataset_shape[0], train_dataset_shape[1], train_dataset_shape[2], 1))
valid_shape = valid_dataset.shape
valid_dataset = valid_dataset.reshape((valid_shape[0], valid_shape[1],valid_shape[2],1))
test_shape = test_dataset.shape
test_dataset = test_dataset.reshape((test_shape[0], test_shape[1],test_shape[2],1))

In [11]:
def my_generator():
    for batch_index in np.random.choice(range(train_dataset.shape[0])
                                        ,size=(train_dataset.shape[0]//batch_size, batch_size)
                                        ,replace=False
                                       ):
        yield train_dataset[batch_index], train_labels[batch_index]
    

In [12]:
model_1.fit(  x=train_dataset
          , y=train_labels
          , batch_size=batch_size
          , epochs=5
          , validation_data=(valid_dataset,valid_labels))

Train on 200000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x295026748d0>

In [17]:
model_1.evaluate(test_dataset,test_labels)



[0.10526585588380695, 0.9695]

In [14]:
model_2.fit_generator(my_generator()
                    , train_dataset.shape[0]//batch_size//5
                    , epochs=5
                , validation_data=(valid_dataset,valid_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x29508789400>

In [16]:
model_2.evaluate(test_dataset,test_labels)



[0.15810049138963223, 0.9554]

In [18]:

model_3 = Sequential([
    Conv2D(filters= 32,kernel_size=(3,3), strides=1, padding='valid', activation='relu', input_shape=(28,28,1) ),
    MaxPool2D(pool_size=(2,2)),
    
    Dropout(rate=0.25),
    
    Flatten(),
    Dense(1024, activation='relu'),
    Dropout(rate=0.25),
    Dense(10, activation='softmax'),
    
])


model_3.compile( optimizer='adam'
              , loss='sparse_categorical_crossentropy'
              , metrics=['accuracy']
             )

In [19]:
for i in range(2):
    model_3.fit_generator(my_generator()
                    , train_dataset.shape[0]//batch_size//5
                    , epochs=5
                , validation_data=(valid_dataset,valid_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [20]:
model_3.evaluate(test_dataset,test_labels)



[0.1306341567747295, 0.9621]

##### Let us give 3 more rounds to mode_3 which would logically equivalent to the way traying was done for model_1.
i.e. training entire training dataset 5 times through 5 epochs  == 5 times full batches; with one full set of batches through 5 epochs

In [21]:
for i in range(2):
    model_3.fit_generator(my_generator()
                    , train_dataset.shape[0]//batch_size//5
                    , epochs=5
                , validation_data=(valid_dataset,valid_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [22]:
model_3.evaluate(test_dataset,test_labels)



[0.10911551758013666, 0.9678]

* It is mere conincidental that batches gave early orientation to the final result. We just saw the snapshot of the entire dataset training. even model1 would have same result at the same snapshot. 

* We should make sure that each batch has balanced labels. We know for sure we have balanced dataset; we should extend the same in each batches.

* If possible we should also try find the outlier and try a technique by name batch normalization.

---
Problem 2
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

In [None]:
Dense()