Multi-Digit traning on SVHN dataset
=============

we build a convolutional neural network to train a model to read the number up to 5 digit form the stree-view-house-number (SVHN).

- Arman Uygur # au2205
- Jonathan Galsurkar #jfg2150
- Nitesh Surtani #ns3148


In [1]:
"""
Importing Packages
"""
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range
import matplotlib.pyplot as plt
import numpy as np
import h5py
import tensorflow as tf
import time
tf.__version__

'1.4.1'

In [2]:
"""
Importing the pre-processed SVHN dataset, it is currently stored as H5 format.
"""
# Opening the h5 file
h5 = h5py.File('SVHN_preprocessed.h5','r')

# Extract the datasets
train_dataset = h5['train_dataset'][:]
train_labels = h5['train_labels'][:]
test_dataset = h5['test_dataset'][:]
test_labels = h5['test_labels'][:]
valid_dataset = h5['valid_dataset'][:]
valid_labels = h5['valid_labels'][:]

# Closing the h5 file
h5.close()

In [3]:
num_labels = 11
num_channels = 1 

In [4]:
""" 
Defining some helper functions.
- accuracy function returns a prediction accuracy
- get_conv_params function returns a tf variable of a given kernel shape, input and output numbers of a conv layer.
- get_fc_params functions returns a tf variable of given input and output numbers of a fully connected layer
"""
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 2).T == labels) / predictions.shape[1] / predictions.shape[0])

def get_conv_params(kernel_shape, in_channel, out_channel):
    return tf.Variable(tf.truncated_normal([kernel_shape, kernel_shape, in_channel, out_channel], stddev=0.1)), tf.Variable(tf.zeros([out_channel]))

def get_fc_params(in_channel, out_channel):
    return tf.Variable(tf.truncated_normal([in_channel, out_channel], stddev=0.1)), tf.Variable(tf.zeros([out_channel]))

In [5]:
""" 
Defining layer functions.
- conv_layer functions takes following parameters:
    -input
    -weight
    -bias
    -stride : if none, defaults to 1 stride, this is a number variable to specify how many strides should be applied
    -pooling: this is a boolean variable, if True, it applies a 2x2 max pooling
    -activation: this variable is by default relu, but it could also be a maxout function
    -batch_norm: this is a boolean variable, if True, it applies a batch normalization.
    
- fc_layer function takes following parameters:
    -input
    -weight
    -bias
    -activation: this is a boolean variable, if True, it applies a relu activation.
"""

def conv_layer(input_x, weight, bias, stride = None, pooling = False, activation = tf.nn.relu, batch_norm = True):
    conv_out = tf.add(tf.nn.conv2d(input_x, weight, [1, 1, 1, 1], padding='SAME'), bias)
    
    if batch_norm:
        conv_out = tf.layers.batch_normalization(conv_out, training = True)
    if activation == 'maxout':
        cell_out = tf.contrib.layers.maxout(conv_out, num_units=48)
    else:
        cell_out = activation(conv_out)
    if pooling:
        cell_out = tf.layers.max_pooling2d(inputs=cell_out, pool_size=[2, 2], strides=stride, padding='SAME')
    return cell_out

def fc_layer(input_x, weight, bias, activation = False):
    cell_out = tf.add(tf.matmul(input_x, weight), bias)
    if activation:
        cell_out = tf.nn.relu(cell_out)
    return cell_out

In [6]:
""" 
Defining model function. It follows the following architecture:

Input -> Conv Layer 1 -> Conv Layer 2 -> Conv Layer 3 -> Conv Layer 4 -> Conv Layer 5 -> Conv Layer 6 -> Conv Layer 7 
-> Conv Layer 8 -> Flatten Output -> Fully Connected Layer 1 -> Fully Connected Layer 2 -> 5 Softmax Logits -> Output

Each convulutional layer includes a dropout (0.9 in best model), a 2x2 max pooling, stride 2 and 1 (alternates), and a relu
activation function (except the first one, it has maxout function)

In different scenarios, we also tried different dropout probs (0.75 for Convs and 0.5 for fc layers), 9 Conv layers, 5 Conv 
layers, different learning rates (1e-2 vs 1e-3), different sizes of fc units (256 vs 3072)
"""

def model(data, keepProb = 0.90):
    conv1 = tf.nn.dropout(conv_layer(data, w1, b1, pooling = True, stride=2, activation='maxout'), keepProb)
    conv2 = tf.nn.dropout(conv_layer(conv1, w2, b2, pooling = True, stride=1), keepProb)
    conv3 = tf.nn.dropout(conv_layer(conv2, w3, b3, pooling = True, stride=2), keepProb)
    conv4 = tf.nn.dropout(conv_layer(conv3, w4, b4, pooling = True, stride=1), keepProb)
    conv5 = tf.nn.dropout(conv_layer(conv4, w5, b5, pooling = True, stride=2), keepProb)
    conv6 = tf.nn.dropout(conv_layer(conv5, w6, b6, pooling = True, stride=1), keepProb)
    conv7 = tf.nn.dropout(conv_layer(conv6, w7, b7, pooling = True, stride=2), keepProb)
    conv8 = tf.nn.dropout(conv_layer(conv7, w8, b8, pooling = True, stride=1), keepProb)
#     conv9 = tf.nn.dropout(conv_layer(conv8, w11, b11, pooling = True, stride=2), keepProb)

    #full-connected layers - 2 layers -     
    shape = conv8.get_shape().as_list()
    reshape = tf.reshape(conv8, [shape[0], shape[1] * shape[2] * shape[3]])
    fc1 = tf.nn.dropout(fc_layer(reshape, w9, b9), keepProb)
    fc2 = tf.nn.dropout(fc_layer(fc1, w10, b10), keepProb)
    
    logitsC1 = fc_layer(fc2, w11, b11)
    logitsC2 = fc_layer(fc2, w12, b12)
    logitsC3 = fc_layer(fc2, w13, b13)
    logitsC4 = fc_layer(fc2, w14, b14)
    logitsC5 = fc_layer(fc2, w15, b15)
    logits = tf.stack([logitsC1, logitsC2, logitsC3, logitsC4, logitsC5])
    
    return logits

In [7]:
"""
Defining variables and predicting. Using 3x3 kernel for first conv layer, and 5x5 for the rest. Both FC Layers are getting 3072
units each. Since we used 100 GB RAM and 2 GPU, we could use batch_size = 1024. We used the same units mentioned in paper for 
each Conv layer - specifically:
Conv Layer 1:  48 Units
Conv Layer 2:  64 Units
Conv Layer 3:  128 Units
Conv Layer 4:  160 Units
Conv Layer 5:  192 Units
Conv Layer 6:  192 Units
Conv Layer 7:  192 Units
Conv Layer 8:  192 Units

FC Layer1: 192 x 4 (4 multiplier depends on the Conv Architecture) and 3072 units
FC Layer2: 3072 units

"""
batch_size = 512
kernel1 = 5
kernel2 = 3

num_hidden = 256
graph = tf.Graph()

with graph.as_default():

    # Input data
    X_train = tf.placeholder(tf.float32, shape=(batch_size, train_dataset.shape[1], train_dataset.shape[2], num_channels))
    y_train = tf.placeholder(tf.int32, shape=(batch_size, 5))
    X_val = tf.constant(valid_dataset)
    X_test = tf.constant(test_dataset)
   
    # Variables
    w1, b1 = get_conv_params(kernel2, num_channels, 48)
    w2, b2 = get_conv_params(kernel1, 48, 64)
    w3, b3 = get_conv_params(kernel1, 64, 128)
    w4, b4 = get_conv_params(kernel1, 128, 160)
    w5, b5 = get_conv_params(kernel1, 160, 192)
    w6, b6 = get_conv_params(kernel1, 192, 192)
    w7, b7 = get_conv_params(kernel1, 192, 192)
    w8, b8 = get_conv_params(kernel1, 192, 192)
    
    
    #full-connected layers - 2 layers - 
    w9, b9 = get_fc_params(192*4, num_hidden)
    w10, b10 = get_fc_params(num_hidden, num_hidden)

    # parameters for logits
    w11, b11 = get_fc_params(num_hidden, num_labels)
    w12, b12 = get_fc_params(num_hidden, num_labels)
    w13, b13 = get_fc_params(num_hidden, num_labels)
    w14, b14 = get_fc_params(num_hidden, num_labels)
    w15, b15 = get_fc_params(num_hidden, num_labels)
  
    # Training computation.
    logits = model(X_train)
    loss1 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,0], logits=logits[0]))
    loss2 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,1], logits=logits[1]))
    loss3 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,2], logits=logits[2]))
    loss4 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,3], logits=logits[3]))
    loss5 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,4], logits=logits[4])) 
    loss = loss1+loss2+loss3+loss4+loss5
  
    # Optimizer.
    optimizer = tf.train.AdamOptimizer(1e-3).minimize(loss)
  
    # Predictions for the training, validation, and test data.
    train_pred = tf.nn.softmax(logits)
    valid_pred = tf.nn.softmax(model(X_val, 1.0))
    test_pred = tf.nn.softmax(model(X_test, 1.0))
    saver = tf.train.Saver()

In [8]:
"""
Simulation Step: Ran all scearios up to 5,000 iterations.
"""
num_iter = 5000
best_acc = 0

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
   
    for step in range(num_iter):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        batch_x = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_y = train_labels[offset:(offset + batch_size), :]
        
        feed_dict = {X_train : batch_x, y_train : batch_y}
        
        _, l, predictions = session.run([optimizer, loss, train_pred], feed_dict=feed_dict)
        
        if (step % 100 == 0):
            train_acc = accuracy(predictions, batch_y[:,:])
            val_acc = accuracy(valid_pred.eval(), valid_labels[:,:])
            print('Batch accuracy at step {0}: {1:.1f}% | Validation accuracy: {2:.1f}% | Batch loss: {3:.2f} '
                  .format(step, train_acc, val_acc,l))

            # saving the best model (one with highest validation accuracy)
            if val_acc > best_acc:
                best_acc = val_acc
                saver.save(session, './best_model')

Batch accuracy at step 0: 13.4% | Validation accuracy: 56.4% | Batch loss: 35.25 
Batch accuracy at step 100: 53.5% | Validation accuracy: 58.1% | Batch loss: 7.31 
Batch accuracy at step 200: 55.8% | Validation accuracy: 59.7% | Batch loss: 6.44 
Batch accuracy at step 300: 57.7% | Validation accuracy: 60.2% | Batch loss: 6.20 
Batch accuracy at step 400: 59.8% | Validation accuracy: 63.1% | Batch loss: 5.85 
Batch accuracy at step 500: 70.5% | Validation accuracy: 69.3% | Batch loss: 4.40 
Batch accuracy at step 600: 73.5% | Validation accuracy: 75.9% | Batch loss: 3.88 
Batch accuracy at step 700: 81.0% | Validation accuracy: 83.1% | Batch loss: 2.71 
Batch accuracy at step 800: 85.3% | Validation accuracy: 85.6% | Batch loss: 2.11 
Batch accuracy at step 900: 82.1% | Validation accuracy: 89.5% | Batch loss: 3.07 
Batch accuracy at step 1000: 90.4% | Validation accuracy: 90.0% | Batch loss: 1.57 
Batch accuracy at step 1100: 91.4% | Validation accuracy: 90.6% | Batch loss: 1.57 
Bat

In [9]:
"""
Restoring the best model (one with highest validation accuracy) to predict test labels.
"""
with tf.Session(graph=graph) as session:
    saver.restore(session, "./best_model")
    prediction = session.run(test_pred, feed_dict={X_test : test_dataset,})
    test_acc = accuracy(prediction, test_labels[:,:])
print('Test accuracy {0:.2f}%'.format(test_acc))

INFO:tensorflow:Restoring parameters from ./best_model
Test accuracy 95.59%


In [6]:
"""
Simulation Example:
5 Conv Layer, 1 FC Layer with FC units = 256 performance

Defining variables and predicting. Using 3x3 kernel for first conv layer, and 5x5 for the rest. Both FC Layers are getting 3072
units each. Since we used 100 GB RAM and 2 GPU, we could use batch_size = 1024. We used the same units mentioned in paper for 
each Conv layer - specifically:
Conv Layer 1:  48 Units
Conv Layer 2:  64 Units
Conv Layer 3:  128 Units
Conv Layer 4:  160 Units
Conv Layer 5:  192 Units

FC Layer1: 192 x 4 (4 multiplier depends on the Conv Architecture) and 256 units

"""
t0 = time.time()
def small_model(data, keepProb = 0.9):
    conv1 = tf.nn.dropout(conv_layer(data, w1, b1, pooling = True, stride=2, activation='maxout'), keepProb)
    conv2 = tf.nn.dropout(conv_layer(conv1, w2, b2, pooling = True, stride=1), keepProb)
    conv3 = tf.nn.dropout(conv_layer(conv2, w3, b3, pooling = True, stride=2), keepProb)
    conv4 = tf.nn.dropout(conv_layer(conv3, w4, b4, pooling = True, stride=1), keepProb)
    conv5 = tf.nn.dropout(conv_layer(conv4, w5, b5, pooling = True, stride=2), keepProb)

    #full-connected layers - 2 layers -     
    shape = conv5.get_shape().as_list()
    reshape = tf.reshape(conv5, [shape[0], shape[1] * shape[2] * shape[3]])
    fc1 = tf.nn.dropout(fc_layer(reshape, w9, b9), keepProb)
    
    logitsC1 = fc_layer(fc1, w11, b11)
    logitsC2 = fc_layer(fc1, w12, b12)
    logitsC3 = fc_layer(fc1, w13, b13)
    logitsC4 = fc_layer(fc1, w14, b14)
    logitsC5 = fc_layer(fc1, w15, b15)
    
    logits = tf.stack([logitsC1, logitsC2, logitsC3, logitsC4, logitsC5])
    return logits

batch_size = 512
kernel1 = 5
kernel2 = 3

num_hidden = 256
graph = tf.Graph()

with graph.as_default():

    # Input data
    X_train = tf.placeholder(tf.float32, shape=(batch_size, train_dataset.shape[1], train_dataset.shape[2], num_channels))
    y_train = tf.placeholder(tf.int32, shape=(batch_size, 5))
    X_val = tf.constant(valid_dataset)
    X_test = tf.constant(test_dataset)
   
    # Variables
    w1, b1 = get_conv_params(kernel2, num_channels, 48)
    w2, b2 = get_conv_params(kernel1, 48, 64)
    w3, b3 = get_conv_params(kernel1, 64, 128)
    w4, b4 = get_conv_params(kernel1, 128, 160)
    w5, b5 = get_conv_params(kernel1, 160, 192)
    
    
    #full-connected layers - 2 layers - 
    w9, b9 = get_fc_params(192*4*4, num_hidden)

    # parameters for logits
    w11, b11 = get_fc_params(num_hidden, num_labels)
    w12, b12 = get_fc_params(num_hidden, num_labels)
    w13, b13 = get_fc_params(num_hidden, num_labels)
    w14, b14 = get_fc_params(num_hidden, num_labels)
    w15, b15 = get_fc_params(num_hidden, num_labels)
  
    # Training computation.
    logits = small_model(X_train)
    loss1 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,0], logits=logits[0]))
    loss2 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,1], logits=logits[1]))
    loss3 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,2], logits=logits[2]))
    loss4 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,3], logits=logits[3]))
    loss5 = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y_train[:,4], logits=logits[4])) 
    loss = loss1+loss2+loss3+loss4+loss5
  
    # Optimizer.
    optimizer = tf.train.AdamOptimizer(1e-3).minimize(loss)
  
    # Predictions for the training, validation, and test data.
    train_pred = tf.nn.softmax(logits)
    valid_pred = tf.nn.softmax(small_model(X_val, 1.0))
    test_pred = tf.nn.softmax(small_model(X_test, 1.0))
    saver = tf.train.Saver()


num_iter = 5000
best_acc = 0

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
   
    for step in range(num_iter):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        
        batch_x = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_y = train_labels[offset:(offset + batch_size), :]
        
        feed_dict = {X_train : batch_x, y_train : batch_y}
        
        _, l, predictions = session.run([optimizer, loss, train_pred], feed_dict=feed_dict)
        
        if (step % 100 == 0):
            train_acc = accuracy(predictions, batch_y[:,:])
            val_acc = accuracy(valid_pred.eval(), valid_labels[:,:])
            print('Batch accuracy at step {0}: {1:.1f}% | Validation accuracy: {2:.1f}% | Batch loss: {3:.2f} '
                  .format(step, train_acc, val_acc,l))

            # saving the best model (one with highest validation accuracy)
            if val_acc > best_acc:
                best_acc = val_acc
                saver.save(session, './best_model_5ConvModel')

t1 = time.time()
print('Training Time: {0:.2f} seconds'.format(t1-t0))

Batch accuracy at step 0: 6.2% | Validation accuracy: 54.7% | Batch loss: 67.94 
Batch accuracy at step 100: 53.3% | Validation accuracy: 58.1% | Batch loss: 9.01 
Batch accuracy at step 200: 55.1% | Validation accuracy: 60.6% | Batch loss: 7.89 
Batch accuracy at step 300: 60.8% | Validation accuracy: 66.2% | Batch loss: 6.64 
Batch accuracy at step 400: 68.8% | Validation accuracy: 72.1% | Batch loss: 5.20 
Batch accuracy at step 500: 78.4% | Validation accuracy: 79.3% | Batch loss: 3.82 
Batch accuracy at step 600: 80.2% | Validation accuracy: 82.6% | Batch loss: 3.26 
Batch accuracy at step 700: 83.7% | Validation accuracy: 86.0% | Batch loss: 2.69 
Batch accuracy at step 800: 86.7% | Validation accuracy: 87.6% | Batch loss: 2.21 
Batch accuracy at step 900: 82.9% | Validation accuracy: 89.5% | Batch loss: 2.89 
Batch accuracy at step 1000: 89.3% | Validation accuracy: 90.4% | Batch loss: 1.71 
Batch accuracy at step 1100: 90.0% | Validation accuracy: 91.0% | Batch loss: 1.80 
Batc

In [7]:
"""
Restoring the best small architecture model (one with highest validation accuracy) to predict test labels.
"""
with tf.Session(graph=graph) as session:
    saver.restore(session, "./best_model_5ConvModel")
    prediction = session.run(test_pred, feed_dict={X_test : test_dataset,})
    test_acc = accuracy(prediction, test_labels[:,:])
print('Test accuracy {0:.2f}%'.format(test_acc))

INFO:tensorflow:Restoring parameters from ./best_model_5ConvModel
Test accuracy 94.04%
