# Machine Learning Engineer Nanodegree
## Deep Learning
## Project: Build a Digit Recognition Program

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with **'Implementation'** in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with **'Optional'** in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a **'Question'** header. Carefully read each question and provide thorough answers in the following text boxes that begin with **'Answer:'**. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

>**Note:** Code and Markdown cells can be executed using the **Shift + Enter** keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.

----
## Step 1: Design and Test a Model Architecture
Design and implement a deep learning model that learns to recognize sequences of digits. Train the model using synthetic data generated by concatenating character images from [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) or [MNIST](http://yann.lecun.com/exdb/mnist/). To produce a synthetic sequence of digits for testing, you can for example limit yourself to sequences up to five digits, and use five classifiers on top of your deep network. You would have to incorporate an additional ‘blank’ character to account for shorter number sequences.

There are various aspects to consider when thinking about this problem:
- Your model can be derived from a deep neural net or a convolutional network.
- You could experiment sharing or not the weights between the softmax classifiers.
- You can also use a recurrent network in your deep neural net to replace the classification layers and directly emit the sequence of digits one-at-a-time.

You can use ** Keras ** to implement your model. Read more at [keras.io](https://keras.io/).

Here is an example of a [published baseline model on this problem](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42241.pdf). ([video](https://www.youtube.com/watch?v=vGPI_JvLoN0)). You are not expected to model your architecture precisely using this model nor get the same performance levels, but this is more to show an exampe of an approach used to solve this particular problem. We encourage you to try out different architectures for yourself and see what works best for you. Here is a useful [forum post](https://discussions.udacity.com/t/goodfellow-et-al-2013-architecture/202363) discussing the architecture as described in the paper and here is [another one](https://discussions.udacity.com/t/what-loss-function-to-use-for-multi-digit-svhn-training/176897) discussing the loss function.

### Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [1]:
from os.path import isfile
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import tqdm
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline



In [2]:
filepath_mnist = './models/mnist/checkpoint'
mnist = input_data.read_data_sets('.',one_hot=True,reshape=False)

# Save File #
save_file = "./models/mnist/save_test_model"

## Input Parameters ##
n_classes = 10
last_loss = None

## Model Hyperparameters ##
epochs =20000
learning_rate = 0.0001
dropout = 0.5
test_size = 256
batch_size = 50
epsilon = 1e-3
decay = 0.999

## Definition ##

def neural_net_image_input(image_shape):

    image_shape = [None,image_shape[0],image_shape[1],image_shape[2]]
    X = tf.placeholder(tf.float32,shape=image_shape,name="x")
    return X


def neural_net_label_input(n_classes):

    Y = tf.placeholder(tf.float32,[None,n_classes],name="y")
    return Y


def neural_net_keep_prob_input():

    keep_prob = tf.placeholder(tf.float32,name="keep_prob")
    return keep_prob

def batch_norm_wrapper(x_tensor,is_training):

    ## Tensor Shape ##
    tensor_shape = x_tensor.get_shape().as_list()
    ## Gamma and Beta ##
    gamma = tf.Variable(tf.ones(shape=[tensor_shape[-1]]))
    beta = tf.Variable(tf.zeros(shape=[tensor_shape[-1]]))
    pop_mean = tf.Variable(tf.zeros(shape=[tensor_shape[-1]]), trainable=False)
    pop_var = tf.Variable(tf.ones(shape=[tensor_shape[-1]]), trainable=False)

    if is_training:
        if len(tensor_shape) == 2:
            batch_mean, batch_var = tf.nn.moments(x_tensor,axes=[0])
        elif len(tensor_shape) == 4:
            batch_mean, batch_var = tf.nn.moments(x_tensor,axes=[0,1,2])
        else:
            print("Wrong Dimensions")
            exit()

        train_mean = tf.assign(pop_mean,pop_mean * decay + (1 - decay) * batch_mean)
        train_var = tf.assign(pop_var,pop_var * decay + (1 - decay) * batch_var)

        with tf.control_dependencies([train_mean,train_var]):
            return tf.nn.batch_normalization(x_tensor,batch_mean,batch_var,beta,gamma,epsilon)
    else:
        return tf.nn.batch_normalization(x_tensor,pop_mean,pop_var,beta,gamma,epsilon)


def xavier_init(x_tensor,weight_dim):

    Nin = x_tensor.get_shape().as_list()[-1]
    Nout = weight_dim[-1]

    return tf.cast(tf.sqrt(tf.divide(2,tf.add(Nin,Nout))),tf.float32)


def conv2d_maxpool(x_tensor,conv_num_outputs,conv_ksize,conv_stride,pool_ksize,pool_stride,is_training):

    ## Tensor Shape ##
    # [batch,height,width,depth]
    tensor_shape = x_tensor.get_shape().as_list()

    ## Weight and Bias Dimensions #
    weight_dim = [*conv_ksize,tensor_shape[-1],conv_num_outputs]
    bias_dim = [conv_num_outputs]

    ## Filter Dimensions ##
    filter_stride = [1,*conv_stride,1]

    ## Pooling Dimensions ##
    pool_stride = [1,*pool_stride,1]
    pool_ksize = [1,*pool_ksize,1]

    ## Weights and Biases ##
    weights = tf.Variable(tf.truncated_normal(shape=weight_dim,stddev=xavier_init(x_tensor,weight_dim)))
    biases = tf.Variable(tf.constant(0.1,shape=bias_dim))

    ## Convolution ##
    conv_layer = tf.nn.bias_add(tf.nn.conv2d(x_tensor,weights,filter_stride,padding="SAME"),biases)

    ## Batch Normalization ##
    conv_layer = batch_norm_wrapper(conv_layer,is_training)

    ## Activation ##
    conv_layer = tf.nn.relu(conv_layer)

    ## Max Pooling ##
    conv_layer = tf.nn.max_pool(conv_layer,pool_ksize,pool_stride,padding="SAME")

    return conv_layer


## Change tensor from 4D to 2D for dense layers
def flatten(x_tensor):

    ## Tensor shape ##
    tensor_shape = x_tensor.get_shape().as_list()
    tensor_shape = tensor_shape[1] * tensor_shape[2] * tensor_shape[3]

    return tf.reshape(x_tensor,shape=[-1,tensor_shape])

def fully_conn(x_tensor,num_outputs,is_training):

    ## Tensor shape ##
    tensor_shape = x_tensor.get_shape().as_list()

    ## Weight and Bias Dimensions ##
    weight_dim = [tensor_shape[-1],num_outputs]
    bias_dim = [num_outputs]

    ##Weights and Biases
    weights = tf.Variable(tf.truncated_normal(shape=weight_dim,stddev=xavier_init(x_tensor,weight_dim)))
    biases = tf.Variable(tf.constant(0.1,shape=bias_dim))

    ## Forward Propogation ##
    fc1 = tf.add(tf.matmul(x_tensor,weights),biases)

    ## Batch Normalization ##
    fc1 = batch_norm_wrapper(x_tensor,is_training)

    ## Activation ##
    fc1 = tf.nn.relu(fc1)

    return fc1


def output(x_tensor,num_outputs):

    ## Weight and Bias Dimensions ##
    weight_dim = [x_tensor.get_shape().as_list()[-1],num_outputs]
    bias_dim = [num_outputs]

    ## Weights and Biases ##
    weights = tf.Variable(tf.truncated_normal(shape=weight_dim,stddev=xavier_init(x_tensor,weight_dim)))
    biases = tf.Variable(tf.constant(0.1,shape=bias_dim))

    ## Forward Propogation ##
    out = tf.add(tf.matmul(x_tensor,weights),biases,name="output")

    return out

def conv_net(is_training,keep_prob):

    ## Features and Labels ##
    features = neural_net_image_input((28,28,1))
    labels = neural_net_label_input((10))


    ## Convolutions layer parameters ## 
    conv_param = {"conv1_num_outputs" : 32, "conv1_conv_ksize" : (5,5), "conv1_conv_strides" : (1,1),
                  "conv1_pool_ksize" : (2,2), "conv1_pool_strides" : (2,2), "conv2_num_outputs" : 64,
                  "conv2_conv_ksize" : (5,5), "conv2_conv_strides" : (1,1), "conv2_pool_ksize" : (2,2),
                  "conv2_pool_strides" : (2,2), "conv3_num_outputs" : 128 , "conv3_conv_ksize" : (3,3),
                  "conv3_conv_strides" : (1,1), "conv3_pool_ksize" : (2,2),"conv3_pool_strides" : (2,2) }

    ## Parameters: Fully Connected Layer ##
    ## Current best fc_param = {"fc1_num_outputs" : 1024 , "fc2_num_outputs" : 256,"fc3_num_outputs" : 64, "dropout": keep_prob }
    fc_param = {"fc1_num_outputs" : 1024 , "fc2_num_outputs" : 64,"fc3_num_outputs" : 30, "dropout": keep_prob }

    ## Paramaters: Output layer ##
    output_param = {"output_num_outputs" : 10}


    ## Layer 1 ##
    conv1_layer = conv2d_maxpool(features,conv_param["conv1_num_outputs"], conv_param["conv1_conv_ksize"], conv_param["conv1_conv_strides"],
                                 conv_param["conv1_pool_ksize"],conv_param["conv1_pool_strides"],is_training)

    ## Layer 2 ##
    conv2_layer = conv2d_maxpool(conv1_layer,conv_param["conv2_num_outputs"], conv_param["conv2_conv_ksize"], conv_param["conv2_conv_strides"],
                                 conv_param["conv2_pool_ksize"],conv_param["conv2_pool_strides"],is_training)
    # Layer 3 ##
    conv3_layer = conv2d_maxpool(conv2_layer,conv_param["conv3_num_outputs"], conv_param["conv3_conv_ksize"], conv_param["conv3_conv_strides"],
                                 conv_param["conv3_pool_ksize"],conv_param["conv3_pool_strides"],is_training)

    ## Flattening ##
    # Convert from 4D to 2D
    flat = flatten(conv3_layer)

    ## Fully Connected Layer 1 ##
    fc1_layer = fully_conn(flat,fc_param["fc1_num_outputs"],is_training)
    # Dropout #
    fc1_layer = tf.nn.dropout(fc1_layer,keep_prob=keep_prob)
    
    ## Fully Connected Layer 2 ##
    fc2_layer = fully_conn(fc1_layer,fc_param["fc1_num_outputs"],is_training)
    # Dropout #
    fc2_layer = tf.nn.dropout(fc2_layer,keep_prob=keep_prob)
    
    ## Output Layer ##
    out_layer = output(fc2_layer,output_param["output_num_outputs"])

    ## Cost or Cross Entropy Loss ##
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=labels))

    ## Optimizer ##
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    ## Predictions ##
    predictions = tf.equal(tf.argmax(out_layer,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(predictions,tf.float32), name="accuracy")

    return (features,labels), optimizer, cost, accuracy, out_layer, tf.train.Saver()

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


In [3]:
def save_test_model():
    ## Dropout Probability ##
    tf.reset_default_graph()
    keep_probability = neural_net_keep_prob_input()
    (features,labels_), optimizer, cost, accuracy, _, saver = conv_net(True,keep_probability)

    acc = []
    last_loss = None

    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())

        total_batches = int(mnist.train.num_examples / batch_size)

        for epoch in tqdm.tqdm(range(epochs)):

            #for i in range(total_batches):

            batch_x, batch_y = mnist.train.next_batch(batch_size)
            ## Run Optimizer ##
            sess.run(optimizer,feed_dict={features : batch_x, labels_ : batch_y, keep_probability : dropout})

            if epoch % (epochs / 10) == 0:
                loss = sess.run(cost,feed_dict={features : batch_x, labels_ : batch_y, keep_probability : 1.0})
                vali_acc = sess.run(accuracy, feed_dict={features : mnist.validation.images, labels_ : mnist.validation.labels, keep_probability : 1.0})
                #acc.append(vali_acc)
                if last_loss and last_loss > loss:
                    saver.save(sess,save_file)
                else:
                    print("Validation loss has not decreased")
                last_loss = loss
                print("Epoch #: {:}, Loss: {:}, Validation Accuracy: {:} " .format(epoch+1,loss,vali_acc))


In [4]:
def load_test_model(new_paths):
    
    tf.reset_default_graph()
    keep_probability = neural_net_keep_prob_input()
    (features,labels_), _, _, accuracy, out_layer, saver = conv_net(False,keep_probability)
    save_file = new_paths[1]
    predictions = []
    correct = 0
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        saver.restore(sess, save_file)
        for i in range(100):
            pred, corr = sess.run([tf.argmax(out_layer,1), accuracy],
                                 feed_dict={features: [mnist.test.images[i]], labels_: [mnist.test.labels[i]], keep_probability : 1.0})
            correct += corr
            predictions.append(pred[0])
    print("PREDICTIONS:", predictions)
    print("ACCURACY:", correct/100)



In [5]:
def saver_loader_start(filepath):
    if not isfile(filepath):
        save_test_model()
    paths = filepath.split("/")
    paths = paths[:-1]
    files = ["/save_test_model.meta","/save_test_model"]
    new_paths = []
    for file in files:
        new_paths.append('/'.join(paths) + file)
    load_test_model(new_paths)

In [6]:
saver_loader_start(filepath_mnist)

INFO:tensorflow:Restoring parameters from ./models/mnist/save_test_model
PREDICTIONS: [7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1, 5, 9, 7, 8, 4, 9, 6, 6, 5, 4, 0, 7, 4, 0, 1, 3, 1, 3, 4, 7, 2, 7, 1, 2, 1, 1, 7, 4, 2, 3, 5, 1, 2, 4, 4, 6, 3, 5, 5, 6, 0, 4, 1, 9, 5, 7, 8, 9, 3, 7, 4, 6, 4, 3, 0, 7, 0, 2, 9, 1, 7, 3, 2, 9, 7, 7, 6, 2, 7, 8, 4, 7, 3, 6, 1, 3, 6, 4, 3, 1, 4, 1, 7, 6, 9]
ACCURACY: 0.98


----
## Step 2: Train a Model on a Realistic Dataset
Once you have settled on a good architecture, you can train your model on real data. In particular, the [Street View House Numbers (SVHN)](http://ufldl.stanford.edu/housenumbers/) dataset is a good large-scale dataset collected from house numbers in Google Street View. Training on this more challenging dataset, where the digits are not neatly lined-up and have various skews, fonts and colors, likely means you have to do some hyperparameter exploration to perform well.

### Implementation
Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [7]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import tqdm
import helper
#import matplotlib.pyplot as plt
import numpy as np
from os.path import isfile,abspath
from sklearn.model_selection import train_test_split
import tqdm


In [8]:
filepath_svhn = './models/svhn/checkpoint'
features_train, labels_train, features_test, labels_test = helper.preprocess(abspath('./'))
features_train,features_validate, labels_train, labels_validate = train_test_split(features_train,labels_train,test_size=0.25,random_state=42)

# Save File #
save_file = "./models/svhn/save_streetview_model"

## Input Parameters ##
n_classes = 10
last_loss = None

## Model Hyperparameters ##
epochs = 10000
learning_rate = 0.0001
dropout = 0.7
test_size = 256
batch_size = 50
epsilon = 1e-3
decay = 0.999



## Definition ##
def conv_net_svhn(is_training,keep_prob):

    ## Features and Labels ##
    features = neural_net_image_input((32,32,1))
    labels = neural_net_label_input((10))


    ## Convolutions layer parameters ## 
    conv_param = {"conv1_num_outputs" : 32, "conv1_conv_ksize" : (5,5), "conv1_conv_strides" : (1,1),
                  "conv1_pool_ksize" : (2,2), "conv1_pool_strides" : (2,2), "conv2_num_outputs" : 64,
                  "conv2_conv_ksize" : (5,5), "conv2_conv_strides" : (1,1), "conv2_pool_ksize" : (2,2),
                  "conv2_pool_strides" : (2,2), "conv3_num_outputs" : 128 , "conv3_conv_ksize" : (3,3),
                  "conv3_conv_strides" : (1,1), "conv3_pool_ksize" : (2,2),"conv3_pool_strides" : (2,2) }

    ## Parameters: Fully Connected Layer ##
    ## Current best fc_param = {"fc1_num_outputs" : 1024 , "fc2_num_outputs" : 256,"fc3_num_outputs" : 64, "dropout": keep_prob }
    fc_param = {"fc1_num_outputs" : 1024 , "fc2_num_outputs" : 64,"fc3_num_outputs" : 30, "dropout": keep_prob }

    ## Paramaters: Output layer ##
    output_param = {"output_num_outputs" : 10}


    ## Layer 1 ##
    conv1_layer = conv2d_maxpool(features,conv_param["conv1_num_outputs"], conv_param["conv1_conv_ksize"], conv_param["conv1_conv_strides"],
                                 conv_param["conv1_pool_ksize"],conv_param["conv1_pool_strides"],is_training)

    ## Layer 2 ##
    conv2_layer = conv2d_maxpool(conv1_layer,conv_param["conv2_num_outputs"], conv_param["conv2_conv_ksize"], conv_param["conv2_conv_strides"],
                                 conv_param["conv2_pool_ksize"],conv_param["conv2_pool_strides"],is_training)
    # Layer 3 ##
    conv3_layer = conv2d_maxpool(conv2_layer,conv_param["conv3_num_outputs"], conv_param["conv3_conv_ksize"], conv_param["conv3_conv_strides"],
                                 conv_param["conv3_pool_ksize"],conv_param["conv3_pool_strides"],is_training)

    ## Flattening ##
    # Convert from 4D to 2D
    flat = flatten(conv3_layer)

    ## Fully Connected Layer 1 ##
    fc1_layer = fully_conn(flat,fc_param["fc1_num_outputs"],is_training)
    # Dropout #
    fc1_layer = tf.nn.dropout(fc1_layer,keep_prob=keep_prob)
    
    ## Fully Connected Layer 2 ##
    fc2_layer = fully_conn(fc1_layer,fc_param["fc1_num_outputs"],is_training)
    # Dropout #
    fc2_layer = tf.nn.dropout(fc2_layer,keep_prob=keep_prob)
    
    ## Output Layer ##
    out_layer = output(fc2_layer,output_param["output_num_outputs"])

    ## Cost or Cross Entropy Loss ##
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=labels))

    ## Optimizer ##
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    ## Predictions ##
    predictions = tf.equal(tf.argmax(out_layer,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(predictions,tf.float32), name="accuracy")

    return (features,labels), optimizer, cost, accuracy, out_layer, tf.train.Saver()


Reading Preprocessed data....


In [9]:
def save_test_model_svhn():
    ## Dropout Probability ##
    tf.reset_default_graph()
    keep_probability = neural_net_keep_prob_input()
    (features,labels_), optimizer, cost, accuracy, _, saver = conv_net(True,keep_probability)

    acc = []

    with tf.Session() as sess:

        sess.run(tf.global_variables_initializer())

        for epoch in tqdm.tqdm(range(epochs)):
            total_batches = int(features_train.shape[0] // batch_size)
            #for batch_x, batch_y in helper.batch_features_labels(features_train,labels_train,batch_size):
            batch_x,batch_y = helper.batch_features_labels(features_train,labels_train,batch_size)[epoch % total_batches]
            ## Run Optimizer ##
            sess.run(optimizer,feed_dict={features : batch_x, labels_ : batch_y, keep_probability : dropout})

            if epoch % (epochs / 10) == 0:
                loss = sess.run(cost,feed_dict={features : features_validate, labels_ : labels_validate, keep_probability : 1.0})
                vali_acc = sess.run(accuracy, feed_dict={features : features_validate, labels_ : labels_validate, keep_probability : 1.0})
                #acc.append(vali_acc)
                if last_loss and last_loss > loss:
                    saver.save(sess,save_file)
                else:
                    print("Validation loss has not decreased")
                last_loss = loss
                print("Epoch #: {:}, Loss: {:}, Validation Accuracy: {:} " .format(epoch+1,loss,vali_acc))


In [19]:
def load_test_model_svhn(new_paths):
    
    tf.reset_default_graph()
    keep_probability = neural_net_keep_prob_input()
    (features,labels_), _, _, accuracy, out_layer, saver = conv_net_svhn(False,keep_probability)
    save_file = new_paths[1]
    predictions = []
    correct = 0
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        saver.restore(sess, save_file)
        test_acc = sess.run(accuracy, feed_dict = {features : features_test[:5000], labels_ : labels_test[:5000], keep_probability : 1.0})
        print("Test Accuracy: {:}%".format(test_acc*100))



In [20]:
def saver_loader_start_svhn(filepath):
    if not isfile(filepath):
        save_test_model_svhn()
    paths = filepath.split("/")
    paths = paths[:-1]
    files = ["/save_streetview_model.meta","/save_streetview_model"]
    new_paths = []
    for file in files:
        new_paths.append('/'.join(paths) + file)
    load_test_model_svhn(new_paths)

In [21]:
saver_loader_start_svhn(filepath_svhn)

INFO:tensorflow:Restoring parameters from ./models/svhn/save_streetview_model
Test Accuracy: 95.4800009727478%
