## Buliding Feed Forward Neural Network

Most probably, Neural Network is the first step in the deep learning.The name Deep Learning comes from the concept that computer scientists hope to mimic the brain structure with the same functionality of the neurons. The key feature of this model is that it can separate data which is NOT linearly separable. We assume that you have the basic knowledge over the concept and you are just interested in the Tensorflow implementation of the Neural Nets. If you want to know more about the Neural Nets we suggest you to take this amazing course on machine learning. To build any classifier, your code needs specific parts:

Prepare the needed libraries, input data and hyper-parameters for the network
    1.Build the graph of the network
    2.Train the network

#### Prepration:
_Imports: We will start with importing the required libraries._

In [None]:
import tensorflow as tf
import numpy as np 
import matplotlib.pyplot as plt
print('Tensorflow Version : {}'.format(tf.__version__))

**Import tensorflow package for loading the data**

Note: This module will be depreceated in future tensorflow version 

In [None]:
from tensorflow.examples.tutorials.mnist import input_data

In [None]:
mnist = input_data.read_data_sets('data', one_hot=True, fake_data=False)

In [None]:
print("Size of:")
print("- Training-set:\t\t{}".format(len(mnist.train.labels)))
print("- Test-set:\t\t{}".format(len(mnist.test.labels)))
print("- Validation-set:\t{}".format(len(mnist.validation.labels)))

#### Util  

Below is plot images util function which we will use for plotting the true and predicted class labels 

In [None]:
def plot_images(images, cls_true, cls_pred=None, title=None):
    """
    Create figure with 3x3 sub-plots.
    :param images: array of images to be plotted, (9, img_h*img_w)
    :param cls_true: corresponding true labels (9,)
    :param cls_pred: corresponding true labels (9,)
    """
    fig, axes = plt.subplots(3, 3, figsize=(9, 9))
    fig.subplots_adjust(hspace=0.3, wspace=0.3)
    img_h = img_w = np.sqrt(images.shape[-1]).astype(int)
    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape((img_h, img_w)), cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            ax_title = "True: {0}".format(np.argmax(cls_true[i]))
        else:
            ax_title = "True: {0}, Pred: {1}".format(np.argmax(cls_true[i]), cls_pred[i])

        ax.set_title(ax_title)

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])
    
    if title:
        plt.suptitle(title, size=20)
    plt.show()


In [None]:
def plot_example_errors(images, cls_true, cls_pred, title=None):
    """
    Function for plotting examples of images that have been mis-classified
    :param images: array of all images, (#imgs, img_h*img_w)
    :param cls_true: corresponding true labels, (#imgs,)
    :param cls_pred: corresponding predicted labels, (#imgs,)
    """
    # Negate the boolean array.
    incorrect = np.logical_not(np.equal(cls_pred, np.argmax(cls_true,axis=1)))

    # Get the images from the test-set that have been
    # incorrectly classified.
    incorrect_images = images[incorrect]

    # Get the true and predicted classes for those images.
    cls_pred = cls_pred[incorrect]
    cls_true = cls_true[incorrect]

    # Plot the first 9 images.
    plot_images(images=incorrect_images[0:9],
                cls_true=cls_true[0:9],
                cls_pred=cls_pred[0:9],
                title=title)

In [None]:
#Get images for plotting - here we are getting first 9 images and labels 
images = mnist.train.images[0:9]
images_class_true = mnist.train.labels[0:9]
plot_images(images, images_class_true)

#### Hyper-parameters:
Hyper-parameters are important parameters which are not learned by the network. So, we have to specify them externally. These parameters are constant and they are not learnable.

In [None]:
# hyper-parameters
learning_rate = 0.001  # The optimization learning rate
epochs = 10  # Total number of training epochs
batch_size = 128  # Training batch size
display_freq = 100  # Frequency of displaying the training results

# Network Parameters
# We know that MNIST images are 28 pixels in each dimension.
img_h = img_w = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_h * img_w

# Number of classes, one class for each of 10 digits.
n_classes = 10

# number of units in the first hidden layer
hidden_units_first_layer = 500

In [None]:
with tf.name_scope('input'):
    x = tf.placeholder(tf.float32, [None, img_size_flat], name='x-input')
    y_ = tf.placeholder(tf.float32, [None, n_classes], name='y-input')
    keep_prob = tf.placeholder(tf.float32)

In [None]:
def weight_variable(shape):
    """Create a weight variable with appropriate initialization."""
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

In [None]:
def bias_variable(shape):
    """Create a bias variable with appropriate initialization."""
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

#### Readings

[Activation Functions](https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6)
,[Searching for activation function](https://arxiv.org/pdf/1710.05941.pdf)

In [None]:
def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
    """Reusable code for making a simple neural net layer.
        It does a matrix multiply, bias add, and then uses relu to nonlinearize.
        It also sets up name scoping so that the resultant graph is easy to read, and
        adds a number of summary ops.
    """
    # Adding a name scope ensures logical grouping of the layers in the graph.
    with tf.name_scope(layer_name):
        # This Variable will hold the state of the weights for the layer
        with tf.name_scope('weights'):
            weights = weight_variable([input_dim, output_dim])
        with tf.name_scope('biases'):
            biases = bias_variable([output_dim])
        with tf.name_scope('Wx_plus_b'):
            preactivate = tf.matmul(input_tensor, weights) + biases
        activations = act(preactivate, name='activation')
        return activations

In [None]:
#Create the Network
#The networki will be with one hidden layer with 500 units (change if you want just to see the difference)
hidden1 = nn_layer(x, img_size_flat, hidden_units_first_layer, 'layer1')
#Added dropout layer 
dropped = tf.nn.dropout(hidden1, keep_prob)
#Output Layer with softmax activation function whichwill give propbabilities about the predicted class
y = nn_layer(dropped, hidden_units_first_layer, n_classes, 'layer2', act=tf.nn.softmax)

In [None]:
with tf.name_scope('prediction'):
    class_prediction = tf.argmax(y, axis=1, name='predictions')

In [None]:
with tf.name_scope('cross_entropy'):
    diff = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_, logits=y)
    with tf.name_scope('loss'):
        loss = tf.reduce_mean(diff)

#### Readings

[Matrix Calculus](http://explained.ai/matrix-calculus/index.html), [Overvview of Gradient Descent](http://ruder.io/optimizing-gradient-descent/), [Neural networks: training with backpropagation](https://www.jeremyjordan.me/neural-networks-training/)


In [None]:
with tf.name_scope('adam_optimizer'):
    train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)

In [None]:
with tf.name_scope('accuracy'):
    with tf.name_scope('correct_prediction'):
        correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    with tf.name_scope('accuracy'):
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

In [None]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    writer = tf.summary.FileWriter('/notebooks/graphs' + '/dnn', sess.graph)
    # Number of training iterations in each epoch
    num_tr_iter = int(mnist.train.num_examples // batch_size)
    print('Total Number of itteration : {}'.format(num_tr_iter))
    for epoch in range(epochs):
        print('Training epoch: {}'.format(epoch+1))
        for iteration in range(num_tr_iter):
            batch_x, batch_y = mnist.train.next_batch(batch_size)

            # Run optimization op (backprop)
            feed_dict_batch = {x: batch_x, y_: batch_y, keep_prob:0.3}
            _ = sess.run(train_step, feed_dict=feed_dict_batch)

            if iteration % display_freq == 0:
            # Calculate and display the batch loss and accuracy
                loss_batch, acc_batch = sess.run([loss, accuracy],
                                                 feed_dict=feed_dict_batch)
                print("iter {0:3d}:\t Loss={1:.2f},\tTraining Accuracy={2:.01%}".format(iteration, 
                                                                                        loss_batch, 
                                                                                        acc_batch))
        
        # Run validation after every epoch
        feed_dict_valid = {x: mnist.validation.images, y_: mnist.validation.labels, keep_prob: 1.0}
        loss_valid, acc_valid = sess.run([loss, accuracy], feed_dict=feed_dict_valid)
        print('---------------------------------------------------------')
        print("Epoch: {0}, validation loss: {1:.2f}, validation accuracy: {2:.01%}".format(epoch + 1, 
                                                                                           loss_valid, 
                                                                                           acc_valid))
        print('---------------------------------------------------------')
    # Test the network after training
    # Accuracy
    feed_dict_test = {x: mnist.test.images, y_: mnist.test.labels, keep_prob:1.0}
    cls_pred,loss_test, acc_test = sess.run([class_prediction,loss, accuracy], feed_dict=feed_dict_test)
    print('---------------------------------------------------------')
    print("Test loss: {0:.2f}, test accuracy: {1:.01%}".format(loss_test, acc_test))
    print('---------------------------------------------------------')

    # Plot some of the correct and misclassified examples
    cls_true = mnist.test.labels
    plot_images(mnist.test.images, cls_true, cls_pred, title='Examples')
    plot_example_errors(mnist.test.images, cls_true, cls_pred, title='Misclassified Examples')
    

As it can be seen some of the examples are even hard for human to be classified. But we will see in the future that there are networks (like Convolutional Neural Networks) that can reduce the classification error.

#### Reccomended Book 
[Deep Learning](http://www.deeplearningbook.org/)