# Implementation of Neural Networks using Tensorflow

In [1]:
import tensorflow as tf
import numpy as np

In [2]:
from sklearn.datasets import fetch_mldata

# mnist = fetch_mldata('MNIST original')

## Multi-Layer Perceptron

This has an input layer and an output layer, and one or more Hidden Layers. Every layer except the input and output layers has a bias neuron and each layer is fully connected to the next layer.

## Training a DNN for MNIST

When an neural network has two or more hidden layers it is called a Deep Neural Network (DNN).

### Construction

In [3]:
n_inputs = 28*28  # MNIST
n_hidden1 = 300
n_hidden2 = 200
n_hidden3 = 100
n_outputs = 10
learning_rate = 0.05
momentum = 0.9

X will be the input layer. Each image will be placed along the 1st dimension and each pixel will go down the 2nd dimension. 

$$
X =
\begin{bmatrix}
    image_1 / pixel_1 & image_2 / pixel_1 & \dots & image_n / pixel_1 \\
    image_1 / pixel_2 & image_2 / pixel_2 & \dots & image_n / pixel_2  \\
    \vdots & \vdots & \ddots & \vdots \\
    image_1 / pixel_{784} & image_2 / pixel_{784} & \dots & image_n / pixel_{784} \\
\end{bmatrix}
$$

X will be replaced by one training batch at a time, but we dont know the size of each training batch. Hence, the shape of the input layer is $(None, 28 * 28)$. 

All instances of a training batch are processed at the same time by the neural network. 

In [4]:
# Input layer
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

The W matrix will hold weights which will be updated during training. It will hold the connection weights between each input and each neuron, hence the shape (n_inputs, n_neurons). It will be initialised with a truncated normal distribution, with a standard deviation: $$\sigma = \frac{2}{\sqrt{n_{inputs}}}$$

A truncated normal distribution prevents any larger number which may slow down training.

We need to initialise weights randomly so that when each neuron in each layer is updated they will be updated differently.

The W (weights) and b (biases) are a part of the trainable variables within the Tensorflow graph. These will be updated 

In [5]:
def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name="weights")
        b = tf.Variable(tf.zeros([n_neurons]), name="biases")
        z = tf.matmul(X, W) + b
        if activation == "relu":
            return tf.nn.relu(z)
        return z

In [6]:
with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, "hidden1", "relu")
    hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", "relu")
    logits = neuron_layer(hidden2, n_outputs, "output")

Fully_connected internally creates a weights matrix for the connection weights between each input and each neuron.

[Dropout regularisation](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf) is applied to every layer of the netural network except the output layer. Since each neuron may or may not be present, there are $2^N$ possible neural networks (N is the total number of dropable neurons) that can be produced to learn a training step. So we essentially have an ensemble neural network of all the neural networks produced at each training step.

In [7]:
from tensorflow.contrib.layers import fully_connected, dropout

keep_prob = 0.5
activation_fn = tf.nn.elu
is_training = tf.placeholder(tf.bool, shape=(), name="is_training")

with tf.name_scope("fc_dnn"):
    X_drop = dropout(X, keep_prob=keep_prob, is_training=is_training)
    
    hidden1 = fully_connected(X_drop, n_hidden1,
                              activation_fn=activation_fn, 
                              scope="hidden1")
    hidden1_drop = dropout(hidden1, keep_prob=keep_prob, is_training=is_training)
    
    hidden2 = fully_connected(hidden1, n_hidden2,
                              activation_fn=activation_fn,
                              scope="hidden2")
    hidden2_drop = dropout(hidden2, keep_prob=keep_prob, is_training=is_training)
    
    hidden3 = fully_connected(hidden2, n_hidden2,
                              activation_fn=activation_fn,
                              scope="hidden3")
    hidden3_drop = dropout(hidden3, keep_prob=keep_prob, is_training=is_training)
    
    logits = fully_connected(hidden3_drop, n_outputs, scope="outputs")

In [8]:
with tf.name_scope("loss"):
    entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=y, logits=logits)
    loss = tf.reduce_mean(entropy, name="loss")

In [9]:
with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate,
                                           momentum=momentum,
                                           use_nesterov=True)
    training_op = optimizer.minimize(loss)

In [10]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [11]:
init = tf.global_variables_initializer()

### Execution Phase

In [12]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data")

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [13]:
n_epochs = 10
batch_size = 50

with tf.Session() as sess:
    init.run()
    
    for epoch in range(n_epochs):
        for _ in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch,
                                             is_training: True})
            
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch,
                                             is_training: False})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images, 
                                            y: mnist.test.labels,
                                            is_training: False})
        print(epoch,
              "Train accuracy:", acc_train,
              "Test accuracy:", acc_test)

0 Train accuracy: 0.92 Test accuracy: 0.9096
1 Train accuracy: 1.0 Test accuracy: 0.9671
2 Train accuracy: 0.96 Test accuracy: 0.974
3 Train accuracy: 0.94 Test accuracy: 0.9893
4 Train accuracy: 1.0 Test accuracy: 0.9799
5 Train accuracy: 1.0 Test accuracy: 0.9829
6 Train accuracy: 1.0 Test accuracy: 0.9991
7 Train accuracy: 1.0 Test accuracy: 0.9992
8 Train accuracy: 1.0 Test accuracy: 0.9991
9 Train accuracy: 1.0 Test accuracy: 0.9993
