## Neural Networks

#### Perceptions

- The perceptron algorithm was introduced by Frank Rosenblatt in 1957.
- The perceptron is a binary linear classifier that is only capable of predicting classes of samples if those samples
can be separated via a straight line.
- It classifies samples using hand crafted features which represents information about the samples, weighs the features 
on how important they are to the final prediction and the resulting computation is compared against a threshold value.

- A step function is an instant transformation of a value from 0 to 1. What this means is that if z is greater 
than or equal to 0, its predicts one class, else it predicts the other.
- At each iteration, the predicted class gets compared to the actual class and the weights gets updated if the prediction 
was wrong else it is left unchanged in the case of a correct prediction. Updates of weights continue until all samples 
are correctly predicted, at which point we can say that the perceptron classifier has found a linear decision boundary
that perfectly separates all samples into two mutually exclusive classes.
During training the weights are updated by adding a small value to the original weights. The amount added is determined
by the perceptron learning rule. 

- The first coefficient on the right hand side of the equation is called the learning rate and acts as a scaling factor
to increase or decrease the extent of the update.
- It should be noted that the perceptron learning algorithm described is severely limited as it can only learn simple 
functions that have a clear linear boundary. The perceptron is almost never used in practice but served as an integral
building block during the earlier development of artificial neural networks.

#### Multi- Layer Perceptrons

- Modern iterations are known as multi-layer perceptrons. Multi-layer perceptrons are feed forward neural networks 
that have several nodes in the structure of a perceptron. However, there are important differences. 
- A multilayer perceptron  is made up of multiple layers of neurons stacked to form a network. The activation functions
used are non-linear unlike the perceptron model that uses a step function. 
- Nonlinear activations are capable of capturing more interesting representations of data and as such do not require 
input data to be linearly separable. 
- The other important difference is that multi-layer perceptrons are trained using a different kind of algorithm called
backpropagation which enables training across multiple layers.

### Back Propagation

##### Backpropagation is an algorithm technique that is used to solve the issue of credit assignment in artificial neural networks.
##### What that means is that it is used to determine how much an input’s features and weights contribute to the final output of the model. Unlike the perceptron learning rule, Backpropagation is used to calculate the gradients, which tell us how much
##### a change in the parameters of the model affects the final output. The gradients are used to train the model by using them as an error signal to indicate to the model how far off its predictions are from the ground truth.
##### The backpropagation algorithm can be thought of as the chain rule of derivatives applied across layers.

In [1]:
!pip install -q git+https://github.com/tensorflow/examples.git

  DEPRECATION: tensorflow-examples was installed using the legacy 'setup.py install' method, because a wheel could not be built for it. A possible replacement is to fix the wheel build issue reported above. You can find discussion regarding this at https://github.com/pypa/pip/issues/8368.


In [3]:
!pip install -q git+https://github.com/tensorflow/examples.git

In [2]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

ModuleNotFoundError: No module named 'tensorflow.examples'

In [12]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

In [13]:
plt.imshow(np.reshape(mnist.train.images[8], [28, 28]), cmap='gray')
plt.show()

AttributeError: module 'keras.api._v2.keras.datasets.mnist' has no attribute 'train'

In order to train an artificial neural network model on our data, we first need to define the parameters that describe
the computation graph such as number of neurons in each hidden layer, number of hidden layers, input size, number of output 
classes etc. Each image in the dataset is 28 by 28 pixels therefore, the input shape is 784 which is 28 × 28.

In [None]:
# Parameters
learning_rate = 0.1
num_steps = 500
batch_size = 128
display_step = 100

In [None]:
# Network Parameters
n_hidden_1 = 10 # 1st layer number of neurons
n_hidden_2 = 10 # 2nd layer number of neurons
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)

In [None]:
# tf Graph input
X = tf.placeholder("float", [None, num_input])
Y = tf.placeholder("float", [None, num_classes])

We then declare weights and biases which are trainable parameters and initialise them randomly to very small values.
The declarations are stored in a Python dictionary.

In [None]:
# Store layers weight & bias
weights = {
 'h1': tf.Variable(tf.random_normal([num_input, n_hidden_1])),
 'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
 'out': tf.Variable(tf.random_normal([n_hidden_2, num_classes]))
}
biases = {
 'b1': tf.Variable(tf.random_normal([n_hidden_1])),
 'b2': tf.Variable(tf.random_normal([n_hidden_2])),
 'out': tf.Variable(tf.random_normal([num_classes]))
}


We are would then describe a 3-layer neural network with 10 units in the
output for each of the class digits and define the model by creating a
function which forward propagates the inputs through the layers. Note that
we are still describing all these operations on the computation graph.

In [None]:
# Create model
def neural_net(x):
 # Hidden fully connected layer with 10 neurons
 layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
 # Hidden fully connected layer with 10 neurons
 layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
 # Output fully connected layer with a neuron for each class
 out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

Next we call our function, define the loss objective, choose the optimizer that would be used to train the model 
and initialise all variables.

In [None]:
# Construct model
logits = neural_net(X)

In [None]:
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
 logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)

In [None]:
# Evaluate model (with test logits, for dropout to be disabled)
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [None]:
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

Finally, we create a session, supply images in batches to the model for
training and print the loss and accuracy for each mini-batch.

In [None]:
# Start training
with tf.Session() as sess:

In [None]:
# Run the initializer
 sess.run(init)

In [None]:
 for step in range(1, num_steps+1):
        batch_x, batch_y = mnist.train.next_batch(batch_size)
 # Run optimization op (backprop)
 sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
    if step % display_step == 0 or step == 1:
 # Calculate batch loss and accuracy
 loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                      Y: batch_y})
    print("Step " + str(step) + ", Minibatch Loss= " + \
          "{:.4f}".format(loss) + ", Training Accuracy= " + \
          "{:.3f}".format(acc))
    print("Optimization Finished!")

In [None]:
 # Calculate accuracy for MNIST test images
    print("Testing Accuracy:", \
          sess.run(accuracy, feed_dict={X: mnist.test.images,
                                        Y: mnist.test.labels}))

The loss drops to 0.4863 after training for 500 steps and we achieve an accuracy of 85% on the test set.