### Intro to deep neural networks
---

Code and examples have been extracted from the following sources
 - Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, GÃ©ron A, 2017

---
##### The Perceptron

The	Perceptron is one of the simples ANN architectures, invented in	1957 by	Frank Rosenblatt. It is based on a	slightly different artificial	neuron called a	linear threshold unit (LTU): the inputs	and	output	are numbers	and	each input	connection	is associated with a weight. The	LTU	computes a weighted	sum	of	its	inputs $z=w_1x_1 + w_2x_2 + \dots + w_nx_n=\mathbb{w}^T\mathbb{x}$

![alt text](./imgs/perceptron.png "Perceptron")

![alt text](./imgs/step.png "Step")

In their 1969 monograph titled Perceptrons, Marvin Minsky and Seymour Papert highlighted a number o f serious weaknesses of Perceptrons, in particular the fact that they are incapable of solving some trivia l problems (e.g., the Exclusive OR (XOR) classification problem)

![alt text](./imgs/xor.png "XOR")


##### Multi-Layer Perceptron 

However, it turns out that some of the limitations of Perceptrons can be eliminated by stacking multipl e Perceptrons. The resulting ANN is called a Multi-Layer Perceptron (MLP).

An MLP is composed of one (passthrough) input layer, one or more layers of LTUs, called hidden layers , and one final layer of LTUs called the output layer. Every layer except the output laye r includes a bias neuron and is fully connected to the next layer. When an ANN has two or more hidden layers, it is called a deep neural network (DNN).

![alt text](./imgs/mlp.png "MLP")

##### Backpropagation

For many years researchers struggled to find a way to train MLPs, without success. But in 1986, D. E . Rumelhart et al. published a groundbreaking article introducing the backpropagation training algorithm.

For each training instance, the algorithm feeds it to the network and computes the output of every neuro n in each consecutive layer (this is the forward pass, just like when making predictions). Then it measure s the network's output error (i.e., the difference between the desired output and the actual output of th e network), and it computes how much each neuron in the last hidden layer contributed to each outpu t neuron's error. It then proceeds to measure how much of these error contributions came from each neuron in the previous hidden layer, and so on until the algorithm reaches the input layer. 

![alt text](./imgs/backpropagation.png "Backpropagation")

In order for this algorithm to work properly, the authors made a key change to the MLP's architecture: they replaced the step function with the logistic function

![alt text](./imgs/activation.png "Activation")

An MLP is often used for classification, with each output corresponding to a different binary class (e.g. , spam/ham, urgent/not-urgent, and so on). When the classes are exclusive (e.g., classes 0 through 9 for digit image classification), the output layer is typically modified by replacing the individual activation functions by a shared softmax function

![alt text](./imgs/neural_net.png "Neural Net")


###### Training a DNN Using Plain TensorFlow

In [10]:
import numpy as np
import tensorflow as tf

# Global parameters MNIST dataset
n_inputs = 28*28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

Since we don't know how many instances are we going to feed directly to X, we are going to define both, X and y as placeholders.

In [11]:
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X')
y = tf.placeholder(tf.int64, shape=(None), name='y')

Now, we need to create the two hidden layeres and the output layer. So let's create a neuron_layer() function that we will use to create one layer at a time. It will need parameters to specify the inputs, the number of neurons, the activateion functiona and the name of the layer: 

In [12]:
def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1]) # different shape per layer
        stddev = 2 / np.sqrt(n_inputs) # Works better in practice
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name='weights')
        b = tf.Variable(tf.zeros([n_neurons]), name="biases")
        z = tf.matmul(X, W) + b
        if activation == 'relu':
            return tf.nn.relu(z)
        return z

Now, let's create the deep neural network! The first hidden layer takes X as its input. The second takes the output of the first hidden layer as its input. And finally, the output layer takes the output of the second hidden layer as its input

In [13]:
with tf.name_scope('dnn'):
    hidden1 = neuron_layer(X, n_hidden1, 'hidden1', activation='relu')
    hidden2 = neuron_layer(hidden1, n_hidden2, 'hidden2', activation='relu')
    logits = neuron_layer(hidden2, n_outputs, 'outputs')

For optimization reasons, we will handle the softmax computation later.

In [14]:
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y,
                                                             logits = logits)
    loss = tf.reduce_mean(xentropy, name='loss')

We have the neural network model, we have the cost function, and now we need to define  a GradientDescentOptimizer that will tweak the model parameters to minimize the cost function .

In [15]:
learning_rate = 0.01
with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

And finally evaluate our model

In [17]:
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [18]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

##### Execution phase

In [21]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data')

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [22]:
n_epochs = 400
batch_size = 50

In [23]:
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images,
                                           y: mnist.test.labels})
        print(epoch, 
              "Train accuracy:", acc_train,
              "Test accuracy:", acc_test)
    save_path = saver.save(sess, './final_model.ckpt')

0 Train accuracy: 0.94 Test accuracy: 0.9121
1 Train accuracy: 0.96 Test accuracy: 0.9289
2 Train accuracy: 0.92 Test accuracy: 0.9376
3 Train accuracy: 0.94 Test accuracy: 0.9424
4 Train accuracy: 0.96 Test accuracy: 0.9454
5 Train accuracy: 0.96 Test accuracy: 0.9501
6 Train accuracy: 0.9 Test accuracy: 0.9559
7 Train accuracy: 0.94 Test accuracy: 0.9575
8 Train accuracy: 0.92 Test accuracy: 0.9597
9 Train accuracy: 0.98 Test accuracy: 0.9606
10 Train accuracy: 1.0 Test accuracy: 0.9631
11 Train accuracy: 0.96 Test accuracy: 0.9648
12 Train accuracy: 1.0 Test accuracy: 0.9652
13 Train accuracy: 0.94 Test accuracy: 0.9674
14 Train accuracy: 0.94 Test accuracy: 0.9677
15 Train accuracy: 0.96 Test accuracy: 0.9692
16 Train accuracy: 0.98 Test accuracy: 0.9701
17 Train accuracy: 0.98 Test accuracy: 0.9708
18 Train accuracy: 0.98 Test accuracy: 0.9711
19 Train accuracy: 0.98 Test accuracy: 0.9725
20 Train accuracy: 1.0 Test accuracy: 0.9735
21 Train accuracy: 1.0 Test accuracy: 0.9737
22 

181 Train accuracy: 1.0 Test accuracy: 0.9793
182 Train accuracy: 1.0 Test accuracy: 0.98
183 Train accuracy: 1.0 Test accuracy: 0.9796
184 Train accuracy: 1.0 Test accuracy: 0.9794
185 Train accuracy: 1.0 Test accuracy: 0.9791
186 Train accuracy: 1.0 Test accuracy: 0.9795
187 Train accuracy: 1.0 Test accuracy: 0.9795
188 Train accuracy: 1.0 Test accuracy: 0.9795
189 Train accuracy: 1.0 Test accuracy: 0.9793
190 Train accuracy: 1.0 Test accuracy: 0.9795
191 Train accuracy: 1.0 Test accuracy: 0.9797
192 Train accuracy: 1.0 Test accuracy: 0.9793
193 Train accuracy: 1.0 Test accuracy: 0.9798
194 Train accuracy: 1.0 Test accuracy: 0.9797
195 Train accuracy: 1.0 Test accuracy: 0.98
196 Train accuracy: 1.0 Test accuracy: 0.9795
197 Train accuracy: 1.0 Test accuracy: 0.9796
198 Train accuracy: 1.0 Test accuracy: 0.9796
199 Train accuracy: 1.0 Test accuracy: 0.9798
200 Train accuracy: 1.0 Test accuracy: 0.9794
201 Train accuracy: 1.0 Test accuracy: 0.9797
202 Train accuracy: 1.0 Test accuracy:

360 Train accuracy: 1.0 Test accuracy: 0.9799
361 Train accuracy: 1.0 Test accuracy: 0.9797
362 Train accuracy: 1.0 Test accuracy: 0.9799
363 Train accuracy: 1.0 Test accuracy: 0.9798
364 Train accuracy: 1.0 Test accuracy: 0.9796
365 Train accuracy: 1.0 Test accuracy: 0.9798
366 Train accuracy: 1.0 Test accuracy: 0.9798
367 Train accuracy: 1.0 Test accuracy: 0.9798
368 Train accuracy: 1.0 Test accuracy: 0.9797
369 Train accuracy: 1.0 Test accuracy: 0.9799
370 Train accuracy: 1.0 Test accuracy: 0.9797
371 Train accuracy: 1.0 Test accuracy: 0.9798
372 Train accuracy: 1.0 Test accuracy: 0.9798
373 Train accuracy: 1.0 Test accuracy: 0.9799
374 Train accuracy: 1.0 Test accuracy: 0.9798
375 Train accuracy: 1.0 Test accuracy: 0.9798
376 Train accuracy: 1.0 Test accuracy: 0.9798
377 Train accuracy: 1.0 Test accuracy: 0.9797
378 Train accuracy: 1.0 Test accuracy: 0.9798
379 Train accuracy: 1.0 Test accuracy: 0.9798
380 Train accuracy: 1.0 Test accuracy: 0.9796
381 Train accuracy: 1.0 Test accur