# Chapter 10: Introduction to Artificial Neural Networks
## Exercise 1
Draw an ANN using the original artificial neurons that computes A XOR B.

![solution](img/xor.png)

## Exercise 2
Why is it generally preferable to use a Logistic Regression classifier rather than a classical Perceptron? How can you tweak a perceptron to make it equivalent to a Logistic Regression classifier?

It is preferable to use a Logistic Regression classifier because you will also obtained the probability of each predicted class.
However, you can change the activation function to the logistic activation function to make the perceptron equivalent to a Logistic Regression classifier.

## Exercise 3
Why was the logistic activation function a key ingredient in training the first MLPs?
The previous step function contained only flat segments, so Gradient Descent couldn't work with it. However, it will work with the logistic activation function, so it was possible to apply 

The previous step function contained only flat segments, so Gradient Descent couldn't work with it. However, it will work with the logistic activation function, so it was possible to apply better training techniques (i.e. backpropagation).

## Exercise 4
Name three popular activation functions. Can you draw them?

* Step function
* Logistic function
* ReLU function

## Exercise 5
Suppose you have an MLP composed of one input layer with 10 passthrough neurons, followed b y one hidden layer with 50 artificial neurons, and finally one output layer with 3 artificial neurons. All artificial neurons use the ReLU activation function.
* What is the shape of the input matrix $X$?<br>
The input matrix X will have n rows (as many as training instances) and 10 columns (number of passthrough neurons).<br><br>
* What about the shape of the hidden layer's weight vector $W_h$, and its bias vector $b_h$?<br>
$W_h$ will have 10 rows and 50 columns. $b_h$ will have a length of 50.<br><br>
* What is the shape of the output layer's weight vector $W_o$, and its bias vector $b_o$?<br>
$W_o$ will have 50 rows and 3 columns. $b_o$ will have a length of 3.<br><br>
* What is the shape of the output matrix $Y$?<br>
n rows and 3 columns<br><br>
* Write the equation that computes the network's output matrix Y as a function of $X$, $W_h$, $b_h$, $W_o$ and $b_o$.<br>
$Y = ReLU(ReLU(X * W_h + b_h) * W_o + b_o)$

## Exercise 6
How many neurons do you need in the output layer if you want to classify email into spam or ham? What activation function should you use in the output layer? If instead you want to tackle MNIST, how many neurons do you need in the output layer, using what activation function? Answer the same questions for getting your network to predict housing prices as in Chapter 2.

You need 1 neuron to classify email into spam or ham, and you should use the step activation function.<br>
If you want to tackle MNIST, you need 10 neurons in the output layer (1 for each digit), and use the softmax activation function to get the probability of each digit for the input image. 

## Exercise 7
What is the backpropagation and how does it work? What is the difference between backpropagation and reverse-mode autodiff?

Backpropagation is a technique used to train artificial neural netowrks. It first computes the gradients of the cost function with regards to every model parameter (all the weights and biases), and then it performs a Gradient Descent step using these gradients. This backpropagation step is typically performed thousands or millions of times, using many training batches, until the model parameters converge to values that minimize the cost function. To compute the gradients, backpropagation uses reverse-mode autodiff. Reverse-mode autodiff performs a forward pass through a computation graph, computing every node's value for the current training batch, and then it performs a reverse pass, computing all the gradients at once.

## Exercise 8
Can you list all the hyperparameters you can tweak in an MLP? If the MLP overfits the training data, how could you tweak these hyperparameters to try to solve the problem?

Number of hidden layers, number of neurons in each hidden layer, the activation function used in each hidden layer and in the output layer.<br>If the MLP overfits the training data, you can try reducing the number of hidden layers and reducing the number of neurons per hidden layer.

## Exercise 9
Train a deep MLP on the MNIST dataset and see if you can get over 98% precision. Just like in the last exercise of Chapter 9, try adding all the bells and whistles.

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

%matplotlib inline

In [2]:
from multilayer_perceptron import MultilayerPerceptron

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
n_inputs = len(mnist.train.images)
n_classes = len(mnist.train.labels[0])
batch_size = 100

print('Num Inputs: {}'.format(n_inputs))
print('Num classes: {}'.format(n_classes))
print('Batch size: {}'.format(batch_size))

mlp = MultilayerPerceptron(n_inputs, n_classes, batch_size)
mlp.add_layer(300, activation=tf.nn.relu, name="hidden_layer")
mlp.add_layer(100, activation=tf.nn.relu, name="hidden_layer_2")
mlp.add_layer(10, name="output_layer")
mlp.fit(mnist.train.images, mnist.train.labels)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Num Inputs: 55000
Num classes: 10
Batch size: 100
Epoch: 0001 cost=0.690056207 acc=0.795800001
Epoch: 0002 cost=0.261716428 acc=0.924709093
Epoch: 0003 cost=0.196611674 acc=0.944400003
Epoch: 0004 cost=0.160751927 acc=0.954672732
Epoch: 0005 cost=0.136119253 acc=0.962036369
Epoch: 0006 cost=0.117553622 acc=0.967600006
Epoch: 0007 cost=0.102834631 acc=0.972090917
Epoch: 0008 cost=0.090684658 acc=0.975400009
Epoch: 0009 cost=0.080365193 acc=0.978181828
Epoch: 0010 cost=0.071489379 acc=0.980909102
Epoch: 0011 cost=0.063780992 acc=0.983236375
Epoch: 0012 cost=0.057070995 acc=0.985200011
Epoch: 0013 cost=0.051134932 acc=0.987054556
Epoch: 0014 cost=0.045841789 acc=0.988763645
Epoch: 0015 cost=0.041127240 acc=0.990309099
Epoch: 0016 cost=0.036892837 acc=0.991690917
Epoch: 0017 cost=0.033102133 acc=0.9

In [3]:
def get_accuracy(y_pred, y_true):
    assert np.shape(y_pred) == np.shape(y_true)
    
    correct_predictions = np.sum(y_pred == y_true)
    return correct_predictions / np.shape(y_pred)[0]

In [4]:
import numpy as np

model_pred = mlp.predict(mnist.test.images)
true_pred = np.argmax(mnist.test.labels, axis=1)

accuracy = get_accuracy(model_pred, true_pred)
print('Accuracy: ', accuracy)

Accuracy:  0.9793
