# Neural Networks

Neural networks are a powerful machine learning framework used to learn complex input-output mappings from examples. Examples of successful applications of neural networks include:
1. Classification of handwritten digits
2. Speech recognition

Neural networks can be viewed as a series of nonlinear transformations applied to the input variables where the nature of the transformation is learned from the training data. There are several neural network architectures but we will focus on a feedforward architecture where information flows in one direction from input to output and there is no feedback from the output back to the input.

## The Multilayer Perceptron

The multilayer perceptron (MLP) is a feed forward neural network. The figure below shows an MLP with a single hidden layer (from http://deeplearning.net/tutorial/mlp.html).
![MLP](mlp.png)

The input to a given layer is obtained from the output of the previous layer. The output of a given layer is obtained by applying an activation function to a weighted linear combination of the inputs. Mathematically let $x_1,\ldots,x_D$ be the inputs to a given layer. The output of the $j$th hidden layer is given by
\begin{eqnarray*}
z_j=h(\sum_{i=1}^Dw_{ji}x_i+w_{j0})
\end{eqnarray*}
where:
1. $h(.)$ is a nonlinear activation function
2. $w_{ji}$ is the weight from input node (neuron) $i$ to output node $j$
3. $w_{j0}$ is known as the bias of neuron $j$

Similarly, the output $y_k$ of the $k$th output neuron is obtained by applying an activation function to a weighted linear combination of the inputs from the hidden layer. This output is a function of the weights $\mathbf{w}$ and the inputs $\mathbf{x}$ and we write $y_k(\mathbf{x},\mathbf{w})$. We can collect all the outputs into a vector $\mathbf{y}(\mathbf{x},\mathbf{w})$


### Activation Functions
There are a number of activation functions used depending on the nature of the data and target variables. These include:
1. The sigmoid function 
\begin{eqnarray*}
\sigma(a)=\frac{1}{1+\exp(-a)}
\end{eqnarray*}
2. The Tanh function 
\begin{eqnarray*}
\tanh(a)=\frac{\exp(a)-\exp(-a)}{\exp(a)+\exp(-a)}
\end{eqnarray*}
3. The rectified linear unit (ReLU)
\begin{eqnarray*}
f(a)=max\{o,a\}
\end{eqnarray*}



In [None]:
#write three functions to plot the three activation functions. 
#The functions you write should take in a vector of points and return 
#the activation function evaluated at those points

### Network Learning

The aim of training the neural network is to learn an input-output mapping from examples. We aim to learn a set of weights and biases to obtain the appropriate mapping. Given $N$ training examples $\mathbf{x}_n$ and the correspinding target output vectors $\mathbf{t}_n$, we aim to learn weights and biases to minimize the error
\begin{eqnarray*}
E(\mathbf{w})=\frac{1}{2}\sum_{n=1}^N||\mathbf{y}(\mathbf{x}_n,\mathbf{w})-\mathbf{t}_n)||^2
\end{eqnarray*}

This learning is often achieved by gradient descent where the weights at one time step $\tau$ are modified in the direction of negative gradient according to 
\begin{eqnarray*}
\mathbf{w}^{(\tau+1)}=\mathbf{w}^{(\tau)}-\eta\nabla E(\mathbf{w}^{(\tau)})
\end{eqnarray*}
where $\eta>0$ is the learning rate. In practice, for the MLP the gradient of the error function is found by backpropagation.

### Example
We will consider the example of learning to classify hand written digits. We will use the MNIST training set which consists of 70,000 examples and the implementation of the MLP on scikit learn http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

In [None]:
import sklearn
print(sklearn.__version__)

In [None]:
#load the data
from sklearn.neural_network import MLPClassifier
#from sklearn.datasets import fetch_mldata
#mnist = fetch_mldata('MNIST original')
from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784')

Let us explore the data by plotting a few examples

In [None]:
#The flattened images of the digits are 784 dimensional arrays
mnist.data.shape

In [None]:
np.max(mnist.data)

In [None]:
#The targets are stored in mnist.target
mnist.target.shape

In [None]:
#Plot a random digit and show the label
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

rand_index = np.random.randint(0,mnist.data.shape[0])
plt.imshow(mnist.data[rand_index].reshape(28,28), cmap='Greys_r')
plt.xticks([]);
plt.yticks([]);
print(int(mnist.target[rand_index]))

Now divide the data into training, validation and testing. Use the split 42k training, 14k validation and 14k for testing.

In [None]:
#write code to divide the data into training, validation and test sets
#X_train,X_val,X_test=
#y_test,y_val,y_test=
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(mnist.data, 
                                                    mnist.target,
                                                    test_size=.4)# 60% for training
X_val, X_test, y_val, y_test = train_test_split(X_test, 
                                                y_test,
                                                test_size=.5) # 20% validation and 20 % test

#Normalise the data by dividing the pixel values by 255

X_train = np.array(X_train) / 255
X_val = np.array(X_val) / 255
X_test = np.array(X_test) / 255

X_train.max()

In [None]:
#Use the data to select an appropriate number of hidden layer neurons
#for an MLP with a single hidden layer the function

In [None]:
#Define the number of hidden neurons per layer as a tuple
HL=(100,)


#create the classifier
mlp = MLPClassifier(hidden_layer_sizes=HL, 
                    activation='tanh',
                    max_iter=50, 
                    alpha=1e-4, 
                    solver='sgd', 
                    verbose=10, 
                    tol=1e-4, 
                    random_state=1,
                    learning_rate_init=.1)

mlp.fit(X_train, y_train)
print("Training set score: %f" % mlp.score(X_train, y_train))
print("Validation set score: %f" % mlp.score(X_val, y_val))


In [None]:
#Modify the above code to search through MLPs with 50,100,150,200,500 and 1000 neurons
#in the hidden layer to find the best


In [None]:
#with the best network, learn a model and use it to find the accuracy 
#on the test set

Let us view some of the digits in the test set and corresponding classification

In [None]:
#Use mlp.predict to see how your classifier labels some test digits
rand_index = np.random.randint(0, X_test.shape[0])
plt.imshow(X_test[rand_index].reshape(28,28), cmap='Greys_r')
plt.xticks([]);
plt.yticks([]);
mlp.predict(X_test[rand_index].reshape(1, -1))