# Training a Neural Network
*Curtis Miller*

In this notebook I demonstrate how to train the neural network known as the **multilayer perceptron (MLP)**. We will use a MLP to classify the iris dataset and also a dataset of handwritten digits, in order to detect different characters.

Neural networks have a lot of parameters to set when training. These include:

* How many hidden layers to have
* How many neurons to include in each layer
* The activation functions of neurons in the hidden layers
* Value of the regularization term to control overfitting (referred to as $\alpha$)

Issues when training a neural network are also accute. These are choices related to the actual optimization algorithm that estimates the parameters used for prediction. For neural networks this fitting process is very involved.

MLPs are online algorithms just like perceptrons. This is especially advantageous for training on large datasets that don't necessarily fit into data. Additionally, MLPs are *not* linear classifiers/regressors. This suggests that MLPs are most popular for learning problems that require fitting data that isn't linearly separable.

MLPs can be used for classification and regression. This notebook focuses on classification.

First, lets load in the datasets we will use.

In [None]:
from sklearn.datasets import load_iris, load_digits
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# First, the iris dataset
iris_obj = load_iris()
iris_data_train, iris_data_test, species_train, species_test = train_test_split(iris_obj.data, iris_obj.target)

# Next, the digits dataset
digits_obj = load_digits()
print(digits_obj.DESCR)

In [None]:
digits_obj.data.shape

In [None]:
digits_data_train, digits_data_test, number_train, number_test = train_test_split(digits_obj.data, digits_obj.target)
number_train[:5]

In [None]:
digits_data_train[0, :]

In [None]:
digits_data_train[0, :].reshape((8, 8))

In [None]:
plt.imshow(digits_data_train[0, :].reshape((8, 8)))

## Fitting a MLP to the Iris Data

MLP models are implemented via the `MLPClassifier` object in **scikit-learn**. The MLP classifier I train:

* Has one hidden layer with 20 neurons
* Uses the logistic activation function for the hidden layers
* Uses a regularization parameter of $\alpha = 1$

I demonstrate its use below.

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

In [None]:
mlp_iris = MLPClassifier(hidden_layer_sizes=(20,),    # A tuple with the number of neurons for each hidden layer
                         activation='logistic',         # Which activation function to use
                         alpha=1,                       # Regularization parameter
                         max_iter=1000)                 # Maximum number of iterations taken by the solver
mlp_iris = mlp_iris.fit(iris_data_train, species_train)
mlp_iris.predict(iris_data_train[:1,:])

In [None]:
species_pred_train = mlp_iris.predict(iris_data_train)
accuracy_score(species_pred_train, species_train)

In [None]:
species_pred_test = mlp_iris.predict(iris_data_test)
accuracy_score(species_pred_test, species_test)

The classifier has extremely high accuracy for this dataset.

## Fitting a MLP to the Digits Dataset

Let's now see how the MLP classifier performs for the digits dataset. Again there is only one hidden layer, this one with 50 neurons.

In [None]:
mlp_digits = MLPClassifier(hidden_layer_sizes=(50,),
                           activation='logistic',
                           alpha=1)
mlp_digits = mlp_digits.fit(digits_data_train, number_train)

In [None]:
mlp_digits.predict(digits_data_train[[0], :])

In [None]:
number_pred_train = mlp_digits.predict(digits_data_train)
accuracy_score(number_pred_train, number_train)

In [None]:
number_pred_test = mlp_digits.predict(digits_data_test)
accuracy_score(number_pred_test, number_test)

The classifier shines in these nonlinear contexts.