# Classification example

The main limitation of perceptrons was that they only worked with linearly separable classes.

A multilayer perceptron (MLP) outperforms the linear perceptron and is able to solve linearly non-separable problems.

In this notebook you will learn how to apply the MLP to a typical [classification task](http://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification).

In [None]:
import numpy as np

from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_moons, make_circles, make_classification

import matplotlib.pyplot as plt
from packages.plot import plot_contour, plot_ds
%matplotlib inline

## Load in the data

scikit-learn includes various [random sample generators](http://scikit-learn.org/stable/datasets/#sample-generators) that can be used to build artificial datasets of controlled size and complexity. Choose one of these datasets by running one of the three code cells below:

<table border="0">
<tr>
<th>Moons</th>
<th>Circles</th>
<th>Linear separable classes</th>
</tr>
<tr><td>
<img src="img/ds_moons.png">
</td><td>
<img src="img/ds_circles.png">
</td><td>
<img src="img/ds_linear_separable.png">
</td></tr>
</table>

In [None]:
# Run this cell for using dataset "Moons"
ds = make_moons(noise=0.3, random_state=0)

In [None]:
# Run this cell for using dataset "Circles"
ds = make_circles(noise=0.2, factor=0.5, random_state=1)

In [None]:
# Run this cell for using dataset "Linear separable classes"
ds = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1)

## Cross-validation: evaluating network performance

Learning the parameters of the MLP and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called [**overfitting**](https://en.wikipedia.org/wiki/Overfitting). To avoid it, it is common practice  to hold out part of the available data as a test set `[X_test, y_test]`. 

In scikit-learn a random split into training and test sets can be quickly computed with the [`train_test_split`](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split) helper function.

In [None]:
X, y = ds
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)

## Data scaling

Standardization of datasets is a common requirement for neural networks; they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.

In scikit-learn, the preprocessing module provides a utility class [StandardScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) that computes the mean and standard deviation on a training set so as to be able to later reapply the same transformation on the testing set.

In [None]:
# Don't cheat - fit only on training data
scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
# apply same transformation to test data
X_test = scaler.transform(X_test)

In [None]:
plot_ds(X_train, X_test, y_train, y_test)

## Build the model

Create a [MLP object](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html) with the following arguments:
* [stochastic gradient descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent) solver (standard technique in [backpropagation](https://en.wikipedia.org/wiki/Backpropagation))
* one hidden layer with 50 neurons
* 4000 iterations maximum

The rest of the arguments are set to their default values (see documentation).

In [None]:
net = MLPClassifier(solver='sgd',\
                    hidden_layer_sizes=(50, ),\
                    max_iter=4000)

## Train the network

The `fit` function automatically iterates until convergence or the maximum number of iterations is reached, so you only need to execute the following cell once.

In [None]:
net.fit(X_train, y_train)

## Plot decision boundary

Plot the decision boundary as a contour plot. For that, we will assign a color to each point in the plane, which will be proportional to its probability of belonging to one class or the other.

In [None]:
plot_contour(net, X_train, X_test, y_train, y_test)
plot_ds(X_train, X_test, y_train, y_test)

## Analysis of the network

Percentage of correct classification of the test data:

In [None]:
print('Score: %.2f' % (net.score(X_test, y_test)*100))

Number of iterations during training:

In [None]:
net.n_iter_

Loss curve: (currently, [MLPClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier) supports only the [Cross-Entropy loss function](https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_error_function_and_logistic_regression))

In [None]:
plt.plot(net.loss_curve_);
plt.xlabel('Iterations');
plt.ylabel('Loss');