# Multilayer Perceptrons

You can read a good introduction to Multilayer Perceptrons (MLPs) at [scikit-learns page](http://scikit-learn.org/stable/modules/neural_networks_supervised.html).

MLPs are used to solve the same problems as regressions, but they make it possible to solve for non-linear relationships and they allow us to do that without defining feature details (i.e, without having to have expert knowledge of the domain).

A MLP can predict non-linear relationships in data by separating the input and output data layers by one or more intermediate ("hidden") layers.  Hidden layers accumulate interemediate values of the relation and then provide them as input for the next layer.

Each layer is a set of nodes.  The number of nodes in the layer is set depending on the problem solved by the network.  Below I am including a picture from the scikit-learn documentation linked above.

<img src="http://scikit-learn.org/stable/_images/multilayerperceptron_network.png" width=50%>A Multilayer Perceptron</img>

The **input layer** represents the features present in each input to the network.  For example, if the network is designed to recognize images, the network will usually require the input layer to accept individual pixels from an image.

A MLP includes one or more **hidden layer** which is used to separate calculations and generate intermediate features used in the network.  

Finally, the MLP will have a set of nodes that form the **output layer**.  The output may provide a regression output, but that is unusual.  Most of the time a MLP will be used to predict a class of responses.  In this case, each output node will represent a specific class.  For example, if the network was built to recognize digits, then there would be 10 nodes in the output, one for each digit.

Between each layer of nodes is a set of **weights**, each of which connects a single node in one layer to a node in the next layer.  

In [2]:
# MNIST Logistic Regression
import idx2numpy

X = idx2numpy.convert_from_file('train-images-idx3-ubyte')
y = idx2numpy.convert_from_file('train-labels-idx1-ubyte')
X.shape, y.shape

((60000, 28, 28), (60000,))

In [3]:
# reduce dimensions on X.  Keep first dim but comine second and third
X = X.reshape(X.shape[0],-1)
X.shape

(60000, 784)

In [29]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

model = LogisticRegression(solver='lbfgs').fit(X, y)
model

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='lbfgs', tol=0.0001,
          verbose=0, warm_start=False)

In [4]:
X_test = idx2numpy.convert_from_file('t10k-images-idx3-ubyte')
y_test = idx2numpy.convert_from_file('t10k-labels-idx1-ubyte')
X_test.shape, y_test.shape

((10000, 28, 28), (10000,))

In [5]:
# reduce dimensions on X.  Keep first dim but comine second and third
X_test = X_test.reshape(X_test.shape[0],-1)
X_test.shape

(10000, 784)

In [32]:
y_predicted = model.predict(X_test)


# print confusion matrix
confusion = confusion_matrix(y_test, y_predicted)
print('Confusion matrix')
print(confusion)

Confusion matrix
[[ 957    0    0    4    0    3    6    2    6    2]
 [   0 1116    3    1    0    1    4    1    8    1]
 [   8   12  906   18    9    5   10   11   50    3]
 [   3    0   19  916    2   23    5   11   24    7]
 [   1    2    5    3  910    0   11    2   10   38]
 [  11    2    1   40   10  754   16    8   39   11]
 [   7    3    7    2    4   17  909    1    8    0]
 [   3    6   24    4    7    1    1  946    5   31]
 [   9   15    7   22   11   26    7   12  854   11]
 [   9    6    2   13   30    4    0   25   16  904]]


In [33]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_predicted))

             precision    recall  f1-score   support

          0       0.95      0.98      0.96       980
          1       0.96      0.98      0.97      1135
          2       0.93      0.88      0.90      1032
          3       0.90      0.91      0.90      1010
          4       0.93      0.93      0.93       982
          5       0.90      0.85      0.87       892
          6       0.94      0.95      0.94       958
          7       0.93      0.92      0.92      1028
          8       0.84      0.88      0.86       974
          9       0.90      0.90      0.90      1009

avg / total       0.92      0.92      0.92     10000



In [47]:
# Now with MLP
from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(solver='lbfgs', alpha=1e-5,
                     hidden_layer_sizes=(100), random_state=1)
mlp.fit(X,y)

y_predicted = mlp.predict(X_test)


# print confusion matrix
confusion = confusion_matrix(y_test, y_predicted)
print('Confusion matrix')
print(confusion)
print(classification_report(y_test, y_predicted))

Confusion matrix
[[ 945    1    6    1    0    5   10    6    6    0]
 [   0 1109    3    3    0    1    2    1   15    1]
 [   4    3  977   16    5    2    6   10    6    3]
 [   1    2   13  943    0   20    1    3   19    8]
 [   0    1   10    1  923    3    6    2    6   30]
 [   5    1    0   24    1  822   13    2   18    6]
 [   7    4    5    0    3    9  922    2    6    0]
 [   2    4   20    7    5    4    0  955    5   26]
 [   5    0    6    9    5   10    4    4  922    9]
 [   3    5    4    7   29    7    0    7   14  933]]
             precision    recall  f1-score   support

          0       0.97      0.96      0.97       980
          1       0.98      0.98      0.98      1135
          2       0.94      0.95      0.94      1032
          3       0.93      0.93      0.93      1010
          4       0.95      0.94      0.95       982
          5       0.93      0.92      0.93       892
          6       0.96      0.96      0.96       958
          7       0.96     

In [48]:
#another model with different sized hidden layer
from sklearn.neural_network import MLPClassifier
mlp2 = MLPClassifier(solver='lbfgs', alpha=1e-5,
                     hidden_layer_sizes=(50), random_state=1)
mlp2.fit(X,y)

y_predicted = mlp2.predict(X_test)
print(classification_report(y_test, y_predicted))

             precision    recall  f1-score   support

          0       0.96      0.92      0.94       980
          1       0.99      0.97      0.98      1135
          2       0.81      0.89      0.85      1032
          3       0.86      0.81      0.84      1010
          4       0.93      0.78      0.85       982
          5       0.84      0.75      0.79       892
          6       0.93      0.93      0.93       958
          7       0.95      0.88      0.91      1028
          8       0.76      0.87      0.81       974
          9       0.77      0.92      0.84      1009

avg / total       0.88      0.88      0.88     10000



In [49]:
#one more model with different sized hidden layer
from sklearn.neural_network import MLPClassifier
mlp3 = MLPClassifier(solver='lbfgs', alpha=1e-5,
                     hidden_layer_sizes=(200), random_state=1)
mlp3.fit(X,y)
y_predicted = mlp3.predict(X_test)
print(classification_report(y_test, y_predicted))

             precision    recall  f1-score   support

          0       0.98      0.98      0.98       980
          1       0.99      0.98      0.99      1135
          2       0.96      0.96      0.96      1032
          3       0.94      0.96      0.95      1010
          4       0.97      0.96      0.97       982
          5       0.95      0.95      0.95       892
          6       0.97      0.97      0.97       958
          7       0.96      0.97      0.96      1028
          8       0.95      0.94      0.95       974
          9       0.96      0.95      0.96      1009

avg / total       0.96      0.96      0.96     10000



In [50]:
#one more model with different sized hidden layer
from sklearn.neural_network import MLPClassifier
mlp4 = MLPClassifier(solver='lbfgs', alpha=1e-5,
                     hidden_layer_sizes=(400), random_state=1)
mlp4.fit(X,y)
y_predicted = mlp4.predict(X_test)
print(classification_report(y_test, y_predicted))

             precision    recall  f1-score   support

          0       0.98      0.98      0.98       980
          1       0.99      0.99      0.99      1135
          2       0.97      0.96      0.97      1032
          3       0.96      0.97      0.96      1010
          4       0.97      0.97      0.97       982
          5       0.96      0.95      0.95       892
          6       0.97      0.97      0.97       958
          7       0.97      0.96      0.97      1028
          8       0.96      0.95      0.96       974
          9       0.95      0.96      0.96      1009

avg / total       0.97      0.97      0.97     10000



In [51]:
# Now a fully connected deep model.
#one more model with different sized hidden layer
from sklearn.neural_network import MLPClassifier
mlpdeep = MLPClassifier(solver='lbfgs', alpha=1e-5,
                     hidden_layer_sizes=(100, 50), random_state=1)
mlpdeep.fit(X,y)
y_predicted = mlpdeep.predict(X_test)
print(classification_report(y_test, y_predicted))

             precision    recall  f1-score   support

          0       0.97      0.98      0.98       980
          1       0.99      0.99      0.99      1135
          2       0.96      0.96      0.96      1032
          3       0.96      0.96      0.96      1010
          4       0.97      0.96      0.96       982
          5       0.95      0.96      0.96       892
          6       0.98      0.97      0.97       958
          7       0.97      0.97      0.97      1028
          8       0.96      0.96      0.96       974
          9       0.95      0.94      0.95      1009

avg / total       0.97      0.97      0.97     10000



In [52]:
print(confusion_matrix(y_test, y_predicted))

[[ 963    0    3    0    1    5    1    5    2    0]
 [   0 1122    5    1    0    1    1    0    4    1]
 [   7    1  989   13    3    2    3    7    7    0]
 [   0    3   10  972    1    9    0    3    9    3]
 [   1    0    5    0  943    1    9    1    2   20]
 [   2    0    1   12    0  859    5    0    6    7]
 [   8    2    0    0    7    6  931    0    4    0]
 [   3    3   10    1    0    0    0  993    5   13]
 [   4    1    7    5    3    8    2    5  937    2]
 [   2    3    0    9   17   11    1   11    4  951]]
