### Most of the information presented in this notebook orginated from the resource linked below, even if they have been modified
[taken from here](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier)

more useful links:
+ [Scikit Learn Get Started](https://scikit-learn.org/stable/getting_started.html)
+ [Scikit Learn Neural Networks](https://scikit-learn.org/stable/modules/neural_networks_supervised.html)

In [1]:
# kind of standard import
import numpy as np

In [4]:
from sklearn.neural_network import MLPClassifier  # the neural network 
from sklearn.datasets import make_classification # easy generation of synthetic input data
from sklearn.model_selection import train_test_split # to conveniantly split the data into test and training 
X, y = make_classification(n_samples=100, random_state=1) # 100 points, default: 20 features, 2 classes

# the use of X for the input feature data (2D array) and y (1D) for the target values (prediction goal) is convention
# we fix the random_state to make multiple run reproducible
# we use stratify=y to have the same class ratios in the training and in the testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
                                                    random_state=1)

print(X_test.shape)
print(y_test.shape)
# the call to fit with the provided training data is the standard way to train a model in sklearn
clf = MLPClassifier(random_state=1, max_iter=300).fit(X_train, y_train)

# prints out the probability for each of the classes
# here only the first test instance is used [:1] (slicing)
print(clf.predict_proba(X_test[:1]))

# here we predict the class of the first 5 test instances
print(clf.predict(X_test[:5, :]))

# the performance on the complete test set
print(clf.score(X_test, y_test))


(25, 20)
(25,)
[[0.03838405 0.96161595]]
[1 0 1 0 1]
0.88


If you have no internet connection or do not fire up your browser there is an easy way to access the API

In [5]:
help(MLPClassifier)

Help on class MLPClassifier in module sklearn.neural_network._multilayer_perceptron:

class MLPClassifier(sklearn.base.ClassifierMixin, BaseMultilayerPerceptron)
 |  MLPClassifier(hidden_layer_sizes=(100,), activation='relu', *, solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)
 |  
 |  Multi-layer Perceptron classifier.
 |  
 |  This model optimizes the log-loss function using LBFGS or stochastic
 |  gradient descent.
 |  
 |  .. versionadded:: 0.18
 |  
 |  Parameters
 |  ----------
 |  hidden_layer_sizes : tuple, length = n_layers - 2, default=(100,)
 |      The ith element represents the number of neurons in the ith
 |      hidden layer.
 |  
 |  activation : {'identity

In [None]:
help(make_classification)

In [None]:
help(train_test_split)

The follwing example allows to demonstrate overfitting. The training is stopped after every epoche (max_iter=1) and the performance on the training and on the testing set is determined. We can observe that up to a certain number of epoches the score both on the training and the testing set is improving. In this phase the model is generalizing (i.e. learning) well, but after some point the performance on the training set continues to improve where on the testing set is getting worse again. This is the point where overfitting starts, i.e. the model do not generalize anymore but picks up training set specific information, i.e. it is overfitting.

In [None]:
clf = MLPClassifier(random_state=1, max_iter=1, solver='adam', warm_start=True)
for i in range(300):
    clf.fit(X_train, y_train)
    print("epoche: " + str(i))
    print("Score on training set:" + str (clf.score(X_train, y_train)))
    print("Score on testing set:" + str (clf.score(X_test, y_test)))
    print(clf.loss_)


This allows you to retrieve the weights of a trained model. Apart from showing the final weights this is not very suitable to easily derive the network architecture.

In [None]:
print(clf.coefs_)

The following example should demonstrate the use of the early_stopping feature which actually stops early before overfitting. Unfortunately it is still far from optimal in this case here. Looking at the information obtain with the verbose option set, it looks like that the rather small validation set of 10 instances is quite stable in the predictions and triggers so the termination. You can explore this by e.g. setting n_iter_no_change to 50

In [None]:
clf = MLPClassifier(random_state=1, max_iter=300
                    ,early_stopping=True
                    #,tol=1e-10
                    ,verbose=True
                    #,n_iter_no_change=50
                   ).fit(X_train, y_train)
print(clf.predict_proba(X_test[:1]))

print(clf.predict(X_test[:5, :]))

print(clf.score(X_test, y_test))
print(clf.n_iter_)
print(clf.loss_curve_) # See how the loss on the training set decreases over the epoches
print(clf.best_validation_score_)