# Multilayer Perceptrons (feedforward NN) examples in scikit-learn
"A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN). The term MLP is used ambiguously, sometimes loosely to mean any feedforward ANN, sometimes strictly to refer to networks composed of multiple layers of perceptrons (with threshold activation). An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable."

## Exemplo XNOR

In [2]:
# Regarding solvers:
# The solver hyperparameter is the solver for weight optimization.
# 'lbfgs' is an optimizer in the family of quasi-Newton methods).
# The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.

In [None]:
# Regarding hidden layers:
# When a number lower than 5 is used the local optimization function can get stuck at a local minimum. So 5 is generally used as a minimum for the hidden layers so the model can arrive at the best value for the cost function.

In [9]:
from sklearn.neural_network import MLPClassifier

X = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]]
y = [1, 0, 0, 1]

mlp = MLPClassifier(solver='lbfgs', hidden_layer_sizes=(5,))

mlp.fit(X, y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(5,), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='lbfgs',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

In [10]:
predictions = mlp.predict([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
print(predictions)

[1 0 1 0]


## Dataset digits

In [11]:
from sklearn.model_selection import cross_val_score
from sklearn import datasets

digits = datasets.load_digits()

In [14]:
mlp_dig = MLPClassifier(solver='lbfgs', hidden_layer_sizes=(50,20))
# hidden layer sizes: 20 layers with 50 units.

In [15]:
scores = cross_val_score(mlp_dig, digits.data, digits.target, cv = 5)
print(scores.mean())

0.9271200866604767


#### Score with scaled data

In [16]:
from sklearn.preprocessing import StandardScaler 
scaler = StandardScaler()  
scaler.fit(digits.data)
scaled_digits = scaler.transform(digits.data)    
scores = cross_val_score(mlp_dig, scaled_digits, digits.target, cv = 5)

print(scores.mean())

0.9293423088826989


## Regression example (Diabetes dataset)

In [17]:
from sklearn.neural_network import MLPRegressor

diabetes = datasets.load_diabetes()

mlp_diab = MLPRegressor(solver = "sgd", hidden_layer_sizes=(30,), max_iter = 2000)
# solver: ‘sgd’ refers to stochastic gradient descent.
# hidden layer sizes: 1 layer with 30 units.

scores = cross_val_score(mlp_diab, diabetes.data, diabetes.target, cv = 5)
print(scores.mean())

0.47702096367100644
