# Neural networks

Principle : In this section we'll focus on one specific type of Deep Learning algorithm, namely multilayer perceptrons. MLPs can be viewed as generalizations of linear models that perform multiple stages of processing to come to a decision.
This model has a lot more coefficients (also called weights) to learn: there is one between every input and every hidden unit (which make up the hidden layer), and one between every unit in the hidden layer and the output.
After computing a weighted sum for each hidden unit, a nonlinear function is applied to the result, usually the rectifying nonlinearity (also known as rectified linear unit or relu) or the tangens hyperbolicus (tanh). The relu cuts off values below zero, while tanh saturates to –1 for low input values and +1 for high input values

In [1]:
%load_ext autoreload
%autoreload
from utils import feature_selection

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import GridSearchCV

### First test

In [4]:
gt = pd.read_csv('../dumps/2020.01.13-14.25.csv')
cols = [col for col in gt.columns if col not in ['label']]
data = gt[cols]
target = gt['label']

data_train, data_test, target_train, target_test = train_test_split(data,target, test_size = 0.20, random_state = 0)

mlp = MLPClassifier(solver='lbfgs', random_state=0, max_iter=10000)
mlp.fit(data_train, target_train)
print("Accuracy on training set: {:.3f}".format(mlp.score(data_train, target_train))) 
print("Accuracy on test set: {:.3f}".format(mlp.score(data_test, target_test)))

Accuracy on training set: 0.958
Accuracy on test set: 0.724


Considering the results, we rather be overfitting.

### Further tests

In [3]:
gt = pd.read_csv('../dumps/2020.02.10-12.14.csv')
cols = [col for col in gt.columns if col not in ['label']]
data = gt[cols]
target = gt['label']

data_train, data_test, target_train, target_test = train_test_split(data,target, test_size = 0.20, random_state = 0)

#### Solver

The solver for weight optimization.
- ‘lbfgs’ is an optimizer in the family of quasi-Newton methods.
- ‘sgd’ refers to stochastic gradient descent.
- ‘adam’ refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba

Note: The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.

In [5]:
solver = ['lbfgs','sgd', 'adam']
for i in solver:
    print("Solver : %s" % i)
    mlp = MLPClassifier(solver=i, random_state=0, max_iter=10000) 
    mlp.fit(data_train, target_train)
    print("Accuracy on training set: {:.3f}".format(mlp.score(data_train, target_train))) 
    print("Accuracy on test set: {:.3f}".format(mlp.score(data_test, target_test)))

Solver : lbfgs
Accuracy on training set: 0.859
Accuracy on test set: 0.843
Solver : sgd
Accuracy on training set: 0.895
Accuracy on test set: 0.899
Solver : adam
Accuracy on training set: 0.972
Accuracy on test set: 0.853


As we could expect, the 'adam' algorithm performs quite well on our dataset. Still, the performance on the test set might be improved by tuning other parameters.

#### Activation

Activation function for the hidden layer.
- ‘identity’, no-op activation, useful to implement linear bottleneck, returns f(x) = x
- ‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).
- ‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x).
- ‘relu’, the rectified linear unit function, returns f(x) = max(0, x)

In [7]:
act = ['identity','logistic', 'tanh', 'relu']
for i in act:
    print("function : %s" % i)
    mlp = MLPClassifier(activation=i, random_state=0, max_iter=10000) 
    mlp.fit(data_train, target_train)
    print("Accuracy on training set: {:.3f}".format(mlp.score(data_train, target_train))) 
    print("Accuracy on test set: {:.3f}".format(mlp.score(data_test, target_test)))

function : identity




Accuracy on training set: 0.877
Accuracy on test set: 0.875
function : logistic
Accuracy on training set: 0.895
Accuracy on test set: 0.900
function : tanh
Accuracy on training set: 0.895
Accuracy on test set: 0.900
function : relu
Accuracy on training set: 0.972
Accuracy on test set: 0.853


In this case, the 'identity' activation didn't manage to make the algorithm converge. For the others, both 'logistic' and 'tanh' provided sam results and performed better on the test set than the training set, which is the opposite for the 'relu' activation.

#### Learning rate

Learning rate schedule for weight updates (only for 'sgd' solver).
- ‘constant’ is a constant learning rate given by ‘learning_rate_init’.
- ‘invscaling’ gradually decreases the learning rate at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t)
- ‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5.

In [10]:
learning_rate = ['constant','invscaling', 'adaptive']
for i in learning_rate:
    print("function : %s" % i)
    mlp = MLPClassifier(solver='sgd', learning_rate=i, random_state=0, max_iter=10000) 
    mlp.fit(data_train, target_train)
    print("Accuracy on training set: {:.3f}".format(mlp.score(data_train, target_train))) 
    print("Accuracy on test set: {:.3f}".format(mlp.score(data_test, target_test)))

function : constant
Accuracy on training set: 0.895
Accuracy on test set: 0.899
function : invscaling
Accuracy on training set: 0.895
Accuracy on test set: 0.899
function : adaptive
Accuracy on training set: 0.895
Accuracy on test set: 0.899


For the 'sgd' solver, the learning rate doesn't really matter.

#### Alpha

L2 penalty (regularization term) parameter.

In [12]:
alpha = [0.0001,0.001,0.1,1,10,100,1000]
for i in alpha:
    print("alpha : %s" % i)
    mlp = MLPClassifier(alpha=i, random_state=0, max_iter=10000) 
    mlp.fit(data_train, target_train)
    print("Accuracy on training set: {:.3f}".format(mlp.score(data_train, target_train))) 
    print("Accuracy on test set: {:.3f}".format(mlp.score(data_test, target_test)))

alpha : 0.0001
Accuracy on training set: 0.972
Accuracy on test set: 0.853
alpha : 0.001
Accuracy on training set: 0.972
Accuracy on test set: 0.849
alpha : 0.1
Accuracy on training set: 0.967
Accuracy on test set: 0.840
alpha : 1
Accuracy on training set: 0.971
Accuracy on test set: 0.853
alpha : 10
Accuracy on training set: 0.973
Accuracy on test set: 0.848
alpha : 100
Accuracy on training set: 0.932
Accuracy on test set: 0.868
alpha : 1000
Accuracy on training set: 0.906
Accuracy on test set: 0.892


The more we increase the *alpha* value, the more we reduce overfitting.

#### Hidden layers

This parameter allows us to set the number of layers and the number of nodes we wish to have in the Neural Network Classifier. Each element in the tuple represents the number of nodes at the ith position where i is the index of the tuple.

In [4]:
hidden_layers_size = [(50,50,50), (100,100,100), (50,100,50), (100,50,50), (50,50,100), (100,)]
for i in hidden_layers_size:
    print("layer size : %s" % (i,))
    mlp = MLPClassifier(hidden_layer_sizes=i, random_state=0, max_iter=10000) 
    mlp.fit(data_train, target_train)
    print("Accuracy on training set: {:.3f}".format(mlp.score(data_train, target_train))) 
    print("Accuracy on test set: {:.3f}".format(mlp.score(data_test, target_test)))

layer size : (50, 50, 50)
Accuracy on training set: 0.939
Accuracy on test set: 0.873
layer size : (100, 100, 100)
Accuracy on training set: 0.950
Accuracy on test set: 0.867
layer size : (50, 100, 50)
Accuracy on training set: 0.911
Accuracy on test set: 0.889
layer size : (100, 50, 50)
Accuracy on training set: 0.931
Accuracy on test set: 0.878
layer size : (50, 50, 100)
Accuracy on training set: 0.921
Accuracy on test set: 0.885
layer size : (100,)
Accuracy on training set: 0.972
Accuracy on test set: 0.853


### Best match

In [None]:
gt = pd.read_csv('../dumps/2020.02.10-12.14.csv')
cols = [col for col in gt.columns if col not in ['label']]
data = gt[cols]
target = gt['label']
data_train, data_test, target_train, target_test = train_test_split(data,target, test_size = 0.20, random_state = 0)

In [None]:
parameters = {'solver': ['lbfgs','sgd','adam'], 'max_iter': [1000,10000], 'alpha': [0.0001,0.001,0.1,1,10,100], 'hidden_layer_sizes':[(50,50,50), (100,100,100), (50,100,50), (100,50,50), (50,50,100), (100,)], 'activation':['identity','logistic', 'tanh', 'relu']}
clf = GridSearchCV(MLPClassifier(), parameters, n_jobs=-1)
clf.fit(data_train, target_train)
print(clf.score(data_train, target_train))
print(clf.best_params_)

In [None]:
parameters = {'solver': ['lbfgs','sgd','adam'], 'max_iter': [1000,10000], 'alpha': [0.0001,0.001,0.1,1,10,100], 'hidden_layer_sizes':[(50,50,50), (100,100,100), (50,100,50), (100,50,50), (50,50,100), (100,)], 'activation':['identity','logistic', 'tanh', 'relu']}
clf = RandomizedSearchCV(MLPClassifier(), parameters, n_jobs=-1)
clf.fit(data_train, target_train)
print(clf.score(data_train, target_train))
print(clf.best_params_)

### Features relevance

In [2]:
gt = pd.read_csv('../dumps/2020.02.10-12.14.csv')
cols = [col for col in gt.columns if col not in ['label']]
data = gt[cols]
target = gt['label']

data_train, data_test, target_train, target_test = train_test_split(data,target, test_size = 0.20, random_state = 0)

mlp = MLPClassifier()
mlp.fit(data_train, target_train)



MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(100,), learning_rate='constant',
              learning_rate_init=0.001, max_fun=15000, max_iter=200,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

In [10]:
feature_selection('../dumps/2020.02.10-12.14.csv',0.15,"mlp")

0.15
(1196, 119)




ValueError: Either fit the model before transform or set "prefit=True" while passing the fitted estimator to the constructor.

### Test with Thomas datasets

In [5]:
gt = pd.read_csv("../dumps/2019-08.Merged_thomas.csv")
cols = [col for col in gt.columns if col not in ['label']]
data = gt[cols]
target = gt['label']

data_train, data_test, target_train, target_test = train_test_split(data,target, test_size = 0.20, random_state = 0)

tree = MLPClassifier(solver='adam',activation='tanh',alpha=100,hidden_layer_sizes=(50, 50, 100))
tree.fit(data_train, target_train)
print("Accuracy on training set: {:.3f}".format(tree.score(data_train, target_train))) 
print("Accuracy on test set: {:.3f}".format(tree.score(data_test, target_test)))

Accuracy on training set: 0.769
Accuracy on test set: 0.761


In [4]:
gt = pd.read_csv("../dumps/2019-09.Merged_thomas.csv")
cols = [col for col in gt.columns if col not in ['label']]
data = gt[cols]
target = gt['label']

data_train, data_test, target_train, target_test = train_test_split(data,target, test_size = 0.20, random_state = 0)

tree = MLPClassifier(solver='adam',activation='tanh',alpha=100,hidden_layer_sizes=(50, 50, 100))
tree.fit(data_train, target_train)
print("Accuracy on training set: {:.3f}".format(tree.score(data_train, target_train))) 
print("Accuracy on test set: {:.3f}".format(tree.score(data_test, target_test)))

Accuracy on training set: 0.784
Accuracy on test set: 0.787
