## Multilayer Perceptron: Fit and evaluate a model

Using the Titanic dataset from [this](https://www.kaggle.com/c/titanic/overview) Kaggle competition.

In this section, we will fit and evaluate a simple Multilayer Perceptron model.

I will be using GridSearchCV to do the grid search within K-fold cross validation to find the optimal hyperparameter settings. 

### Read in Data

In [1]:
import joblib
import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=DeprecationWarning)

tr_features = pd.read_csv('../data/train_features.csv')
tr_labels = pd.read_csv('../data/train_labels.csv')



### Hyperparameter tuning
A quick reminder of the three hyperparameters that i will seek to optimize in this section.

1st, what is the right size of the hidden layers thats represented in yellow in this diagram?

2nd, which of the sigmoid hyperbolic tangent ReLU activation functions is the right choice for this problem?

3rd, what is the proper learning rate to find the optimal model for this problem?

![hidden layer](../img/hidden_layers.png)

In [2]:
def print_results(results):
    print('BEST PARAMS: {}\n'.format(results.best_params_))

    means = results.cv_results_['mean_test_score']
    stds = results.cv_results_['std_test_score']
    for mean, std, params in zip(means, stds, results.cv_results_['params']):
        print('{} (+/-{}) for {}'.format(round(mean, 3), round(std * 2, 3), params))

In [3]:
# defining my hyperparameter dictionary: the keys in this dictionary will align-
# with the name of the hyperparameters that would be passed into Multilayer Perceptron classifier. 
# define the list of settings for each key.

# explore the model with only 1 hidden layer because i'm dealing with a simple problem here
# look at 1 layer with 10 nodes, 50 nodes, 100 nodes. the second value in the tuple represents-
# the number of layers. by default its 1
mlp = MLPClassifier()
parameters = {
    'hidden_layer_sizes': [(10,), (50,), (100,)],
    'activation': ['relu', 'tanh', 'logistic'],
    'learning_rate': ['constant', 'invscaling', 'adaptive']
}

cv = GridSearchCV(mlp, parameters, cv=5)
cv.fit(tr_features, tr_labels.values.ravel())

print_results(cv)









BEST PARAMS: {'activation': 'tanh', 'hidden_layer_sizes': (100,), 'learning_rate': 'adaptive'}

0.736 (+/-0.122) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'constant'}
0.745 (+/-0.117) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'invscaling'}
0.73 (+/-0.105) for {'activation': 'relu', 'hidden_layer_sizes': (10,), 'learning_rate': 'adaptive'}
0.785 (+/-0.128) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'constant'}
0.783 (+/-0.136) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'invscaling'}
0.779 (+/-0.12) for {'activation': 'relu', 'hidden_layer_sizes': (50,), 'learning_rate': 'adaptive'}
0.794 (+/-0.104) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rate': 'constant'}
0.798 (+/-0.114) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learning_rate': 'invscaling'}
0.788 (+/-0.117) for {'activation': 'relu', 'hidden_layer_sizes': (100,), 'learnin



# Results:
I can see that its telling me the best results are with the tanh activation function with 100 hidden layer nodes, using the adaptive learning rate.

I can see that this is generating 80.2% accuracy, which is a little bit better than i saw with both logistic regression and with SVM.

Two things i want to call out about these results. I can see that for pretty much for all different combinations, the model with only 10 hidden nodes, appears to be a little bit too simple. As increasing it to 50 or a 100 improve the results pretty much across the board. 

In addition to that, the relu activation function and tanh activation function appear to generate better results than the logistic activation function.

In [4]:
# print out the best results
cv.best_estimator_

### Write out pickled model

In [5]:
joblib.dump(cv.best_estimator_, '../data/MLP_model.pkl')

['../data/MLP_model.pkl']