# Multi Layer Perceptron Classifier
Let's try some neural networks and start with an [MLP classifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html).

In [1]:
from util import get_wpm_train_test, fit_predict_print_wp
from sklearn.neural_network import MLPClassifier

train_x, train_y, test_x, test_y = get_wpm_train_test()
model = MLPClassifier(random_state=42)

fit_predict_print_wp(model, train_x, train_y, test_x, test_y)

Accuracy: 56.04% (102/182)




The accuracy is again slightly higher, now also slightly better than our RandomClassifier but nothing significant (+1.90%). Note: When taking the average RandomClassifier score for multiple runs, we have an increase of +14.97%.
Since we're getting an error that it hasn't converged, we can try to increase the number of iterations:

In [2]:
model = MLPClassifier(random_state=42, max_iter=1000)
fit_predict_print_wp(model, train_x, train_y, test_x, test_y)

Accuracy: 56.59% (103/182)


The accuracy has slightly increased (only +0.55%) and the warning has disappeared.

## Hyperparameter Tuning

In [3]:
from sklearn.model_selection import GridSearchCV

print("Fitting 5 folds for each of 900 candidates, totalling 4500 fits")  # Prevent accidentially running this
raise KeyboardInterrupt

model = MLPClassifier(random_state=42)

# Parameters suggested by co-pilot
param_grid = {'hidden_layer_sizes': [(100,), (50,), (25,), (10,), (5,)],
              'activation': ['identity', 'logistic', 'tanh', 'relu'],
              'solver': ['lbfgs', 'sgd', 'adam'],
              'alpha': [0.0001, 0.001, 0.01, 0.1, 1],
              'learning_rate': ['constant', 'invscaling', 'adaptive'],
              'max_iter': [1000]}

grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', verbose=1)

grid_search.fit(train_x, train_y["Winner"])

Fitting 5 folds for each of 900 candidates, totalling 4500 fits


KeyboardInterrupt: 

In [4]:
# best_params = grid_search.best_params_
best_params = {'activation': 'identity',
               'alpha': 0.0001,
               'hidden_layer_sizes': (10,),
               'learning_rate': 'constant',
               'max_iter': 1000,
               'solver': 'adam'}
best_params

{'activation': 'identity',
 'alpha': 0.0001,
 'hidden_layer_sizes': (10,),
 'learning_rate': 'constant',
 'max_iter': 1000,
 'solver': 'adam'}

So the best parmeters are as follows:

{'activation': 'identity',
 'alpha': 0.0001,
 'hidden_layer_sizes': (10,),
 'learning_rate': 'constant',
 'max_iter': 1000,
 'solver': 'adam'}

In [5]:
model = MLPClassifier(random_state=42, **best_params)
fit_predict_print_wp(model, train_x, train_y, test_x, test_y)

Accuracy: 55.49% (101/182)


Even though we provided more optimization parameters, our accuracy is slightly lower on the test set.

## MLPRegressor
A regressor tries to predict a number, while a classifier tries to predict a class. We want to predict a score that represents how good a headline is, in order to select the best (so something between a number and aa class).

In [6]:
from sklearn.neural_network import MLPRegressor

model = MLPRegressor(random_state=42)
fit_predict_print_wp(model, train_x, train_y, test_x, test_y, proba=False)

Accuracy: 53.85% (98/182)


In [11]:
# Some params based from here: https://stackoverflow.com/questions/61163759/tuning-mlpregressor-hyper-parameters
param_grid = {'hidden_layer_sizes': [(50, 50, 50), (50, 100, 50), (100, 1)],
              'alpha': [0.0001, 0.001, 0.05],
              'max_iter': [1000]}

grid_search = GridSearchCV(model, param_grid, cv=5, verbose=1)

# train_x_int = train_x.replace({False: 0, True: 1})
# train_y_int = train_y.replace({False: 0, True: 1})

grid_search.fit(train_x, train_y["Winner"])

best_params = grid_search.best_params_
best_params

Fitting 5 folds for each of 9 candidates, totalling 45 fits


{'alpha': 0.05, 'hidden_layer_sizes': (100, 1), 'max_iter': 1000}

So the best parameters are as follows:

{'alpha': 0.05, 'hidden_layer_sizes': (100, 1), 'max_iter': 1000}

Let's test them:

In [12]:
model = MLPRegressor(random_state=42, **best_params)
fit_predict_print_wp(model, train_x, train_y, test_x, test_y, proba=False)

Accuracy: 53.85% (98/182)


Those hyperparameters doesn't seem to make any difference. Let's try some more:

In [13]:
param_grid = {'hidden_layer_sizes': [(50,), (100,), (200,), (100, 100, 100)],
              'alpha': [0.05, 0.1, 1],
              'max_iter': [1000]}
grid_search = GridSearchCV(model, param_grid, cv=5, verbose=1)
grid_search.fit(train_x, train_y["Winner"])
best_params = grid_search.best_params_
best_params

Fitting 5 folds for each of 12 candidates, totalling 60 fits


{'alpha': 1, 'hidden_layer_sizes': (200,), 'max_iter': 1000}

In [14]:
model = MLPRegressor(random_state=42, **best_params)
fit_predict_print_wp(model, train_x, train_y, test_x, test_y, proba=False)

Accuracy: 53.30% (97/182)


Seems like it keeps going for larger hidden layers (200,) and alpha (1), but the accuracy on the test set hs still slightly decreased. Let's still try some higher parameters:

In [15]:
param_grid = {'hidden_layer_sizes': [(200,), (400), (800)],
              'alpha': [1,2,4,8],
              'max_iter': [1000]}
grid_search = GridSearchCV(model, param_grid, cv=5, verbose=1)
grid_search.fit(train_x, train_y["Winner"])
best_params = grid_search.best_params_
print(best_params)

model = MLPRegressor(random_state=42, **best_params)
fit_predict_print_wp(model, train_x, train_y, test_x, test_y, proba=False)

Fitting 5 folds for each of 12 candidates, totalling 60 fits
{'alpha': 1, 'hidden_layer_sizes': (200,), 'max_iter': 1000}
Accuracy: 53.30% (97/182)


Alright, it sticks with the previous parameters now (but still has worse results than the MLPClassifier, which has also lower results than the Naïve Bayes Classifier).