# Neural Networks

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

from sklearn.neural_network import MLPClassifier
# MLP: multilayer perceptron

I will run a number of different Neural Networks, starting with an out-of-the-box MLP Classifier, which I will then optimise. Next I will optimise a CNN and lastly an RNN. I expect the RNN to be the most successful as RNNs leverage the sequence of data which is certainly important for text analysis.

# Import Headline Data

In [2]:
# Import TFIDF data
X_train_head = pd.read_csv('data/X_train_head.csv')

X_test_head = pd.read_csv('data/X_test_head.csv')

y_train_head = pd.read_csv('data/y_train_head.csv')

y_test_head = pd.read_csv('data/y_test_head.csv')

# Multi-Layer Perceptron Classifier

In [16]:
# Build an out-of-the-box MLPClassifier with one hidden layer
NN_model = MLPClassifier(hidden_layer_sizes=(1), random_state=1)
NN_model.fit(X_train_head, y_train_head);

In [17]:
NN_model.score(X_test_head, y_test_head)

0.6232445520581114

So the out-of-the-box MLPClassifier is similar in accuracy to the Logistic Regression with a 62.3% test accuracy. Currently the OOTB Random Forest is leading the way!

This is not too surprising as I only have one hidden layer in my model which is of size, one unit, so it is behaving similar to a Logistic Regression model. It is unable to fit anything complex, so let's increase the number of hidden layers and the size of each layer.

In [19]:
# Build an out-of-the-box Neural Network with 3 hidden layers of sizes (5, 10, 5)
NN_model = MLPClassifier(hidden_layer_sizes=(5, 10, 5), random_state=1)
NN_model.fit(X_train_head, y_train_head);

In [20]:
NN_model.score(X_test_head, y_test_head)

0.636319612590799

Simply adding two layers and increasing the size of the layers improved the accuracy by 1.3%. This already has it out in front of the other optimised models so far.

Let's keep going! 

We will want to optimise over a number of paramters:

1) hidden_layers: both the number of layers and their sizes \
2) solver: the algorithm for reducing the cost function in each epoch \
3) activation: the function determining the firing of a neuron between the layers \
4) alpha: regularisation parameter to reduce overfitting \
5) learning_rate_init: the size of the steps down the gradient of the solver function \
6) learning_rate: how the learning rate changes
7) momentum: helps solvers get out of local minimums

In [28]:
from sklearn.model_selection import GridSearchCV

params = {
    'hidden_layer_sizes': [(5, 10, 5), (50, 100, 50),
                      (5, 10, 10, 10, 5), (50, 100, 100, 100, 50),
                     (5, 10, 10, 10, 10, 10, 5), (10, 25, 25, 25, 25, 25, 10), ],
    'solver': ['lbfgs', 'sgd', 'adam'],
    'activation': ['tanh', 'relu'],
    'alpha': [0.0001, 0.01, 1, 100, 1000],
    'learning_rate_init': [0.00001, 0.001, 0.1, 10],
    'learning_rate': ['adaptive', 'invscaling'],
    'momentum': [0.3, 0.6, 0.9]
}

gridsearch = GridSearchCV(MLPClassifier(), params, cv=3, n_jobs=-1, verbose=1)

gridsearch_results = gridsearch.fit(X_train_head, y_train_head)

Fitting 3 folds for each of 4320 candidates, totalling 12960 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  42 tasks      | elapsed:   41.5s
[Parallel(n_jobs=-1)]: Done 192 tasks      | elapsed:  2.5min
[Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed: 18.1min
[Parallel(n_jobs=-1)]: Done 792 tasks      | elapsed: 63.5min
[Parallel(n_jobs=-1)]: Done 1242 tasks      | elapsed: 86.2min
[Parallel(n_jobs=-1)]: Done 1792 tasks      | elapsed: 478.7min
[Parallel(n_jobs=-1)]: Done 2442 tasks      | elapsed: 591.3min
[Parallel(n_jobs=-1)]: Done 3192 tasks      | elapsed: 642.3min
[Parallel(n_jobs=-1)]: Done 4042 tasks      | elapsed: 708.7min
[Parallel(n_jobs=-1)]: Done 4992 tasks      | elapsed: 760.0min
[Parallel(n_jobs=-1)]: Done 6042 tasks      | elapsed: 830.5min
[Parallel(n_jobs=-1)]: Done 7192 tasks      | elapsed: 978.8min
[Parallel(n_jobs=-1)]: Done 8442 tasks      | elapsed: 1079.0min
[Parallel(n_jobs=-1)]: Done 9792 tasks      | elapsed: 1140.0min
[Parallel(n_jobs=-1)]: Done 11242 t

In [33]:
best_params = gridsearch_results.best_params_
best_params

{'activation': 'tanh',
 'alpha': 1,
 'hidden_layer_sizes': (10, 25, 25, 25, 25, 25, 10),
 'learning_rate': 'adaptive',
 'learning_rate_init': 0.1,
 'momentum': 0.6,
 'solver': 'lbfgs'}

In [31]:
gridsearch_results.best_score_

0.6319923719578641

After all of that, the best score I could achieve with my MLP Classifier is 63%.

In [35]:
# Build the optimised MLPClassifier with optimal parameters
NN_model = MLPClassifier(hidden_layer_sizes=best_params["hidden_layer_sizes"], 
                         activation=best_params["activation"], 
                         alpha=best_params["alpha"],
                         learning_rate=best_params["learning_rate"],
                         learning_rate_init=best_params["learning_rate_init"],
                         momentum=best_params["momentum"],
                         solver=best_params["solver"])
NN_model.fit(X_train_head, y_train_head);

In [36]:
train_score = NN_model.score(X_train_head, y_train_head)
test_score = NN_model.score(X_test_head, y_test_head)

In [37]:
print(f'Training score: {train_score}')
print(f'Test score: {test_score}')

Training score: 0.6409542262048922
Test score: 0.6372881355932203


In [41]:
# Build an MLPClassifier with more nodes int he hidden layers
NN_model = MLPClassifier(hidden_layer_sizes=(100, 1000, 1000, 1000, 1000, 1000, 100), 
                         activation=best_params["activation"], 
                         alpha=best_params["alpha"],
                         learning_rate=best_params["learning_rate"],
                         learning_rate_init=best_params["learning_rate_init"],
                         momentum=best_params["momentum"],
                         solver=best_params["solver"])
NN_model.fit(X_train_head, y_train_head);

In [42]:
train_score = NN_model.score(X_train_head, y_train_head)
test_score = NN_model.score(X_test_head, y_test_head)

In [43]:
print(f'Training score: {train_score}')
print(f'Test score: {test_score}')

Training score: 0.6313877452167596
Test score: 0.6242130750605327
