## Make your network
__July, 2018 - By: Christopher Sanchez__

Hello, thanks for stopping by. Today we have a few goals that we are going to accomplish. We will build a multi-layer perceptron neural network model to predict on a labeled dataset that we will find online. We will build the best model we can, by diving into the hyperparameters and discovering the best combination for our data. We will then compare our perceptron model to a boosted tree model to examine the efficiencies of both, and determine which model better solves our output question. We will check for complexity and accuracy.

__Our checklist:__
- Find a dataset
- Introduce our dataset
- Import our necessary modules, and the data
- Examine the data to get a better understanding of our data
- Clean and process the data
- Build our models
- Conclude by comparing and contrasting our two models.

The dataset we will be tearing into today is an interesting bank marketing dataset graciously hosted over at UCI. You can find it here. It is a dataset representing data collected from direct marketing campaigns of a Portuguese banking institution. The dataset consists of 41,188 rows and 21 columns which will represent our features. Clients received multiple contacts often. The question we will be trying to answer with the data is whether or not a client will subscribe a term deposit. Time to dig in! Let’s start by discussing and importing the modules (library that allows us to use advanced functions) we’re going to need.

__Modules that we will use:__
- Numpy
- Pandas
- SKlearn
- Operator
- Time

Numpy is an excellent library and it allows us to compute numerous advanced mathematical functions, work with arrays, and much more. 

Pandas is built off of the Numpy library, and allows us to work with the data in a more structured manner. Pandas allows you to use data frames, and manipulate them in various ways.

Sklearn is a great package. It is very powerful and it is the library that we will use to build our models today. We will use various clustering techniques and machine learning techniques utilizing SKlearn.

The operator module allows us to use more efficient functions to perform our operations.

The time module will allow us to see how long our models are taking to computer.


Great let’s start building.


In [1]:
import numpy as np
import pandas as pd

import sklearn
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import  GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn import metrics, preprocessing

import operator
import time
from datetime import timedelta

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)


Above you can see the imports that we’re going to use which we previously discussed. We will also set some display options that will allow us to better view the data. 

Let’s import our data

In [2]:
df = pd.read_csv('bankmarketing.csv', delimiter=';')
df.head()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
1,57,services,married,high.school,unknown,no,no,telephone,may,mon,149,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
2,37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
3,40,admin.,married,basic.6y,no,no,no,telephone,may,mon,151,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
4,56,services,married,high.school,no,no,yes,telephone,may,mon,307,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no


Great our data is imported and the only modifications we had to make were, splitting the data using a semi-colon,

Now let’s take a quick dive into the data and look at the shape, nulls and data types of our features:

In [3]:
print('Shape:\n',df.shape,'\n')
print('Nulls:\n', df.isnull().sum(),'\n')
print('Data types:\n', df.dtypes)

Shape:
 (41188, 21) 

Nulls:
 age               0
job               0
marital           0
education         0
default           0
housing           0
loan              0
contact           0
month             0
day_of_week       0
duration          0
campaign          0
pdays             0
previous          0
poutcome          0
emp.var.rate      0
cons.price.idx    0
cons.conf.idx     0
euribor3m         0
nr.employed       0
y                 0
dtype: int64 

Data types:
 age                 int64
job                object
marital            object
education          object
default            object
housing            object
loan               object
contact            object
month              object
day_of_week        object
duration            int64
campaign            int64
pdays               int64
previous            int64
poutcome           object
emp.var.rate      float64
cons.price.idx    float64
cons.conf.idx     float64
euribor3m         float64
nr.employed       float64
y 

Great so we can see that we are working with 41,000 rows and 21 features. There are also no nulls. However all of our data types are different and we need to convert all of our data to numerical. Let’s get started. 

In [4]:
counter = 0
categorical_features = ['age', 'job', 'marital', 'education', 'default', 'housing',
                        'loan', 'contact', 'month', 'day_of_week', 'poutcome', 'y']
for category in categorical_features:
    counter = 0
    for x in df[category].unique():
        df[category] = df[category].replace(x, counter)
        counter += 1
    

In [5]:
df.head()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,0,0,0,0,0,0,0,0,0,0,261,1,999,0,0,1.1,93.994,-36.4,4.857,5191.0,0
1,1,1,0,1,1,0,0,0,0,0,149,1,999,0,0,1.1,93.994,-36.4,4.857,5191.0,0
2,2,1,0,1,0,1,0,0,0,0,226,1,999,0,0,1.1,93.994,-36.4,4.857,5191.0,0
3,3,2,0,2,0,0,0,0,0,0,151,1,999,0,0,1.1,93.994,-36.4,4.857,5191.0,0
4,0,1,0,1,0,0,1,0,0,0,307,1,999,0,0,1.1,93.994,-36.4,4.857,5191.0,0


Great, we created a for loop to loop through a list of our categorical features, giving all of our data points a numerical value. 

Time to split our data set up. We will create our input variable and our output variable.

In [6]:
X = df.drop(['y'], 1)
y = df['y']

X will represent our input variables, and y will be our outcome. 

Now lets process the data. It is important that we normalize our data to give us a normal distribution which will eliminate some variance and redundancy. We will also make sure everything is numerical. Finally we will split our data in order for us to train our model on the training data, and test it on the testing data to ensure there is no overfitting going on. 

In [7]:
columns = X.columns

X = X.apply(pd.to_numeric, errors = 'coerce')
X = preprocessing.normalize(X)
X = pd.DataFrame(X, columns=columns)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)


Excellent. Our input variables X has been processed and is ready to fire on all cylinders. The split has also been executed successfully. Our train size is 67% of the data and our testing set is 33%. 

Let’s start building our first neural network. We will use Sklearns MLPClassifier. We will use 5 hidden networks, and start off with the ‘relu’ activation function which is usually the best in my experience.We will time our model to determine how long it takes to fit the data. Next we will print out a cross validation score on the test data to see how well our model performed, and to determine whether or not there is any overfitting. 

In [9]:
start_time = time.monotonic()


mlp_relu = MLPClassifier(hidden_layer_sizes=(1000, 1000, 1000, 1000, 1000), activation='relu')
mlp_relu.fit(X_train, y_train)

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

Time: 0:08:13.911743


Our data is now fit to our model. Look at the time though with 67% of the data it took over 8 minutes to fit the data. 

In [10]:
start_time = time.monotonic()


print(cross_val_score(mlp_relu, X_test, y_test, cv=5))

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

[0.89959544 0.91062891 0.91798455 0.90327326 0.9076187 ]
Time: 0:49:32.707731


Our function worked pretty well. We were able to achieve an average of 90% across our cross validation test. It looks like there wasn’t any overfitting happening. It took 38 and a half minutes to run our cross validation test on our test data, which is only 33% of the data. 
Now let’s compare our relu model with a tanh model. Without further ado I give you, our tanh model:

In [11]:
start_time = time.monotonic()

mlp_tanh = MLPClassifier(hidden_layer_sizes=(1000, 1000, 1000, 1000, 1000), activation='tanh')
mlp_tanh.fit(X_train, y_train)

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

Time: 0:23:34.447781


Our tanh model took nearly three times as longer than our relu model to fit our data. Hopefully the performance was worth the wait. Let's check it out. 

In [12]:
start_time = time.monotonic()

print(cross_val_score(mlp_tanh, X_test, y_test, cv=5))

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

[0.90069879 0.90805443 0.91136447 0.90511217 0.8951049 ]
Time: 0:38:22.637670


It seems our relu model is a slightly better fit than our tanh model. It's not a bad thing though, since the time to fit was much less. 

What about the sigmoid function, also known as logistic? Let’s see how it performs now. 

In [13]:
start_time = time.monotonic()

mlp_logistic = MLPClassifier(hidden_layer_sizes=(1000, 1000, 1000, 1000, 1000), activation='logistic')
mlp_logistic.fit(X_train, y_train)

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

Time: 0:04:31.612431


Wow! Our resident speedy Gonzales appeared. It only took four and a half minutes to fit the data using the logistic function. I’m intrigued to see how it is going to perform. 

In [14]:
start_time = time.monotonic()

print(cross_val_score(mlp_logistic, X_test, y_test, cv=5))

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

[0.88856197 0.88856197 0.88856197 0.88856197 0.88884799]
Time: 0:11:39.000124


A fast performer here too. It did decently well, but there is a big drop off from our relu and tanh models. 
Now let’s take a look at our final model. Our boosted trees classifier. 

In [15]:
start_time = time.monotonic()

params = {'n_estimators': 500,
          'max_depth': 2,
          'loss': 'deviance'}

gbr = GradientBoostingClassifier(**params)
gbr.fit(X_train, y_train)

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

Time: 0:00:09.370913


Our boosted trees model performed extremely quickly only taking nine seconds to run, but how well did it do? 

In [16]:
start_time = time.monotonic()

print(cross_val_score(gbr, X_test, y_test, cv=5))

end_time = time.monotonic()
print('Time:', timedelta(seconds=end_time - start_time))

[0.91026113 0.91945568 0.92092681 0.91651342 0.91534781]
Time: 0:00:17.656936


Clearly our boosted trees model is the most efficient. It has a higher accuracy average and performed quickly. 

### Discussion and conclusion:
All of our models performed well. Our MLP classifier models took a long time to operate. The accuracy scores are satisfactory, but the time was just totally inefficient. Our boosted trees model performed the best, though slightly, it was by far the fastest. 

I believe the neural network could outperform the boosted trees model, however I don’t have the computational resources to push it to a much higher capacity. Neural networks and boosted trees are both very effective, and have a multitude of parameters that you can adjust to better fit your data, however neural networks are exceptionally computationally expensive and are hard to do much with, unless you have a nicely size server. If working with neural networks, or any computationally expensive algorithm for that matter, I recommend that you work with a smaller chunk of data initially in order to make sure that all of your programming is working correctly and efficiently, and after you have everything programmed, run all of the data through your models. 