Create a multi-layer perceptron neural network model to predict on a labeled dataset of your choosing. Compare this model to either a boosted tree or a random forest model and describe the relative tradeoffs between complexity and accuracy. Be sure to vary the hyperparameters of your MLP!

In [1]:
#Importing the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Import the model.
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import GradientBoostingClassifier

# Import Metrics
from sklearn.metrics import adjusted_rand_score
from sklearn.model_selection import cross_val_score

In [2]:
data= pd.read_csv('Dataset_spine.csv')

About the dataset
310 Observations, 13 Attributes (12 Numeric Predictors, 1 Binary Class Attribute - No Demographics)

Lower back pain can be caused by a variety of problems with any parts of the complex, interconnected network of spinal muscles, nerves, bones, discs or tendons in the lumbar spine. Typical sources of low back pain include:

The large nerve roots in the low back that go to the legs may be irritated
The smaller nerves that supply the low back may be irritated
The large paired lower back muscles (erector spinae) may be strained
The bones, ligaments or joints may be damaged
An intervertebral disc may be degenerating
An irritation or problem with any of these structures can cause lower back pain and/or pain that radiates or is referred to other parts of the body. Many lower back problems also cause back muscle spasms, which don't sound like much but can cause severe pain and disability.

While lower back pain is extremely common, the symptoms and severity of lower back pain vary greatly. A simple lower back muscle strain might be excruciating enough to necessitate an emergency room visit, while a degenerating disc might cause only mild, intermittent discomfort.

This data set is about to identify a person is abnormal or normal using collected physical spine details/data.

In [3]:
data.head()

Unnamed: 0,Col1,Col2,Col3,Col4,Col5,Col6,Col7,Col8,Col9,Col10,Col11,Col12,Class_att,Unnamed: 13
0,63.027818,22.552586,39.609117,40.475232,98.672917,-0.2544,0.744503,12.5661,14.5386,15.30468,-28.658501,43.5123,Abnormal,
1,39.056951,10.060991,25.015378,28.99596,114.405425,4.564259,0.415186,12.8874,17.5323,16.78486,-25.530607,16.1102,Abnormal,
2,68.832021,22.218482,50.092194,46.613539,105.985135,-3.530317,0.474889,26.8343,17.4861,16.65897,-29.031888,19.2221,Abnormal,Prediction is done by using binary classificat...
3,69.297008,24.652878,44.311238,44.64413,101.868495,11.211523,0.369345,23.5603,12.7074,11.42447,-30.470246,18.8329,Abnormal,
4,49.712859,9.652075,28.317406,40.060784,108.168725,7.918501,0.54336,35.494,15.9546,8.87237,-16.378376,24.9171,Abnormal,


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 310 entries, 0 to 309
Data columns (total 14 columns):
Col1           310 non-null float64
Col2           310 non-null float64
Col3           310 non-null float64
Col4           310 non-null float64
Col5           310 non-null float64
Col6           310 non-null float64
Col7           310 non-null float64
Col8           310 non-null float64
Col9           310 non-null float64
Col10          310 non-null float64
Col11          310 non-null float64
Col12          310 non-null float64
Class_att      310 non-null object
Unnamed: 13    14 non-null object
dtypes: float64(12), object(2)
memory usage: 34.0+ KB


The dataset doesnt contain any missing values


In [None]:
Building a Model - Default Settings
We will use multi-layer perceptron modeling (MLP) to classify if a person is normal or abnormal using the data.
We will drop non-numerical data: Unnamed:13

In [7]:
data.drop(['Unnamed: 13'], axis=1, inplace=True)

In [9]:
#Identiying the variables
X = data.drop('Class_att', axis= 1)
Y = data.Class_att

Model 1 (Default settings)

In [11]:
# Exstablishing and fitting the model, with a single 100, perceptron layer
mlp = MLPClassifier()
mlp.fit(X,Y)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

In [13]:
Y.value_counts()/ len(Y)

Abnormal    0.677419
Normal      0.322581
Name: Class_att, dtype: float64

In [17]:
#We will calculate the adjusted rand score. 
#This score will tell us how the prediction relates to the ground truth of the data

#5 fold cross validation
ars = cross_val_score(mlp, X, Y, scoring = 'adjusted_rand_score', cv=5)
print(ars)
print('Cross Validation Score: {:.3f}(+/-{:.2f})'.format(ars.mean(), ars.std()*2))



[-0.02324533  0.2525545   0.74840687  0.69201162  0.5774233 ]
Cross Validation Score: 0.449(+/-0.58)




The adjusted rand score of 0.44 indicates that the clustering is identical? however the higher variance indicates that the model is overfitting 

In [18]:
# Get predicted clusters.
full_pred = mlp.predict(X)
pd.crosstab(Y, full_pred)

col_0,Abnormal,Normal
Class_att,Unnamed: 1_level_1,Unnamed: 2_level_1
Abnormal,185,25
Normal,15,85


We can see the person with abnormal result is common showing the data being skewed. 

Model 2: Logistic Activitation

In [19]:
mlp2 = MLPClassifier(activation = 'logistic')
mlp2.fit(X, Y)



MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

In [21]:
#5 fold cross validation 
ars = cross_val_score(mlp2, X, Y, scoring = 'adjusted_rand_score', cv =5)
print('Cross Validation Score: {:.3f}(+/-{:.2f})'.format(ars.mean(), ars.std()*2))



Cross Validation Score: 0.401(+/-0.51)




It shows slight decrease in adjusted rand score and  variance 

In [22]:
# Get predicted clusters.
full_pred2 = mlp2.predict(X)
pd.crosstab(Y, full_pred2)

col_0,Abnormal,Normal
Class_att,Unnamed: 1_level_1,Unnamed: 2_level_1
Abnormal,191,19
Normal,13,87


The results didn't change much, so will change further parameters and try to optimize our models.

Model 3: playing with size of layers

In [23]:
# Establish and fit the model, with default settings.
mlp3 = MLPClassifier(activation='logistic', hidden_layer_sizes=(1000))
mlp3.fit(X, Y)



MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=1000, learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

In [24]:
#5 fold cross validation 
ars = cross_val_score(mlp3, X, Y, scoring = 'adjusted_rand_score', cv =5)
print('Cross Validation Score: {:.3f}(+/-{:.2f})'.format(ars.mean(), ars.std()*2))



Cross Validation Score: 0.478(+/-0.67)


Increased hidden layers gives higher Adjusted rand score but higher variance

In [None]:
Model 4: Reduce alpha

In [25]:
# Establish and fit the model, with default settings.
mlp4 = MLPClassifier(activation='logistic', hidden_layer_sizes=(1000), alpha=1e-6)
mlp4.fit(X, Y)

MLPClassifier(activation='logistic', alpha=1e-06, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=1000, learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

In [26]:
#5 fold cross validation 
ars = cross_val_score(mlp4, X, Y, scoring = 'adjusted_rand_score', cv =5)
print('Cross Validation Score: {:.3f}(+/-{:.2f})'.format(ars.mean(), ars.std()*2))



Cross Validation Score: 0.458(+/-0.64)


In [27]:
# Get predicted clusters.
full_pred2 = mlp4.predict(X)
pd.crosstab(Y, full_pred2)

col_0,Abnormal,Normal
Class_att,Unnamed: 1_level_1,Unnamed: 2_level_1
Abnormal,185,25
Normal,14,86


The adjusted rand score and variance slightly decreased compared to the previous model

Model 5 (Small layers and higher alpha)

In [29]:
# Establish and fit the model, with default settings.
mlp5 = MLPClassifier(activation='logistic', alpha=1e-7)
mlp5.fit(X, Y)



MLPClassifier(activation='logistic', alpha=1e-07, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

In [30]:
#5 fold cross validation 
ars = cross_val_score(mlp5, X, Y, scoring = 'adjusted_rand_score', cv =5)
print('Cross Validation Score: {:.3f}(+/-{:.2f})'.format(ars.mean(), ars.std()*2))



Cross Validation Score: 0.447(+/-0.62)




In [31]:
# Get predicted clusters.
full_pred2 = mlp5.predict(X)
pd.crosstab(Y, full_pred2)

col_0,Abnormal,Normal
Class_att,Unnamed: 1_level_1,Unnamed: 2_level_1
Abnormal,189,21
Normal,13,87


There doesnt seem to be change in result either with increased alpha.

In [None]:
Gradient Boosting Classifier

In [35]:
gbc = GradientBoostingClassifier()
gbc.fit(X, Y)

GradientBoostingClassifier(criterion='friedman_mse', init=None,
              learning_rate=0.1, loss='deviance', max_depth=3,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=1, min_samples_split=2,
              min_weight_fraction_leaf=0.0, n_estimators=100,
              n_iter_no_change=None, presort='auto', random_state=None,
              subsample=1.0, tol=0.0001, validation_fraction=0.1,
              verbose=0, warm_start=False)

In [37]:
# 5-fold cross validation
ars6 = cross_val_score(gbc, X, Y, scoring='adjusted_rand_score', cv=5)
print('Cross Validation Adjusted Rand Scores: {:.3f}(+/- {:.2f})'.format(ars6.mean(), ars6.std()*2))

Cross Validation Adjusted Rand Scores: 0.457(+/- 0.61)


Both MLP and Gradient Boosting Classifier seem to perform similar for this dataset. Both the model gives good adjusted random score but varaince is high. 