# Assignment 6: Multi-layer Perceptron Classifier
For this assignment we used the Pokemon data set from the kaggle repository

https://www.kaggle.com/abcsds/pokemon

Our data set contains 13 features consisting of 8 quantitative features:

Encyclopedia number
Sum of all Stats
Hit Points
Attack
Defense
Special Attack
Special Defense
Speed
and 5 categorical features (sex)

Name
Type 1
Type 2
Generation
Legendary

For the purposes of this assignment we only used

In [1]:
import pandas as pd #data analysis library
import matplotlib.pyplot as plt #graphing
import seaborn as sns #graphing
import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.neural_network import MLPRegressor
from sklearn import metrics

In [2]:
df = pd.read_csv("Pokemon.csv") #read in data
df.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


## Data Preprocessing
For the purposes of this assignment, we eliminate the following features:

   * #
   * Name
   * Type 1
   * Type 2
   * Total
   * Sp. Atk
   * Sp. Def
   * Generation
   * Legendary

In [3]:
poke_stats = ['HP', "Attack", "Defense", "Speed"]
df = df.drop(columns = ['#','Name', 'Type 1', 'Type 2', 'Total', 'Sp. Atk', 'Sp. Def', 'Generation', "Legendary"])
df.head()

Unnamed: 0,HP,Attack,Defense,Speed
0,45,49,49,45
1,60,62,63,60
2,80,82,83,80
3,80,100,123,80
4,39,52,43,65


In [4]:
#set independent and dependent variables
x = df.iloc[:,1:9] #all entries from column 1 to 3
leng = df.iloc[:,1]
dim = df.iloc[:,2]
rin = df.iloc[:,3]
x

Unnamed: 0,Attack,Defense,Speed
0,49,49,45
1,62,63,60
2,82,83,80
3,100,123,80
4,52,43,65
...,...,...,...
795,100,150,50
796,160,110,110
797,110,60,70
798,160,60,80


In [5]:
y = df.iloc[:,0]
print(y)

0      45
1      60
2      80
3      80
4      39
       ..
795    50
796    50
797    80
798    80
799    80
Name: HP, Length: 800, dtype: int64


In [6]:
#train test split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.2, random_state = 0) #20% testing

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(x_train) 
x_train = scaler.transform(x_train)
scaler.fit(x_test)
x_test = scaler.transform(x_test)  # apply same transformation to test data

## Training with MLP Regressor

In [7]:
#Train and fit data
mlp_regr = MLPRegressor(hidden_layer_sizes = (100,), random_state=1, max_iter=2000).fit(x_train,y_train)

print("Parameters: ")
print("Iterations: ", mlp_regr.n_iter_)
print("Learning Rate: 'constant'")
print("alpha: ", .0001)
print ("Learning Rate Iniitialization: ", .001)
print("Activation Function: 'relu'")
print("Number of Hidden Layers: ", mlp_regr.n_layers_ - 2)
print("Number of Neurons: ", 100)

#probability estimates

Parameters: 
Iterations:  1116
Learning Rate: 'constant'
alpha:  0.0001
Learning Rate Iniitialization:  0.001
Activation Function: 'relu'
Number of Hidden Layers:  1
Number of Neurons:  100


In [8]:
ypred = mlp_regr.predict(x_train)
ypred

array([ 61.07132137,  71.64285975,  81.97071186,  56.14803345,
        73.95268867,  65.35413266,  69.09325737,  62.38211701,
        77.65710611,  74.53535991,  52.61166673,  70.44264011,
        86.95545631,  75.94001198,  73.36875251,  81.88829366,
        81.53668183,  64.98708538,  59.98862375,  69.54599656,
        49.17337536,  57.46090933,  55.67214031,  61.42939942,
        61.28662387, 106.75482771,  66.1824447 ,  81.55876971,
        84.90250358,  57.88981736,  80.10813134,  49.26756228,
        80.56843862,  65.73080056,  52.50592405,  63.33500428,
        93.37060748,  60.516072  ,  54.12451554,  68.00640133,
        63.68065213,  59.38490053,  83.02051844,  88.78720186,
        80.75713722,  63.663905  ,  55.08961133,  89.37684281,
        72.80923601,  64.13275861,  63.961776  ,  48.86099241,
        90.57438351,  59.00854246,  50.73432709,  58.61348553,
        71.35591895,  46.31785626,  53.10675995,  73.04026902,
        62.20295264,  86.02566625,  53.58691298,  58.99

## Training Set Performance Metrics

In [9]:
mlp_score = mlp_regr.score(x_train,y_train)
mlp_score
print("R^2: ", mlp_score)
print("R: ", np.sqrt(mlp_score))

R^2:  0.3168650996013702
R:  0.5629077185484049


In [10]:
print("Evaluation Metrics for Training Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_train, ypred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_train, ypred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_train,ypred))) #RMSE

Evaluation Metrics for Training Set: 
Absolute Error:  13.403091834095338
MSE:  413.05694213164514
RMSE:  20.323802354176866


## Test Set Performance Metrics

In [11]:
y_test_pred = mlp_regr.predict(x_test)
y_test_pred

array([ 44.37548176,  79.24652554,  60.25472595,  54.53446779,
        67.82449957,  62.63304518,  60.39731093,  79.55705269,
        57.74620896,  80.69905152,  56.22241727,  53.3603741 ,
        62.54083885,  86.48575157,  68.80210686,  41.58373189,
        54.16440128,  76.42277003,  79.36276169,  74.85102389,
        47.77352413,  68.10236811,  86.07899523,  88.43884541,
        78.13597086,  49.2815806 ,  80.59462431,  61.53978718,
        76.57755845,  80.5958376 ,  77.82340389,  60.63366754,
        59.47565506,  68.21430688,  87.70259768,  62.07121319,
        79.49953797,  71.83510883,  73.1397742 ,  53.66347419,
        52.57417464,  95.8438388 ,  70.18781025,  68.48084226,
        53.66347419,  56.92408449,  58.98615203,  66.877922  ,
        57.26871751,  66.04829868,  64.93735025,  51.5844842 ,
        49.16932199,  56.9960055 ,  58.3898752 ,  51.83885739,
        92.70260779,  80.45182231,  61.60175076,  76.8409267 ,
        86.96283534,  76.74268519,  68.60272671,  56.97

In [12]:
mlp_test_score = mlp_regr.score(x_test,y_test)
print("R^2: ", mlp_test_score)
print("R: ", np.sqrt(mlp_test_score))

R^2:  0.1646513181266498
R:  0.40577249552754285


In [13]:

print("Evaluation Metrics for Test Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_test, y_test_pred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_test, y_test_pred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_test,y_test_pred))) #RMSE

Evaluation Metrics for Test Set: 
Absolute Error:  16.412070434778144
MSE:  682.2767789422467
RMSE:  26.120428383589857


# MLP Regressor Alternate Parameters #1

In [34]:

#Train and fit data
mlp_regr = MLPRegressor(hidden_layer_sizes=(5000,), activation='relu',
                        alpha=0.001, learning_rate='adaptive', learning_rate_init=0.001,
                        max_iter=2000, random_state=1, tol=0.0001).fit(x_train,y_train)

print("Parameters: ")
print("Iterations: ", mlp_regr.n_iter_)
print("Learning Rate: 'adaptive'")
print("alpha: ", .001)
print ("Learning Rate Iniitialization: ", .001)
print("Activation Function: 'relu'")
print("Number of Hidden Layers: ", mlp_regr.n_layers_ - 2)
print("Number of Neurons: ", 5000)

print("\nTraining Set Performance Metrics")

ypred = mlp_regr.predict(x_train)
mlp_score = mlp_regr.score(x_train,y_train)
print("R^2: ", mlp_score)
print("R: ", np.sqrt(mlp_score))
print("\nEvaluation Metrics for Training Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_train, ypred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_train, ypred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_train,ypred))) #RMSE


print("\nTest Set Performance Metrics")

y_test_pred = mlp_regr.predict(x_test)
mlp_test_score = mlp_regr.score(x_test,y_test)
print("R^2: ", mlp_test_score)
print("R: ", np.sqrt(mlp_test_score))
print("\nEvaluation Metrics for Test Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_test, y_test_pred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_test, y_test_pred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_test,y_test_pred))) #RMSE




Parameters: 
Iterations:  323
Learning Rate: 'adaptive'
alpha:  0.001
Learning Rate Iniitialization:  0.001
Activation Function: 'relu'
Number of Hidden Layers:  1
Number of Neurons:  5000

Training Set Performance Metrics
R^2:  0.3210706906730604
R:  0.5666310004518464

Evaluation Metrics for Training Set: 
Absolute Error:  13.230800223254999
MSE:  410.5140350324547
RMSE:  20.26114594568764

Test Set Performance Metrics
R^2:  0.15720150009130773
R:  0.39648644376738496

Evaluation Metrics for Test Set: 
Absolute Error:  16.334191039477908
MSE:  688.3614690401113
RMSE:  26.236643631381497


# MLP Regressor Alternate Parameters #2

In [29]:


#Train and fit data
mlp_regr = MLPRegressor(hidden_layer_sizes=(250,5), activation='identity', 
                        alpha=0.0001, learning_rate='adaptive', learning_rate_init=0.001,
                        max_iter=2000, random_state=1).fit(x_train,y_train)

print("Parameters: ")
print("Iterations: ", mlp_regr.n_iter_)
print("Learning Rate: 'adaptive'")
print("alpha: ", .0001)
print ("Learning Rate Iniitialization: ", .001)
print("Activation Function: 'identity'")
print("Number of Hidden Layers: ", mlp_regr.n_layers_ - 2)
print("Number of Neurons: ", 250)

print("\nTraining Set Performance Metrics")

ypred = mlp_regr.predict(x_train)
mlp_score = mlp_regr.score(x_train,y_train)
print("R^2: ", mlp_score)
print("R: ", np.sqrt(mlp_score))
print("\nEvaluation Metrics for Training Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_train, ypred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_train, ypred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_train,ypred))) #RMSE


print("\nTest Set Performance Metrics")

y_test_pred = mlp_regr.predict(x_test)
mlp_test_score = mlp_regr.score(x_test,y_test)
print("R^2: ", mlp_test_score)
print("R: ", np.sqrt(mlp_test_score))
print("\nEvaluation Metrics for Test Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_test, y_test_pred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_test, y_test_pred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_test,y_test_pred))) #RMSE

Parameters: 
Iterations:  77
Learning Rate: 'adaptive'
alpha:  0.0001
Learning Rate Iniitialization:  0.001
Activation Function: 'identity'
Number of Hidden Layers:  2
Number of Neurons:  100

Training Set Performance Metrics
R^2:  0.20462065467304746
R:  0.4523501460959723

Evaluation Metrics for Training Set: 
Absolute Error:  15.021714646255697
MSE:  480.9254512157257
RMSE:  21.930012567614405

Test Set Performance Metrics
R^2:  0.07386298004198288
R:  0.2717774457933971

Evaluation Metrics for Test Set: 
Absolute Error:  17.47737947648736
MSE:  756.4287782427227
RMSE:  27.50325032142061


# MLP Classifier Alternate Parameters # 3


In [33]:

#Train and fit data
mlp_regr = MLPRegressor(hidden_layer_sizes=(50,5), activation='relu', 
                        alpha=0.01, learning_rate='constant', learning_rate_init=0.01,
                        max_iter=5000, random_state=1, tol=0.0001).fit(x_train,y_train)

print("Parameters: ")
print("Iterations: ", mlp_regr.n_iter_)
print("Learning Rate: 'constant'")
print("alpha: ", .01)
print ("Learning Rate Iniitialization: ", .01)
print("Activation Function: 'relu'")
print("Number of Hidden Layers: ", mlp_regr.n_layers_ - 2)
print("Number of Neurons: ", 50)

print("\nTraining Set Performance Metrics")

ypred = mlp_regr.predict(x_train)
mlp_score = mlp_regr.score(x_train,y_train)
print("R^2: ", mlp_score)
print("R: ", np.sqrt(mlp_score))
print("\nEvaluation Metrics for Training Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_train, ypred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_train, ypred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_train,ypred))) #RMSE


print("\nTest Set Performance Metrics")

y_test_pred = mlp_regr.predict(x_test)
mlp_test_score = mlp_regr.score(x_test,y_test)
print("R^2: ", mlp_test_score)
print("R: ", np.sqrt(mlp_test_score))
print("\nEvaluation Metrics for Test Set: ")
print("Absolute Error: " ,metrics.mean_absolute_error(y_test, y_test_pred)) #Absolute error
print("MSE: ", metrics.mean_squared_error(y_test, y_test_pred)) #MSE
print("RMSE: ", np.sqrt(metrics.mean_squared_error(y_test,y_test_pred))) #RMSE

Parameters: 
Iterations:  181
Learning Rate: 'constant'
alpha:  0.01
Learning Rate Iniitialization:  0.01
Activation Function: 'relu'
Number of Hidden Layers:  2
Number of Neurons:  50

Training Set Performance Metrics
R^2:  0.3171375028780288
R:  0.5631496274330906

Evaluation Metrics for Training Set: 
Absolute Error:  13.278759177872228
MSE:  412.8922337198548
RMSE:  20.319749843929053

Test Set Performance Metrics
R^2:  0.1517552773027726
R:  0.3895577971274258

Evaluation Metrics for Test Set: 
Absolute Error:  16.791522963192403
MSE:  692.8097089454291
RMSE:  26.321278634318453


Findings:
    At one point, we compared the performance of two solvers for weight optimization. The 'adam" solver, porposed by Kingma, Iederik, and Ba, performed well. The "lbfgs' solver, which is better suited for small data sets seemed to take longer to converge which seems to indicate that the data set we are working with is considered a "large dataset." 
    We experimented with lowering the number of neurons and found that decreasing the number of neurons caused iteration count to increase. While increasing the number of neurons decreased iteraction count. When increasing neuron count, evaluation metrics improved slightly, and when decreasing neuron count, evaluation metrics worsened slightly. 
    Changing the alpha and learning rate did not seem to have much of an impact, especially in comparison to altering the number of neurons. However, we did note that lowering the alpha and learning rate initialization seemed to increase iteration count dramatically. 
    Decreasing the number of hidden layers also appeared to lower the iteration count and slightly improve evaluation metrics (< 1 point). 
    Note that all these observations were consistent for both training and test sets. 
    In comparison to the 'relu' activation function, the 'identity' activation function converged extremely quickly, but had slightly worse evaluation metrics (higher error scores and lower R scores). 
    *Note that above examples do not necessarily reflect these findings, as we altered each parameter individually for a control regressor and noted the changes

Comparison to Assignment 2:
    Our best regressor utilized mostly default parameters, with the exception of an adaptive learning rate, and a high number of neurons and hidden layers. 
    Using our best performing results: in comparison to the OLS algorithm from assignment 2, we noted that our evaluation metrics improved significantly for the training set, and slightly for the test set. In particular, the R^2 score was slightly better, but the OLS algorithm had better error scores. 
    In comparison to Gradient Descent: Our Regressor had slightly higher errors for both the training and test set. 