<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [1]:
import pandas as pd
pd.options.display.max_columns= 99

from sklearn.preprocessing import StandardScaler

In [2]:
##### Your Code Here #####

churnframe = pd.read_csv('https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv')
print(churnframe.shape)
churnframe.head()

(7043, 21)


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [3]:
churnframe['Churn'].value_counts()

No     5174
Yes    1869
Name: Churn, dtype: int64

In [4]:
features = churnframe.columns.tolist()
features.remove('customerID')
features.remove('Churn')

X_train = churnframe[features]
y_train = churnframe['Churn']

In [5]:
from sklearn.preprocessing import OrdinalEncoder, StandardScaler
from sklearn.model_selection import GridSearchCV

from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [6]:
encoder = OrdinalEncoder()
scaler = StandardScaler()

X_encoded = encoder.fit_transform(X_train)
X_scaled = scaler.fit_transform(X_encoded)

In [7]:
mapdict = {'No': 0, 'Yes': 1}
y_encoded = y_train.map(mapdict)
y_array = y_encoded.values

inputs = X_scaled.shape[1]

In [8]:
def create_model(optimizer='adam',activation='relu', learning_rate=0.1, momentum=0.1, kernel_initializer='uniform'):
    # Create Model
    model = Sequential()
    model.add(Dense(42, activation=activation, input_shape=(inputs,),kernel_initializer=kernel_initializer))
    model.add(Dropout(0.3))
    model.add(Dense(42, activation=activation,kernel_initializer=kernel_initializer))
    model.add(Dropout(0.3))
    model.add(Dense(1, activation=activation,kernel_initializer=kernel_initializer))

    # Compile Model
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

churnmodel = KerasClassifier(build_fn=create_model, verbose=0)

In [9]:
# grid search on batch size
param_grid = {'batch_size': [20, 40, 60, 80, 100],
              'epochs': [20]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7978122591972351 using {'batch_size': 20, 'epochs': 20}
Means: 0.7978122591972351, Stdev: 0.011825576309982697 with: {'batch_size': 20, 'epochs': 20}
Means: 0.7861741065979004, Stdev: 0.024427246974087345 with: {'batch_size': 40, 'epochs': 20}
Means: 0.7965368747711181, Stdev: 0.006703794662105161 with: {'batch_size': 60, 'epochs': 20}
Means: 0.7874523162841797, Stdev: 0.0232637697245823 with: {'batch_size': 80, 'epochs': 20}
Means: 0.7877306580543518, Stdev: 0.013218734708044091 with: {'batch_size': 100, 'epochs': 20}


In [10]:
# optimizer
param_grid = {'batch_size': [40],
              'epochs': [20],
             'optimizer': ['adam','SGD','RMSprop','Adagrad','Adadelta','Adamax','Nadam']}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8033519268035889 using {'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}
Means: 0.8022148609161377, Stdev: 0.005439149547100095 with: {'batch_size': 40, 'epochs': 20, 'optimizer': 'adam'}
Means: 0.2653713583946228, Stdev: 0.004757620185291315 with: {'batch_size': 40, 'epochs': 20, 'optimizer': 'SGD'}
Means: 0.7978127598762512, Stdev: 0.008551378685846711 with: {'batch_size': 40, 'epochs': 20, 'optimizer': 'RMSprop'}
Means: 0.786312735080719, Stdev: 0.002973181564303053 with: {'batch_size': 40, 'epochs': 20, 'optimizer': 'Adagrad'}
Means: 0.7346286535263061, Stdev: 0.004757632470334884 with: {'batch_size': 40, 'epochs': 20, 'optimizer': 'Adadelta'}
Means: 0.8019309759140014, Stdev: 0.003889237205019425 with: {'batch_size': 40, 'epochs': 20, 'optimizer': 'Adamax'}
Means: 0.8033519268035889, Stdev: 0.007138327381597007 with: {'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}


In [11]:
# activation
param_grid = {'batch_size': [40],
              'epochs': [20],
             'optimizer': ['Nadam'],
             'activation': ['relu','sigmoid','tanh','elu','selu']}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8023574113845825 using {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}
Means: 0.8023574113845825, Stdev: 0.005742851611722875 with: {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}
Means: 0.8019323945045471, Stdev: 0.008145671184592146 with: {'activation': 'sigmoid', 'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}
Means: 0.7824754834175109, Stdev: 0.01886766806881873 with: {'activation': 'tanh', 'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}
Means: 0.7978139758110047, Stdev: 0.004884563056720921 with: {'activation': 'elu', 'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}
Means: 0.8002269387245178, Stdev: 0.005922236858385498 with: {'activation': 'selu', 'batch_size': 40, 'epochs': 20, 'optimizer': 'Nadam'}


In [28]:
from tensorflow.keras.optimizers import Nadam

In [34]:
def create_model(activation='relu', learning_rate=0.1, beta_1=0.5, beta_2=0.5, kernel_initializer='uniform',dropout_val=0.3,l1_count=42,l2_count=42):
    # Create Model
    model = Sequential()
    model.add(Dense(l1_count, activation=activation, input_shape=(inputs,),kernel_initializer=kernel_initializer))
    model.add(Dropout(dropout_val))
    model.add(Dense(l2_count, activation=activation,kernel_initializer=kernel_initializer))
    model.add(Dropout(dropout_val))
    model.add(Dense(1, activation=activation,kernel_initializer=kernel_initializer))
    
    # create a Nadam optimizer object
    optimizer = Nadam(learning_rate=learning_rate, beta_1=beta_1, beta_2=beta_2 )

    # Compile Model
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

churnmodel = KerasClassifier(build_fn=create_model, verbose=0)

In [30]:
param_grid = {'batch_size': [40],
              'epochs': [20],
             'activation': ['relu'],
             'learning_rate': [0.01,0.05,0.1,0.5,1]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7953987002372742 using {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7953987002372742, Stdev: 0.006161596665631569 with: {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7824800133705139, Stdev: 0.004589577086116449 with: {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'learning_rate': 0.05}
Means: 0.7425818800926208, Stdev: 0.010788285413146042 with: {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'learning_rate': 0.1}
Means: 0.45699663162231446, Stdev: 0.23070316519385528 with: {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'learning_rate': 0.5}
Means: 0.5438550531864166, Stdev: 0.23054277596898592 with: {'activation': 'relu', 'batch_size': 40, 'epochs': 20, 'learning_rate': 1}


In [31]:
param_grid = {'batch_size': [40],
              'epochs': [20],
             'activation': ['relu'],
             'learning_rate': [0.01],
             'beta_1': [0.01,0.05,0.1,0.5,1]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7956836938858032 using {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.785032594203949, Stdev: 0.010661899060774097 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 0.01, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7883015513420105, Stdev: 0.0068083964534755344 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 0.05, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7860258102416993, Stdev: 0.012515183421584603 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 0.1, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7918490767478943, Stdev: 0.006211014833619207 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 0.5, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7956836938858032, Stdev: 0.002850223507180162 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'epochs': 20, 'learning_rate': 0.01}


In [32]:
param_grid = {'batch_size': [40],
              'epochs': [20],
             'activation': ['relu'],
             'learning_rate': [0.01],
             'beta_1': [1],
             'beta_2': [0.01,0.05,0.1,0.5,1]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7965362668037415 using {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7965362668037415, Stdev: 0.0035969081916756474 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7938372969627381, Stdev: 0.006950129413571321 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.05, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7932689309120178, Stdev: 0.006931439803202795 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.1, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7963929057121277, Stdev: 0.007627370003907808 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.5, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7346286535263061, Stdev: 0.004757632470334884 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 1, 'epochs': 20, 'learning_rate': 0.01}


In [33]:
param_grid = {'batch_size': [40],
              'epochs': [20],
             'activation': ['relu'],
             'learning_rate': [0.01],
             'beta_1': [1],
             'beta_2': [0.01],
             'kernel_initializer': ['Ones','RandomNormal','RandomUniform','TruncatedNormal','Orthogonal','LecunUniform','GlorotUniform']}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

ValueError: Unknown initializer:LecunUniform



Best: 0.7954002141952514 using {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'learning_rate': 0.01}
Means: 0.4208420753479004, Stdev: 0.012591194688671814 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'epochs': 20, 'kernel_initializer': 'Ones', 'learning_rate': 0.01}
Means: 0.7901450157165527, Stdev: 0.004981377339241919 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'epochs': 20, 'kernel_initializer': 'RandomNormal', 'learning_rate': 0.01}
Means: 0.7936954617500305, Stdev: 0.007690514458304882 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'epochs': 20, 'kernel_initializer': 'RandomUniform', 'learning_rate': 0.01}
Means: 0.7931271672248841, Stdev: 0.006999306569871932 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'epochs': 20, 'kernel_initializer': 'TruncatedNormal', 'learning_rate': 0.01}
Means:

In [35]:
param_grid = {'batch_size': [40],
              'epochs': [20],
             'activation': ['relu'],
             'learning_rate': [0.01],
             'beta_1': [1],
             'beta_2': [0.01],
             'kernel_initializer': ['Orthogonal'],
             'dropout_val': [0.25,0.3,0.35,0.4,0.45,0.5]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7969608902931213 using {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'learning_rate': 0.01}
Means: 0.7965354561805725, Stdev: 0.005530400216428808 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.25, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'learning_rate': 0.01}
Means: 0.7823394775390625, Stdev: 0.02053831495636072 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.3, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'learning_rate': 0.01}
Means: 0.7949735760688782, Stdev: 0.006067711841546868 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.35, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'learning_rate': 0.01}
Means: 0.7969608902931213, Stdev: 0.006914998326539822 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'd

In [36]:
param_grid = {'batch_size': [40],
              'epochs': [20],
             'activation': ['relu'],
             'learning_rate': [0.01],
             'beta_1': [1],
             'beta_2': [0.01],
             'kernel_initializer': ['Orthogonal'],
             'dropout_val': [0.4],
             'l1_count': [21,30,40,50,60]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7956836938858032 using {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'learning_rate': 0.01}
Means: 0.7946905851364136, Stdev: 0.0074534666224943405 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 21, 'learning_rate': 0.01}
Means: 0.7824835419654846, Stdev: 0.021080561196511464 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 30, 'learning_rate': 0.01}
Means: 0.7956836938858032, Stdev: 0.005824627620676899 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'learning_rate': 0.01}
Means: 0.7837511777877808, Stdev: 0.02768640684605893 with: {'activat

In [40]:
param_grid = {'batch_size': [40],
              'epochs': [20],
             'activation': ['relu'],
             'learning_rate': [0.01],
             'beta_1': [1],
             'beta_2': [0.01],
             'kernel_initializer': ['Orthogonal'],
             'dropout_val': [0.40],
             'l1_count': [40],
             'l2_count': [21,30,40,50,60]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7980974674224853 using {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 21, 'learning_rate': 0.01}
Means: 0.7980974674224853, Stdev: 0.005844469756870711 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 21, 'learning_rate': 0.01}
Means: 0.796109426021576, Stdev: 0.0042997271639112315 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 30, 'learning_rate': 0.01}
Means: 0.7844708681106567, Stdev: 0.021965161096534947 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 40, 'learning_rate': 0.01}
Means:

In [41]:
param_grid = {'batch_size': [40],
              'epochs': [20,40,60,80,100],
             'activation': ['relu'],
             'learning_rate': [0.01],
             'beta_1': [1],
             'beta_2': [0.01],
             'kernel_initializer': ['Orthogonal'],
             'dropout_val': [0.40],
             'l1_count': [40],
             'l2_count': [21]}

grid = GridSearchCV(estimator=churnmodel, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_scaled, y_array)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7973879337310791 using {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 40, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 21, 'learning_rate': 0.01}
Means: 0.7946890711784362, Stdev: 0.003776898788220109 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 20, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 21, 'learning_rate': 0.01}
Means: 0.7973879337310791, Stdev: 0.009964218113131027 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 40, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 21, 'learning_rate': 0.01}
Means: 0.79667729139328, Stdev: 0.005020058079491214 with: {'activation': 'relu', 'batch_size': 40, 'beta_1': 1, 'beta_2': 0.01, 'dropout_val': 0.4, 'epochs': 60, 'kernel_initializer': 'Orthogonal', 'l1_count': 40, 'l2_count': 21, 'learning_rate': 0.01}
Means: 0

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?