<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: [Available Here](https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv)

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [1]:
##### Your Code Here #####

In [2]:
import pandas as pd
import numpy as np

In [3]:
#load data in
df = pd.read_csv('churn.csv')
print(df.shape)

(7043, 21)


In [4]:
#high level look at data
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [5]:
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [6]:
#drop high cardinality customerID column that doesn't provide useful information
df = df.drop(columns='customerID')

In [7]:
df['OnlineBackup'].unique

<bound method Series.unique of 0       Yes
1        No
2       Yes
3        No
4        No
       ... 
7038     No
7039    Yes
7040     No
7041     No
7042     No
Name: OnlineBackup, Length: 7043, dtype: object>

In [8]:
#change values to binary if they are yes or no // male or female
df['Churn'] = df['Churn'].replace(regex='Yes', value=1)
df['Churn'] = df['Churn'].replace(regex='No', value=0)

df['gender'] = df['gender'].replace(regex='Female', value=1)
df['gender'] = df['gender'].replace(regex='Male', value=0)

df['Partner'] = df['Partner'].replace(regex='Yes', value=1)
df['Partner'] = df['Partner'].replace(regex='No', value=0)

df['Dependents'] = df['Dependents'].replace(regex='Yes', value=1)
df['Dependents'] = df['Dependents'].replace(regex='No', value=0)

df['PhoneService'] = df['PhoneService'].replace(regex='Yes', value=1)
df['PhoneService'] = df['PhoneService'].replace(regex='No', value=0)

df['OnlineSecurity'] = df['OnlineSecurity'].replace(regex='Yes', value=1)
df['OnlineSecurity'] = df['OnlineSecurity'].replace(regex='No', value=0)

df['OnlineBackup'] = df['OnlineBackup'].replace(regex='Yes', value=1)
df['OnlineBackup'] = df['OnlineBackup'].replace(regex='No', value=0)

df['DeviceProtection'] = df['DeviceProtection'].replace(regex='Yes', value=1)
df['DeviceProtection'] = df['DeviceProtection'].replace(regex='No', value=0)

df['TechSupport'] = df['TechSupport'].replace(regex='Yes', value=1)
df['TechSupport'] = df['TechSupport'].replace(regex='No', value=0)

df['StreamingTV'] = df['StreamingTV'].replace(regex='Yes', value=1)
df['StreamingTV'] = df['StreamingTV'].replace(regex='No', value=0)

df['StreamingMovies'] = df['StreamingMovies'].replace(regex='Yes', value=1)
df['StreamingMovies'] = df['StreamingMovies'].replace(regex='No', value=0)

df['PaperlessBilling'] = df['PaperlessBilling'].replace(regex='Yes', value=1)
df['PaperlessBilling'] = df['PaperlessBilling'].replace(regex='No', value=0)

In [9]:
df.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,1,0,1,0,1,0,No phone service,DSL,0,1,0,0,0,0,Month-to-month,1,Electronic check,29.85,29.85,0
1,0,0,0,0,34,1,No,DSL,1,0,1,0,0,0,One year,0,Mailed check,56.95,1889.5,0
2,0,0,0,0,2,1,No,DSL,1,1,0,0,0,0,Month-to-month,1,Mailed check,53.85,108.15,1
3,0,0,0,0,45,0,No phone service,DSL,1,0,1,1,0,0,One year,0,Bank transfer (automatic),42.3,1840.75,0
4,1,0,0,0,2,1,No,Fiber optic,0,0,0,0,0,0,Month-to-month,1,Electronic check,70.7,151.65,1


In [10]:
#now split the data into train and validation sets
from sklearn.model_selection import train_test_split

train, val = train_test_split(df, train_size=0.80, test_size=0.20, stratify=df['Churn'], random_state=42)
print(train.shape, val.shape)

(5634, 20) (1409, 20)


In [11]:
train['Churn'].value_counts(normalize=True)

0    0.734647
1    0.265353
Name: Churn, dtype: float64

In [12]:
val['Churn'].value_counts(normalize=True)

0    0.734564
1    0.265436
Name: Churn, dtype: float64

# Baseline ML model for Churn dataset - 0 features, using majoirty class

In [13]:
#initalize target, normalize values
target = 'Churn'
y_train = train[target]
y_train.value_counts(normalize=True)

0    0.734647
1    0.265353
Name: Churn, dtype: float64

In [14]:
#intialize majoirty class
majority = y_train.mode()[0]
y_pred = [majority] * len(y_train)

#show accuracy, should be same as majoirty value
from sklearn.metrics import accuracy_score
accuracy_score(y_train, y_pred)
#as expected accuracy is 73.46% just guessing the majority value

0.7346467873624423

# Prep the data to be encoded and normalized for future analysis

# Hyperparameter tuning on NN (learning rate, epochs, batch_size, activation function)

In [19]:
import tensorflow
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, PReLU
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import Adam, Nadam
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import Normalizer

In [20]:
target = 'Churn'
X_train = train.drop(columns=target)
y_train = train[target]
X_val = val.drop(columns=target)
y_val = val[target]

In [21]:
encoder = OrdinalEncoder()
X_train_encoded = encoder.fit_transform(X_train)
X_val_encoded = encoder.fit_transform(X_val)

In [23]:
scaler = StandardScaler() #mean 0 std 1

X_train = scaler.fit_transform(X_train_encoded)
X_val = scaler.transform(X_val_encoded)

In [24]:
inputs = X_train.shape[1]
print(inputs)

19


In [39]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# Function to create model, required for KerasClassifier
def create_model(learning_rate=0.0001):
    # create model
    model = Sequential()
    model.add(Dense(128, input_shape=(inputs,), activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate = learning_rate, name='Adam'), metrics=['accuracy'])
    return model

# create model
opt_model = KerasClassifier(build_fn=create_model, verbose=0, epochs=20, batch_size=100,) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
learning_rate = [0.0005, 0.001, 0.01, 0.05, 0.1]
param_grid = dict(optimizer=optimizer)

# Create Grid Search
grid = GridSearchCV(estimator=opt_model, param_grid=param_g, n_jobs=-1, verbose=0, scoring='accuracy')
grid_result = grid.fit(X_train, y_train)


# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.799783924690426 using {'epochs': 20, 'learning_rate': 0.001}
Means: 0.7983635959596598, Stdev: 0.017453128274491234 with: {'epochs': 20, 'learning_rate': 0.0005}
Means: 0.799783924690426, Stdev: 0.013778664858896068 with: {'epochs': 20, 'learning_rate': 0.001}
Means: 0.7903779505469652, Stdev: 0.016245490956909504 with: {'epochs': 20, 'learning_rate': 0.01}
Means: 0.7818504620166082, Stdev: 0.032022556010854034 with: {'epochs': 20, 'learning_rate': 0.05}
Means: 0.7346473843224833, Stdev: 0.011082225108839655 with: {'epochs': 20, 'learning_rate': 0.1}


In [35]:
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(inputs,)))
    model.add(Dense(32, activation='relu'))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7999618768692016 using {'batch_size': 100, 'epochs': 20}
Means: 0.7896652579307556, Stdev: 0.018079179032468005 with: {'batch_size': 10, 'epochs': 20}
Means: 0.7901987552642822, Stdev: 0.015969832144096415 with: {'batch_size': 20, 'epochs': 20}
Means: 0.7972988367080689, Stdev: 0.013816716124492676 with: {'batch_size': 40, 'epochs': 20}
Means: 0.7996072411537171, Stdev: 0.014168622466091518 with: {'batch_size': 60, 'epochs': 20}
Means: 0.798717737197876, Stdev: 0.016961854769572977 with: {'batch_size': 80, 'epochs': 20}
Means: 0.7999618768692016, Stdev: 0.013748301880961553 with: {'batch_size': 100, 'epochs': 20}


In [36]:
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(inputs,)))
    model.add(Dense(32, activation='relu'))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
param_grid = {'batch_size': [100],
              'epochs': [20, 50, 75]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.798541533946991 using {'batch_size': 100, 'epochs': 20}
Means: 0.798541533946991, Stdev: 0.016667958719076056 with: {'batch_size': 100, 'epochs': 20}
Means: 0.7974755048751831, Stdev: 0.014310099764887337 with: {'batch_size': 100, 'epochs': 50}
Means: 0.7932157635688781, Stdev: 0.016483458756306007 with: {'batch_size': 100, 'epochs': 75}


In [38]:
def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(inputs,)))
    model.add(Dense(32, activation='relu'))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, activation='sigmoid'))     
    model.compile(loss = "binary_crossentropy", optimizer = optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=100, verbose=0) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8036896586418152 using {'optimizer': 'Nadam'}
Means: 0.7873598337173462, Stdev: 0.01592568534004453 with: {'optimizer': 'SGD'}
Means: 0.8020920395851135, Stdev: 0.014076566109028712 with: {'optimizer': 'RMSprop'}
Means: 0.7843434333801269, Stdev: 0.012922711209439336 with: {'optimizer': 'Adagrad'}
Means: 0.3924576938152313, Stdev: 0.11447528152020713 with: {'optimizer': 'Adadelta'}
Means: 0.7983640670776367, Stdev: 0.01478940489843911 with: {'optimizer': 'Adam'}
Means: 0.7971219897270203, Stdev: 0.015493040209876454 with: {'optimizer': 'Adamax'}
Means: 0.8036896586418152, Stdev: 0.013460231138909215 with: {'optimizer': 'Nadam'}


In [40]:
def create_model(activation='relu'):
    model = Sequential()
    model.add(Dense(64, activation=activation, input_shape=(inputs,)))
    model.add(Dense(32, activation=activation))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, activation='sigmoid'))   
    optimizer = Nadam(learning_rate=0.001)
    model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=100, verbose=0) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8049328446388244 using {'activation': 'softplus'}
Means: 0.7346473813056946, Stdev: 0.011082224181538 with: {'activation': 'softmax'}
Means: 0.8049328446388244, Stdev: 0.01241332675179136 with: {'activation': 'softplus'}
Means: 0.803866982460022, Stdev: 0.014320240500888983 with: {'activation': 'softsign'}
Means: 0.7983640670776367, Stdev: 0.014444663473309808 with: {'activation': 'relu'}
Means: 0.7999620079994202, Stdev: 0.011237134406431705 with: {'activation': 'tanh'}
Means: 0.8040447473526001, Stdev: 0.014975978961969102 with: {'activation': 'sigmoid'}
Means: 0.8029795169830323, Stdev: 0.014031589625561922 with: {'activation': 'hard_sigmoid'}
Means: 0.7997855067253112, Stdev: 0.014302840898040186 with: {'activation': 'linear'}


In [41]:
def create_model(init_mode='uniform'):
    model = Sequential()
    model.add(Dense(64, kernel_initializer=init_mode, activation='softplus', input_shape=(inputs,)))
    model.add(Dense(32, kernel_initializer=init_mode, activation='softplus'))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))   
    optimizer = Nadam(learning_rate=0.001)
    model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=100, verbose=0) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
param_grid = dict(init_mode=init_mode)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8051101684570312 using {'init_mode': 'glorot_uniform'}
Means: 0.8040452241897583, Stdev: 0.011764942964412868 with: {'init_mode': 'uniform'}
Means: 0.8029809355735779, Stdev: 0.011827988097390375 with: {'init_mode': 'lecun_uniform'}
Means: 0.8024489998817443, Stdev: 0.011405939157751777 with: {'init_mode': 'normal'}
Means: 0.7983651638031006, Stdev: 0.013148051765674518 with: {'init_mode': 'zero'}
Means: 0.8044004559516906, Stdev: 0.013414160950635315 with: {'init_mode': 'glorot_normal'}
Means: 0.8051101684570312, Stdev: 0.01479167537706617 with: {'init_mode': 'glorot_uniform'}
Means: 0.8038674473762513, Stdev: 0.011711181203371538 with: {'init_mode': 'he_normal'}
Means: 0.8020918726921081, Stdev: 0.01068338929631471 with: {'init_mode': 'he_uniform'}


In [42]:
from tensorflow.keras.constraints import MaxNorm


def create_model(dropout_rate=0.0, weight_constraint=0):
    model = Sequential()
    model.add(Dense(64, kernel_initializer='glorot_uniform', activation='softplus', kernel_constraint=MaxNorm(weight_constraint), input_shape=(inputs,)))
    model.add(Dense(32, kernel_initializer='glorot_uniform', kernel_constraint=MaxNorm(weight_constraint), activation='softplus'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, kernel_initializer='glorot_uniform', kernel_constraint=MaxNorm(weight_constraint), activation='sigmoid'))   
    optimizer = Nadam(learning_rate=0.001)
    model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=100, verbose=0) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
weight_constraint = [1, 2, 3, 4, 5]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5]
param_grid = dict(dropout_rate=dropout_rate, weight_constraint=weight_constraint)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8051109552383423 using {'dropout_rate': 0.2, 'weight_constraint': 1}
Means: 0.7990761399269104, Stdev: 0.015242866861710239 with: {'dropout_rate': 0.0, 'weight_constraint': 1}
Means: 0.7962350010871887, Stdev: 0.010845697465406813 with: {'dropout_rate': 0.0, 'weight_constraint': 2}
Means: 0.8001356959342957, Stdev: 0.02344108278084131 with: {'dropout_rate': 0.0, 'weight_constraint': 3}
Means: 0.7962339043617248, Stdev: 0.013210501359921729 with: {'dropout_rate': 0.0, 'weight_constraint': 4}
Means: 0.8051098465919495, Stdev: 0.014076354353409598 with: {'dropout_rate': 0.0, 'weight_constraint': 5}
Means: 0.8024482250213623, Stdev: 0.014064680194122865 with: {'dropout_rate': 0.1, 'weight_constraint': 1}
Means: 0.8035120487213134, Stdev: 0.013036865599745491 with: {'dropout_rate': 0.1, 'weight_constraint': 2}
Means: 0.8013815760612488, Stdev: 0.01428907751555653 with: {'dropout_rate': 0.1, 'weight_constraint': 3}
Means: 0.8026253819465637, Stdev: 0.008631946059488837 with: {'dropou

In [43]:
def create_model(neurons=1):
    model = Sequential()
    model.add(Dense(neurons, kernel_initializer='glorot_uniform', activation='softplus', kernel_constraint=MaxNorm(1), input_shape=(inputs,)))
    model.add(Dense(neurons, kernel_initializer='glorot_uniform', kernel_constraint=MaxNorm(1), activation='softplus'))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, kernel_initializer='glorot_uniform', kernel_constraint=MaxNorm(1), activation='sigmoid'))   
    optimizer = Nadam(learning_rate=0.001)
    model.compile(loss="binary_crossentropy", optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=20, batch_size=100, verbose=0) #model same eah time, pass in kerasclassifier then do parameter grid search

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
neurons = [1, 5, 10, 25, 50, 100]
param_grid = dict(neurons=neurons)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.8052871465682984 using {'neurons': 25}
Means: 0.7346473813056946, Stdev: 0.011082224181538 with: {'neurons': 1}
Means: 0.7951695799827576, Stdev: 0.017150368145742608 with: {'neurons': 5}
Means: 0.8010285258293152, Stdev: 0.013156178445846932 with: {'neurons': 10}
Means: 0.8052871465682984, Stdev: 0.015403809352465287 with: {'neurons': 25}
Means: 0.8024486899375916, Stdev: 0.012339726679339313 with: {'neurons': 50}
Means: 0.804933774471283, Stdev: 0.010195194258291384 with: {'neurons': 100}


## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?