
# Hyperparameter Tuning

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: [Available Here](https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv)

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


#Load the data

In [None]:
import pandas as pd

df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv' ).dropna().drop(['customerID'], axis=1, )
df

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.30,1840.75,No
4,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.70,151.65,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,Male,0,Yes,Yes,24,Yes,Yes,DSL,Yes,No,Yes,Yes,Yes,Yes,One year,Yes,Mailed check,84.80,1990.5,No
7039,Female,0,Yes,Yes,72,Yes,Yes,Fiber optic,No,Yes,Yes,No,Yes,Yes,One year,Yes,Credit card (automatic),103.20,7362.9,No
7040,Female,0,Yes,Yes,11,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,Yes,Electronic check,29.60,346.45,No
7041,Male,1,Yes,No,4,Yes,Yes,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Mailed check,74.40,306.6,Yes


#Clean the data if necessary (it will be)

In [None]:
df.isna().sum()

gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

In [None]:
df.dtypes

gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

In [None]:
df['Churn'].map({'No': 0, 'Yes': 1}).value_counts()

0    5174
1    1869
Name: Churn, dtype: int64

In [None]:
df['gender'].value_counts()

Male      3555
Female    3488
Name: gender, dtype: int64

In [None]:
df['Partner'].map({'No': 0, 'Yes': 1}).value_counts()

0    3641
1    3402
Name: Partner, dtype: int64

In [None]:
df['TotalCharges'].str.replace(' ', '0').astype(float)

0         29.85
1       1889.50
2        108.15
3       1840.75
4        151.65
         ...   
7038    1990.50
7039    7362.90
7040     346.45
7041     306.60
7042    6844.50
Name: TotalCharges, Length: 7043, dtype: float64

In [None]:
def wrangle(dataframe):
  df = dataframe.copy()

  #ONE-HOT ENCODE:
  #1.INTERNET SERVICE:
  df['fiber_optic'] = df['InternetService'] == 'Fiber_Optic'
  df['dsl'] = df['InternetService'] == 'DSL'

  #2. Contract Types:
  df['Contract_mtm'] = df['Contract'] == 'Month_to_Mont'
  df['Contract_2yrs'] = df['Contract'] == 'Two_years'
  df['Contract_1yr'] = df['Contract'] == 'One_year'
  df = df.drop(columns='Contract')

  #3.Payment Method:
  df['Payment_echeck'] = df['PaymentMethod'] == 'Electronic_check'
  df['Payment_Mailcheck'] = df['PaymentMethod'] == 'Mailed_check'
  df['Payment_bank'] = df['PaymentMethod'] == 'Bank_Transfer'
  df['Payment_credCard']  = df['PaymentMethod'] == 'Credit_card'
  df = df.drop(columns='PaymentMethod')

  #Categorical Columns:
  Bin_Columns = ['gender', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines',
                 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
                 'TechSupport', 'StreamingTV', 'StreamingMovies', 'PaperlessBilling',
                 'Churn']

  for column in Bin_Columns:
    df[column] = df [column].map({
            'Male': 0,
            'Female': 1,
            'No': 0,
            'Yes': 1,
            'No phone service': 0,
            'Fiber optic': 1,
            'DSL': 1,
            'No internet service': 0,
    })

  #Nan 
  df['TotalCharges'] = df['TotalCharges'].str.replace(' ', '0').astype(float)
  
  for column in df.columns:
    df[column] = df[column].astype(float)

  return df

In [None]:
Data_clean = wrangle(df)
Data_clean

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,PaperlessBilling,MonthlyCharges,TotalCharges,Churn,fiber_optic,dsl,Contract_mtm,Contract_2yrs,Contract_1yr,Payment_echeck,Payment_Mailcheck,Payment_bank,Payment_credCard
0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,29.85,29.85,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,34.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,56.95,1889.50,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,2.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,53.85,108.15,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,45.0,0.0,0.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,42.30,1840.75,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,0.0,0.0,0.0,2.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,70.70,151.65,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,0.0,0.0,1.0,1.0,24.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,84.80,1990.50,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7039,1.0,0.0,1.0,1.0,72.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,103.20,7362.90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7040,1.0,0.0,1.0,1.0,11.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,29.60,346.45,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7041,0.0,1.0,1.0,0.0,4.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,74.40,306.60,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
Data_clean['Churn'].value_counts(normalize=True)

0.0    0.73463
1.0    0.26537
Name: Churn, dtype: float64

#Normalize

In [None]:
from tensorflow.keras.utils import normalize

target = 'Churn'
features = Data_clean.drop(columns=target).columns

X = Data_clean[features].values
y = Data_clean[target].values

print('X', X.shape, type(X))
print('y', y.shape, type(y))

X (7043, 26) <class 'numpy.ndarray'>
y (7043,) <class 'numpy.ndarray'>


#Create and fit a baseline Keras MLP model to the data.

In [None]:
#https://www.tensorflow.org/api_docs/python/tf/keras/wrappers

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
def create_model(additional_layers=0, 
                 nodes_per_layer=26, 
                 activation_per_layer='relu',
                 loss_function='binary_crossentropy', 
                 optimizer='adam'):
  
    model = Sequential()

    model.add(Dense(nodes_per_layer, 
                    activation='relu', 
                    input_dim=26))
    
    for _ in range(additional_layers):
        model.add(Dense(nodes_per_layer, 
                        activation=activation_per_layer))
    model.add(Dense(1, activation='sigmoid'))
    
    model.compile(loss=loss_function, 
                  optimizer=optimizer, 
                  metrics=['accuracy'])
    
    return model

In [None]:
#Implement scikit and keras
model = KerasClassifier(build_fn=create_model, verbose=1)
results = model.fit(X, y, epochs=10, 
                    validation_split=0.2, 
                    verbose=1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#Hyperparameter tune (at least) the following parameters:

-batch_size

-training epochs

-optimizer

-learning rate (if applicable to optimizer)

-momentum (if applicable to optimizer)

-activation functions

-network weight initialization

-dropout regularization

-number of neurons in the hidden layer

In [None]:
#Machine learning Quick reference book
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.callbacks import  EarlyStopping

stop = EarlyStopping(monitor='accuracy', 
                     min_delta=0.001, 
                     patience=5)
#Number of Nodes
params = {
    'nodes_per_layer': [20, 42, 78, 100]
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    n_jobs=1,
    cv=5,
)

grid.fit(X, y, callbacks=[stop], 
         epochs=10, validation_split=0.2, verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 1/10
Ep

In [None]:
grid.score(X,y)



0.7861706614494324

In [None]:
#Additional layers
params = {
    'nodes_per_layer': [50],
    'additional_layers': [10,8,4,0]
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    scoring='accuracy',
    n_jobs=1,
    cv=5,
)

grid.fit(X, y, callbacks=[stop], 
         epochs=10,
         validation_split=0.2, verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
#Batch Size:
params = {
    'nodes_per_layer': [50],
    'additional_layers': [4],
    'batch_size': [10, 30, 60, 100],
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    scoring='accuracy',
    n_jobs=1,
    cv=5,
    verbose=1
)

grid.fit(X, y, callbacks=[stop], 
         epochs=10,
         validation_split=0.2, verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)


Fitting 5 folds for each of 4 candidates, totalling 20 fits
Epoch 1/10


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
E

[Parallel(n_jobs=1)]: Done  20 out of  20 | elapsed:  1.6min finished


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Best score: 0.7877345917478547
Best params: {'additional_layers': 4, 'batch_size': 10, 'nodes_per_layer': 50}


In [None]:
#Optimizing:
params = {
    'nodes_per_layer': [50],
    'additional_layers': [4],
    'batch_size': [10],
    'epochs': [100],
    'optimizer': ['adam', 'adadelta'
                  'nadam', 'rmsprop', 'ftrl']
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    n_jobs=1,
    cv=5,
    verbose=1
)

grid.fit(X, y, callbacks=[stop], 
         epochs=10,
         validation_split=0.2, verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Fitting 5 folds for each of 4 candidates, totalling 20 fits
Epoch 1/10


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


ValueError: Unknown optimizer: adadeltanadam



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
E

[Parallel(n_jobs=1)]: Done  20 out of  20 | elapsed:  2.4min finished


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Best score: 0.7860297441482544
Best params: {'additional_layers': 4, 'batch_size': 10, 'epochs': 100, 'nodes_per_layer': 50, 'optimizer': 'adam'}



In [None]:
from tensorflow.keras.optimizers import Adam

def create_model(additional_layers=0, 
                 nodes_per_layer=26, 
                 activation_per_layer='relu',
                 loss_function='binary_crossentropy', 
                 learning_rate=0.001):
  
    model2 = Sequential()

    model2.add(Dense(nodes_per_layer, 
                    activation='relu', 
                    input_dim=26))
    
    for _ in range(additional_layers):
        model2.add(Dense(nodes_per_layer, 
                        activation=activation_per_layer))
    model2.add(Dense(1, activation='sigmoid'))

    #Optimiser model
    optimizer = Adam(learning_rate=learning_rate)

    model2.compile(loss=loss_function, 
                  optimizer=optimizer, 
                  metrics=['accuracy'])
    
    return model2

In [None]:
#Learning rate for Adam Optimizer:

params = {
    'nodes_per_layer': [50],
    'additional_layers': [4],
    'batch_size': [10],
    'epochs': [100],
    'learning_rate': [0.001, 0.01, 0.1] 
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    n_jobs=1,
    cv=5,
    verbose=1
)

grid.fit(X, y, callbacks=[stop], 
         epochs=10,
         validation_split=0.2, verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Fitting 5 folds for each of 3 candidates, totalling 15 fits
Epoch 1/10


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epo

[Parallel(n_jobs=1)]: Done  15 out of  15 | elapsed:  2.2min finished


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Best score: 0.7794981718063354
Best params: {'additional_layers': 4, 'batch_size': 10, 'epochs': 100, 'learning_rate': 0.001, 'nodes_per_layer': 50}



In [None]:
#Adam Optimizer
params = {
    'nodes_per_layer': [52],
    'additional_layers': [4],
    'batch_size': [10],
    'epochs': [100],
    'learning_rate': [0.001, 0.01, 0.1]
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    n_jobs=1,
    cv=5,
    verbose=1
)

grid.fit(X, y, callbacks=[stop], 
         validation_split=0.2, 
         verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Fitting 5 folds for each of 3 candidates, totalling 15 fits
Epoch 1/100


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18

[Parallel(n_jobs=1)]: Done  15 out of  15 | elapsed:  3.5min finished


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100

Best score: 0.7819140672683715
Best params: {'additional_layers': 4, 'batch_size': 10, 'epochs': 100, 'learning_rate': 0.001, 'nodes_per_layer': 52}



In [None]:
from tensorflow.keras.optimizers import Adam

#Compile the model
def create_model(additional_layers=0, 
                 nodes_per_layer=26, 
                 activation_per_layer='relu',
                 loss_function='binary_crossentropy', 
                 learning_rate=0.001, 
                 network_weights_init='uniform'):
  
    model4 = Sequential()

    model4.add(Dense(nodes_per_layer, 
                    activation='relu', 
                    input_dim=26, 
                    kernel_initializer=network_weights_init))
    
    for _ in range(additional_layers):
        model4.add(Dense(nodes_per_layer, 
                        activation=activation_per_layer, 
                        kernel_initializer=network_weights_init))
    model4.add(Dense(1, activation='sigmoid', 
                    kernel_initializer=network_weights_init))

    optimizer = Adam(learning_rate=learning_rate)
    
    model4.compile(loss=loss_function, 
                  optimizer=optimizer, 
                  metrics=['accuracy'])
    
    return model4

In [None]:
params = {
    'nodes_per_layer': [52],
    'additional_layers': [4],
    'batch_size': [10],
    'epochs': [100],
    'learning_rate': [0.001],
    'network_weights_init': ['uniform', 
                             'zero', 
                             'glorot_normal']
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    n_jobs=1,
    cv=5,
    verbose=1
)

grid.fit(X, y, callbacks=[stop], 
         validation_split=0.2, verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Fitting 5 folds for each of 3 candidates, totalling 15 fits
Epoch 1/100


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100

[Parallel(n_jobs=1)]: Done  15 out of  15 | elapsed:  3.9min finished


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100

Best score: 0.7936962604522705
Best params: {'additional_layers': 4, 'batch_size': 10, 'epochs': 100, 'learning_rate': 0.001, 'network_weights_init': 'uniform', 'nodes_per_layer': 52}



In [None]:
#activation functions

params = {
    'nodes_per_layer': [50],
    'additional_layers': [4],
    'batch_size': [10],
    'epochs': [100],
    'learning_rate': [0.001],
    'network_weights_init': ['glorot_normal'],
    'activation_per_layer': ['relu', 'sigmoid', 'softmax'],
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    n_jobs=1,
    cv=5,
    verbose=1
)

grid.fit(X, y, callbacks=[stop], 
         validation_split=0.2, 
         verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Fitting 5 folds for each of 3 candidates, totalling 15 fits
Epoch 1/100


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 1/

[Parallel(n_jobs=1)]: Done  15 out of  15 | elapsed:  3.4min finished


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100

Best score: 0.791709554195404
Best params: {'activation_per_layer': 'relu', 'additional_layers': 4, 'batch_size': 10, 'epochs': 100, 'learning_rate': 0.001, 'network_weights_init': 'glorot_normal', 'nodes_per_layer': 50}



In [None]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dropout

#Compile the model and dropout rate
def create_model(additional_layers=0, 
                 nodes_per_layer=26, 
                 activation_per_layer='relu',
                 loss_function='binary_crossentropy', 
                 learning_rate=0.001,
                 network_weights_init='uniform', 
                 dropout_rate=0.1):
  
    model6 = Sequential()

    model6.add(Dense(nodes_per_layer, 
                    activation=activation_per_layer, input_dim=26, 
                    kernel_initializer=network_weights_init))
    
    for _ in range(additional_layers):
        model6.add(Dropout(rate=dropout_rate))
        model6.add(Dense(nodes_per_layer, 
                        activation=activation_per_layer, 
                        kernel_initializer=network_weights_init))
    model6.add(Dense(1, activation='sigmoid', 
                    kernel_initializer=network_weights_init))

    optimizer = Adam(learning_rate=learning_rate)
    
    model6.compile(loss=loss_function, 
                  optimizer=optimizer, 
                  metrics=['accuracy'])
    
    return model6

In [None]:
#Dropout

params = {
    'nodes_per_layer': [4],
    'batch_size': [10],
    'epochs': [100],
    'learning_rate': [0.001],
    'network_weights_init': ['glorot_normal'],
    'activation_per_layer': ['relu'],
    'dropout_rate': [0.1, 0.2, 0.3]
}

grid = GridSearchCV(
    estimator=KerasClassifier(build_fn=create_model),
    param_grid=params,
    n_jobs=1,
    cv=5,
    verbose=1
)

grid.fit(X, y, callbacks=[stop], 
         validation_split=0.2, 
         verbose=1)

print()
print('Best score:', grid.best_score_)
print('Best params:', grid.best_params_)
print()

Fitting 5 folds for each of 3 candidates, totalling 15 fits
Epoch 1/100


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Ep

[Parallel(n_jobs=1)]: Done  15 out of  15 | elapsed:  2.8min finished


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100

Best score: 0.7851788878440857
Best params: {'activation_per_layer': 'relu', 'batch_size': 10, 'dropout_rate': 0.3, 'epochs': 100, 'learning_rate': 0.001, 'network_weights_init': 'glorot_normal', 'nodes_per_layer': 4}




- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?