<a href="https://colab.research.google.com/github/ShreyasJothish/DS-Unit-4-Sprint-3-Neural-Networks/blob/master/module4-Hyperparameter-Tuning/LS_DS_434_Hyperparameter_Tuning_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [0]:
##### Your Code Here #####
# generic imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [0]:
#from google.colab import files
#files.upload()

In [0]:
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.head(20)
df.shape

# Customer ID is unique across so dropping it
df.drop(columns='customerID', inplace=True)
df['TotalCharges'] = df['TotalCharges'].apply(lambda x: 0.0 if len(x) < 2 else float(x))

# Use the values to get np.arrary instead of passing the dataframe
X = df.drop(columns='Churn').values
y = df['Churn'].values

In [4]:
for column in df.columns:
  print(column, df[column].dtype, df[column].nunique())

gender object 2
SeniorCitizen int64 2
Partner object 2
Dependents object 2
tenure int64 73
PhoneService object 2
MultipleLines object 3
InternetService object 3
OnlineSecurity object 3
OnlineBackup object 3
DeviceProtection object 3
TechSupport object 3
StreamingTV object 3
StreamingMovies object 3
Contract object 3
PaperlessBilling object 2
PaymentMethod object 4
MonthlyCharges float64 1585
TotalCharges float64 6531
Churn object 2


In [5]:
# Use category encoder to update object columns
!pip install category_encoders



In [6]:
# Imports for pipeline
from sklearn.pipeline import make_pipeline

import category_encoders as ce
from sklearn.preprocessing import RobustScaler

from sklearn.model_selection import GridSearchCV

# Keras imports
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier

# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=44, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

# Create pipeline
pipeline = make_pipeline(\
                         ce.BinaryEncoder(),
                         RobustScaler(), 
                         KerasClassifier(build_fn=create_model, verbose=1))

# Model validation.
param_grid = {
}

gridsearch = GridSearchCV(pipeline, param_grid=param_grid, cv=3, 
                         scoring='accuracy', verbose=10)

gridsearch.fit(X, y)

# Interpret the results.

# Best cross validation score
print('Cross Validation Score:', gridsearch.best_score_)

# Best parameters which resulted in the best score
print('Best Parameters:', gridsearch.best_params_)

Using TensorFlow backend.


Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV]  ................................................................
Instructions for updating:
Colocations handled automatically by placer.


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Instructions for updating:
Use tf.cast instead.
Epoch 1/1
[CV] ....................... , score=0.7904599659284497, total=   0.9s
[CV]  ................................................................


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s remaining:    0.0s


Epoch 1/1
[CV] ....................... , score=0.7781090289608177, total=   1.0s
[CV]  ................................................................


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    2.3s remaining:    0.0s


Epoch 1/1
[CV] ....................... , score=0.7869620792501065, total=   1.0s


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    3.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    3.4s finished


Epoch 1/1
Cross Validation Score: 0.7851767712622462
Best Parameters: {}


Without any hyper parameter tuning:

Cross Validation Score: 0.7755217946897629

 Hyper parameter tuning:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer

In [7]:
from keras.optimizers import Adam
from keras.layers import advanced_activations

# fix random seed for reproducibility
np.random.seed(42)

# Function to create model, required for KerasClassifier
def create_model(activation_hidden='relu', learning_rate=0.001, rate=0.1,
                units=12):
                 #optimizer='adam'):
  # create model
  model = Sequential()
  model.add(Dense(units, input_dim=44, activation=activation_hidden))
  model.add(Dropout(rate))
  model.add(Dense(1, activation='sigmoid'))
  model.add(Dropout(rate))
  # Compile model
  optimizer = Adam(lr=learning_rate)
  model.compile(loss='binary_crossentropy', optimizer=optimizer, \
                metrics=['accuracy'])
  return model

# Create pipeline
pipeline = make_pipeline(\
                         ce.BinaryEncoder(),
                         RobustScaler(), 
                         KerasClassifier(build_fn=create_model, verbose=1))

# Model validation.
param_grid = {
    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [50],
    #'kerasclassifier__optimizer': ['sgd', 'rmsprop', 'adagrad', 'adadelta',\
    #'adam', 'adamax', 'nadam'],
    'kerasclassifier__learning_rate': [0.001],
    'kerasclassifier__activation_hidden': ['exponential'],
    'kerasclassifier__rate': [0.1],
    'kerasclassifier__units': [64]
}

gridsearch = GridSearchCV(pipeline, param_grid=param_grid, cv=3, 
                         scoring='accuracy', verbose=10)

gridsearch.fit(X, y)

# Interpret the results.

# Best cross validation score
print('Cross Validation Score:', gridsearch.best_score_)

# Best parameters which resulted in the best score
print('Best Parameters:', gridsearch.best_params_)

Fitting 3 folds for each of 1 candidates, totalling 3 fits
[CV] kerasclassifier__activation_hidden=exponential, kerasclassifier__batch_size=80, kerasclassifier__epochs=50, kerasclassifier__learning_rate=0.001, kerasclassifier__rate=0.1, kerasclassifier__units=64 
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
[CV]  kerasclassifier__activation_hidden=exponential, kerasclassifier__batch_size=80, kerasclassifier__epochs=50, kerasclassifier__learning_rate=0.001, kerasclassifier__rate=0.1, kerasclassifier__units=64, score=0.7977001703577513, total=   4.7s
[CV] kerasclassifier__activation_hidden=exponential, kerasclassifier__batch_size=80, kerasclassifier__epochs=50, kerasclassifier__learning_rate=0.001, kerasclassif

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    4.8s remaining:    0.0s


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
[CV]  kerasclassifier__activation_hidden=exponential, kerasclassifier__batch_size=80, kerasclassifier__epochs=50, kerasclassifier__learning_rate=0.001, kerasclassifier__rate=0.1, kerasclassifier__units=64, score=0.8019591141396933, total=   4.7s
[CV] kerasclassifier__activation_hidden=exponential, kerasclassifier__batch_size=80, kerasclassifier__epochs=50, kerasclassifier__learning_rate=0.001, kerasclassif

[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    9.8s remaining:    0.0s


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
[CV]  kerasclassifier__activation_hidden=exponential, kerasclassifier__batch_size=80, kerasclassifier__epochs=50, kerasclassifier__learning_rate=0.001, kerasclassifier__rate=0.1, kerasclassifier__units=64, score=0.8010225820195995, total=   4.7s


[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   14.7s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   14.7s finished


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Cross Validation Score: 0.8002271759193526
Best Parameters: {'kerasclassifier__activation_hidden': 'exponential', 'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 50, 'kerasclassifier__learning_rate': 0.001, 'kerasclassifier__rate': 0.1, 'kerasclassifier__units': 64}


1:
param_grid = {

    'kerasclassifier__batch_size': [20, 50, 80, 100, 200],
    'kerasclassifier__epochs': [20],
}

Cross Validation Score: 0.8046287093568082

Best Parameters: {'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20}

2: param_grid = {

    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [20],
    'kerasclassifier__optimizer': ['sgd', 'rmsprop', 'adagrad', 'adadelta', 'adam', 'adamax', 'nadam']
}


Cross Validation Score: 0.8054806190543803

Best Parameters: {'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20, 'kerasclassifier__optimizer': 'adam'}

3: param_grid = {

    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [20],
    'kerasclassifier__learning_rate': [0.001, 0.01, 0.1, 0.2, 0.5],
}

Cross Validation Score: 0.8007951157177339

Best Parameters: {'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20, 'kerasclassifier__learning_rate': 0.001}

4: param_grid = {

    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [20],
    'kerasclassifier__learning_rate': [0.001, 0.002, 0.005, 0.01],
}


Cross Validation Score: 0.8009371006673293

Best Parameters: {'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20, 'kerasclassifier__learning_rate': 0.001}

5: param_grid = {

    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [20],
    'kerasclassifier__learning_rate': [0.001],
    'kerasclassifier__activation_hidden': ['linear', 'exponential', 
                                           'hard_sigmoid', 'sigmoid', 'tanh',
                                           'relu', 'softsign', 'softplus', 
                                           'selu', 'elu', 'softmax'],
}

Cross Validation Score: 0.8039187846088315

Best Parameters: {'kerasclassifier__activation_hidden': 'exponential', 'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20, 'kerasclassifier__learning_rate': 0.001}


6: param_grid = {

    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [20],
    'kerasclassifier__learning_rate': [0.001],
    'kerasclassifier__activation_hidden': ['exponential'],
    'kerasclassifier__rate': [0.1, 0.2, 0.3, 0.4]
    
}

Cross Validation Score: 0.7985233565242085

Best Parameters: {'kerasclassifier__activation_hidden': 'exponential', 'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20, 'kerasclassifier__learning_rate': 0.001, 'kerasclassifier__rate': 0.1}

7: param_grid = {

    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [20],
    'kerasclassifier__learning_rate': [0.001],
    'kerasclassifier__activation_hidden': ['exponential'],
    'kerasclassifier__rate': [0.1],
    'kerasclassifier__units': [12, 16, 32, 64]
    
}

Cross Validation Score: 0.802072980264092

Best Parameters: {'kerasclassifier__activation_hidden': 'exponential', 'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20, 'kerasclassifier__learning_rate': 0.001, 'kerasclassifier__rate': 0.1, 'kerasclassifier__units': 64}

8: param_grid = {

    'kerasclassifier__batch_size': [80],
    'kerasclassifier__epochs': [20, 50, 100, 200],
    'kerasclassifier__learning_rate': [0.001],
    'kerasclassifier__activation_hidden': ['exponential'],
    'kerasclassifier__rate': [0.1],
    'kerasclassifier__units': [64]
}

Cross Validation Score: 0.80335084481045

Best Parameters: {'kerasclassifier__activation_hidden': 'exponential', 'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 50, 'kerasclassifier__learning_rate': 0.001, 'kerasclassifier__rate': 0.1, 'kerasclassifier__units': 64}

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?