<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [52]:
import numpy as np
import pandas as pd
import tensorflow.keras as keras
from tensorflow.keras.optimizers import Adam, Nadam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.preprocessing import OrdinalEncoder
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split as tts
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

In [2]:
# load dataframe
df0 = pd.read_csv('dataset.csv')
df0.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [3]:
# droping customerID
df0.drop(columns=['customerID'], inplace=True)

In [4]:
# creates encoder, scaler
encoder = OrdinalEncoder()
scaler = MinMaxScaler()

# sets X and y
X = np.array(df0.drop(columns=['Churn']))
y = np.array(df0['Churn'].replace({'No': 0, 'Yes': 1}))

In [5]:
# encodes features
X_enc = encoder.fit_transform(X, y)

# scales encoded features
X_scal = scaler.fit_transform(X_enc, y)

In [6]:
# creating train test split
X_train, X_test, y_train, y_test = tts(X_scal, y, test_size=0.1)

In [7]:
# checks number of features
len(X_train[1])

19

In [128]:
# initialize model
mod0 = [] # used to clear model incase cell is rerun
mod0 = Sequential()

# add input layer
mod0.add(Dense(10, input_shape=(19,), activation='sigmoid'))

# add hidden layer 
mod0.add(Dense(8, activation='sigmoid'))

# add output layer
mod0.add(Dense(1, activation='sigmoid'))

# compile model
mod0.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# get model summary
mod0.summary()

# fit model and save output to history var
history0 = mod.fit(X_train, 
                   y_train, 
                   validation_data=(X_test, y_test),
                   epochs=100, 
                   verbose=0)

# gets score
score = mod0.evaluate(X_train, y_train, verbose=False)

# prints score
print(f'\nloss: {score[0]}  ---   accuracy: {score[1]}')

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_21 (Dense)             (None, 10)                200       
_________________________________________________________________
dense_22 (Dense)             (None, 8)                 88        
_________________________________________________________________
dense_23 (Dense)             (None, 1)                 9         
Total params: 297
Trainable params: 297
Non-trainable params: 0
_________________________________________________________________
loss: 0.5802153913746801  ---   accuracy: 0.7343010306358337


In [50]:
# defines function for building model for keras classifier
def create_model(optimizer, input_act, hidden_nodes, dropout_rate, weight_func, learning_rate):
    mod = Sequential()
    mod.add(Dense(hidden_nodes, input_shape=(19,), kernel_initializer=weight_func, activation=input_act, name='input'))
    mod.add(Dropout(dropout_rate))
    mod.add(Dense(1, activation='sigmoid', name='output'))
    optimizer_updated = optimizer(lr=learning_rate)
    mod.compile(loss='binary_crossentropy', optimizer=optimizer_updated, metrics=['accuracy'])
    return(mod)

# create model
mod = KerasClassifier(build_fn=create_model, verbose=0)

param_grid = {'batch_size': [10],
              'epochs': [1],
              'optimizer': ['Adam'],
              'input_act': ['sigmoid'],
              'hidden_nodes': [5],
              'dropout_rate': [0],
              'weight_func': ['RandomUniform', 'Ones'],
              'learning_rate': [.01, .1],
              }

# Create Grid Search
grid = GridSearchCV(estimator=mod, param_grid=param_grid, n_jobs=1, cv=3)

# fits grid
grid_fit = grid.fit(X_train, y_train)

# gets best score and params
best_score = grid_fit.best_score_
best_params = grid_fit.best_params_

print(f'Best Params were: {best_params}  --- Which gave an accuracy score of {best_score}')

Best Params were: {'batch_size': 10, 'dropout_rate': 0, 'epochs': 1, 'hidden_nodes': 5, 'input_act': 'sigmoid', 'learning_rate': 0.01, 'optimizer': <class 'tensorflow.python.keras.optimizer_v2.adam.Adam'>, 'weight_func': 'RandomUniform'}  --- Which gave an accuracy score of 0.7868412829190258


In [None]:
param_grid2 = {'batch_size': [10, 50, 100],
              'epochs': [10, 15],
              'optimizer': [Adam, Nadam],
              'input_act': ['sigmoid', 'relu'],
              'hidden_nodes': [5, 10],
              'dropout_rate': [0, .2,.5],
              'weight_func': ['RandomUniform', 'Ones'],
              'learning_rate': [.01, .1]
              }

# Create Random Search
grid2 = RandomizedSearchCV(estimator=mod, param_distributions=param_grid2, n_jobs=1, cv=3, n_iter=20)

# fits grid
grid_fit2 = grid2.fit(X_train, y_train)

# gets best score and params
best_score2 = grid_fit2.best_score_
best_params2 = grid_fit2.best_params_

print(f'Best Params were: {best_params2}  --- Which gave an accuracy score of {best_score2}')

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?