<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn+(1).csv')

In [2]:
#lets get a percentage count of how often people churn to set a baseline
df['Churn'].value_counts(normalize=True)

No     0.73463
Yes    0.26537
Name: Churn, dtype: float64

In [3]:
#converting the target to a 0/1 categorical

df['Churn'] = df['Churn'].replace({'Yes':1, 'No':0})

In [4]:

#digging up unit 2 notes on working with tabular data
#offhand I know understanding how many unique values there is in a dataset is imortant for feature engineering

for col in df.columns: print(col, df[col].nunique())

customerID 7043
gender 2
SeniorCitizen 2
Partner 2
Dependents 2
tenure 73
PhoneService 2
MultipleLines 3
InternetService 3
OnlineSecurity 3
OnlineBackup 3
DeviceProtection 3
TechSupport 3
StreamingTV 3
StreamingMovies 3
Contract 3
PaperlessBilling 2
PaymentMethod 4
MonthlyCharges 1585
TotalCharges 6531
Churn 2


In [5]:
#Dropping Customer ID
X = df.drop(columns=['Churn', 'customerID']).values
y = df['Churn'].values

In [6]:
# !pip install category_encoders

In [7]:
#ordinal encoding will map the many low cardinality fields into a numerical representation
import category_encoders as ce
ord_enc = ce.OrdinalEncoder()

#scalar encoding will transform our inputs to be prepared for a neural network
from sklearn.preprocessing import StandardScaler
scaler  = StandardScaler()

X = ord_enc.fit_transform(X)
X = scaler.fit_transform(X)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=1337)

In [8]:
import pandas as pd
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# fix random seed for reproducibility
seed = 42
numpy.random.seed(seed)

# load dataset

# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(19, input_dim=19, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

In [9]:
# define the grid search parameters
param_grid = {'batch_size': [10, 25, 50],
              'epochs': [20]}

# Create Grid Search
#MAKE SURE JOBS =1, MAKING JOBS =-1 DOES NOT PLAY NICELY WITH THE KERAS WRAPPER IT DOES NOT PARRELIZE WELL
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




Best: 0.8017796346496696 using {'batch_size': 25, 'epochs': 20}
Means: 0.8002650513765442, Stdev: 0.0018325075247239823 with: {'batch_size': 10, 'epochs': 20}
Means: 0.8017796346496696, Stdev: 0.005173583521320478 with: {'batch_size': 25, 'epochs': 20}
Means: 0.8000757090411643, Stdev: 0.0012796621415408743 with: {'batch_size': 50, 'epochs': 20}


In [10]:
# Best: 0.8017796346496696 using {'batch_size': 25, 'epochs': 20}

In [11]:
# define the grid search parameters
param_grid = {'batch_size': [10],
              'epochs': [20,50,100]}

# Create Grid Search
#MAKE SURE JOBS =1, MAKING JOBS =-1 DOES NOT PLAY NICELY WITH THE KERAS WRAPPER IT DOES NOT PARRELIZE WELL
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")



Best: 0.8014009849608102 using {'batch_size': 10, 'epochs': 20}
Means: 0.8014009849608102, Stdev: 0.003401740389408124 with: {'batch_size': 10, 'epochs': 20}
Means: 0.7968572580037989, Stdev: 0.0026085012557913166 with: {'batch_size': 10, 'epochs': 50}
Means: 0.7906096140370591, Stdev: 0.003532912877744876 with: {'batch_size': 10, 'epochs': 100}


In [12]:
#my second pass of models shows an optimal combination of batch size 10 and epochs of 20

updated_model = create_model()

opt = updated_model.fit(X_train, y_train,
                        epochs=20,
                        batch_size=10,
                        validation_split=.1,
                        verbose=True)

Train on 4753 samples, validate on 529 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [14]:
scores = updated_model.evaluate(X_test, y_test)

print('Neural Network ACC: ', scores[1])

Neural Network ACC:  0.7938671


In [15]:
from tensorflow.keras.layers import Dropout
from tensorflow.keras.optimizers import Adam


def create_model_with_drops():
    model = Sequential()
    model.add(Dense(32, activation='relu'))
    model.add(Dropout(0.25))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    
    adam = Adam(lr=0.001)

    
    
    model.compile(loss='binary_crossentropy', 
                  optimizer='adam', 
                  metrics=['accuracy'])

    return model

model_with_drops = KerasClassifier(build_fn=create_model_with_drops, verbose=1)

In [16]:
final_model = model_with_drops.fit(X, y,
                         epochs=20,
                         batch_size=10,
                         validation_split=.1,
                         verbose=1)

Train on 6338 samples, validate on 705 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?