<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [1]:
import pandas as pd
from tensorflow import keras
from keras.optimizers import SGD
from keras.models import Sequential 
from keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

df = pd.read_csv('Customer.csv')

In [2]:
def cat_encode(X):
    
    X = X.copy()

    # import labelencoder
    from sklearn.preprocessing import LabelEncoder
    
    le = LabelEncoder()

    #Categorical boolean mask
    categorical_feature_mask = X.dtypes==object
    categorical_cols = X.columns[categorical_feature_mask].tolist()

    X[categorical_cols] = X[categorical_cols].apply(lambda col: le.fit_transform(col))
    
    return X

In [3]:
df = cat_encode(df)
target = 'Churn'
features = ['gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges']

X = df[features]
y = df[target]

#Identify target in our df and set the predicted value to the mode
y_true = df['Churn']
majority = df['Churn'].mode()[0]

#Create a list of predictions for the length of our df
y_pred = [majority] * len(y_true)

#Use accuracy_score to check our baseline accuracy
from sklearn.metrics import accuracy_score
print("Baseline Accuracy:", accuracy_score(y_true, y_pred))

Baseline Accuracy: 0.7346301292063041


In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, 
    y,
    test_size=0.2)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((5634, 19), (1409, 19), (5634,), (1409,))

In [5]:
batch_size = 20
num_classes = 2
epochs = 10

opt = SGD(lr=0.1, momentum=0.9)

In [6]:
model = Sequential()
model.add(Dense(5, activation='relu', input_shape=(19,)))
model.add(Dense(3, activation='relu'))
model.add(Dense(3, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(
    loss='binary_crossentropy', 
    optimizer=opt, 
    metrics=['accuracy'],
)

history = model.fit(
    X_train, 
    y_train, 
    validation_data=(X_test, y_test), 
    epochs=epochs,
    batch_size=batch_size,
    verbose=0)

In [7]:
score = model.evaluate(X_test, y_test, verbose=0)
print('Test accuracy:', score[1])

Test accuracy: 0.7501774430274963


In [8]:
def create_model():  
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(19,)))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')
    return model

# create model
model = KerasClassifier(
    build_fn=create_model, 
    verbose=0, 
    batch_size=2, 
)

# define the grid search parameters
param_grid = {
    'epochs': [5],
}

In [9]:
# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=8)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7353518724441528 using {'epochs': 5}
Means: 0.7353518724441528, Stdev: 0.015311538562747359 with: {'epochs': 5}


## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?

In [10]:
def create_model_beta(depth, drop, activation):  
    model = Sequential()
    model.add(Dense(depth, activation=activation, input_shape=(19,)))
    model.add(Dropout(drop))
    model.add(Dense(depth, activation=activation))
    model.add(Dropout(drop))
    model.add(Dense(depth, activation=activation))
    model.add(Dropout(drop))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')
    return model

In [11]:
model = KerasClassifier(
    verbose=0,
    build_fn=create_model_beta,
    batch_size=16,
    epochs=100,
    drop=0.15,
    activation='linear',
)

param_distributions = {
#     'batch_size': range(12, 24, 2),
#     'epochs': range(80, 150, 5),
#     'drop': [0.1, 0.15, 0.2, 0.25, 0.3],
#     'activation': ['tanh', 'relu', 'linear', 'softmax', 'softplus', 'softsign', 'sigmoid', 'hard_sigmoid'],
#     'activation': ['relu', 'linear', 'softplus'],
    'depth': range(100, 400, 100),
}

search = RandomizedSearchCV(
    estimator=model,
    param_distributions=param_distributions,
    cv=3,
    n_iter=3,
    n_jobs=8,
    random_state=42,
)

In [12]:
# Fit RandomizedSearchCV Model
search_result = search.fit(X_train, y_train)

# Report Results
print(f"Best: {search_result.best_score_} using {search_result.best_params_}")
means = search_result.cv_results_['mean_test_score']
stds = search_result.cv_results_['std_test_score']
params = search_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7777777910232544 using {'depth': 200}
Means: 0.7722754875818888, Stdev: 0.018121748892126432 with: {'depth': 100}
Means: 0.7777777910232544, Stdev: 0.02435224213138495 with: {'depth': 200}
Means: 0.7538161277770996, Stdev: 0.03009134947121012 with: {'depth': 300}


### Take 2

In [13]:
def create_model(depth, drop, activation, input_shape, output_size, n_layers):
    model = Sequential()
    model.add(Dense(depth, activation=activation, input_shape=input_shape))
    model.add(Dropout(drop))

    for _ in range(n_layers):
        model.add(Dense(depth, activation=activation))
        model.add(Dropout(drop))

    if output_size == 1:
        model.add(Dense(output_size, activation='sigmoid'))
        model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')
    else:
        model.add(Dense(output_size, activation='softmax'))
        model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

    return model

It can take a while to search all these params at once!

In [17]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, stratify=y)

classifier = KerasClassifier(
    build_fn=create_model,
    input_shape=X_train.columns.shape,
    output_size=1,
    verbose=1,
#     batch_size=16,
#     epochs=180,
#     drop=0.15,
#     activation='linear',
#     depth=160,
#     n_layers=4,
)

param_distributions = {
    'batch_size': range(2, 24, 2),
    'epochs': range(20, 200, 20),
    'drop': [0.1, 0.15, 0.2, 0.25, 0.3],
    'activation': ['tanh', 'relu', 'linear', 'softmax', 'softplus', 'softsign', 'sigmoid', 'hard_sigmoid'],
    'depth': range(32, 256, 32),
    'n_layers': range(1, 10),
}

search = RandomizedSearchCV(
    estimator=classifier,
    param_distributions=param_distributions,
    cv=3,
    n_iter=9,
    n_jobs=8,
    random_state=42,
)

search_result = search.fit(X_train, y_train)

print(f"Best: {search_result.best_score_} using {search_result.best_params_}")
means = search_result.cv_results_['mean_test_score']
stds = search_result.cv_results_['std_test_score']
params = search_result.cv_results_['params']

Epoch 1/180
Epoch 2/180
Epoch 3/180
Epoch 4/180
Epoch 5/180
Epoch 6/180
Epoch 7/180
Epoch 8/180
Epoch 9/180
Epoch 10/180
Epoch 11/180
Epoch 12/180
Epoch 13/180
Epoch 14/180
Epoch 15/180
Epoch 16/180
Epoch 17/180
Epoch 18/180
Epoch 19/180
Epoch 20/180
Epoch 21/180
Epoch 22/180
Epoch 23/180
Epoch 24/180
Epoch 25/180
Epoch 26/180
Epoch 27/180
Epoch 28/180
Epoch 29/180
Epoch 30/180
Epoch 31/180
Epoch 32/180
Epoch 33/180
Epoch 34/180
Epoch 35/180
Epoch 36/180
Epoch 37/180
Epoch 38/180
Epoch 39/180
Epoch 40/180
Epoch 41/180
Epoch 42/180
Epoch 43/180
Epoch 44/180
Epoch 45/180
Epoch 46/180
Epoch 47/180
Epoch 48/180
Epoch 49/180
Epoch 50/180
Epoch 51/180
Epoch 52/180
Epoch 53/180
Epoch 54/180
Epoch 55/180
Epoch 56/180
Epoch 57/180
Epoch 58/180
Epoch 59/180
Epoch 60/180
Epoch 61/180
Epoch 62/180
Epoch 63/180
Epoch 64/180
Epoch 65/180
Epoch 66/180
Epoch 67/180
Epoch 68/180
Epoch 69/180
Epoch 70/180
Epoch 71/180
Epoch 72/180
Epoch 73/180
Epoch 74/180
Epoch 75/180
Epoch 76/180
Epoch 77/180
Epoch 78

Epoch 81/180
Epoch 82/180
Epoch 83/180
Epoch 84/180
Epoch 85/180
Epoch 86/180
Epoch 87/180
Epoch 88/180
Epoch 89/180
Epoch 90/180
Epoch 91/180
Epoch 92/180
Epoch 93/180
Epoch 94/180
Epoch 95/180
Epoch 96/180
Epoch 97/180
Epoch 98/180
Epoch 99/180
Epoch 100/180
Epoch 101/180
Epoch 102/180
Epoch 103/180
Epoch 104/180
Epoch 105/180
Epoch 106/180
Epoch 107/180
Epoch 108/180
Epoch 109/180
Epoch 110/180
Epoch 111/180
Epoch 112/180
Epoch 113/180
Epoch 114/180
Epoch 115/180
Epoch 116/180
Epoch 117/180
Epoch 118/180
Epoch 119/180
Epoch 120/180
Epoch 121/180
Epoch 122/180
Epoch 123/180
Epoch 124/180
Epoch 125/180
Epoch 126/180
Epoch 127/180
Epoch 128/180
Epoch 129/180
Epoch 130/180
Epoch 131/180
Epoch 132/180
Epoch 133/180
Epoch 134/180
Epoch 135/180
Epoch 136/180
Epoch 137/180
Epoch 138/180
Epoch 139/180
Epoch 140/180
Epoch 141/180
Epoch 142/180
Epoch 143/180
Epoch 144/180
Epoch 145/180
Epoch 146/180
Epoch 147/180
Epoch 148/180
Epoch 149/180
Epoch 150/180
Epoch 151/180
Epoch 152/180
Epoch 153/1

Epoch 159/180
Epoch 160/180
Epoch 161/180
Epoch 162/180
Epoch 163/180
Epoch 164/180
Epoch 165/180
Epoch 166/180
Epoch 167/180
Epoch 168/180
Epoch 169/180
Epoch 170/180
Epoch 171/180
Epoch 172/180
Epoch 173/180
Epoch 174/180
Epoch 175/180
Epoch 176/180
Epoch 177/180
Epoch 178/180
Epoch 179/180
Epoch 180/180
Best: 0.8008835315704346 using {'n_layers': 2}
