# Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


In [1]:
##### Your Code Here #####
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import keras
from keras.models import Sequential
from keras.layers import Dense



Using TensorFlow backend.


### First pass with cross-validation

In [36]:
df = pd.read_csv('Telco-Customer-Churn.csv')
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [37]:
df.shape

(7043, 21)

In [38]:
df.isna().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

In [39]:
df.dtypes

customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

In [40]:
# Dropping customerID feature as it holds no informational value

df_1 = df.drop('customerID', axis=1)
df_1.head()


Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [41]:
# User LabelEncoder on the categorical data

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

df_1 = df_1.apply(LabelEncoder().fit_transform)
df_1.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,0,0,1,0,1,0,1,0,0,2,0,0,0,0,0,1,2,142,2505,0
1,1,0,0,0,34,1,0,0,2,0,2,0,0,0,1,0,3,498,1466,0
2,1,0,0,0,2,1,0,0,2,2,0,0,0,0,0,1,3,436,157,1
3,1,0,0,0,45,0,1,0,2,0,2,2,0,0,1,0,0,266,1400,0
4,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,2,729,925,1


In [42]:
df_1.dtypes

gender              int64
SeniorCitizen       int64
Partner             int64
Dependents          int64
tenure              int64
PhoneService        int64
MultipleLines       int64
InternetService     int64
OnlineSecurity      int64
OnlineBackup        int64
DeviceProtection    int64
TechSupport         int64
StreamingTV         int64
StreamingMovies     int64
Contract            int64
PaperlessBilling    int64
PaymentMethod       int64
MonthlyCharges      int64
TotalCharges        int64
Churn               int64
dtype: object

In [43]:
# Separate our target from DataFrame

churn = df_1.pop('Churn')
df_1.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges
0,0,0,1,0,1,0,1,0,0,2,0,0,0,0,0,1,2,142,2505
1,1,0,0,0,34,1,0,0,2,0,2,0,0,0,1,0,3,498,1466
2,1,0,0,0,2,1,0,0,2,2,0,0,0,0,0,1,3,436,157
3,1,0,0,0,45,0,1,0,2,0,2,2,0,0,1,0,0,266,1400
4,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,2,729,925


In [44]:

y = churn.values

X = df_1.values
X

array([[   0,    0,    1, ...,    2,  142, 2505],
       [   1,    0,    0, ...,    3,  498, 1466],
       [   1,    0,    0, ...,    3,  436,  157],
       ...,
       [   0,    0,    1, ...,    2,  137, 2994],
       [   1,    1,    1, ...,    3,  795, 2660],
       [   1,    0,    0, ...,    0, 1388, 5407]])

In [45]:
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from keras.wrappers.scikit_learn import KerasClassifier

# fix random seed for reproducibility
seed= 42
np.random.seed(seed)

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

inputs = X.shape[1]
epochs = 100
batch_size = 10

# baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=inputs, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# evaluate model with standardized dataset using a pipeline
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=baseline_model, epochs=epochs, batch_size=batch_size, verbose=0)))
pipeline = Pipeline(estimators)
results = cross_val_score(pipeline, X, y, cv=kfold)
print("K-Fold Cross-Validation results -> Mean: {:.2f}, Standard Deviation: {:.2f}".format(results.mean(), results.std()))




K-Fold Cross-Validation results -> Mean: 0.79, Standard Deviation: 0.01


## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?

### Hyperparameter Tuning: Batch Size

In [56]:
"""
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from keras.wrappers.scikit_learn import KerasClassifier

# fix random seed for reproducibility
seed= 42
np.random.seed(seed)

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

inputs = X.shape[1]
epochs = 100
batch_size = 10

# baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=inputs, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# evaluate model with standardized dataset using a pipeline
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=baseline_model, epochs=epochs, batch_size=batch_size, verbose=0)))
pipeline = Pipeline(estimators)
results = cross_val_score(pipeline, X, y, cv=kfold)
print("K-Fold Cross-Validation results -> Mean: {:.2f}, Standard Deviation: {:.2f}".format(results.mean(), results.std()))

"""

from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import make_pipeline
"""
Using the previous baseline_model function to create model in order to tune Batch Size. 
So, in this instance we're using thr pipeline from previously as shown in the above doctring
to do our GridSearch.

Because the pipeline already specifies KerasClassifier, we don't need to specify or instantiate it here.
"""
#model = KerasClassifier(build_fn=baseline_model, verbose=1)
"""
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=baseline_model, verbose=1)))
"""

# pipeline interface throws errors that I don't understand
pipe = make_pipeline(StandardScaler(), KerasClassifier(build_fn=baseline_model, verbose=1))
# scaler = StandardScaler()
# X = scaler.fit_transform(X)

# model = KerasClassifier(build_fn=baseline_model, verbose=1)
# Define the grid search parameters
param_grid = {'kerasclassifier__batch_size': [10, 20, 40, 60, 80, 100],
             'kerasclassifier__epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=kfold, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report results
print("Best: {:.2f} using {}".format(grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Best: 0.80 using {'kerasclassifier__batch_size': 60, 'kerasclassifier__epochs': 20}
Means: 0.7961096106969581, Stdev: 0.009450144792409998 with: {'kerasclassifier__batch_size': 10, 'kerasclassifier__epochs': 20}
Means: 0.7973874776214086, Stdev: 0.00798071910696262 with: {'kerasclassifier__batch_size': 20, 'kerasclassifier__epochs': 20}
Means: 0.7946897609894306, Stdev: 0.00736092455805145 with: {'kerasclassifier__batch_size': 40, 'kerasclassifier__epochs': 20}
Means: 0.7995172537610423, Stdev: 0.00849223869453875 with: {'kerasclassifier__batch_size': 60, 'kerasclassifier__epochs': 20}
Means: 0.7949737326319916, Stdev: 0.010953429301452999 with: {'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 20}
Means: 0.796535563319985, Stdev: 0.0077159250782

### Hyperparameter Tuning: Epochs

In [58]:
param_grid = {'kerasclassifier__batch_size': [60, 80],
             'kerasclassifier__epochs': [20, 40, 60]}

pipe = make_pipeline(StandardScaler(), KerasClassifier(build_fn=baseline_model, verbose=1))

# Create Grid Search
grid = GridSearchCV(estimator=pipe, param_grid=param_grid, cv=kfold, n_jobs=-1)
grid_result = grid.fit(X, y)

# Report results
print("Best: {:.2f} using {}".format(grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60
Epoch 59/60
Epoch 60/60
Best: 0.80 using {'kerasclassifier__batch_size': 80, 'kerasclassifier__epochs': 60}
Means: 0.7968195419868048, Stdev: 0.006285406386007481 with: {'kerasclassifier__batch_size': 60, 'kerasclassifier__epochs': 20}
Means: 0.798239387928314, Stdev: 0.010125809674158513 with: {'kerasclassifier