# Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


## need to download these each time aws instance is started

In [1]:
#!conda install -c conda-forge category_encoders -y

In [2]:
#!wget https://raw.githubusercontent.com/treselle-systems/customer_churn_analysis/master/WA_Fn-UseC_-Telco-Customer-Churn.csv

In [3]:
#!pip install h5py scikit-optimize

## Imports I know I'll need

In [4]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
import category_encoders as ce

Using TensorFlow backend.


## Load and Transform data

In [5]:
df = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")

In [6]:
 pd.set_option("display.max_columns", 21)

In [7]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [8]:
df = df.replace("No",0).replace("Yes",1)
df = df.replace("Male",1).replace("Female",0)
df = df.replace(" ",0)

In [9]:
from sklearn.pipeline import make_pipeline
pipe=make_pipeline(
   ce.BinaryEncoder(cols=["Partner","gender","SeniorCitizen","Dependents","PhoneService","PaperlessBilling"]),
   ce.OneHotEncoder(cols=["MultipleLines","InternetService","OnlineSecurity","OnlineBackup","DeviceProtection",
                                        "TechSupport","StreamingTV","StreamingMovies","Contract","PaymentMethod"])
)
df_enc = pipe.fit_transform(df)

In [10]:
df_enc["TotalCharges"] = pd.to_numeric(df_enc["TotalCharges"])

In [11]:
X = df_enc.drop(columns =["customerID","Churn"],axis=1)
y = df_enc["Churn"]

In [12]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X,y,random_state=42,test_size=.15,stratify=y)

In [27]:
import os
import skopt
import tensorflow
from tensorflow.python.keras import backend as K
from keras import optimizers
from tensorflow.python.keras.models import load_model
from skopt import gp_minimize, forest_minimize,gbrt_minimize
from skopt.space import Real, Categorical, Integer
from skopt.plots import plot_convergence
from skopt.plots import plot_objective, plot_evaluations
from keras.regularizers import l1,l2,l1_l2
from skopt.utils import use_named_args


In [28]:
dim_learning_rate = Real(low=1e-6, high=1e-2, prior='log-uniform',
                         name='learning_rate')
dim_num_dense_layers = Integer(low=1, high=5, name='num_dense_layers')
dim_num_dense_nodes = Integer(low=16, high=72, name='num_dense_nodes')
dim_activation = Categorical(categories=['relu', 'sigmoid'],
                             name='activation')
dim_batch_size = Integer(low=28, high=128, name='batch_size')
dim_adam_decay = Real(low=1e-6,high=1e-2,name="adam_decay")

dimensions = [dim_learning_rate,
              dim_num_dense_layers,
              dim_num_dense_nodes,
              dim_activation,
              dim_batch_size,
              dim_adam_decay
             ]
default_parameters = [1e-3, 1, 16, 'relu',128, 1e-3]


In [29]:
dim_num_dense_nodes

Integer(low=16, high=72)

In [30]:
input_shape = X_train.shape[1]

In [37]:
def create_model(learning_rate, num_dense_layers,
                 num_dense_nodes, activation,adam_decay
                ):
    """
    Hyper-parameters:
    learning_rate:     Learning-rate for the optimizer.
    num_dense_layers:  Number of dense layers.
    num_dense_nodes:   Number of nodes in each dense layer.
    activation:        Activation function for all layers.
    """
    
    # Start construction of a Keras Sequential model.
    model = Sequential()

    # Add an input layer which is similar to a feed_dict in TensorFlow.
    # Note that the input-shape must be a tuple containing the image-size.
    model.add(Dense(num_dense_nodes, activation=activation, input_shape=(input_shape,) ))

    

    # Add fully-connected / dense layers.
    # The number of layers is a hyper-parameter we want to optimize.
    for i in range(num_dense_layers):
        # Name of the layer. This is not really necessary
        # because Keras should give them unique names.
        name = 'layer_dense_{0}'.format(i+1)

        # Add the dense / fully-connected layer to the model.
        # This has two hyper-parameters we want to optimize:
        # The number of nodes and the activation function.
        model.add(Dense(num_dense_nodes,
                        activation=activation,
                        name=name,
                        ))
        

    # Last fully-connected / dense layer with softmax-activation
    # for use in classification.
    model.add(Dense(1, activation='sigmoid'))
    
    # Use the Adam method for training the network.
    # We want to find the best learning-rate for the Adam method.
    adam = optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=None, decay=adam_decay, amsgrad=False)
    
    # In Keras we need to compile the model so it can be trained.
    model.compile(optimizer=adam,
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    
    return model

In [38]:
path_best_model = 'best_model.h5'
best_accuracy = 0.0

In [39]:
@use_named_args(dimensions=dimensions)
def fitness(learning_rate, num_dense_layers,
            num_dense_nodes, activation, batch_size, adam_decay
           ):
    """
    Hyper-parameters:
    learning_rate:     Learning-rate for the optimizer.
    num_dense_layers:  Number of dense layers.
    num_dense_nodes:   Number of nodes in each dense layer.
    activation:        Activation function for all layers.
    """

    # Print the hyper-parameters.
    print('learning rate: {0:.1e}'.format(learning_rate))
    print('num_dense_layers:', num_dense_layers)
    print('num_dense_nodes:', num_dense_nodes)
    print('activation:', activation)
    print()
    
    # Create the neural network with these hyper-parameters.
    model = create_model(learning_rate=learning_rate,
                         num_dense_layers=num_dense_layers,
                         num_dense_nodes=num_dense_nodes,
                         activation=activation,adam_decay=adam_decay
                        )
    

    # Use Keras to train the model.
    history = model.fit(x=X_train,
                        y=y_train,
                        epochs=3,
                        batch_size=batch_size,
                        validation_split=0.3,
                        )

    # Get the classification accuracy on the validation-set
    # after the last training-epoch.
    accuracy = history.history['val_acc'][-1]

    # Print the classification accuracy.
    print()
    print("Accuracy: {0:.2%}".format(accuracy))
    print()


    # Delete the Keras model with these hyper-parameters from memory.
    del model
    
    # Clear the Keras session, otherwise it will keep adding new
    # models to the same TensorFlow graph each time we create
    # a model with a different set of hyper-parameters.
    K.clear_session()
    tensorflow.reset_default_graph()

    
    # NOTE: Scikit-optimize does minimization so it tries to
    # find a set of hyper-parameters with the LOWEST fitness-value.
    # Because we are interested in the HIGHEST classification
    # accuracy, we need to negate this number so it can be minimized.
    return -accuracy

In [40]:
os.path.isfile('best_model.h5')
#os.remove('best_model.h5')

False

In [41]:
K.clear_session()
tensorflow.reset_default_graph()


In [42]:
#fitness(x=default_parameters)

In [43]:
search_result = gbrt_minimize(func=fitness,
                            dimensions=dimensions,
                            acq_func='EI', # Expected Improvement.
                            n_calls=40,
                            n_jobs=-1,
                            x0=default_parameters)

learning rate: 1.0e-03
num_dense_layers: 1
num_dense_nodes: 16
activation: relu

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.45%

learning rate: 4.5e-04
num_dense_layers: 3
num_dense_nodes: 56
activation: relu



Exception ignored in: <bound method BaseSession._Callable.__del__ of <tensorflow.python.client.session.BaseSession._Callable object at 0x7fb1be5333c8>>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1455, in __del__
    self._session._session, self._handle, status)
  File "/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No such callable handle: 94920658202256


Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.56%

learning rate: 4.7e-03
num_dense_layers: 1
num_dense_nodes: 30
activation: relu

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.51%

learning rate: 1.9e-05
num_dense_layers: 2
num_dense_nodes: 63
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 79.79%

learning rate: 5.5e-03
num_dense_layers: 3
num_dense_nodes: 55
activation: relu

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.56%

learning rate: 6.8e-06
num_dense_layers: 5
num_dense_nodes: 53
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 79.40%

learning rate: 3.2e-06
num_dense_layers: 1
num_dense_nodes: 55
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.90%

lear

learning rate: 4.1e-05
num_dense_layers: 5
num_dense_nodes: 19
activation: relu

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.95%

learning rate: 8.3e-03
num_dense_layers: 4
num_dense_nodes: 52
activation: relu

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 80.01%

learning rate: 3.0e-03
num_dense_layers: 5
num_dense_nodes: 47
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 77.56%

learning rate: 1.6e-04
num_dense_layers: 4
num_dense_nodes: 28
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.84%

learning rate: 1.5e-06
num_dense_layers: 4
num_dense_nodes: 26
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 76.06%

learning rate: 9.6e-06
num_dense_layers: 1
num_dense_nodes: 54
activation: sigmoid

Train on 4190 sam


Accuracy: 79.01%

learning rate: 1.7e-04
num_dense_layers: 2
num_dense_nodes: 18
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 79.68%

learning rate: 9.5e-03
num_dense_layers: 2
num_dense_nodes: 50
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.23%

learning rate: 8.4e-04
num_dense_layers: 1
num_dense_nodes: 23
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 78.90%

learning rate: 1.6e-04
num_dense_layers: 4
num_dense_nodes: 31
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 77.56%

learning rate: 1.3e-05
num_dense_layers: 2
num_dense_nodes: 71
activation: sigmoid

Train on 4190 samples, validate on 1796 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3

Accuracy: 79.34%

learning rate: 1.7e-04
num_dense_layers: 2
num_dense_nodes: 32
activation: r

In [46]:
dir(search_result)

['fun',
 'func_vals',
 'models',
 'random_state',
 'space',
 'specs',
 'x',
 'x_iters']

In [56]:
search_result.x_iters

[[0.001, 1, 16, 'relu', 128, 0.001],
 [0.0004543790298067931, 3, 56, 'relu', 77, 0.00642928687944211],
 [0.004682524930126848, 1, 30, 'relu', 39, 0.007556750271075004],
 [1.9234539726165893e-05, 2, 63, 'sigmoid', 80, 0.000435220567199416],
 [0.005457179440167175, 3, 55, 'relu', 103, 0.00911446307953481],
 [6.755465856583113e-06, 5, 53, 'sigmoid', 122, 0.005595517987302317],
 [3.190267815272856e-06, 1, 55, 'sigmoid', 111, 0.0053740120058309564],
 [2.943208606262637e-05, 1, 30, 'relu', 34, 0.0035065244736867097],
 [0.0007143831479586419, 3, 23, 'relu', 70, 0.00869618195525584],
 [0.00016194786750667762, 4, 36, 'sigmoid', 39, 0.0026077482236392053],
 [0.0041556010783669025, 3, 29, 'sigmoid', 61, 0.009083405089553798],
 [0.00128461673929366, 4, 30, 'sigmoid', 63, 0.00039036299893700706],
 [9.661521077934171e-06, 4, 71, 'relu', 41, 0.0003818069334530619],
 [0.0007227146510068309, 4, 40, 'relu', 38, 0.002664201608680341],
 [4.158561456022096e-06, 3, 39, 'relu', 39, 0.0006627428956980823],
 [

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?