# Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


## need to download these each time aws instance is started

In [8]:
!conda install -c conda-forge category_encoders -y

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [2]:
!wget https://raw.githubusercontent.com/treselle-systems/customer_churn_analysis/master/WA_Fn-UseC_-Telco-Customer-Churn.csv

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [3]:
!pip install h5py scikit-optimize



In [9]:
!pip install --upgrade jupyterthemes

Collecting jupyterthemes
  Downloading https://files.pythonhosted.org/packages/8a/08/9dee6dfd7f2aad6c30282d55c8f495b4dc1e4747b4e2bdbeb80572ddf312/jupyterthemes-0.20.0-py2.py3-none-any.whl (7.0MB)
Collecting lesscpy>=0.11.2 (from jupyterthemes)
  Downloading https://files.pythonhosted.org/packages/10/d0/fdd9874972e07ae8727a3d26b433891d8605b96999ea99bbf506e756a7b1/lesscpy-0.13.0-py2.py3-none-any.whl (48kB)
Installing collected packages: lesscpy, jupyterthemes
Successfully installed jupyterthemes-0.20.0 lesscpy-0.13.0


In [10]:
!jt -t chesterish

In [None]:
!conda install keras 

## Imports I know I'll need

In [5]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
import category_encoders as ce

ModuleNotFoundError: No module named 'keras'

## Load and Transform data

In [None]:
df = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")

In [None]:
 pd.set_option("display.max_columns", 21)

In [None]:
df.head()

In [None]:
df = df.replace("No",0).replace("Yes",1)
df = df.replace("Male",1).replace("Female",0)
df = df.replace(" ",0)

In [None]:
from sklearn.pipeline import make_pipeline
pipe=make_pipeline(
   ce.BinaryEncoder(cols=["Partner","gender","SeniorCitizen","Dependents","PhoneService","PaperlessBilling"]),
   ce.OneHotEncoder(cols=["MultipleLines","InternetService","OnlineSecurity","OnlineBackup","DeviceProtection",
                                        "TechSupport","StreamingTV","StreamingMovies","Contract","PaymentMethod"])
)
df_enc = pipe.fit_transform(df)

In [None]:
df_enc["TotalCharges"] = pd.to_numeric(df_enc["TotalCharges"])

In [None]:
X = df_enc.drop(columns =["customerID","Churn"],axis=1)
y = df_enc["Churn"]

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X,y,random_state=42,test_size=.15,stratify=y)

In [None]:
import skopt
import tensorflow
from tensorflow.python.keras import backend as K
from keras import optimizers
from tensorflow.python.keras.models import load_model
from skopt import gp_minimize, forest_minimize,gbrt_minimize
from skopt.space import Real, Categorical, Integer
from skopt.plots import plot_convergence
from skopt.plots import plot_objective, plot_evaluations
from keras.regularizers import l1,l2,l1_l2
from skopt.utils import use_named_args


In [None]:
dim_learning_rate = Real(low=1e-6, high=1e-2, prior='log-uniform',
                         name='learning_rate')
dim_num_dense_layers = Integer(low=1, high=5, name='num_dense_layers')
dim_num_dense_nodes = Integer(low=16, high=72, name='num_dense_nodes')
dim_activation = Categorical(categories=['relu', 'sigmoid'],
                             name='activation')
dim_batch_size = Integer(low=28, high=128, name='batch_size')
dim_adam_decay = Real(low=1e-6,high=1e-2,name="adam_decay")

dimensions = [dim_learning_rate,
              dim_num_dense_layers,
              dim_num_dense_nodes,
              dim_activation,
              dim_batch_size,
              dim_adam_decay
             ]
default_parameters = [1e-3, 1, 16, 'relu',128, 1e-3]


In [None]:
dim_num_dense_nodes

In [None]:
input_shape = X_train.shape[1]

In [None]:
def create_model(learning_rate, num_dense_layers,
                 num_dense_nodes, activation,adam_decay
                ):
    """
    Hyper-parameters:
    learning_rate:     Learning-rate for the optimizer.
    num_dense_layers:  Number of dense layers.
    num_dense_nodes:   Number of nodes in each dense layer.
    activation:        Activation function for all layers.
    """
    
    # Start construction of a Keras Sequential model.
    model = Sequential()

    # Add an input layer which is similar to a feed_dict in TensorFlow.
    # Note that the input-shape must be a tuple containing the image-size.
    model.add(Dense(num_dense_nodes, activation=activation, input_shape=(input_shape,) ))

    

    # Add fully-connected / dense layers.
    # The number of layers is a hyper-parameter we want to optimize.
    for i in range(num_dense_layers):
        # Name of the layer. This is not really necessary
        # because Keras should give them unique names.
        name = 'layer_dense_{0}'.format(i+1)

        # Add the dense / fully-connected layer to the model.
        # This has two hyper-parameters we want to optimize:
        # The number of nodes and the activation function.
        model.add(Dense(num_dense_nodes,
                        activation=activation,
                        name=name,
                        ))
        

    # Last fully-connected / dense layer with softmax-activation
    # for use in classification.
    model.add(Dense(1, activation='sigmoid'))
    
    # Use the Adam method for training the network.
    # We want to find the best learning-rate for the Adam method.
    adam = optimizers.Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=None, decay=adam_decay, amsgrad=False)
    
    # In Keras we need to compile the model so it can be trained.
    model.compile(optimizer=adam,
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    
    return model

In [None]:
@use_named_args(dimensions=dimensions)
def fitness(learning_rate, num_dense_layers,
            num_dense_nodes, activation, batch_size, adam_decay
           ):
    """
    Hyper-parameters:
    learning_rate:     Learning-rate for the optimizer.
    num_dense_layers:  Number of dense layers.
    num_dense_nodes:   Number of nodes in each dense layer.
    activation:        Activation function for all layers.
    """

    # Print the hyper-parameters.
    print('learning rate: {0:.1e}'.format(learning_rate))
    print('num_dense_layers:', num_dense_layers)
    print('num_dense_nodes:', num_dense_nodes)
    print('activation:', activation)
    print()
    
    # Create the neural network with these hyper-parameters.
    model = create_model(learning_rate=learning_rate,
                         num_dense_layers=num_dense_layers,
                         num_dense_nodes=num_dense_nodes,
                         activation=activation,adam_decay=adam_decay
                        )
    

    # Use Keras to train the model.
    history = model.fit(x=X_train,
                        y=y_train,
                        epochs=3,
                        batch_size=batch_size,
                        validation_split=0.3,
                        )

    # Get the classification accuracy on the validation-set
    # after the last training-epoch.
    accuracy = history.history['val_acc'][-1]

    # Print the classification accuracy.
    print()
    print("Accuracy: {0:.2%}".format(accuracy))
    print()


    # Delete the Keras model with these hyper-parameters from memory.
    del model
    
    # Clear the Keras session, otherwise it will keep adding new
    # models to the same TensorFlow graph each time we create
    # a model with a different set of hyper-parameters.
    K.clear_session()
    tensorflow.reset_default_graph()

    
    # NOTE: Scikit-optimize does minimization so it tries to
    # find a set of hyper-parameters with the LOWEST fitness-value.
    # Because we are interested in the HIGHEST classification
    # accuracy, we need to negate this number so it can be minimized.
    return -accuracy

In [None]:
os.path.isfile('best_model.h5')
#os.remove('best_model.h5')

In [None]:
K.clear_session()
tensorflow.reset_default_graph()


In [None]:
#fitness(x=default_parameters)

In [None]:
search_result = gp_minimize(func=fitness,
                            dimensions=dimensions,
                            acq_func='EI', # Expected Improvement.
                            n_calls=40,
                            n_jobs=-1,
                            x0=default_parameters)

In [None]:
dir(search_result)

In [None]:
search_result.x_iters

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?