<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Tobias Reaper

---
---

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this [telecom customer churn dataset](https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv).

## Requirements

- [x] Load the data
- [x] Clean the data if necessary (it will be)
- [ ] Create and fit a baseline Keras MLP model to the data.
- [ ] Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer

You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters.

Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?

---

## Load and Look

- [x] Load the data

In [1]:
# === Initial imports === #
import pandas as pd
import numpy as np

# !pip install pyjanitor
import janitor

In [2]:
# === Load the data === #
data_url = "https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv"

df1 = pd.read_csv(data_url)

In [3]:
# === Configure pandas display settings === #
pd.options.display.max_columns = 100
pd.options.display.max_rows = 100

In [4]:
# === First looks === #
print(df1.shape)
df1.head(2)

(7043, 21)


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No


In [5]:
# === Nulls? === #
df1.isnull().sum() # Nope! At least no NaN values

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

In [6]:
# === Look at the data in other ways === #
df1.select_dtypes(exclude="number").describe().T.sort_values(by="unique")

Unnamed: 0,count,unique,top,freq
Churn,7043,2,No,5174
gender,7043,2,Male,3555
Partner,7043,2,No,3641
Dependents,7043,2,No,4933
PhoneService,7043,2,Yes,6361
PaperlessBilling,7043,2,Yes,4171
Contract,7043,3,Month-to-month,3875
StreamingMovies,7043,3,No,2785
StreamingTV,7043,3,No,2810
TechSupport,7043,3,No,3473


I'm curious why some of the yes/no columns have 3 categories.

In [7]:
# === Look at an example of yes/no feature with 3 categories === #
df1["StreamingMovies"].value_counts()

No                     2785
Yes                    2732
No internet service    1526
Name: StreamingMovies, dtype: int64

In [8]:
# === Look at another example of yes/no feature with 3 categories === #
df1["TechSupport"].value_counts()

No                     3473
Yes                    2044
No internet service    1526
Name: TechSupport, dtype: int64

That is a kind of analog for null values, but are already filled in. We can use it like that.

---

## Data Day Spa

- [ ] Clean the data if necessary (it will be)
  - [x] Clean column names
  - [x] Fix incorrect datatypes
  - [x] Encode categorical columns

In [35]:
# === First, clean up the column names using pyjanitor === #
df2 = (df1.clean_names())
df2.head(2)

Unnamed: 0,customerid,gender,seniorcitizen,partner,dependents,tenure,phoneservice,multiplelines,internetservice,onlinesecurity,onlinebackup,deviceprotection,techsupport,streamingtv,streamingmovies,contract,paperlessbilling,paymentmethod,monthlycharges,totalcharges,churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No


In [36]:
# === Look at datatypes === #
df2.dtypes

customerid           object
gender               object
seniorcitizen         int64
partner              object
dependents           object
tenure                int64
phoneservice         object
multiplelines        object
internetservice      object
onlinesecurity       object
onlinebackup         object
deviceprotection     object
techsupport          object
streamingtv          object
streamingmovies      object
contract             object
paperlessbilling     object
paymentmethod        object
monthlycharges      float64
totalcharges         object
churn                object
dtype: object

the `totalcharges` column should be float just like monthlycharges.

In [37]:
# === Convert totalcharges to float === #

# First, extract only the numerical digits and the decimal point
df2["total_charges"] = df2["totalcharges"].str.replace(r"[^0-9.]", "")

# Then convert to float
# df2["total_charges"] = df2["total_charges"].astype(float)

In [38]:
# === Convert totalcharges to float === #

# First, extract only the numerical digits and the decimal point
df2["total_charges"] = df2["totalcharges"].str.extract(r"(\d*\.\d*)")

# Then convert to float
df2["total_charges"] = df2["total_charges"].astype(float)

In [39]:
# === Remove original column and confirm === #
df2 = df2.drop(columns=["totalcharges", "customerid"])
df2["total_charges"].dtype

dtype('float64')

In [40]:
df2.isnull().sum()

gender                0
seniorcitizen         0
partner               0
dependents            0
tenure                0
phoneservice          0
multiplelines         0
internetservice       0
onlinesecurity        0
onlinebackup          0
deviceprotection      0
techsupport           0
streamingtv           0
streamingmovies       0
contract              0
paperlessbilling      0
paymentmethod         0
monthlycharges        0
churn                 0
total_charges       335
dtype: int64

In [41]:
# === Drop missing values === #
df3 = df2.dropna(axis=0)

#### Encoding

We want to be sure that the encoding for "yes" and "no" is consistent throughout the dataset. To do that, a replace function can be used.

In [43]:
# === Replace the values with numbers === #

# Set up mapping dictionary
mapper = {
    "Male": 0,
    "Female": 1,
    "Yes": 1,
    "No": 0,
    "No internet service": -1,
    "No phone service": -1,
    "Fiber optic": 0,
    "DSL": 1,
    "Month-to-month": 0,
    "Two year": 2,
    "One year": 1,
    "Electronic check": 0,
    "Mailed check": 1,
    "Bank transfer (automatic)": 2,
    "Credit card (automatic)": 3,
}

# Replace throughout dataframe
df4 = df3.replace(to_replace=mapper)

In [44]:
df4.head()

Unnamed: 0,gender,seniorcitizen,partner,dependents,tenure,phoneservice,multiplelines,internetservice,onlinesecurity,onlinebackup,deviceprotection,techsupport,streamingtv,streamingmovies,contract,paperlessbilling,paymentmethod,monthlycharges,churn,total_charges
0,1,0,1,0,1,0,-1,1,0,1,0,0,0,0,0,1,0,29.85,0,29.85
1,0,0,0,0,34,1,0,1,1,0,1,0,0,0,1,0,1,56.95,0,1889.5
2,0,0,0,0,2,1,0,1,1,1,0,0,0,0,0,1,1,53.85,1,108.15
3,0,0,0,0,45,0,-1,1,1,0,1,1,0,0,1,0,2,42.3,0,1840.75
4,1,0,0,0,2,1,0,0,0,0,0,0,0,0,0,1,0,70.7,1,151.65


In [45]:
df4.dtypes

gender                int64
seniorcitizen         int64
partner               int64
dependents            int64
tenure                int64
phoneservice          int64
multiplelines         int64
internetservice       int64
onlinesecurity        int64
onlinebackup          int64
deviceprotection      int64
techsupport           int64
streamingtv           int64
streamingmovies       int64
contract              int64
paperlessbilling      int64
paymentmethod         int64
monthlycharges      float64
churn                 int64
total_charges       float64
dtype: object

In [50]:
# === Scale the two float columns === #
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

for col in ["monthlycharges", "total_charges"]:
    array = df4[col].values.reshape(-1, 1)
    df4[col] = scaler.fit_transform(array)

In [51]:
df4.describe()

Unnamed: 0,gender,seniorcitizen,partner,dependents,tenure,phoneservice,multiplelines,internetservice,onlinesecurity,onlinebackup,deviceprotection,techsupport,streamingtv,streamingmovies,contract,paperlessbilling,paymentmethod,monthlycharges,churn,total_charges
count,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0,6708.0
mean,0.496422,0.162791,0.481067,0.297704,32.36449,0.903846,0.325432,0.343918,0.0726,0.130739,0.127013,0.076625,0.168754,0.172928,0.686643,0.593023,1.314252,-5.514704e-17,0.266995,-1.383641e-16
std,0.500024,0.369202,0.499679,0.457283,24.553474,0.294824,0.641791,0.475049,0.705644,0.737739,0.735861,0.708067,0.755574,0.75739,0.832589,0.491307,1.148544,1.000075,0.442423,1.000075
min,0.0,0.0,0.0,0.0,1.0,0.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,0.0,0.0,-1.549161,0.0,-0.9972627
25%,0.0,0.0,0.0,0.0,9.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.9711837,0.0,-0.829505
50%,0.0,0.0,0.0,0.0,29.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.1843553,0.0,-0.3897469
75%,1.0,0.0,1.0,1.0,55.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,0.8308921,1.0,0.6660007
max,1.0,1.0,1.0,1.0,72.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,3.0,1.791556,1.0,2.822466


---

## Baseline model

- [x] Create and fit a baseline Keras MLP model to the data

In [52]:
# === Split data into train, test, features, target === #
from sklearn.model_selection import train_test_split

train, test = train_test_split(df4)

X_train = train.drop(columns=["churn"])
X_test = test.drop(columns=["churn"])

y_train = train["churn"]
y_test = test["churn"]

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(5031, 19) (5031,)
(1677, 19) (1677,)


In [53]:
# === Tensorflow / keras imports === #
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [55]:
# Hyperparameters: untuned
inputs = X_train.shape[1]
epochs = 50
batch_size = 10

# Build the model
model = Sequential()
model.add(Dense(16, activation='sigmoid', input_shape=(inputs,)))
model.add(Dense(8, activation='tanh'))
model.add(Dense(1))

# Compile the model according to the specs
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Fit model
model.fit(X_train, y_train, 
          validation_data=(X_test, y_test),
          epochs=epochs,
          batch_size=batch_size,
         )

Train on 5031 samples, validate on 1677 samples
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x7f0e96694be0>

In [56]:
model.evaluate(X_train, y_train)



[0.4092990502439657, 0.81117076]

---

## Hyperamatunage

- [ ] Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters.

Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?

In [57]:
# === Hyper imports === #
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [None]:
# === Hyperparamatized === #

# Function to create model, required for KerasClassifier
def create_model(
    learning_rate=0.01,
    init_mode_1="glorot_uniform",
    init_mode_2="zero",
    activation_1="relu",
    activation_2="tanh",
):

    # Create model
    model = Sequential()
    model.add(Dense(16, activation="sigmoid", input_shape=(inputs,)))
    model.add(Dense(12, activation=activation_1, kernel_initializer=init_mode_1))
    model.add(Dense(8, activation=activation_2, kernel_initializer=init_mode_2))
    model.add(Dense(1, activation="sigmoid"))

    # Compile model
    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

# Instantiate the classifier model
model = KerasClassifier(build_fn=create_model, verbose=0)

# Define the grid search parameters
param_grid = {
    "batch_size": [10, 20, 40, 60, 80, 100],
    "epochs": [20],
    "learning_rate": [.001, .01, .1, .2, .3, .5],
    "init_mode_1": ["lecun_uniform", "he_normal", "glorot_uniform"],
    "init_mode_2": ["zero", "he_normal", "glorot_uniform"],
    "activation_1": ["relu", "sigmoid", "tanh"],
    "activation_2": ["tanh", "relu", "sigmoid"],
}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']

for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

In [None]:
model = Sequential()
model.fit()

---
---

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?