# Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: <https://drive.google.com/file/d/1dfbAsM9DwA7tYhInyflIpZnYs7VT-0AQ/view> 

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


### Imports & Data Loading

In [31]:
from pprint import pprint

import numpy as np
import pandas as pd

from sklearn.preprocessing import LabelEncoder, LabelBinarizer, OneHotEncoder, StandardScaler

from keras.models import Sequential
from keras.layers import Dense

In [48]:
df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


### Data Cleaning

In [49]:
print('NaN Counts')
pprint(df.isna().sum())

NaN Counts
customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64


There are only 3 numeric features currently. I'll `.describe()` the non-numeric columns and plan on encoding many of those columns, then using `.describe()` again.

In [50]:
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692
std,0.368612,24.559481,30.090047
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75


In [51]:
df.describe(exclude='number').T

Unnamed: 0,count,unique,top,freq
customerID,7043,7043,6543-XRMYR,1
gender,7043,2,Male,3555
Partner,7043,2,No,3641
Dependents,7043,2,No,4933
PhoneService,7043,2,Yes,6361
MultipleLines,7043,3,No,3390
InternetService,7043,3,Fiber optic,3096
OnlineSecurity,7043,3,No,3498
OnlineBackup,7043,3,No,3088
DeviceProtection,7043,3,No,3095


Now I'll encode the above features, and remove the `customerID`.

In [52]:
# A column of strings representing money... better to convert to floats.
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df = df.drop(columns='customerID')

In [54]:
categorical_features = df.describe(exclude='number').columns

oh_enc = OneHotEncoder(sparse=False)
blah = oh_enc.fit_transform(df[categorical_features])
print(blah.shape)
blah
# df[categorical_features].head()

(7043, 43)


array([[1., 0., 0., ..., 0., 1., 0.],
       [0., 1., 1., ..., 1., 1., 0.],
       [0., 1., 1., ..., 1., 0., 1.],
       ...,
       [1., 0., 0., ..., 0., 1., 0.],
       [0., 1., 0., ..., 1., 0., 1.],
       [0., 1., 1., ..., 0., 1., 0.]])

In [57]:
np.append(df.drop(columns=categorical_features).to_numpy(), blah).shape

(331021,)

In [7]:
# # A column of strings representing money... better to convert to floats.
# df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

# categorical_features = df.describe(exclude='number').columns

# for col in categorical_features:
#     label_enc = LabelEncoder()
#     encoded_df = label_enc.fit_transform(df[col])
#     df[col] = encoded_df

# df[categorical_features].head()

Unnamed: 0,gender,Partner,Dependents,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,Churn
0,0,1,0,0,1,0,0,2,0,0,0,0,0,1,2,0
1,1,0,0,1,0,0,2,0,2,0,0,0,1,0,3,0
2,1,0,0,1,0,0,2,2,0,0,0,0,0,1,3,1
3,1,0,0,0,1,0,2,0,2,2,0,0,1,0,0,0
4,0,0,0,1,0,1,0,0,0,0,0,0,0,1,2,1


In [8]:
df.describe()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
count,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7032.0,7043.0
mean,0.504756,0.162147,0.483033,0.299588,32.371149,0.903166,0.940508,0.872923,0.790004,0.906432,0.904444,0.797104,0.985376,0.992475,0.690473,0.592219,1.574329,64.761692,2283.300441,0.26537
std,0.500013,0.368612,0.499748,0.45811,24.559481,0.295752,0.948554,0.737796,0.859848,0.880162,0.879949,0.861551,0.885002,0.885091,0.833755,0.491457,1.068104,30.090047,2266.771362,0.441561
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.25,18.8,0.0
25%,0.0,0.0,0.0,0.0,9.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,35.5,401.45,0.0
50%,1.0,0.0,0.0,0.0,29.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,2.0,70.35,1397.475,0.0
75%,1.0,0.0,1.0,1.0,55.0,1.0,2.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,1.0,1.0,2.0,89.85,3794.7375,1.0
max,1.0,1.0,1.0,1.0,72.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,1.0,3.0,118.75,8684.8,1.0


In [9]:
# Verify feature types are numeric for model training
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
gender              7043 non-null int64
SeniorCitizen       7043 non-null int64
Partner             7043 non-null int64
Dependents          7043 non-null int64
tenure              7043 non-null int64
PhoneService        7043 non-null int64
MultipleLines       7043 non-null int64
InternetService     7043 non-null int64
OnlineSecurity      7043 non-null int64
OnlineBackup        7043 non-null int64
DeviceProtection    7043 non-null int64
TechSupport         7043 non-null int64
StreamingTV         7043 non-null int64
StreamingMovies     7043 non-null int64
Contract            7043 non-null int64
PaperlessBilling    7043 non-null int64
PaymentMethod       7043 non-null int64
MonthlyCharges      7043 non-null float64
TotalCharges        7032 non-null float64
Churn               7043 non-null int64
dtypes: float64(2), int64(18)
memory usage: 1.1 MB


### Normalize the input data & convert to NumPy arrays

In [10]:
from keras.utils import normalize

# X, y split into NumPy arrays
X = df.drop(columns='Churn').values.astype('float64')
y = LabelBinarizer().fit_transform(df['Churn']).flatten().astype('float64')

X = StandardScaler().fit_transform(X)
# X = normalize(X, axis=1, order=2)
print(X.shape, type(X))
print(X)
print(y.shape, type(y))
print(y)

(7043, 19) <class 'numpy.ndarray'>
[[-1.00955867 -0.43991649  1.03453023 ...  0.39855772 -1.16032292
  -0.99419409]
 [ 0.99053183 -0.43991649 -0.96662231 ...  1.33486261 -0.25962894
  -0.17373982]
 [ 0.99053183 -0.43991649 -0.96662231 ...  1.33486261 -0.36266036
  -0.95964911]
 ...
 [-1.00955867 -0.43991649  1.03453023 ...  0.39855772 -1.1686319
  -0.85451414]
 [ 0.99053183  2.27315869  1.03453023 ...  1.33486261  0.32033821
  -0.87209546]
 [ 0.99053183 -0.43991649 -0.96662231 ... -1.47405205  1.35896134
   2.01234407]]
(7043,) <class 'numpy.ndarray'>
[0. 0. 1. ... 0. 1. 0.]


### Create and fit a baseline Keras MLP model to the data

In [24]:
from keras.optimizers import SGD

np.random.seed(42)

inputs = X.shape[1]  # 19
epochs = 20
batch_size = 1

model = Sequential()
model.add(Dense(40, input_shape=(inputs,), activation='relu'))
model.add(Dense(40, activation='relu'))
model.add(Dense(1))

model.summary()
optimizer = SGD(lr=0.001, clipvalue=0.5, clipnorm=1.0)

model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
model.fit(X, y, epochs=epochs, batch_size=batch_size)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_37 (Dense)             (None, 40)                800       
_________________________________________________________________
dense_38 (Dense)             (None, 40)                1640      
_________________________________________________________________
dense_39 (Dense)             (None, 1)                 41        
Total params: 2,481
Trainable params: 2,481
Non-trainable params: 0
_________________________________________________________________
Epoch 1/20

KeyboardInterrupt: 

## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?