<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>

# Hyperparameter Tuning

## *Data Science Unit 4 Sprint 2 Assignment 4*

## Your Mission, should you choose to accept it...

To hyperparameter tune and extract every ounce of accuracy out of this telecom customer churn dataset: [Available Here](https://lambdaschool-data-science.s3.amazonaws.com/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv)

## Requirements

- Load the data
- Clean the data if necessary (it will be)
- Create and fit a baseline Keras MLP model to the data.
- Hyperparameter tune (at least) the following parameters:
 - batch_size
 - training epochs
 - optimizer
 - learning rate (if applicable to optimizer)
 - momentum (if applicable to optimizer)
 - activation functions
 - network weight initialization
 - dropout regularization
 - number of neurons in the hidden layer
 
 You must use Grid Search and Cross Validation for your initial pass of the above hyperparameters
 
 Try and get the maximum accuracy possible out of this data! You'll save big telecoms millions! Doesn't that sound great?


### DOWNLOAD DATA

In [1]:
import pandas as pd
import numpy as np

In [2]:
DATA_URL = ('./data/WA_Fn-UseC_-Telco-Customer-Churn+(1).csv')

In [3]:
df = pd.read_csv(DATA_URL)

In [4]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [5]:
df.shape

(7043, 21)

### CLEAN UP

- changing as many columns to 1/0's as possible before categorically encoding

In [6]:
df.columns

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')

In [7]:
df['gender'].value_counts()

Male      3555
Female    3488
Name: gender, dtype: int64

In [8]:
df['Partner'].value_counts()

No     3641
Yes    3402
Name: Partner, dtype: int64

In [9]:
df['Dependents'].value_counts()

No     4933
Yes    2110
Name: Dependents, dtype: int64

In [10]:
df['PhoneService'].value_counts()

Yes    6361
No      682
Name: PhoneService, dtype: int64

In [11]:
df['OnlineSecurity'].value_counts()

No                     3498
Yes                    2019
No internet service    1526
Name: OnlineSecurity, dtype: int64

In [12]:
df['OnlineBackup'].value_counts()

No                     3088
Yes                    2429
No internet service    1526
Name: OnlineBackup, dtype: int64

In [13]:
df['DeviceProtection'].value_counts()

No                     3095
Yes                    2422
No internet service    1526
Name: DeviceProtection, dtype: int64

In [14]:
df['TechSupport'].value_counts()

No                     3473
Yes                    2044
No internet service    1526
Name: TechSupport, dtype: int64

In [15]:
df['StreamingTV'].value_counts()

No                     2810
Yes                    2707
No internet service    1526
Name: StreamingTV, dtype: int64

In [16]:
df['StreamingMovies'].value_counts()

No                     2785
Yes                    2732
No internet service    1526
Name: StreamingMovies, dtype: int64

In [17]:
df['Contract'].value_counts()

Month-to-month    3875
Two year          1695
One year          1473
Name: Contract, dtype: int64

In [18]:
df['PaperlessBilling'].value_counts()

Yes    4171
No     2872
Name: PaperlessBilling, dtype: int64

In [19]:
df['PaperlessBilling'].value_counts()

Yes    4171
No     2872
Name: PaperlessBilling, dtype: int64

In [20]:
df['Churn'].value_counts()

No     5174
Yes    1869
Name: Churn, dtype: int64

#### USING LABEL ENCODERS
https://stackoverflow.com/questions/40901770/is-there-a-simple-way-to-change-a-column-of-yes-no-to-1-0-in-a-pandas-dataframe
https://stackoverflow.com/questions/24458645/label-encoding-across-multiple-columns-in-scikit-learn

Something cool I learned, you can use label encoders to change binary classifications (as well as other categorical data) to numbers. I knew this but did not realize how succint this could be. 

In [21]:
df.isnull().sum()

customerID          0
gender              0
SeniorCitizen       0
Partner             0
Dependents          0
tenure              0
PhoneService        0
MultipleLines       0
InternetService     0
OnlineSecurity      0
OnlineBackup        0
DeviceProtection    0
TechSupport         0
StreamingTV         0
StreamingMovies     0
Contract            0
PaperlessBilling    0
PaymentMethod       0
MonthlyCharges      0
TotalCharges        0
Churn               0
dtype: int64

In [22]:
def cat_var(df): 
    """Identify categorical features. 

    Parameters
    ----------
    df: original df after missing operations 

    Returns
    -------
    cat_var_df: summary df with col index and col name for all categorical vars
    """
    col_type = df.dtypes
    col_names = list(df)

    cat_var_index = [i for i, x in enumerate(col_type) if x=='object']
    cat_var_name = [x for i, x in enumerate(col_names) if i in cat_var_index]

    cat_var_df = pd.DataFrame({'cat_ind': cat_var_index, 
                               'cat_name': cat_var_name})

    return cat_var_df

In [23]:
from sklearn.preprocessing import LabelEncoder 

In [24]:
def column_encoder(df, cat_var_list):
    """Encoding categorical feature in the dataframe

    Parameters
    ----------
    df: input dataframe 
    cat_var_list: categorical feature index and name, from cat_var function

    Return
    ------
    df: new dataframe where categorical features are encoded
    label_list: classes_ attribute for all encoded features 
    """

    label_list = []
    cat_var_df = cat_var(df)
    cat_list = cat_var_df.loc[:, 'cat_name']

    for index, cat_feature in enumerate(cat_list): 

        le = LabelEncoder()

        le.fit(df.loc[:, cat_feature])    
        label_list.append(list(le.classes_))

        df.loc[:, cat_feature] = le.transform(df.loc[:, cat_feature])

    return df

In [25]:
df1 = df.copy()

In [26]:
cat_var_list = cat_var(df1)

In [27]:
df2 = column_encoder(df1, cat_var_list)

In [28]:
df2

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,5375,0,0,1,0,1,0,1,0,0,...,0,0,0,0,0,1,2,29.85,2505,0
1,3962,1,0,0,0,34,1,0,0,2,...,2,0,0,0,1,0,3,56.95,1466,0
2,2564,1,0,0,0,2,1,0,0,2,...,0,0,0,0,0,1,3,53.85,157,1
3,5535,1,0,0,0,45,0,1,0,2,...,2,2,0,0,1,0,0,42.30,1400,0
4,6511,0,0,0,0,2,1,0,1,0,...,0,0,0,0,0,1,2,70.70,925,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7038,4853,1,0,1,1,24,1,2,0,2,...,2,2,2,2,1,1,3,84.80,1597,0
7039,1525,0,0,1,1,72,1,2,1,0,...,2,0,2,2,1,1,1,103.20,5698,0
7040,3367,0,0,1,1,11,0,1,0,2,...,0,0,0,0,0,1,2,29.60,2994,0
7041,5934,1,1,1,0,4,1,2,1,0,...,0,0,0,0,0,1,3,74.40,2660,1


In [29]:
#Initialize majority class
y = df2['Churn']
majority = y.mode()[0]
y_pred = [majority]*len(y)

In [30]:
from sklearn.metrics import accuracy_score

In [31]:
accuracy_score(y, y_pred)

0.7346301292063041

### TRAIN/TEST SPLIT

In [32]:
from sklearn.model_selection import train_test_split

In [33]:
train, test = train_test_split(
    df2, 
    test_size = 0.20, 
    stratify = df2['Churn'],
    random_state = 42
    )

#### BASELINE

In [34]:
train['Churn'].value_counts(normalize=True)

0    0.734647
1    0.265353
Name: Churn, dtype: float64

#### ARRANGE DATA INTO X FEATURES MATRIX AND Y TARGET VECTOR

In [35]:
target = 'Churn'
X_train = train.drop(columns=['customerID',target])
y_train = train[target]
X_test = test.drop(columns=['customerID',target])
y_test = test[target]

In [36]:
X_train.columns

Index(['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure',
       'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity',
       'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV',
       'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod',
       'MonthlyCharges', 'TotalCharges'],
      dtype='object')

### HYPERPARAMETER TUNING

Hyperparameter tune (at least) the following parameters:

- batch_size ✓
- training epochs ✓
- optimizer ✓
- activation functions ✓

Will create RandomSearchCV
- learning rate (if applicable to optimizer)
- momentum (if applicable to optimizer)
- network weight initialization 
- dropout regularization 
- number of neurons in the hidden layer


In [51]:
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [38]:
inputs  = X_train.shape[1]
inputs

19

#### Model 1

In [43]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# Function to create model, required for KerasClassifier
def create_model():
    
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=inputs, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
search = grid_result.best_params_
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7715697884559631 using {'batch_size': 20, 'epochs': 20}
Means: 0.7680112242698669, Stdev: 0.024855562966980244 with: {'batch_size': 10, 'epochs': 20}
Means: 0.7715697884559631, Stdev: 0.025624836284666944 with: {'batch_size': 20, 'epochs': 20}
Means: 0.7566457748413086, Stdev: 0.03894861657197795 with: {'batch_size': 40, 'epochs': 20}
Means: 0.7642763376235961, Stdev: 0.036728566000628814 with: {'batch_size': 60, 'epochs': 20}
Means: 0.7577220439910889, Stdev: 0.0493439939184053 with: {'batch_size': 80, 'epochs': 20}
Means: 0.6622490763664246, Stdev: 0.12251863124911333 with: {'batch_size': 100, 'epochs': 20}


#### Model 2

In [45]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# Function to create model, required for KerasClassifier
def create_model():
    
    # create model
    model = Sequential()
    model.add(Dense(64, input_dim=inputs, activation='relu'))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [20],
              'epochs': [20, 50, 75]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
search = grid_result.best_params_
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7719232916831971 using {'batch_size': 20, 'epochs': 75}
Means: 0.7145980834960938, Stdev: 0.09617776758201642 with: {'batch_size': 20, 'epochs': 20}
Means: 0.6781951427459717, Stdev: 0.08908856460980505 with: {'batch_size': 20, 'epochs': 50}
Means: 0.7719232916831971, Stdev: 0.04157771147871243 with: {'batch_size': 20, 'epochs': 75}


#### MODEL 3

In [52]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# Function to create model, required for KerasClassifier
def create_model(optimizer='adam'):
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(inputs,)))
    model.add(Dense(32, activation='relu'))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, activation='sigmoid'))  
    
    #Compile model
    model.compile(loss = "binary_crossentropy", optimizer = optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=75, batch_size=20, verbose=0) 

# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
search = grid_result.best_params_
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7893166422843934 using {'optimizer': 'RMSprop'}
Means: 0.7348248481750488, Stdev: 0.011299655878343624 with: {'optimizer': 'SGD'}
Means: 0.7893166422843934, Stdev: 0.014089292449175497 with: {'optimizer': 'RMSprop'}
Means: 0.784165358543396, Stdev: 0.00972925385076679 with: {'optimizer': 'Adagrad'}
Means: 0.7300333619117737, Stdev: 0.013374783588156563 with: {'optimizer': 'Adadelta'}
Means: 0.7552377343177795, Stdev: 0.013983304669407372 with: {'optimizer': 'Adam'}
Means: 0.7520410537719726, Stdev: 0.020348469786460734 with: {'optimizer': 'Adamax'}
Means: 0.7598482966423035, Stdev: 0.03155538913317772 with: {'optimizer': 'Nadam'}


#### Model 4

In [54]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# Function to create model, required for KerasClassifier
def create_model(activation = 'relu'):
    model = Sequential()
    model.add(Dense(64, activation = activation, input_shape=(inputs,)))
    model.add(Dense(32, activation = activation))
    model.add(Dropout(rate=0.2))
    model.add(Dense(1, activation='sigmoid'))  
    
    #Compile model
    model.compile(loss = "binary_crossentropy", optimizer = 'RMSprop', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=75, batch_size=20, verbose=0) 

# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
search = grid_result.best_params_
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7839880466461182 using {'activation': 'softsign'}
Means: 0.696493637561798, Stdev: 0.09437899483369952 with: {'activation': 'softmax'}
Means: 0.7731631755828857, Stdev: 0.02304453866013961 with: {'activation': 'softplus'}
Means: 0.7839880466461182, Stdev: 0.013022461774514788 with: {'activation': 'softsign'}
Means: 0.7706777334213257, Stdev: 0.022454343441841586 with: {'activation': 'relu'}
Means: 0.7660635828971862, Stdev: 0.01773752962319886 with: {'activation': 'tanh'}
Means: 0.7593190670013428, Stdev: 0.017661071944078267 with: {'activation': 'sigmoid'}
Means: 0.7671259760856628, Stdev: 0.01984849571658241 with: {'activation': 'hard_sigmoid'}
Means: 0.7797247052192688, Stdev: 0.018782291651571255 with: {'activation': 'linear'}


## Stretch Goals:

- Try to implement Random Search Hyperparameter Tuning on this dataset
- Try to implement Bayesian Optimiation tuning on this dataset using hyperas or hyperopt (if you're brave)
- Practice hyperparameter tuning other datasets that we have looked at. How high can you get MNIST? Above 99%?
- Study for the Sprint Challenge
 - Can you implement both perceptron and MLP models from scratch with forward and backpropagation?
 - Can you implement both perceptron and MLP models in keras and tune their hyperparameters with cross validation?