# Tuning and Optimizing Neural Networks - Lab

## Introduction

Now that we've discussed some regularization, initialization and optimization techniques, its time to synthesize those concepts into a cohesive modelling pipeline.  

With this pipeline, yoiu will not only fit an initial model but will also attempt to set various hyperparameters for regularization techniques. Your final model selection will pertain to the test metrics across these models. This will more naturally simulate a problem you might be faced with in practice, and the various modelling decisions you are apt to encounter along the way.  

Recall that our end objective is to achieve a balance between overfitting and underfitting. We've discussed the bias variance tradeoff, and the role of regularization in order to reduce overfitting on training data and improving generalization to new cases. Common frameworks for such a procedure include train/validate/test methodology when data is plentiful, and K-folds cross-validation for smaller, more limited datasets. In this lab, you'll perform the latter, as the dataset in question is fairly limited. 

## Objectives

You will be able to:

* Implement a K-folds cross validation modelling pipeline
* Apply normalization as a preprocessing technique
* Apply regularization techniques to improve your model's generalization
* Choose an appropriate optimization strategy 

## Loading the Data

In [55]:
#Your code here; load and preview the dataset

import pandas as pd

data = pd.read_csv('loan_final.csv')
data = data.dropna()
data.columns

Index(['loan_amnt', 'funded_amnt_inv', 'term', 'int_rate', 'installment',
       'grade', 'emp_length', 'home_ownership', 'annual_inc',
       'verification_status', 'loan_status', 'purpose', 'addr_state',
       'total_acc', 'total_pymnt', 'application_type'],
      dtype='object')

## Defining the Problem

Set up the problem by defining X and Y. 

For this problem use the following variables for X:
* loan_amnt
* home_ownership
* funded_amnt_inv
* verification_status
* emp_length
* installment
* annual_inc

Be sure to use dummy variables for categorical variables and to normalize numerical quanitities. Be sure to also remove any rows with null data.  

For Y, we are looking to build a model to predict the total payment received for a loan.

In [56]:
data.head()

Unnamed: 0,loan_amnt,funded_amnt_inv,term,int_rate,installment,grade,emp_length,home_ownership,annual_inc,verification_status,loan_status,purpose,addr_state,total_acc,total_pymnt,application_type
0,5000.0,4975.0,36 months,10.65%,162.87,B,10+ years,RENT,24000.0,Verified,Fully Paid,credit_card,AZ,9.0,5863.155187,Individual
1,2500.0,2500.0,60 months,15.27%,59.83,C,< 1 year,RENT,30000.0,Source Verified,Charged Off,car,GA,4.0,1014.53,Individual
2,2400.0,2400.0,36 months,15.96%,84.33,C,10+ years,RENT,12252.0,Not Verified,Fully Paid,small_business,IL,10.0,3005.666844,Individual
3,10000.0,10000.0,36 months,13.49%,339.31,C,10+ years,RENT,49200.0,Source Verified,Fully Paid,other,CA,37.0,12231.89,Individual
4,3000.0,3000.0,60 months,12.69%,67.79,B,1 year,RENT,80000.0,Source Verified,Fully Paid,other,OR,38.0,4066.908161,Individual


In [57]:
import numpy as np

In [58]:
# Your code here; appropriately define X and Y using dummy variables and normalization for preprocessing.


X_0 = data['loan_amnt']
X_1 = data['home_ownership'] #categorical
X_2 = data['funded_amnt_inv']
X_3 = data['verification_status'] #categorical
X_4 = data['emp_length'] #categorical
X_5 = data['installment']
X_6 = data['annual_inc']

X_0 = (X_0 - np.mean(X_0)) / np.std(X_0)
X_1 = pd.get_dummies(X_1)
X_2 = (X_2 - np.mean(X_2)) / np.std(X_2)
X_3 = pd.get_dummies(X_3)
X_4 = pd.get_dummies(X_4)
X_5 = (X_5 - np.mean(X_5)) / np.std(X_5)
X_6 = (X_6 - np.mean(X_6)) / np.std(X_6)

X = pd.concat([X_0,X_2,X_5,X_6,X_1,X_3,X_4], axis = 1)
X = X.dropna()
X.head()

y = data['total_pymnt']
y = (y - np.mean(y)) / np.std(y)


In [59]:
np.shape(X)

(41394, 23)

## Generating a Hold Out Test Set

While we will be using K-fold cross validation to select an optimal model, we still want a final hold out test set that is completely independent of any modelling decisions. As such, pull out a sample of 10% of the total available data. For consistency of results, use random seed 123. 

In [60]:
# Your code here; generate a hold out test set for final model evaluation. Use random seed 123.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .1, random_state = 123)

X_train.reset_index(drop = True, inplace= True)
y_train.reset_index(drop = True, inplace= True)

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(37254, 23) (4140, 23) (37254,) (4140,)



## Defining a K-fold Cross Validation Methodology

Now that your have a complete holdout test set, write a function that takes in the remaining data and performs k-folds cross validation given a model object. Be sure your function returns performance metrics regarding the training and validation sets.

In [61]:
from sklearn.model_selection import KFold

In [62]:
#Your code here; define a function to evaluate a model object using K folds cross validation.

def k_folds(features_train, labels_train, model_obj, k=10, n_epochs=100):
    
    val_scores = []
    kf = KFold(n_splits = k, shuffle = True)
    
    for i, (train_index, test_index) in enumerate(kf.split(features_train)):
        X_train, X_val = (features_train.iloc[train_index], features_train.iloc[test_index])
        y_train, y_val = (labels_train.iloc[train_index], labels_train.iloc[test_index])
        
        print(np.shape(X_train))
        print(np.shape(X_val))
        print(np.shape(y_train))
        print(np.shape(y_val))
        
        print()
        
        model = model_obj
        history = model.fit (X_train, y_train, batch_size = 32, epochs = n_epochs, verbose = 0, validation_data = (X_val, y_val))
        val_score = model.evaluate(X_val, y_val)
        val_scores.append(val_score)
    
    validation_score = np.average(val_scores)
    print('mean val score: {}'.format(validation_score))
    print('std val score: {}'.format(np.std(val_scores)))
    
    return validation_score
    
    
    
    

## Building a Baseline Model

Here, it is also important to define your evaluation metric that you will look to optimize while tuning the model.   

In general, model training to optimize this metric may consist of using a validation and test set if data is plentiful, or k-folds cross-validation if data is limited. We set up a k-folds cross-validation for this task since the dataset is not overly large.  

Build an initial sequential model with 2 hidden relu layers. The first should have 7 hidden units, and the second 10 hidden units. Finally, add a third layer with a linear activation function to output our predictions for the total loan payment. 

In [63]:

from keras import models, layers

In [64]:
#Your code here; define and compile an initial model as described

np.random.seed(123)
model = models.Sequential()
model.add(layers.Dense(7, input_dim = 23, kernel_initializer = 'normal', activation = 'relu')) #23 input columns
model.add(layers.Dense(10, activation = 'relu'))
model.add(layers.Dense(1, kernel_initializer = 'normal', activation = 'linear'))
model.compile(optimizer = 'sgd', loss = 'mse', metrics = ['mse'])






## Evaluating the Baseline Model with K-Folds Cross Validation

Use your k-folds function to evaluate the baseline model.  

Note: This code block is likely to take 10-20 minutes to run depending on the specs on your computer.
Because of time dependencies, it can be interesting to begin timing these operations for future reference.

Here's a simple little recipe to achieve this:
```
import time
import datetime

now = datetime.datetime.now()
later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)
```

In [47]:
#Your code here; use your k-folds function to evaluate the baseline model.

import time
import datetime

now = datetime.datetime.now()

k_folds(X_train, y_train, model)

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.18795952415298084
std val score: 0.01382453425549332
Time Elapsed: 0:19:32.161351


## Intentionally Overfitting a Model

Now that you've developed a baseline model, its time to intentionally overfit a model. To overfit a model, you can:
* Add layers
* Make the layers bigger
* Increase the number of training epochs

Again, be careful here. Think about the limitations of your resources, both in terms of your computers specs and how much time and patience you have to let the process run. Also keep in mind that you will then be regularizing these overfit models, meaning another round of experiments and more time and resources.  

For example, here are some timing notes on potential experiments run on a Macbook Pro 3.1 GHz Intel Core i5 with 16gb of RAM:

* Using our 10 fold cross validation methodology, a 5-layer neural network with 10 units per hidden layer and 100 epochs took approximately 15 minutes to train and validate  

* Using our 10 fold cross validation methodology, a 5-layer neural network with 25 units per hidden layer and 100 epochs took approximately 25 minutes to train and validate  

* Using our 10 fold cross validation methodology, a 5-layer neural network with 10 units per hidden layer and 250 epochs took approximately 45 minutes to train and validate


In [65]:
from keras.layers import Dense
from keras.models import Sequential

In [66]:
#Your code here; try some methods to overfit your network
now = datetime.datetime.now()


model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.1844731237768995
std val score: 0.011000509872108027
Time Elapsed: 0:21:00.097571


In [67]:
#Your code here; try some methods to overfit your network
now = datetime.datetime.now()

model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model, n_epochs= 200)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.18597067575976264
std val score: 0.011261382507406565
Time Elapsed: 0:39:14.566520


In [68]:
#Your code here; try some methods to overfit your network
now = datetime.datetime.now()

model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(25, activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model, n_epochs= 250)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.18571682495570146
std val score: 0.015079453328531065
Time Elapsed: 0:22:19.495069


## Regularizing the Model to Achieve Balance  

Now that you have a powerful model (albeit an overfit one), we can now increase the generalization of the model by using some of the regularization techniques we discussed. Some options you have to try include:  
* Adding dropout
* Adding L1/L2 regularization
* Altering the layer architecture (add or remove layers similar to above)  

This process will be constrained by time and resources. Be sure to test at least 2 different methodologies, such as dropout and L2 regularization. If you have the time, feel free to continue experimenting.

Notes: 

In [71]:
from keras import regularizers

In [72]:
#Your code here; try some regularization or other methods to tune your network

#L1


now = datetime.datetime.now()

model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_regularizer= regularizers.l1(0.005), activation='relu'))
model.add(Dense(10, kernel_regularizer= regularizers.l1(0.005), activation='relu'))
model.add(Dense(10, kernel_regularizer= regularizers.l1(0.005), activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)




(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.2121264752812695
std val score: 0.01962147924309608
Time Elapsed: 0:21:07.960561


In [73]:
#Your code here; try some regularization or other methods to tune your network

#L2


now = datetime.datetime.now()

model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_regularizer= regularizers.l2(0.005), activation='relu'))
model.add(Dense(10, kernel_regularizer= regularizers.l2(0.005), activation='relu'))
model.add(Dense(10, kernel_regularizer= regularizers.l2(0.005), activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)




(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.19358888022514265
std val score: 0.008947212189328677
Time Elapsed: 0:22:15.967185


In [78]:
#Your code here; try some regularization #Your code here; try some regularization or other methods to tune your network

#Dropout and Early Stop



now = datetime.datetime.now()

model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(10, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(10, activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.4366396045967737
std val score: 0.04212377185340278
Time Elapsed: 0:23:37.733007


In [79]:
#Your code here; try some regularization #Your code here; try some regularization or other methods to tune your network

#Dropout, and L1, and Early Stop



now = datetime.datetime.now()

model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_regularizer = regularizers.l1(.005), activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(10, kernel_regularizer = regularizers.l1(.005), activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(10, kernel_regularizer = regularizers.l1(.005), activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.4103178487378713
std val score: 0.04449041599837636
Time Elapsed: 0:25:15.784208


## Final Evaluation

Now that you have selected a network architecture, tested various regularization procedures and tuned hyperparameters via a validation methodology, it is time to evaluate your finalized model once and for all. Fit the model using all of the training and validation data using the architecture and hyperparameters that were most effective in your expirements above. Afterwards, measure the overall performance on the hold-out test data which has been left untouched (and hasn't leaked any data into the modelling process)!

In [80]:
#Your code here; final model training on entire training set followed by evaluation on hold-out data



#Your code here; try some regularization #Your code here; try some regularization or other methods to tune your network

#Dropout, and L1, and Early Stop



now = datetime.datetime.now()

model = Sequential()
model.add(Dense(7, input_dim=23, kernel_initializer='normal', activation='relu'))
model.add(Dense(10, kernel_regularizer = regularizers.l2(.005), activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(10, kernel_regularizer = regularizers.l2(.005), activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(10, kernel_regularizer = regularizers.l2(.005), activation='relu'))
model.add(layers.Dropout(0.3))
model.add(Dense(1, kernel_initializer='normal', activation = 'linear'))
model.compile(optimizer="sgd" ,loss='mse',metrics=['mse'])

k_folds(X_train, y_train, model)    

later = datetime.datetime.now()
elapsed = later - now
print('Time Elapsed:', elapsed)





(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33528, 23)
(3726, 23)
(33528,)
(3726,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

(33529, 23)
(3725, 23)
(33529,)
(3725,)

mean val score: 0.31387846067040803
std val score: 0.03680113372101816
Time Elapsed: 0:28:12.812584


## Additional Resources

https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/

https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/

https://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/

https://stackoverflow.com/questions/37232782/nan-loss-when-training-regression-network
https://www.springboard.com/blog/free-public-data-sets-data-science-project/

## Summary

In this lab, we investigated some data from *The Lending Club* in a complete data science pipeline regarding neural networks. We began with reserving a hold-out set for testing which never was touched during the modeling phase. From there, we implemented a k-fold cross validation methodology in order to assess an initial baseline model and various regularization methods. From here, we'll begin to investigate other neural network architectures such as CNNs.