# Create an L-layer neural network (classification)

Since the goal is to learn the details of neural networks, this model is implemented using numpy matrix operations. We also have added l2 and dropout regularization options.

## Model parameters

* X = $n_x$ by m array where training vectors are in columns
* Y = 1 by m array of true class values
* $n_x$ = number of predictors
* m = number of training examples
* L = number of layers
* n = list containing number of neurons per layer, n[0] = $n_x$
* Wi = n[i] by n[i-1] array of model parameters for layer i
* bi = n[i] by i array of bias parameters for layer i
* dWi = n[i] by n[i-1] array of partial derivatives of the cost function with respect to the corresponding parameters of Wi
* dbi = n[i] by i array of partial derivatives of the cost function with respect to the corresponding bias parameters
* regularization = None or 'l2'
* lambd = regularization parameter for l2

In [1]:
import numpy as np

In [2]:
# activation function for output vector
def sigmoid(z):
    return 1/(1+np.exp(-z))

In [3]:
# activation function for hidden layers
def relu(z):
    return np.maximum(0, z)

In [66]:
# derivative of the relu function. We keep the shape of the input array for vectorization purposes. Strictly
# speaking, the total derivative should be a (much larger) diagonal matrix
def der_relu(z):
    return (z>0).astype(int)

In [68]:
def predict(X, parameters):
    '''Predict binary class value given input X and model parameters. The activation function
    for each hidden layer is a ReLU and the output activation function is a sigmoind. Since this 
    function is only used for model predictions, no intermediate values of the model are cached
    or returned to the user.
    
    Input:
        X = n_x by m array of feature vectors
        parameters = dictionary of parameters Wi and bi for the neural network
    
    Output:
        A = 1 by m array of predicted class values.'''
    
    # Calculate the number of layers in the network
    L = len(parameters)//2
    
    # hidden layer computations
    A = X
    for i in range(1, L):
        A = relu(np.dot(parameters['W'+str(i)], A) + parameters['b'+str(i)])
        
    # output layer
    A = sigmoid(np.dot(parameters['W'+str(L)], A) + parameters['b'+str(L)])
    
    # predict class
    A = (A >= 0.5)
    
    return A.astype(int)

In [379]:
def initialize(n):
    '''Initialize model parameters using the input specification. We use He initialization since we have ReLU
    activation functions.
    
    Input:
        n = length L+1 list where n[i] is the number of neurons in layer i
        
    Output:
        parameters = dictionary containing initialized model parameters Wi and bi'''
    
    parameters = dict()
    for i in range(1, len(n)):
        parameters['W'+str(i)] = np.random.randn(n[i], n[i-1])*(2/np.sqrt(n[i-1]))
        parameters['b'+str(i)] = np.zeros((n[i], 1))
        
    return parameters

In [322]:
def propogate(X, Y, parameters, regularization=None):
    '''Perform forward and backward propogation.
    
    Input:
        X = n_x by m array containing training examples
        Y = 1 by m array containing true binary class value for training examples
        parameters = dictionary of model parameters Wi and bi
        
    Output:
        gradient = dictionary of gradient arrays dWi and dbi corresponding to model parameters Wi and bi'''
    
    m = X.shape[1]    # infer the number of training examples from X
    L = len(parameters)//2    # infer the number of layers from the number of parameters
    sum_vec = np.full((m, 1), 1)    # vector used in gradient calculations
    cache = dict()    # cache for results of intermediate steps
    
    # forward propogation
    
    # hidden layer computations
    A = X    # A stores the activation of the previous layer, initializes to training vectors
    cache['A0'] = A
    for i in range(1, L):
        Z = np.dot(parameters['W'+str(i)], A) + parameters['b'+str(i)]    # linear step
        A = relu(Z)    # activation
        cache['Z'+str(i)] = Z    # store intermediate values for use in gradient calcululation
        cache['A'+str(i)] = A
        
    # output layer
    Z = np.dot(parameters['W'+str(L)], A) + parameters['b'+str(L)]    # linear step
    A = sigmoid(Z)    # activation
    cache['Z'+str(L)] = Z    # store intermediate values for use in gradient calculation
    cache['A'+str(L)] = A
    
    #backward propogation
    gradient = dict()
    
    # output layer
    dA = (1/m)*(A-Y)    # tracking matrix for chain rule
    gradient['dW'+str(L)] = np.dot(dA, cache['A'+str(L-1)].T)    # dWL
    gradient['db'+str(L)] = np.dot(dA, sum_vec)    # dbL
    
    # hidden layers
    for i in reversed(range(1, L)):
        dA = np.dot(dA.T, parameters['W'+str(i+1)])*der_relu(cache['Z'+str(i)].T)    # update dA. See page 3 of notes for derivation/proof
        dA = dA.T
        gradient['dW'+str(i)] = np.dot(dA, cache['A'+str(i-1)].T)    # dWi
        gradient['db'+str(i)] = np.dot(dA, sum_vec)    # dbi
    
    return gradient

In [424]:
def cost(Yhat, Y, eps=0.000000001, regularization=None, lambd=0, parameters=None):
    '''Calculate cost of prediction Yhat. The small value eps is used to prevent log(0) error.
    
    Input:
        Yhat = 1 by m array of class probabilities
        Y = 1 by m array of true binary class
        eps = small value to prevent log(0) error
        regularization = None or l2
        lambd = l2 regularization constant
        parameters = dictionary of model parameters
        
    Output:
        cost = cost of the given prediction Yhat'''
    
    m = Yhat.shape[1]    # infer the number of training examples
    
    if regularization == 'l2':
        L = len(parameters)//2   # infer the number of layers
        l2_sum = 0   # sum squares of parameters
        for i in range(1, L+1):
            l2_sum += np.sum(parameters['W'+str(i)]**2)
        return -(1/m)*np.sum(Y*np.log(Yhat + eps)+(1-Y)*np.log(1-Yhat + eps)) + (lambd/(2*m))*l2_sum
    else:
        return -(1/m)*np.sum(Y*np.log(Yhat + eps)+(1-Y)*np.log(1-Yhat + eps))

In [449]:
def fit(X, Y, parameters, learning_rate=0.01, iterations=2000, regularization=None, lambd=0, print_cost=False):
    
    L = len(parameters)//2
    m = X.shape[0]
    
    for i in range(iterations):
        gradient = propogate(X, Y, parameters, regularization=regularization)
        
        # print cost every 1000 iterations
        if (i%1000 == 0) and (print_cost == True):
            current_cost = cost(predict(X, parameters), Y, regularization=regularization, lambd=lambd, parameters=parameters)
            print('Cost after {} iterations: {:.12f}'.format(i, current_cost))
        
        # update parameters
        if regularization == 'l2':
            for i in range(1, L+1):
                parameters['W'+str(i)] = (1-lambd/m)*parameters['W'+str(i)] - learning_rate*gradient['dW'+str(i)]
                parameters['b'+str(i)] = parameters['b'+str(i)] - learning_rate*gradient['db'+str(i)]
        else:
            for i in range(1, L+1):
                parameters['W'+str(i)] = parameters['W'+str(i)] - learning_rate*gradient['dW'+str(i)]
                parameters['b'+str(i)] = parameters['b'+str(i)] - learning_rate*gradient['db'+str(i)]
            
    return parameters

In [310]:
def L_layer_model(X_train, Y_train, X_test, Y_test, n, learning_rate=0.01, iterations=2000, regularization=None, lambd=0, print_cost=False):
    
    # initialize model parameters
    parameters = initialize(n)
    
    # fit model
    parameters = fit(X_train, Y_train, parameters, learning_rate=learning_rate, iterations=iterations, regularization=regularization, lambd=lambd, print_cost=print_cost)
    
    train_predictions = predict(X_train, parameters)
    train_accuracy = 100-np.average(np.abs(Y_train-train_predictions))*100
    
    test_predictions = predict(X_test, parameters)
    test_accuracy = 100-np.average(np.abs(Y_test-test_predictions))*100
    
    print('Training set accuracy: {:.4f}%'.format(train_accuracy))
    print('Test set accuracy: {:.4f}%'.format(test_accuracy))
    
    return parameters

## Test the functions with a random input

In [427]:
# Test
X = np.random.randn(4,5)
Y = np.array([[1, 1, 0, 0, 1]])
n = [4, 6, 5, 4, 3, 7, 2, 1]

In [428]:
parameters = initialize(n)

In [429]:
relu(X)

array([[0.        , 0.32929271, 0.        , 0.        , 0.        ],
       [0.        , 1.00885492, 0.        , 0.0689359 , 0.        ],
       [0.97381624, 0.        , 0.        , 0.28006397, 0.32178048],
       [0.        , 0.68706224, 0.        , 0.48878788, 0.05322104]])

In [430]:
sigmoid(X)

array([[0.38588934, 0.58158727, 0.39221099, 0.37108738, 0.46023881],
       [0.34169732, 0.732796  , 0.25808524, 0.51722715, 0.30479082],
       [0.7258795 , 0.15544154, 0.35515476, 0.56956191, 0.57975811],
       [0.38838279, 0.66531309, 0.44656568, 0.61982085, 0.51330212]])

In [431]:
predict(X, parameters)

array([[1, 1, 1, 1, 1]])

In [432]:
predict?

In [433]:
gradient = propogate(X, Y, parameters)

In [434]:
parameters['b4'] - 0.001*gradient['db4']

array([[0.],
       [0.],
       [0.]])

In [435]:
parameters = fit(X, Y, parameters, regularization='l2', lambd=0.01)

Cost after 0 iterations: 8.396865708003
Cost after 1000 iterations: 8.290026540589


In [436]:
Yhat = predict(X, parameters)

In [437]:
cost(Yhat, Y, regularization='l2', lambd=0.01, parameters=parameters)

8.28931115660599

In [440]:
parameters = L_layer_model(X, Y, X, Y, n, learning_rate=0.5, iterations=20000, regularization='l2', lambd=0.1)

Cost after 0 iterations: 9.226144710205
Cost after 1000 iterations: 4.243249766608
Cost after 2000 iterations: 8.289306334179
Cost after 3000 iterations: 8.289306334179
Cost after 4000 iterations: 8.289306334179
Cost after 5000 iterations: 8.289306334179
Cost after 6000 iterations: 8.289306334179
Cost after 7000 iterations: 8.289306334179
Cost after 8000 iterations: 8.289306334179
Cost after 9000 iterations: 8.289306334179
Cost after 10000 iterations: 8.289306334179
Cost after 11000 iterations: 8.289306334179
Cost after 12000 iterations: 8.289306334179
Cost after 13000 iterations: 8.289306334179
Cost after 14000 iterations: 8.289306334179
Cost after 15000 iterations: 8.289306334179
Cost after 16000 iterations: 8.289306334179
Cost after 17000 iterations: 8.289306334179
Cost after 18000 iterations: 8.289306334179
Cost after 19000 iterations: 8.289306334179
Training set accuracy: 60.0000%
Test set accuracy: 60.0000%


In [441]:
parameters['W1']

array([[ 6.54593054e-210, -2.57532995e-210, -2.60437688e-210,
        -4.78209990e-210],
       [-2.54282026e-209, -2.89467025e-209, -4.91050848e-209,
        -2.99867850e-209],
       [ 1.40403499e-209, -5.52505015e-210, -5.58606812e-210,
        -1.02583736e-209],
       [ 1.47818700e-220,  3.99197220e-222,  8.84529653e-222,
        -5.14954609e-221],
       [-2.26273085e-209, -2.63616066e-209, -4.14555713e-209,
        -2.69758290e-209],
       [ 2.46643797e-209, -9.70424321e-210, -9.81406194e-210,
        -1.80184342e-209]])

## Build an 'or' network

Test the L layer model by building a simple network to represent an or gate.

In [450]:
X = np.array([[1, 1, 0, 0], [1, 0, 1, 0]])
Y = np.array([[1, 1, 1, 0]])
n = [2, 4, 1]

In [451]:
# model with single hidden layer of four neurons
parameters = L_layer_model(X, Y, X, Y, n, learning_rate=0.9, iterations=2000, regularization='l2', lambd=0.1, print_cost=True)

Cost after 0 iterations: 5.415018940264
Cost after 1000 iterations: 0.092390400668
Training set accuracy: 100.0000%
Test set accuracy: 100.0000%


In [452]:
# add a second hidden layer
n = [2, 4, 2, 1]
parameters = L_layer_model(X, Y, X, Y, n, learning_rate=0.1, iterations=20000, print_cost=False)

Training set accuracy: 100.0000%
Test set accuracy: 100.0000%


In [455]:
# Simple networks are best in this situation
n = [2, 2, 1]
parameters = L_layer_model(X, Y, X, Y, n, learning_rate=0.1, iterations=20000, regularization='l2', lambd=0.01)

Training set accuracy: 100.0000%
Test set accuracy: 100.0000%


In [457]:
predict(np.array([[1], [0]]), parameters)

array([[1]])

After running the above tests successfully, it appears that the L layer model is functional.

## Test the L layer model on interview data set

We import an interview data set to investigate the performance of an L layer neural network on a larger data set with more predictors.

In [346]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

In [459]:
train_location = '' # insert file path here

train_df = pd.read_csv(train_location)

In [460]:
# Clean data. Details can be found in other notebook
train_df = train_df.dropna()

# Correct days so that all are spelled out
train_df['x35'] = train_df['x35'].map(lambda x: 'wednesday' if x=='wed' else 'thursday' if (x=='thur' or x=='thurday') else 'friday' if x=='fri' else x)

# Correct sept. to Sept and Dev to Dec in column x68
train_df['x68'] = train_df['x68'].map(lambda x: 'Jan' if x=='January' else 'Sept' if x=='sept.' else 'Dec' if x=='Dev' else x)

# Transform columns x34, x35, x68, and x93 to dummy variables
train_df = pd.get_dummies(train_df, columns=['x34', 'x35', 'x68', 'x93'])

# Transform columns x41 and x45 to floats
train_df['x41'] = train_df['x41'].map(lambda x: x.lstrip('$'))
train_df['x41'] = pd.to_numeric(train_df['x41'])

train_df['x45'] = train_df['x45'].map(lambda x: x.rstrip('%'))
train_df['x45'] = pd.to_numeric(train_df['x45'])

# Take the first 1000 samples for some quick tests
train_df = train_df.iloc[:1000,:]
train_df.shape

(1000, 127)

Note that there are 127 data columns in the data set. The column labeled 'y' contains the true binary class of each training example.

In [461]:
# split data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(train_df.drop('y', axis=1), (train_df['y']), test_size=0.2, random_state=42)

# scale data using the standard scaler in sklearn
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Get numpy columns for Y
Y_train = Y_train.values.reshape(len(Y_train),1)
Y_test = Y_test.values.reshape(len(Y_test),1)

In [462]:
X_train.shape

(800, 126)

Run model with one hidden layer of three neurons.

In [464]:
# note that samples are in rows of train_df, so the transposes are fed into the models
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 3, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.01, print_cost=True)

Cost after 0 iterations: 17.174500831276
Cost after 1000 iterations: 0.725473019850
Cost after 2000 iterations: 0.621933756145
Cost after 3000 iterations: 0.596062268569
Cost after 4000 iterations: 0.596077053197
Cost after 5000 iterations: 0.596083724827
Cost after 6000 iterations: 0.596086945799
Cost after 7000 iterations: 0.596087305082
Cost after 8000 iterations: 0.596086729450
Cost after 9000 iterations: 0.440663387838
Cost after 10000 iterations: 0.440663100800
Cost after 11000 iterations: 0.414758287677
Cost after 12000 iterations: 0.414757619184
Cost after 13000 iterations: 0.388852851078
Cost after 14000 iterations: 0.337044893018
Cost after 15000 iterations: 0.311142009118
Cost after 16000 iterations: 0.311142667390
Cost after 17000 iterations: 0.311142842948
Cost after 18000 iterations: 0.311142726611
Cost after 19000 iterations: 0.311142472852
Cost after 20000 iterations: 0.311142135066
Cost after 21000 iterations: 0.311141810185
Cost after 22000 iterations: 0.311141524736


Results
* Training accuracy = 98.5%
* Test accuracy = 84.5%
We appear to be overfitting the model to the data. We can try increasing the regularization to prevent this problem.

In [466]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 3, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.5, print_cost=True)

Cost after 0 iterations: 5.470268891816
Cost after 1000 iterations: 1.271012331030
Cost after 2000 iterations: 1.141639926927
Cost after 3000 iterations: 0.960348981523
Cost after 4000 iterations: 1.012160190017
Cost after 5000 iterations: 1.012166777467
Cost after 6000 iterations: 1.012167700083
Cost after 7000 iterations: 1.038072339403
Cost after 8000 iterations: 1.038065651807
Cost after 9000 iterations: 1.038066926702
Cost after 10000 iterations: 1.038067147211
Cost after 11000 iterations: 1.038067082628
Cost after 12000 iterations: 1.038067374484
Cost after 13000 iterations: 1.038067305621
Cost after 14000 iterations: 1.038067353607
Cost after 15000 iterations: 1.038067585201
Cost after 16000 iterations: 1.038067236351
Cost after 17000 iterations: 1.038067399130
Cost after 18000 iterations: 1.038067140730
Cost after 19000 iterations: 1.038067292885
Cost after 20000 iterations: 1.038067564015
Cost after 21000 iterations: 1.038067381729
Cost after 22000 iterations: 1.038067547840
C

Results indicate that increasing the value of lambd reduces the amount of variance.

In [471]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 3, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.6, print_cost=True)

Cost after 0 iterations: 8.218089347854
Cost after 1000 iterations: 1.478261382085
Cost after 2000 iterations: 1.581859493972
Cost after 3000 iterations: 1.581828718192
Cost after 4000 iterations: 1.581813130745
Cost after 5000 iterations: 1.581816937729
Cost after 6000 iterations: 1.581815329419
Cost after 7000 iterations: 1.581811210604
Cost after 8000 iterations: 1.581811317183
Cost after 9000 iterations: 1.581811568895
Cost after 10000 iterations: 1.581811607957
Cost after 11000 iterations: 1.555907457306
Cost after 12000 iterations: 1.555907716240
Cost after 13000 iterations: 1.581812039888
Cost after 14000 iterations: 1.581811839860
Cost after 15000 iterations: 1.555907672412
Cost after 16000 iterations: 1.555907825443
Cost after 17000 iterations: 1.555907793317
Cost after 18000 iterations: 1.555907786853
Cost after 19000 iterations: 1.555907804617
Cost after 20000 iterations: 1.581811992176
Cost after 21000 iterations: 1.555907859592
Cost after 22000 iterations: 1.581811751734
C

Results indicate that increasing the value of lambd further does not necessarily improve performance any more.

We can try models with more hidden layers/neurons.

In [472]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 6, 3, 1], learning_rate=0.1, iterations=30000)

Training set accuracy: 97.6250%
Test set accuracy: 86.0000%


In [475]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 3, 1], learning_rate=0.1, iterations=30000)

Training set accuracy: 94.5000%
Test set accuracy: 80.0000%


In [476]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 3, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.05)

Training set accuracy: 100.0000%
Test set accuracy: 89.5000%


Let's compare the neural networks to simple logistic regression.

In [477]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 1], learning_rate=0.01, iterations=30000)

Training set accuracy: 91.8750%
Test set accuracy: 90.0000%


In [478]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 1], learning_rate=0.1, iterations=30000)

Training set accuracy: 92.2500%
Test set accuracy: 89.5000%


In [479]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 1], learning_rate=1, iterations=30000)

Training set accuracy: 92.6250%
Test set accuracy: 88.0000%


In [481]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 1], learning_rate=0.005, iterations=30000)

Training set accuracy: 91.6250%
Test set accuracy: 90.5000%


These results indicate that neural networks don't have performance gains over simple logistic regression when the amount of available data is small.

Increase the amount of data to 10000 samples to evaluate performance.

In [482]:
train_location = '/Users/connorodell/Documents/Data_Science/learning/exercise_03_train.csv'

train_df = pd.read_csv(train_location)

In [483]:
# Clean data. Details can be found in other notebook
train_df = train_df.dropna()

# Correct days so that all are spelled out
train_df['x35'] = train_df['x35'].map(lambda x: 'wednesday' if x=='wed' else 'thursday' if (x=='thur' or x=='thurday') else 'friday' if x=='fri' else x)

# Correct sept. to Sept and Dev to Dec in column x68
train_df['x68'] = train_df['x68'].map(lambda x: 'Jan' if x=='January' else 'Sept' if x=='sept.' else 'Dec' if x=='Dev' else x)

# Transform columns x34, x35, x68, and x93 to dummy variables
train_df = pd.get_dummies(train_df, columns=['x34', 'x35', 'x68', 'x93'])

# Transform columns x41 and x45 to floats
train_df['x41'] = train_df['x41'].map(lambda x: x.lstrip('$'))
train_df['x41'] = pd.to_numeric(train_df['x41'])

train_df['x45'] = train_df['x45'].map(lambda x: x.rstrip('%'))
train_df['x45'] = pd.to_numeric(train_df['x45'])

train_df = train_df.iloc[:10000,:]
train_df.shape

(10000, 127)

In [484]:
# split data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(train_df.drop('y', axis=1), (train_df['y']), test_size=0.2, random_state=42)

# scale data using the standard scaler in sklearn
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Get numpy columns for Y
Y_train = Y_train.values.reshape(len(Y_train),1)
Y_test = Y_test.values.reshape(len(Y_test),1)

In [485]:
X_train.shape

(8000, 126)

In [486]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 3, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.01)

Training set accuracy: 99.8000%
Test set accuracy: 97.2000%


In [38]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 3, 1], learning_rate=0.1, iterations=200000)

Model accuracy: 97.2000%


With more data, test accuracy has improved on the order of 10%.

We can see if the logistic regression model has similary performance gains with more data.

In [487]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 1], learning_rate=0.1, iterations=30000)

Training set accuracy: 89.0625%
Test set accuracy: 89.1500%


Results indicate that more data does not improve the logistic regression model. We can interpret this to mean that the model has too much bias.

Now we test the models with the full data set.

In [488]:
train_location = '/Users/connorodell/Documents/Data_Science/learning/exercise_03_train.csv'

train_df = pd.read_csv(train_location)

In [489]:
# Clean data. Details can be found in other notebook
train_df = train_df.dropna()

# Correct days so that all are spelled out
train_df['x35'] = train_df['x35'].map(lambda x: 'wednesday' if x=='wed' else 'thursday' if (x=='thur' or x=='thurday') else 'friday' if x=='fri' else x)

# Correct sept. to Sept and Dev to Dec in column x68
train_df['x68'] = train_df['x68'].map(lambda x: 'Jan' if x=='January' else 'Sept' if x=='sept.' else 'Dec' if x=='Dev' else x)

# Transform columns x34, x35, x68, and x93 to dummy variables
train_df = pd.get_dummies(train_df, columns=['x34', 'x35', 'x68', 'x93'])

# Transform columns x41 and x45 to floats
train_df['x41'] = train_df['x41'].map(lambda x: x.lstrip('$'))
train_df['x41'] = pd.to_numeric(train_df['x41'])

train_df['x45'] = train_df['x45'].map(lambda x: x.rstrip('%'))
train_df['x45'] = pd.to_numeric(train_df['x45'])

#train_df = train_df.iloc[:10000,:]
#train_df.shape

In [490]:
# split data into train/test sets
X_train, X_test, Y_train, Y_test = train_test_split(train_df.drop('y', axis=1), (train_df['y']), test_size=0.2, random_state=42)

# scale data using the standard scaler in sklearn
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Get numpy columns for Y
Y_train = Y_train.values.reshape(len(Y_train),1)
Y_test = Y_test.values.reshape(len(Y_test),1)

Test various hyperparameters to evaluate relative performance.

In [493]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 3, 1], learning_rate=0.1, iterations=2000, regularization='l2', lambd=0.01, print_cost=True)

Cost after 0 iterations: 13.580543559158
Cost after 1000 iterations: 0.893377214231
Training set accuracy: 98.4823%
Test set accuracy: 97.0157%


In [494]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=2000, print_cost=True)

Cost after 0 iterations: 9.930490139564
Cost after 1000 iterations: 1.192002408580
Training set accuracy: 97.9594%
Test set accuracy: 96.3398%


In [397]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.01)

Cost after 0 iterations: 13.846125549303
Cost after 1000 iterations: 1.180769569909
Cost after 2000 iterations: 0.399096149436
Cost after 3000 iterations: 0.171796355148
Cost after 4000 iterations: 0.100434791825
Cost after 5000 iterations: 0.075986848835
Cost after 6000 iterations: 0.058807213220
Cost after 7000 iterations: 0.049556640197
Cost after 8000 iterations: 0.040306067173
Cost after 9000 iterations: 0.035020025446
Cost after 10000 iterations: 0.031055494150
Cost after 11000 iterations: 0.027751718070
Cost after 12000 iterations: 0.025108697206
Cost after 13000 iterations: 0.018501145047
Cost after 14000 iterations: 0.015858124183
Cost after 15000 iterations: 0.013215103319
Cost after 16000 iterations: 0.007268306375
Cost after 17000 iterations: 0.004625285512
Cost after 18000 iterations: 0.003964530296
Cost after 19000 iterations: 0.002643019864
Cost after 20000 iterations: 0.001982264648
Cost after 21000 iterations: 0.000660754216
Cost after 22000 iterations: 0.000660754216


In [398]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 10, 10, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.01)

Cost after 0 iterations: 4.545995884759
Cost after 1000 iterations: 0.905234644856
Cost after 2000 iterations: 0.291393049235
Cost after 3000 iterations: 0.117614427440
Cost after 4000 iterations: 0.066075520595
Cost after 5000 iterations: 0.042288332821
Cost after 6000 iterations: 0.031055494150
Cost after 7000 iterations: 0.023126431558
Cost after 8000 iterations: 0.015858124183
Cost after 9000 iterations: 0.011893592887
Cost after 10000 iterations: 0.005946795944
Cost after 11000 iterations: 0.003303775080
Cost after 12000 iterations: 0.002643019864
Cost after 13000 iterations: 0.001982264648
Cost after 14000 iterations: 0.001982264648
Cost after 15000 iterations: 0.001321509432
Cost after 16000 iterations: 0.000660754216
Cost after 17000 iterations: -0.000000001000
Cost after 18000 iterations: -0.000000001000
Cost after 19000 iterations: -0.000000001000
Cost after 20000 iterations: -0.000000001000
Cost after 21000 iterations: -0.000000001000
Cost after 22000 iterations: -0.00000000

In [399]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.05)

Cost after 0 iterations: 4.869765940577
Cost after 1000 iterations: 0.807442872895
Cost after 2000 iterations: 0.227299793288
Cost after 3000 iterations: 0.124221979599
Cost after 4000 iterations: 0.099113281393
Cost after 5000 iterations: 0.090523463586
Cost after 6000 iterations: 0.089862708370
Cost after 7000 iterations: 0.089201953154
Cost after 8000 iterations: 0.087880442722
Cost after 9000 iterations: 0.087219687506
Cost after 10000 iterations: 0.086558932290
Cost after 11000 iterations: 0.085237421858
Cost after 12000 iterations: 0.082594400994
Cost after 13000 iterations: 0.081933645778
Cost after 14000 iterations: 0.081933645778
Cost after 15000 iterations: 0.082594400994
Cost after 16000 iterations: 0.081933645778
Cost after 17000 iterations: 0.080612135346
Cost after 18000 iterations: 0.079290624914
Cost after 19000 iterations: 0.079290624914
Cost after 20000 iterations: 0.079290624914
Cost after 21000 iterations: 0.078629869698
Cost after 22000 iterations: 0.077969114483
C

In [400]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.05)

Cost after 0 iterations: 16.477252819230
Cost after 1000 iterations: 0.954791286053
Cost after 2000 iterations: 0.243157918471
Cost after 3000 iterations: 0.116292917008
Cost after 4000 iterations: 0.099113281393
Cost after 5000 iterations: 0.094487994881
Cost after 6000 iterations: 0.089201953154
Cost after 7000 iterations: 0.087219687506
Cost after 8000 iterations: 0.086558932290
Cost after 9000 iterations: 0.085898177074
Cost after 10000 iterations: 0.084576666642
Cost after 11000 iterations: 0.085237421858
Cost after 12000 iterations: 0.084576666642
Cost after 13000 iterations: 0.083915911426
Cost after 14000 iterations: 0.082594400994
Cost after 15000 iterations: 0.082594400994
Cost after 16000 iterations: 0.082594400994
Cost after 17000 iterations: 0.081933645778
Cost after 18000 iterations: 0.081933645778
Cost after 19000 iterations: 0.081933645778
Cost after 20000 iterations: 0.081933645778
Cost after 21000 iterations: 0.081272890562
Cost after 22000 iterations: 0.081272890562


In [401]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.05)

Cost after 0 iterations: 9.189122787264
Cost after 1000 iterations: 0.864928576683
Cost after 2000 iterations: 0.228621303720
Cost after 3000 iterations: 0.122239713951
Cost after 4000 iterations: 0.101095547041
Cost after 5000 iterations: 0.092505729233
Cost after 6000 iterations: 0.089862708370
Cost after 7000 iterations: 0.089862708370
Cost after 8000 iterations: 0.088541197938
Cost after 9000 iterations: 0.087880442722
Cost after 10000 iterations: 0.088541197938
Cost after 11000 iterations: 0.087219687506
Cost after 12000 iterations: 0.087219687506
Cost after 13000 iterations: 0.085898177074
Cost after 14000 iterations: 0.086558932290
Cost after 15000 iterations: 0.084576666642
Cost after 16000 iterations: 0.083255156210
Cost after 17000 iterations: 0.083255156210
Cost after 18000 iterations: 0.083915911426
Cost after 19000 iterations: 0.083915911426
Cost after 20000 iterations: 0.084576666642
Cost after 21000 iterations: 0.084576666642
Cost after 22000 iterations: 0.083915911426
C

In [402]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.06)

Cost after 0 iterations: 6.873175755347
Cost after 1000 iterations: 0.641593313691
Cost after 2000 iterations: 0.180386172955
Cost after 3000 iterations: 0.118275182656
Cost after 4000 iterations: 0.103738567905
Cost after 5000 iterations: 0.101095547041
Cost after 6000 iterations: 0.096470260529
Cost after 7000 iterations: 0.095809505313
Cost after 8000 iterations: 0.095809505313
Cost after 9000 iterations: 0.095148750097
Cost after 10000 iterations: 0.094487994881
Cost after 11000 iterations: 0.091844974018
Cost after 12000 iterations: 0.091184218802
Cost after 13000 iterations: 0.090523463586
Cost after 14000 iterations: 0.090523463586
Cost after 15000 iterations: 0.091184218802
Cost after 16000 iterations: 0.091184218802
Cost after 17000 iterations: 0.091184218802
Cost after 18000 iterations: 0.089862708370
Cost after 19000 iterations: 0.089201953154
Cost after 20000 iterations: 0.088541197938
Cost after 21000 iterations: 0.088541197938
Cost after 22000 iterations: 0.088541197938
C

In [403]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.07)

Cost after 0 iterations: 12.787595693346
Cost after 1000 iterations: 0.708990345718
Cost after 2000 iterations: 0.179725417739
Cost after 3000 iterations: 0.119596693088
Cost after 4000 iterations: 0.109024609632
Cost after 5000 iterations: 0.107042343984
Cost after 6000 iterations: 0.105720833553
Cost after 7000 iterations: 0.103738567905
Cost after 8000 iterations: 0.102417057473
Cost after 9000 iterations: 0.101095547041
Cost after 10000 iterations: 0.101756302257
Cost after 11000 iterations: 0.101756302257
Cost after 12000 iterations: 0.100434791825
Cost after 13000 iterations: 0.100434791825
Cost after 14000 iterations: 0.100434791825
Cost after 15000 iterations: 0.099774036609
Cost after 16000 iterations: 0.099113281393
Cost after 17000 iterations: 0.099774036609
Cost after 18000 iterations: 0.101756302257
Cost after 19000 iterations: 0.101095547041
Cost after 20000 iterations: 0.101095547041
Cost after 21000 iterations: 0.101095547041
Cost after 22000 iterations: 0.101095547041


In [404]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.08)

Cost after 0 iterations: 16.392676151588
Cost after 1000 iterations: 0.700400527911
Cost after 2000 iterations: 0.171135599932
Cost after 3000 iterations: 0.124221979599
Cost after 4000 iterations: 0.119596693088
Cost after 5000 iterations: 0.117614427440
Cost after 6000 iterations: 0.112989140928
Cost after 7000 iterations: 0.112989140928
Cost after 8000 iterations: 0.111006875280
Cost after 9000 iterations: 0.110346120064
Cost after 10000 iterations: 0.109024609632
Cost after 11000 iterations: 0.109024609632
Cost after 12000 iterations: 0.108363854416
Cost after 13000 iterations: 0.107042343984
Cost after 14000 iterations: 0.106381588768
Cost after 15000 iterations: 0.105720833553
Cost after 16000 iterations: 0.105720833553
Cost after 17000 iterations: 0.105060078337
Cost after 18000 iterations: 0.105060078337
Cost after 19000 iterations: 0.105720833553
Cost after 20000 iterations: 0.105720833553
Cost after 21000 iterations: 0.105060078337
Cost after 22000 iterations: 0.105060078337


In [405]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.09)

Cost after 0 iterations: 15.112132543070
Cost after 1000 iterations: 0.770440580802
Cost after 2000 iterations: 0.175100131228
Cost after 3000 iterations: 0.129508021327
Cost after 4000 iterations: 0.124221979599
Cost after 5000 iterations: 0.122239713951
Cost after 6000 iterations: 0.118275182656
Cost after 7000 iterations: 0.116292917008
Cost after 8000 iterations: 0.114310651360
Cost after 9000 iterations: 0.114310651360
Cost after 10000 iterations: 0.113649896144
Cost after 11000 iterations: 0.113649896144
Cost after 12000 iterations: 0.111667630496
Cost after 13000 iterations: 0.113649896144
Cost after 14000 iterations: 0.114310651360
Cost after 15000 iterations: 0.111667630496
Cost after 16000 iterations: 0.111006875280
Cost after 17000 iterations: 0.111667630496
Cost after 18000 iterations: 0.111006875280
Cost after 19000 iterations: 0.111006875280
Cost after 20000 iterations: 0.111667630496
Cost after 21000 iterations: 0.111006875280
Cost after 22000 iterations: 0.111667630496


In [406]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 63, 6, 1], learning_rate=0.1, iterations=30000, regularization='l2', lambd=0.1)

Cost after 0 iterations: 11.098705361369
Cost after 1000 iterations: 0.654147662794
Cost after 2000 iterations: 0.175100131228
Cost after 3000 iterations: 0.136776328702
Cost after 4000 iterations: 0.134133307839
Cost after 5000 iterations: 0.129508021327
Cost after 6000 iterations: 0.129508021327
Cost after 7000 iterations: 0.128847266111
Cost after 8000 iterations: 0.129508021327
Cost after 9000 iterations: 0.128186510895
Cost after 10000 iterations: 0.127525755679
Cost after 11000 iterations: 0.127525755679
Cost after 12000 iterations: 0.127525755679
Cost after 13000 iterations: 0.126204245247
Cost after 14000 iterations: 0.126204245247
Cost after 15000 iterations: 0.125543490031
Cost after 16000 iterations: 0.124882734815
Cost after 17000 iterations: 0.125543490031
Cost after 18000 iterations: 0.125543490031
Cost after 19000 iterations: 0.124882734815
Cost after 20000 iterations: 0.125543490031
Cost after 21000 iterations: 0.124882734815
Cost after 22000 iterations: 0.126204245247


Compare to simple linear regression on the full data set.

In [492]:
params = L_layer_model(X_train.T, Y_train.T, X_test.T, Y_test.T, n=[126, 1], learning_rate=0.1, iterations=30000)

Training set accuracy: 89.0285%
Test set accuracy: 89.1468%


Again, simple linear regression sees no performance gains from additional data.