<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
- **Input Layer:**
- **Hidden Layer:**
- **Output Layer:**
- **Activation:**
- **Backpropagation:**


### Neuron: 
A neuron is the basic unit of a neural network. In Artificial Neural Networks the neurons or "nodes" receive inputs and pass on their signal to the next layer of nodes if a certain threshold is reached. What a neuron does is it takes each of the input values, multplies each of them by a weight, sums all of these products up, and then passes the sum through what is called an "activation function" the result of which is the final value.

### Input Layer: 
The input layer of a neural network consists of units (or input neurons) that you assign a value to. It doesn’t apply any operations on the input signals(values) & has no weights and biases values associated.

### Hidden Layer: 
Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them. The values from the input layer are passed to the neuron(s) of the hidden layer, which apply different transformations to the input data. These transformed values are then passed on to the output layer. All the neurons in a hidden layer are connected to each and every neuron in the next layer, hence we have fully connected hidden layers.

### Output Layer: 
The final layer is called the Output Layer. The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address. Typically the output value is modified by an "activation function" to transform it into a format that makes sense for its context.

### Activation Function: 
In Neural Networks, each node has an activation function. Each node in a given layer typically has the same activation function. The activation function decides whether a cell "fires" or not (or, is 'activated' or not). n Artificial Neural Networks activation functions decide how much signal to pass onto the next layer. This is why they are sometimes referred to as transfer functions because they determine how much signal is transferred to the next layer. Activation functions are used to introduce non-linearity to neural networks. It squashes the values into a smaller range.

### Backpropagation
Backpropagation is a supervised learning technique for neural networks that calculates the gradient of descent for weighting different variables. It stands for backward propagation of errors, since the error is computed at the output and distributed backwards throughout the network’s layers. The weights of the neurons in the hidden layers are adjusted up or down, accordingly, to attempt to reduce loss on subsequent feed-forwards through the network.

## 2. Perceptron on NAND Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [166]:
import numpy as np
X = np.array([[1,1,1], [1,0,1], [0,1,1], [0,0,1]])
y = np.array([1,0,0,0])#.reshape(-1,1)

X.shape, y.shape

((4, 3), (4,))

In [167]:
class Perceptron(object):
    def __init__(self, l_rate, epochs):
        self.l_rate = l_rate
        self.epochs = epochs

    def train_weights(self, X, y):
        #initialize weights and bias w/ value = 0
        weights = [0.0 for i in range(len(X[0]))]
        bias = 0.0
    
        #initialize min_error as inifinite
        min_error = float("inf")
        #iterate n_epoch times:
        for epoch in range(self.epochs):
            #initialize sum squared error for this iteration as 0
            sum_error = 0.0
        
            #iterate thru each row in X_train
            for i in range(len(X)):
                row = X[i]
            
                #call the predict function to get the predicted outcome for each row
                prediction = self.predict(row, weights, bias)
            
                #calc the error
                error = y[i] - prediction
            
                #update the sum squared error for this iteration w/ current weights and bias
                sum_error += error**2
            
                #update the bias
                bias += self.l_rate * error
                #iterate thru each feature and update the corresponding weight:
                for j in range(len(row)):
                    weights[j] += self.l_rate * error * row[j]
            #if the sum squared error is less than prev minimum, store the weights, bias, and min_error in        
            if sum_error < min_error:
                min_error = sum_error
                best = [weights, bias]#, min_error]
                best_epoch = epoch
        return best

    def predict(self, row, weights, bias):
        activation = bias
        for i in range(len(row)):
            activation += weights[i] * row[i]
        return 1.0 if activation >= 0.0 else 0.0
    
    def get_predictions(self, X, y):
        best = self.train_weights(X, y)
        weights, bias = best[0], best[1]
        y_pred = []
        for i in range(len(X)):
            y_pred.append(int(self.predict(X[i], weights, bias)))
        return np.array(y_pred)

In [168]:
#intstantiate the perceptron
p = Perceptron(l_rate=0.1, epochs=6)

#get predictions from just 6 epochs
y_pred = p.get_predictions(X, y)

In [169]:
y_pred

array([1, 0, 0, 0])

In [170]:
#verify accuracy of predictions:
y == y_pred

array([ True,  True,  True,  True])

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [171]:
class NeuralNetwork: 
    def __init__(self, inputs, hiddenNodes, outputNodes):
        # Set upArchietecture 
        self.inputs = inputs
        self.hiddenNodes = hiddenNodes
        self.outputNodes = outputNodes
        
        #Initial weights
        self.weights1 = np.random.randn(self.inputs, self.hiddenNodes) 
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes) 
    
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        #derivative of the sigmoid
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        """
        
        #Weighted sume of inputs and hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        #Acivations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        #Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        return self.activated_output
    
    def backward(self, X, y, o):
        """
        Backward propagate through the network
        """
        self.o_error = y - o #error in output
        self.o_delta = self.o_error * self.sigmoidPrime(o) # apply derivative of sigmoid to error
        
        self.z2_error = self.o_delta.dot(self.weights2.T) # z2 error: how much our hidden layer weights were off
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 += X.T.dot(self.z2_delta) #Adjust first set (input => hidden) weights
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) #adjust second set (hidden => output) weights
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)
        
    def predict(self, X):
        output = self.feed_forward(X)
        pred = []
        for o in output:
            pred.append([1]) if o >= 0.5 else pred.append([0])
        return np.array(pred)

In [172]:
import pandas as pd
url = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [173]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

X = df.drop('target', axis=1).values.astype('float32')
y = df['target'].values.reshape(-1,1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((242, 13), (61, 13), (242, 1), (61, 1))

In [174]:
#normalize the X matrix

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [175]:
#instantiate the neural network
nn = NeuralNetwork(inputs=13, hiddenNodes=7, outputNodes=1)

#train the nn:
epochs = 1000
for i in range(epochs):
    nn.train(X_train, y_train)

In [176]:
from sklearn.metrics import accuracy_score, precision_score, recall_score

def get_accuracy(y_true, y_pred):
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    print('Accuracy: ' + str(round(accuracy*100, 3)))
    print('Precision: ' + str(round(precision*100, 3)))
    print('Recall: ' + str(round(recall*100, 3)))
    return accuracy, precision, recall

#generate predictions from the trained nn w/ X_test:
y_pred = nn.predict(X_test)

get_accuracy(y_test, y_pred)

Accuracy: 83.607
Precision: 89.286
Recall: 78.125


(0.8360655737704918, 0.8928571428571429, 0.78125)

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [181]:
import tensorflow
from keras.models import Sequential
from keras.layers import Dense, Dropout
from sklearn.preprocessing import MinMaxScaler
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

## Baseline Model

In [220]:
url = 'https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv'
df = pd.read_csv(url)

X = df.drop('target', axis=1).values.astype('float32')
y = df['target'].values#.reshape(-1,1)

scaler = MinMaxScaler()
X = scaler.fit_transform(X)
X.shape, y.shape

((303, 13), (303,))

In [227]:
#instantiate model
model = Sequential()

#hidden layers
model.add(Dense(6, input_shape=(13,), activation="relu"))
model.add(Dense(3, activation='relu'))

#output layer
model.add(Dense(1, activation='sigmoid'))

#compile the model            
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

#inspect the model summary
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_91 (Dense)             (None, 6)                 84        
_________________________________________________________________
dense_92 (Dense)             (None, 3)                 21        
_________________________________________________________________
dense_93 (Dense)             (None, 1)                 4         
Total params: 109
Trainable params: 109
Non-trainable params: 0
_________________________________________________________________


In [228]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=99)

model.fit(X_train, y_train, epochs=25, verbose=0)

y_pred_proba = model.predict(X_test)

y_pred = []
for pred in y_pred_proba:
    y_pred.append([1]) if pred[0] >= 0.5 else y_pred.append([0])
y_pred = np.array(y_pred)
    
baseline_accuracy = get_accuracy(y_test, y_pred)

Accuracy: 78.261
Precision: 82.143
Recall: 82.143


## Hyperparameter Tuning

In [186]:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

### Hidden Layer Neurons

In [None]:
def create_model(layer1=10, layer2=8):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer='uniform', activation='relu'))
    model.add(Dense(layer2, kernel_initializer='uniform', activation='relu'))
    model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=100, verbose=1)

# define the grid search parameters
layer1 = [3, 6, 9, 12]
layer2 = [2, 5, 8, 11]
param_grid = dict(layer1=layer1, layer2=layer2)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [142]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_layer1,param_layer2,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
15,2.936603,0.033025,0.097828,0.015922,12,11,"{'layer1': 12, 'layer2': 11}",0.623762,0.851485,0.534653,0.669967,0.133408,1
10,4.866275,0.156392,0.64853,0.037524,9,8,"{'layer1': 9, 'layer2': 8}",0.613861,0.792079,0.544554,0.650165,0.104261,2
11,5.574532,0.201089,0.745319,0.041466,9,11,"{'layer1': 9, 'layer2': 11}",0.60396,0.782178,0.564356,0.650165,0.094737,2
12,5.355411,0.307748,0.769404,0.052476,12,2,"{'layer1': 12, 'layer2': 2}",0.564356,0.772277,0.594059,0.643564,0.091818,4
14,4.527716,1.086518,0.596112,0.370829,12,8,"{'layer1': 12, 'layer2': 8}",0.564356,0.792079,0.544554,0.633663,0.112308,5


Best accuracy score using 12 neurons for layer 1 and 11 neurons for layer 2

### Batch Size

In [None]:
layer1 = 12
layer2 = 11

def create_model():
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer='uniform', activation='relu'))
    model.add(Dense(layer2, kernel_initializer='uniform', activation='relu'))
    model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, epochs=100, verbose=1)

# define the grid search parameters
batch_size = [8, 16, 32, 64, 128]
param_grid = dict(batch_size=batch_size)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [188]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_batch_size,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
1,4.781235,0.344072,0.163487,0.018386,16,{'batch_size': 16},0.623762,0.831683,0.544554,0.666667,0.121082,1
0,7.703731,0.339733,0.104259,0.025207,8,{'batch_size': 8},0.633663,0.762376,0.534653,0.643564,0.093231,2
2,3.74889,0.105292,0.1994,0.018264,32,{'batch_size': 32},0.584158,0.811881,0.50495,0.633663,0.130102,3
3,3.247298,0.158231,0.232438,0.01356,64,{'batch_size': 64},0.524752,0.792079,0.0,0.438944,0.329008,4
4,2.788707,0.134544,0.322337,0.05165,128,{'batch_size': 128},0.326733,0.772277,0.0,0.366337,0.316522,5


Batch Size of 16 is the winner

### Initializer

In [None]:
batch_size = 16

def create_model(init_mode='uniform'):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer=init_mode, activation='relu'))
    model.add(Dense(layer2, kernel_initializer=init_mode, activation='relu'))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=batch_size, epochs=100, verbose=1)

# define the grid search parameters
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 
             'he_normal', 'he_uniform']

param_grid = dict(init_mode=init_mode)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [192]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_init_mode,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
4,5.262644,0.099856,0.365401,0.025851,glorot_normal,{'init_mode': 'glorot_normal'},0.722772,0.792079,0.554455,0.689769,0.099777,1
6,5.694395,0.184594,0.504761,0.035258,he_normal,{'init_mode': 'he_normal'},0.693069,0.792079,0.564356,0.683168,0.093231,2
2,4.873749,0.138017,0.201604,0.020263,normal,{'init_mode': 'normal'},0.653465,0.831683,0.564356,0.683168,0.111138,3
1,4.552819,0.045352,0.133764,0.023464,lecun_uniform,{'init_mode': 'lecun_uniform'},0.623762,0.841584,0.574257,0.679868,0.116123,4
5,5.838858,0.249449,0.376984,0.051642,glorot_uniform,{'init_mode': 'glorot_uniform'},0.653465,0.752475,0.613861,0.673267,0.058295,5


Glorot_normal is the winner

### Activation Functions

In [None]:
init_mode = 'glorot_normal'

def create_model(activation='relu'):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer=init_mode, activation=activation))
    model.add(Dense(layer2, kernel_initializer=init_mode, activation=activation))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=batch_size, epochs=100, verbose=1)

# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']

param_grid = dict(activation=activation)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [194]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_activation,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
2,5.65574,0.233078,0.21919,0.018438,softsign,{'activation': 'softsign'},0.653465,0.841584,0.574257,0.689769,0.112114,1
4,5.428058,0.017549,0.343883,0.018,tanh,{'activation': 'tanh'},0.643564,0.782178,0.594059,0.673267,0.079619,2
7,7.178172,0.034478,0.613458,0.025152,linear,{'activation': 'linear'},0.633663,0.811881,0.574257,0.673267,0.100971,2
1,5.090239,0.010475,0.143422,0.016271,softplus,{'activation': 'softplus'},0.60396,0.831683,0.544554,0.660066,0.123751,4
3,5.323265,0.475546,0.242803,0.015668,relu,{'activation': 'relu'},0.60396,0.811881,0.524752,0.646865,0.121082,5


Softsign is the winner

### Optimizers

In [None]:
activation = 'softsign'

def create_model(optimizer='adam'):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer=init_mode, activation=activation))
    model.add(Dense(layer2, kernel_initializer=init_mode, activation=activation))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=batch_size, epochs=100, verbose=1)

# define the grid search parameters
optimizer = optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [196]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_optimizer,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
1,4.724592,0.164289,0.144771,0.038771,RMSprop,{'optimizer': 'RMSprop'},0.673267,0.841584,0.584158,0.69967,0.106739,1
6,6.638242,0.660434,0.412358,0.024616,Nadam,{'optimizer': 'Nadam'},0.683168,0.811881,0.584158,0.693069,0.093231,2
3,6.494347,0.322629,0.296397,0.062389,Adadelta,{'optimizer': 'Adadelta'},0.70297,0.80198,0.564356,0.689769,0.097458,3
2,5.273129,0.180235,0.231829,0.010873,Adagrad,{'optimizer': 'Adagrad'},0.653465,0.831683,0.574257,0.686469,0.107654,4
4,7.863513,0.39196,0.37364,0.072089,Adam,{'optimizer': 'Adam'},0.683168,0.782178,0.584158,0.683168,0.080841,5


RMSprop is the winner

### Dropout regularization:

In [None]:
from keras.constraints import maxnorm

optimizer = 'RMSprop'

def create_model(dropout_rate=0.0, weight_constraint=0):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(layer2, kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=batch_size, epochs=100, verbose=1)

# define the grid search parameters
weight_constraint = [1, 2, 3]
dropout_rate = [0.0, 0.1, 0.2, 0.3]
param_grid = dict(dropout_rate=dropout_rate, weight_constraint=weight_constraint)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [198]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_dropout_rate,param_weight_constraint,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
2,6.807321,0.337038,0.662258,0.027473,0.0,3,"{'dropout_rate': 0.0, 'weight_constraint': 3}",0.673267,0.821782,0.584158,0.693069,0.098015,1
1,6.772218,0.398087,0.535562,0.043048,0.0,2,"{'dropout_rate': 0.0, 'weight_constraint': 2}",0.693069,0.80198,0.564356,0.686469,0.097122,2
10,6.558165,0.403338,0.307074,0.031322,0.3,2,"{'dropout_rate': 0.3, 'weight_constraint': 2}",0.70297,0.811881,0.524752,0.679868,0.118353,3
7,7.566388,1.258503,0.792292,0.491068,0.2,2,"{'dropout_rate': 0.2, 'weight_constraint': 2}",0.693069,0.811881,0.524752,0.676568,0.117799,4
4,8.099014,0.262785,0.765559,0.06248,0.1,2,"{'dropout_rate': 0.1, 'weight_constraint': 2}",0.663366,0.80198,0.554455,0.673267,0.101294,5


dropout= 0, weight_constraint = 3 is the winner

### Number of Epochs and Learning Rate

In [None]:
from keras.optimizers import RMSprop

dropout_rate = 0.0
weight_constraint = 3

def create_model(lr=0.01):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(layer2, kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # Compile model
    optimizer = RMSprop(lr=lr)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=batch_size, epochs=epochs, verbose=1)

# define the grid search parameters
epochs = [75, 200]
lr = [0.001, 0.01, 0.1, 0.5]
param_grid = dict(epochs=epochs, lr=lr)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [200]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_epochs,param_lr,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
4,9.809377,0.067526,0.308544,0.034504,200,0.001,"{'epochs': 200, 'lr': 0.001}",0.693069,0.80198,0.574257,0.689769,0.092997,1
0,4.622927,0.041567,0.102014,0.021971,75,0.001,"{'epochs': 75, 'lr': 0.001}",0.663366,0.80198,0.574257,0.679868,0.093697,2
5,10.016843,0.068657,0.396711,0.025759,200,0.01,"{'epochs': 200, 'lr': 0.01}",0.415842,0.792079,0.683168,0.630363,0.158072,3
1,4.450925,0.13041,0.165057,0.016116,75,0.01,"{'epochs': 75, 'lr': 0.01}",0.584158,0.782178,0.50495,0.623762,0.116591,4
2,4.772882,0.055635,0.238296,0.012846,75,0.1,"{'epochs': 75, 'lr': 0.1}",0.376238,0.80198,0.633663,0.60396,0.175073,5


Best learn rate and # epochs : 200 and 0.001:
Since the learn rate is so small and # epochs iss large, run gridsearch again with higher # epochs and see if that improves 

In [None]:
def create_model(lr=0.01):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(layer2, kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # Compile model
    optimizer = RMSprop(lr=lr)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=batch_size, verbose=1)

# define the grid search parameters
epochs = [175, 300, 500]
lr = [0.001, 0.005]
param_grid = dict(epochs=epochs, lr=lr)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [205]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_epochs,param_lr,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
1,8.528824,0.652418,0.190313,0.03457,175,0.005,"{'epochs': 175, 'lr': 0.005}",0.673267,0.821782,0.643564,0.712871,0.07796,1
4,22.867647,0.731623,0.351138,0.018046,500,0.001,"{'epochs': 500, 'lr': 0.001}",0.673267,0.821782,0.60396,0.69967,0.090864,2
3,15.320076,0.378161,0.294596,0.058786,300,0.005,"{'epochs': 300, 'lr': 0.005}",0.673267,0.782178,0.643564,0.69967,0.059589,3
0,8.044495,0.158138,0.103685,0.017522,175,0.001,"{'epochs': 175, 'lr': 0.001}",0.673267,0.782178,0.594059,0.683168,0.077118,4
5,21.134149,0.657949,0.34015,0.044028,500,0.005,"{'epochs': 500, 'lr': 0.005}",0.70297,0.762376,0.544554,0.669967,0.091937,5


In [None]:
#One last finetuning:
def create_model(lr=0.01):
    # create model
    model = Sequential()
    model.add(Dense(layer1, input_shape=(13,), kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(layer2, kernel_initializer=init_mode, activation=activation, 
                    kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # Compile model
    optimizer = RMSprop(lr=lr)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=batch_size, verbose=1)

# define the grid search parameters
epochs = [140, 165, 180]
lr = [0.003, 0.004, 0.005, 0.006, 0.007]
param_grid = dict(epochs=epochs, lr=lr)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=2, verbose=1)
grid_result = grid.fit(X, y)

In [208]:
pd.DataFrame(grid_result.cv_results_).sort_values(by='rank_test_score').head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_epochs,param_lr,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
11,9.217514,0.615243,0.535745,0.007938,180,0.004,"{'epochs': 180, 'lr': 0.004}",0.80198,0.841584,0.643564,0.762376,0.085554,1
9,8.329626,0.319037,0.361162,0.018857,165,0.007,"{'epochs': 165, 'lr': 0.007}",0.732673,0.821782,0.693069,0.749175,0.053827,2
12,10.942259,0.030027,0.556416,0.050115,180,0.005,"{'epochs': 180, 'lr': 0.005}",0.792079,0.80198,0.643564,0.745875,0.072457,3
8,7.730639,0.125757,0.356799,0.067109,165,0.006,"{'epochs': 165, 'lr': 0.006}",0.554455,0.841584,0.762376,0.719472,0.121082,4
3,8.522777,0.10349,0.605091,0.046021,140,0.006,"{'epochs': 140, 'lr': 0.006}",0.683168,0.841584,0.594059,0.706271,0.102363,5


## Final Model Predictions:

Baseline:

In [231]:
baseline_accuracy
#accuracy, precision, recall

(0.782608695652174, 0.8214285714285714, 0.8214285714285714)

Final Model:

In [229]:
model = grid_result.best_estimator_

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=99)
model.fit(X_train, y_train, verbose=0)

y_pred = model.predict(X_test)

tuned_accuracy = get_accuracy(y_test, y_pred)

Accuracy: 84.783
Precision: 86.207
Recall: 89.286
