<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** 

A single node in the neural network that contains a value

- **Input Layer:** 

The beginning state of all the inputs and is the first layer. They will be the first values that you put through the neural network. They consist of all the X values and one bias value that is usually 1. 

- **Hidden Layer:** 

Are the layer(s) where the "magic" happens. It's what's in between the output and input layers where the values are dot producted together and transformed by a activation function. 

- **Output Layer:**

The final output for the model. It can be a simple 1/0 classification, regression outputs, or a multiple classification. 
- **Activation:**

Each node is transformed by an activation function in order to "fire" nodes. Nodes with a higher value will contribute more to that swim lane. There are many types of activations such as relu, sigmoid, tanh and it depends on your neural net on which to use.

- **Backpropagation:**

The act of training the neural net by comparing the cost of an iteration and sending the information backwards through the model. It takes a step in a direction and updates all the weights for each layer to see how the updated weights perform.


## 2. Perceptron on XOR Gates <a id="Q2"></a>

The XOr, or “exclusive or”, problem is a classic problem in ANN research. It is the problem of using a neural network to predict the outputs of XOr logic gates given two binary inputs. An XOr function should return a true value if the two inputs are not equal and a false value if they are equal. Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2 | y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 1 | 0 |
| 1 | 0 | 1 |


In [9]:
import numpy as np

In [10]:
X = np.array([[0,0],[0,1],[1,1],[1,0]])

In [11]:
y = np.array([[0],[1],[0],[1]])

In [7]:
class xor(object):
    def __init__(self, rate = 0.01, niter = 10):
        self.rate = rate
        self.niter = niter

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """

        # weights
        self.weight = np.zeros(1 + X.shape[1])

        # Number of misclassifications
        self.errors = []  # Number of misclassifications

        for i in range(self.niter):
            err = 0
            for xi, target in zip(X, y):
                delta_w = self.rate * (target - self.predict(xi))
                self.weight[1:] += delta_w * xi
                self.weight[0] += delta_w
                err += int(delta_w != 0.0)
            self.errors.append(err)
        return self

    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.weight[1:]) + self.weight[0]

    def predict(self, X):
        """Return class label after unit step"""
        return np.where(self.net_input(X) >= 0.0, 1, -1)

In [14]:
m = xor()

In [15]:
m.fit(X,y)

<__main__.xor at 0x11975ca58>

In [16]:
m.net_input(X)

array([0., 0., 0., 0.])

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



## Using the gradient descent one we went over in class

In [17]:
import pandas as pd

In [19]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')

In [20]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [22]:
features = ['age','sex','cp','trestbps','chol','fbs','restecg','thalach','exang','oldpeak','slope','ca','thal']
target = 'target'

In [126]:
X = df[features].values
y = (df[target].values).reshape(303,1)

In [133]:
class Neural_Network(object):
    def __init__(self):        
        #Define Hyperparameters
        self.inputLayerSize = 13
        self.outputLayerSize = 1
        self.hiddenLayerSize = 10
        
        #Weights (parameters)
        self.W1 = np.random.randn(self.inputLayerSize,self.hiddenLayerSize)
        self.W2 = np.random.randn(self.hiddenLayerSize,self.outputLayerSize)
        
    def forward(self, X):
        #Propogate inputs though network
        self.z2 = np.dot(X, self.W1)
        self.a2 = self.sigmoid(self.z2)
        self.z3 = np.dot(self.a2, self.W2)
        yHat = self.sigmoid(self.z3) 
        return yHat
        
    def sigmoid(self, z):
        #Apply sigmoid activation function to scalar, vector, or matrix
        return 1/(1+np.exp(-z))
    
    def sigmoidPrime(self,z):
        #Gradient of sigmoid
        return np.exp(-z)/((1+np.exp(-z))**2)
    
    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J
        
    def costFunctionPrime(self, X, y):
        #Compute derivative with respect to W and W2 for a given X and y:
        self.yHat = self.forward(X)
        
        delta3 = np.multiply(-(y-self.yHat), self.sigmoidPrime(self.z3))
        dJdW2 = np.dot(self.a2.T, delta3)
        
        delta2 = np.dot(delta3, self.W2.T)*self.sigmoidPrime(self.z2)
        dJdW1 = np.dot(X.T, delta2)  
        
        return dJdW1, dJdW2
    
    #Helper Functions for interacting with other classes:
    def getParams(self):
        #Get W1 and W2 unrolled into vector:
        params = np.concatenate((self.W1.ravel(), self.W2.ravel()))
        return params
    
    def setParams(self, params):
        #Set W1 and W2 using single paramater vector.
        W1_start = 0
        W1_end = self.hiddenLayerSize * self.inputLayerSize
        self.W1 = np.reshape(params[W1_start:W1_end], (self.inputLayerSize , self.hiddenLayerSize))
        W2_end = W1_end + self.hiddenLayerSize*self.outputLayerSize
        self.W2 = np.reshape(params[W1_end:W2_end], (self.hiddenLayerSize, self.outputLayerSize))
        
    def computeGradients(self, X, y):
        dJdW1, dJdW2 = self.costFunctionPrime(X, y)
        return np.concatenate((dJdW1.ravel(), dJdW2.ravel()))


In [134]:
from scipy import optimize
class trainer(object):
    def __init__(self, N):
        #Make Local reference to network:
        self.N = N
        
    def callbackF(self, params):
        self.N.setParams(params)
        self.J.append(self.N.costFunction(self.X, self.y))   
        
    def costFunctionWrapper(self, params, X, y):
        self.N.setParams(params)
        cost = self.N.costFunction(X, y)
        grad = self.N.computeGradients(X,y)
        
        return cost, grad
        
    def train(self, X, y):
        #Make an internal variable for the callback function:
        self.X = X
        self.y = y

        #Make empty list to store costs:
        self.J = []
        
        params0 = self.N.getParams()

        options = {'maxiter': 200, 'disp' : True}
        _res = optimize.minimize(self.costFunctionWrapper, params0, jac=True, method='BFGS', \
                                 args=(X, y), options=options, callback=self.callbackF)

        self.N.setParams(_res.x)
        self.optimizationResults = _res

In [135]:
NN = Neural_Network()
T = trainer(NN)

In [136]:
T.train(X,y)

Optimization terminated successfully.
         Current function value: 67.374007
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1




In [138]:
print("Predicted Output: \n" + str(NN.forward(X))) 
print("Loss: \n" + str(np.mean(np.square(y - NN.forward(X))))) # mean sum squared loss

Predicted Output: 
[[0.99582919]
 [0.99564799]
 [0.99564798]
 [0.99582933]
 [0.99381567]
 [0.99580223]
 [0.99381567]
 [0.99582928]
 [0.99564942]
 [0.99631576]
 [0.99582933]
 [0.99381567]
 [0.99582902]
 [0.99582933]
 [0.99381754]
 [0.99582933]
 [0.99381567]
 [0.99381558]
 [0.99582933]
 [0.99582932]
 [0.99582933]
 [0.99564798]
 [0.99564798]
 [0.99382167]
 [0.99564894]
 [0.99381567]
 [0.99582933]
 [0.99582933]
 [0.99378316]
 [0.99566371]
 [0.99564798]
 [0.9958293 ]
 [0.99564798]
 [0.99381567]
 [0.99452741]
 [0.99606516]
 [0.99381569]
 [0.99582933]
 [0.99381573]
 [0.99381549]
 [0.99381567]
 [0.99582923]
 [0.99582933]
 [0.99381567]
 [0.99381567]
 [0.99381567]
 [0.99564798]
 [0.99408927]
 [0.99381569]
 [0.99582933]
 [0.99382273]
 [0.99381567]
 [0.99582904]
 [0.99631576]
 [0.99582933]
 [0.99564799]
 [0.99564798]
 [0.99582933]
 [0.99626106]
 [0.99381567]
 [0.99381567]
 [0.99381567]
 [0.99631575]
 [0.99582713]
 [0.99564816]
 [0.99631576]
 [0.99582927]
 [0.99565412]
 [0.99564798]
 [0.9958276 ]
 

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [82]:
# Keras imports
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# sklearn imports
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
from tensorflow.keras import optimizers

In [98]:
def create_model(optimizer='Adagrad',learn_rate = 0.01):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    ada = optimizers.Adagrad(lr=learn_rate)
    model.compile(loss='binary_crossentropy', optimizer=ada, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model)

## Trial One

In [None]:
# grid searching optimizer and batch size

In [94]:
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'optimizer' : ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam'],
              'epochs': [20]}

In [95]:
grid = GridSearchCV(estimator=model,param_grid=param_grid)
grid_result = grid.fit(X,y)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20

In [96]:
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.2145214487488071 using {'batch_size': 20, 'epochs': 20, 'optimizer': 'Adagrad'}
Means: 0.21122111876805624, Stdev: 0.2987117708214034 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'SGD'}
Means: 0.21122111876805624, Stdev: 0.2987117708214034 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'RMSprop'}
Means: 0.1221122145652771, Stdev: 0.17269274996962827 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'Adagrad'}
Means: 0.21122111876805624, Stdev: 0.2987117708214034 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'Adadelta'}
Means: 0.1221122145652771, Stdev: 0.17269274996962827 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'Adam'}
Means: 0.1221122145652771, Stdev: 0.17269274996962827 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'Adamax'}
Means: 0.21122111876805624, Stdev: 0.2987117708214034 with: {'batch_size': 10, 'epochs': 20, 'optimizer': 'Nadam'}
Means: 0.125412544546028, Stdev: 0.17040700995717545 with: {'batch_size': 20, 'epochs': 20, 'optimizer

## Trial Two

In [99]:
# keeping adagrad as our optimizer and grid searching learning rate and batch size

In [101]:
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'learn_rate': [0.01, 0.1, 0.5],
              'epochs': [20]}

In [102]:
grid = GridSearchCV(estimator=model,param_grid=param_grid)
grid_result = grid.fit(X,y)

W0816 09:17:00.642239 4599870912 deprecation.py:506] From /usr/local/anaconda3/envs/dl/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:105: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20

In [103]:
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.8514851530392965 using {'batch_size': 40, 'epochs': 20, 'learn_rate': 0.1}
Means: 0.41914190848668414, Stdev: 0.1873936437866186 with: {'batch_size': 10, 'epochs': 20, 'learn_rate': 0.01}
Means: 0.3201320171356201, Stdev: 0.18190777937304498 with: {'batch_size': 10, 'epochs': 20, 'learn_rate': 0.1}
Means: 0.25742575029532117, Stdev: 0.22330337080166415 with: {'batch_size': 10, 'epochs': 20, 'learn_rate': 0.5}
Means: 0.6600660085678101, Stdev: 0.08414223060395029 with: {'batch_size': 20, 'epochs': 20, 'learn_rate': 0.01}
Means: 0.4554455478986104, Stdev: 0.4130821529318069 with: {'batch_size': 20, 'epochs': 20, 'learn_rate': 0.1}
Means: 0.2145214487488071, Stdev: 0.2964056470473301 with: {'batch_size': 20, 'epochs': 20, 'learn_rate': 0.5}
Means: 0.5016501744588217, Stdev: 0.04068920210560288 with: {'batch_size': 40, 'epochs': 20, 'learn_rate': 0.01}
Means: 0.8514851530392965, Stdev: 0.16349130796357803 with: {'batch_size': 40, 'epochs': 20, 'learn_rate': 0.1}
Means: 0.1221122145

## Trial Three

In [104]:
# increasing epochs

In [105]:
param_grid = {'batch_size': [20],
              'learn_rate': [0.1],
              'epochs': [20,40,80,150]}

In [106]:
grid = GridSearchCV(estimator=model,param_grid=param_grid)
grid_result = grid.fit(X,y)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoc

In [107]:
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.8547854820887247 using {'batch_size': 20, 'epochs': 20, 'learn_rate': 0.1}
Means: 0.8547854820887247, Stdev: 0.14126030586626181 with: {'batch_size': 20, 'epochs': 20, 'learn_rate': 0.1}
Means: 0.3894389520088832, Stdev: 0.10042986691069834 with: {'batch_size': 20, 'epochs': 40, 'learn_rate': 0.1}
Means: 0.6336633761723837, Stdev: 0.18907175676458618 with: {'batch_size': 20, 'epochs': 80, 'learn_rate': 0.1}
Means: 0.7194719513257345, Stdev: 0.2187698232488586 with: {'batch_size': 20, 'epochs': 150, 'learn_rate': 0.1}


## Best Params

Well it looks like our best params are 
1. Learning rate 0.1
2. Adagrad optimizer
3. Batch size 20
4. Epochs 20