<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** Each neuron is a mathematical operation that takes it’s input, multiplies it by it’s weights and then passes the sum through the activation function to the other neurons
- **Input Layer:** The Input nodes provide information from the outside world to the network and are together referred to as the “Input Layer”. No computation is performed in any of the Input nodes – they just pass on the information to the hidden nodes.
- **Hidden Layer:** The Hidden nodes have no direct connection with the outside world (hence the name “hidden”). They perform computations and transfer information from the input nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”.¶
- **Output Layer:** The Output nodes are collectively referred to as the “Output Layer” and are responsible for computations and transferring information from the network to the outside world.¶
- **Activation:** The node applies the activation function to the weighted sum of its inputs. The purpose of the activation function is to introduce non-linearity into the output of a neuron. This is important because most real world data is non linear and we want neurons to learn these non linear representations.
- **Backpropagation:** gives us detailed insights into how changing the weights and biases changes the overall behaviour of the network. Uses the partial derivative ∂C/∂w of the cost function C with respect to any weight w (or bias b) in the network. The expression tells us how quickly the cost changes when we change the weights and biases.


## 2. Perceptron on AND Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [0]:
import numpy as np

# Establish inputs
inputs = np.array([
    [1,1,1],
    [1,0,1],
    [0,1,1],
    [0,0,1]
])

# Establish Target
target = [[1],
          [0],
          [0],
          [0]]
    
class Perceptron:
    def __init__(self,
                inputLayerSize=3,
                outputLayerSize=1,
                hiddenLayerSize=4):        
        #Define Hyperparameters
        self.inputLayerSize = inputLayerSize
        self.outputLayerSize = outputLayerSize
        self.hiddenLayerSize = hiddenLayerSize
        
        #Input Node
        self.weights1 = np.random.randn(self.inputLayerSize,
                                        self.hiddenLayerSize)
        #Output Node
        self.weights2 = np.random.randn(self.hiddenLayerSize, 
                                        self.outputLayerSize)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self,X):
        """
        Calculate the NN inference using feed forward.
        """
        
        # Weighted sum of inputs & hidden
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weighted sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Final Activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    
    def backward(self, X, y, o):
        """
        Backward propagate through the network
        """
        self.o_error = y - o #error in output
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        self.z2_error = self.o_delta.dot(self.weights2.T) 
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 += X.T.dot(self.z2_delta) 
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) 
    
    def train(self, X, y):
      o = self.feed_forward(X)
      self.backward(X, y, o)


In [148]:
p1 = Perceptron()

for i in range(300):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 50 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', inputs)
        print('Actual Output: \n', target)
        print('Predicted Output: \n', str(np.around(p1.feed_forward(inputs))))
        print("Loss: \n", str(np.mean(np.square(target - p1.feed_forward(inputs)))))
    p1.train(inputs,target)

+---------EPOCH 1---------+
Input: 
 [[1 1 1]
 [1 0 1]
 [0 1 1]
 [0 0 1]]
Actual Output: 
 [[1], [0], [0], [0]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 [1.]]
Loss: 
 0.40981696807820067
+---------EPOCH 2---------+
Input: 
 [[1 1 1]
 [1 0 1]
 [0 1 1]
 [0 0 1]]
Actual Output: 
 [[1], [0], [0], [0]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 [1.]]
Loss: 
 0.32116477216469014
+---------EPOCH 3---------+
Input: 
 [[1 1 1]
 [1 0 1]
 [0 1 1]
 [0 0 1]]
Actual Output: 
 [[1], [0], [0], [0]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 [1.]]
Loss: 
 0.25257136047733036
+---------EPOCH 4---------+
Input: 
 [[1 1 1]
 [1 0 1]
 [0 1 1]
 [0 0 1]]
Actual Output: 
 [[1], [0], [0], [0]]
Predicted Output: 
 [[0.]
 [0.]
 [0.]
 [0.]]
Loss: 
 0.2136158506669704
+---------EPOCH 5---------+
Input: 
 [[1 1 1]
 [1 0 1]
 [0 1 1]
 [0 0 1]]
Actual Output: 
 [[1], [0], [0], [0]]
Predicted Output: 
 [[0.]
 [0.]
 [0.]
 [0.]]
Loss: 
 0.19417785778402386
+---------EPOCH 50---------+
Input: 
 [[1 1 1]
 [1 0 1]
 [0 1 1]
 [0 0 1]]
A

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [140]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')

X = df.drop(columns='target').values
y = df[['target']].values
print(df.shape)
df.head(2)

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1


In [0]:
sc = StandardScaler()
X = sc.fit_transform(X)

In [0]:
class Neural_Network(object):
    def __init__(self, inputLayerSize=4, outputLayerSize=1, hiddenLayerSize=4):
        
        #Define Hyperparameters
        self.inputLayerSize = inputLayerSize
        self.outputLayerSize = outputLayerSize
        self.hiddenLayerSize = hiddenLayerSize
        
        #Weights (parameters)
        #Input Layer
        self.W1 = np.random.randn(self.inputLayerSize,self.hiddenLayerSize)
        # Hidden Layers
        self.Wh = np.random.randn(self.hiddenLayerSize,self.hiddenLayerSize)
        # Output Layer
        self.W2 = np.random.randn(self.hiddenLayerSize,self.outputLayerSize)
        
    def forward(self, X):
        """
        Propagate inputs though network
        """
        # Input/1st Hidden Layer
        self.z2 = np.dot(X, self.W1)
        self.a2 = self.sigmoid(self.z2)
        # 2nd Hidden Layer
        self.zh = np.dot(self.a2, self.Wh)
        self.a3 = self.sigmoid(self.zh)
        # Output Layer
        self.z3 = np.dot(self.a3, self.W2)
        yHat = self.sigmoid(self.z3) 
        return yHat
        
    def sigmoid(self, z):
        #Apply sigmoid activation function to scalar, vector, or matrix
        return 1/(1+np.exp(-z))
    
    def sigmoidPrime(self,z):
        #Gradient of sigmoid
        return np.exp(-z)/((1+np.exp(-z))**2)
    
    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J
        
    def costFunctionPrime(self, X, y):
        #Compute derivative with respect to W and W2 for a given X and y:
        self.yHat = self.forward(X)
        
        delta3 = np.multiply(-(y-self.yHat), self.sigmoidPrime(self.z3))
        dJdW2 = np.dot(self.a2.T, delta3)
        
        delta2 = np.dot(delta3, self.W2.T)*self.sigmoidPrime(self.z2)
        dJdW1 = np.dot(X.T, delta2)  
        
        return dJdW1, dJdW2
    
    #Helper Functions for interacting with other classes:
    def getParams(self):
        #Get W1 and W2 unrolled into vector:
        params = np.concatenate((self.W1.ravel(), self.W2.ravel()))
        return params
    
    def setParams(self, params):
        #Set W1 and W2 using single paramater vector.
        W1_start = 0
        W1_end = self.hiddenLayerSize * self.inputLayerSize
        self.W1 = np.reshape(params[W1_start:W1_end], (self.inputLayerSize , self.hiddenLayerSize))
        W2_end = W1_end + self.hiddenLayerSize*self.outputLayerSize
        self.W2 = np.reshape(params[W1_end:W2_end], (self.hiddenLayerSize, self.outputLayerSize))
        
    def computeGradients(self, X, y):
        dJdW1, dJdW2 = self.costFunctionPrime(X, y)
        return np.concatenate((dJdW1.ravel(), dJdW2.ravel()))
    

class trainer(object):
    def __init__(self, N):
        #Make Local reference to network:
        self.N = N
        
    def callbackF(self, params):
        self.N.setParams(params)
        self.J.append(self.N.costFunction(self.X, self.y))   
        
    def costFunctionWrapper(self, params, X, y):
        self.N.setParams(params)
        cost = self.N.costFunction(X, y)
        grad = self.N.computeGradients(X,y)
        
        return cost, grad
        
    def train(self, X, y):
        #Make an internal variable for the callback function:
        self.X = X
        self.y = y

        #Make empty list to store costs:
        self.J = []
        
        params0 = self.N.getParams()

        options = {'maxiter': 200, 'disp' : True}
        _res = optimize.minimize(self.costFunctionWrapper, params0, jac=True, method='BFGS', \
                                 args=(X, y), options=options, callback=self.callbackF)

        self.N.setParams(_res.x)
        self.optimizationResults = _res

In [0]:
NN = Neural_Network(inputLayerSize=x.shape[1])

In [0]:
T = trainer(NN)

In [142]:
T.train(X, y)

         Current function value: 37.559564
         Iterations: 1
         Function evaluations: 110
         Gradient evaluations: 98


In [143]:
X.shape, y.shape, type(x)

((303, 13), (303, 1), numpy.ndarray)

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [4]:
import pandas as pd
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
from sklearn.preprocessing import StandardScaler
import pandas as pd
import keras
import tensorflow

Using TensorFlow backend.


In [0]:
y = keras.utils.to_categorical(y, len(set(y)))

In [0]:
model = Sequential()
model.add(Dense(4, input_dim=13, activation='relu'))
model.add(Dense(5, activation='tanh'))
model.add(Dense(2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [32]:
model.fit(x, y, epochs=50, validation_split=.1)

Train on 272 samples, validate on 31 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x7f1b48252e48>

In [0]:
# Function to create model, required for KerasClassifier
def create_model():
  model = Sequential()
  model.add(Dense(4, input_dim=13, activation='relu'))
  model.add(Dense(5, activation='tanh'))
  model.add(Dense(2, activation='sigmoid'))
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=1)

param_grid = {'batch_size': [10, 20],
              'epochs': [20, 40]}

In [41]:
# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(x, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")



Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40
Best: 0.6666666865348816 using {'batch_size': 10, 'epochs': 40}
Means: 0.4785478512446086, Stdev: 0.22203153125059896 with: {'batch_size': 10, 'epochs': 20}
Means: 0.6666666865348816, Stdev: 0.12421260077050067 with: {'batch_size': 10, 'epochs': 40}
Means: 0.39273926615715027, Stdev: 0.26709233886057665 with: {'batch_size': 20, 'epochs': 20}
Means: 0.4686468690633774, Stdev: 0.2588603522693043 with: {'batch_size': 20, 'epochs': 40}
