<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
  * Neurons are the building blocks when talking about a neural network.  They can take on different "shapes" or forms depending on their purpose.  They are called neurons because a neural network got is loosely based on a brain. The idea is that the information that is fed into the neurons and weights associated with them will either make them fire or not to determine the final output of our classification or regression problem.
- **Input Layer:**
  * The input layer of a neural network is the initial layer. Where everything begins. It essentially is made up of the nodes that receive the data that will go through all the layers.  The amount of nodes or neurons that make up the input layer is largely determined by the amount of features you have in your data.
- **Hidden Layer:**
  * The hidden layers in a neural network is where a lot of the magic happens.  Taking the weights associated with the inputs, it usually is also coupled with some activation function that determines if the neuron fires or not. There can be a lot of hidden layers and a lot of neurons within each layer.  It depends on the problem you are trying to solve and the information you have to determine the amount. A neural network with many hidden layers is what people refer to when they talk about 'deep learning'.
- **Output Layer:**
  * The output layer is always the final layer of a neural network.  For binary or regression problem, there is usually only one neuron in the output layer.  If you are trying to solve a multiple classification problem, you may want multiple neurons in this layer.  Once the inputs have gone through the network, the last layer uses the weights and activation fucntion to determine the 'answer' of what you are trying to find.  For example, in a binary problem, it will output either a 1 or a 0, depending on what the network believes the input is according to the training.
- **Activation:**
  * An activation function is what determines if a neuron fires or not.  In the output layer, it determines whether an input is something or not something.  There are different types of activation functions, which can be more useful or not, depending on the problem at hand.  What an activation function does, is it normalizes the combination of the weights inputs and bias, and given a certain threshold, determines if a neuron will fire or not.  In a way, activation functions are the gate keepers of a neural network.
- **Backpropagation:**
  * Backpropagation is a technique that can be used to make a neural network learn and improve upon itself.  Given specific inputs and targets for those inputs. Once inputs go through a neural network, it calculates the amount of error, according to the true target output, and feeds that information back into the network to adjust the weights accordingly.  Using some optimization algorithm, most popularly Gradient Descent, it will adjust the weights until it finds some optimum result to minimize the final error. 


## 2. Perceptron on XOR Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [193]:
import numpy as np
import matplotlib.pyplot as plt

X = np.array([
    [1,1,1],
    [1,0,1],
    [0,1,1],
    [0,0,1]
])

y = [
    [1],
    [0],
    [0],
    [0]
]



In [186]:
np.zeros(X.shape[1]).shape

(3,)

In [198]:
class Perceptron(object):
    def __init__(self, niter=1000):
        self.niter = niter
    
    def fit(self, X, y):

        # weights
        self.weight = 2 * np.random.random((3,1)) - 1
    
        for iteration in range(self.niter):
            
            self.weighted_sum = np.dot(X, self.weight)

            self.activated_outputs = sigmoid(self.weighted_sum)

            self.error = y - self.activated_outputs

            self.adjustments = self.error * sigmoid_derivative(self.activated_outputs)

            self.weight += np.dot(X.T, self.adjustments)

        print("Weights after training: ", self.weight)
        print("Output after training: ", self.activated_outputs)
        return self
    
   
    def sigmoid(self, X):
        return 1 / (1 + np.exp(-X))

    def sigmoid_derivative(self, X):
        sx = sigmoid(X)
        return sx * (1-sx)


In [199]:
nn = Perceptron(1000)

nn.fit(X, y)

Weights after training:  [[  7.22463508]
 [  7.22463535]
 [-11.12512232]]
Output after training:  [[9.65214673e-01]
 [1.98501452e-02]
 [1.98501504e-02]
 [1.47811523e-05]]


<__main__.Perceptron at 0x1a45232b70>

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [208]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

df = pd.read_csv("https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv")
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [209]:
df.drop(columns=['target']).columns

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal'],
      dtype='object')

In [210]:
#convert numeric cols to floats and normalize:
num_cols = df.drop(columns=['target']).columns
scaler = MinMaxScaler()
df[num_cols] = scaler.fit_transform(df[num_cols].values)

df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,0.708333,1.0,1.0,0.481132,0.244292,1.0,0.0,0.603053,0.0,0.370968,0.0,0.0,0.333333,1
1,0.166667,1.0,0.666667,0.339623,0.283105,0.0,0.5,0.885496,0.0,0.564516,0.0,0.0,0.666667,1
2,0.25,0.0,0.333333,0.339623,0.178082,0.0,0.0,0.770992,0.0,0.225806,1.0,0.0,0.666667,1
3,0.5625,1.0,0.333333,0.245283,0.251142,0.0,0.5,0.816794,0.0,0.129032,1.0,0.0,0.666667,1
4,0.583333,0.0,0.0,0.245283,0.520548,0.0,0.5,0.70229,1.0,0.096774,1.0,0.0,0.666667,1


In [211]:
df = df.astype('float32')

In [212]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
age         303 non-null float32
sex         303 non-null float32
cp          303 non-null float32
trestbps    303 non-null float32
chol        303 non-null float32
fbs         303 non-null float32
restecg     303 non-null float32
thalach     303 non-null float32
exang       303 non-null float32
oldpeak     303 non-null float32
slope       303 non-null float32
ca          303 non-null float32
thal        303 non-null float32
target      303 non-null float32
dtypes: float32(14)
memory usage: 16.6 KB


In [213]:
x = df.drop(columns=['target'])
y = df['target']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.33, random_state=42)

In [214]:
x_train.shape, y_train.shape, x_test.shape, y_test.shape

((203, 13), (203,), (100, 13), (100,))

In [215]:
x_train = x_train.values
x_test = x_test.values
y_train = y_train.values
y_test = y_test.values

In [216]:
y_train = y_train.reshape(-1,1)

In [220]:
class Neural_Network(object):
    def __init__(self):        
        #Define Hyperparameters
        self.inputLayerSize = 13
        self.outputLayerSize = 1
        self.hiddenLayerSize = 3
        
        #Weights (parameters)
        self.W1 = np.random.randn(self.inputLayerSize,self.hiddenLayerSize)
        self.W2 = np.random.randn(self.hiddenLayerSize,self.outputLayerSize)
        
    def forward(self, X):
        #Propogate inputs though network
        self.z2 = np.dot(X, self.W1)
        self.a2 = self.sigmoid(self.z2)
        self.z3 = np.dot(self.a2, self.W2)
        yHat = self.sigmoid(self.z3) 
        return yHat
        
    def sigmoid(self, z):
        #Apply sigmoid activation function to scalar, vector, or matrix
        return 1/(1+np.exp(-z))
    
    def sigmoidPrime(self,z):
        #Gradient of sigmoid
        return np.exp(-z)/((1+np.exp(-z))**2)
    
    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J
        
    def costFunctionPrime(self, X, y):
        #Compute derivative with respect to W and W2 for a given X and y:
        self.yHat = self.forward(X)
        
        delta3 = np.multiply(-(y-self.yHat), self.sigmoidPrime(self.z3))
        dJdW2 = np.dot(self.a2.T, delta3)
        
        delta2 = np.dot(delta3, self.W2.T)*self.sigmoidPrime(self.z2)
        dJdW1 = np.dot(X.T, delta2)  
        
        return dJdW1, dJdW2
    
    #Helper Functions for interacting with other classes:
    def getParams(self):
        #Get W1 and W2 unrolled into vector:
        params = np.concatenate((self.W1.ravel(), self.W2.ravel()))
        return params
    
    def setParams(self, params):
        #Set W1 and W2 using single paramater vector.
        W1_start = 0
        W1_end = self.hiddenLayerSize * self.inputLayerSize
        self.W1 = np.reshape(params[W1_start:W1_end], (self.inputLayerSize , self.hiddenLayerSize))
        W2_end = W1_end + self.hiddenLayerSize*self.outputLayerSize
        self.W2 = np.reshape(params[W1_end:W2_end], (self.hiddenLayerSize, self.outputLayerSize))
        
    def computeGradients(self, X, y):
        dJdW1, dJdW2 = self.costFunctionPrime(X, y)
        return np.concatenate((dJdW1.ravel(), dJdW2.ravel()))

In [221]:
from scipy import optimize
class trainer(object):
    def __init__(self, N):
        #Make Local reference to network:
        self.N = N
        
    def callbackF(self, params):
        self.N.setParams(params)
        self.J.append(self.N.costFunction(self.X, self.y))   
        
    def costFunctionWrapper(self, params, X, y):
        self.N.setParams(params)
        cost = self.N.costFunction(X, y)
        grad = self.N.computeGradients(X,y)
        
        return cost, grad
        
    def train(self, X, y):
        #Make an internal variable for the callback function:
        self.X = X
        self.y = y

        #Make empty list to store costs:
        self.J = []
        
        params0 = self.N.getParams()

        options = {'maxiter': 200, 'disp' : True}
        _res = optimize.minimize(self.costFunctionWrapper, params0, jac=True, method='BFGS', \
                                 args=(X, y), options=options, callback=self.callbackF)

        self.N.setParams(_res.x)
        self.optimizationResults = _res

In [222]:
NN = Neural_Network()
T = trainer(NN)

T.train(x_train,y_train)


         Current function value: 4.069501
         Iterations: 200
         Function evaluations: 233
         Gradient evaluations: 233




In [223]:
for epoch in range(1000):
    if (epoch+1 in [1,2,3,4,5]) or ((epoch+1) % 100 ==0):
        print('+' + '---' * 3 + f'EPOCH {epoch+1}' + '---'*3 + '+')
        print('Input: \n', x_train)
        print('Actual Output: \n', y_train[:10])
        print('Predicted Output: \n', str(nn.feed_forward(x_train[:10])))
    nn.train(x_train,y_train)

+---------EPOCH 1---------+
Input: 
 [[0.5208333  1.         0.6666667  ... 1.         0.         1.        ]
 [0.6041667  0.         0.6666667  ... 1.         0.         0.6666667 ]
 [0.375      1.         0.         ... 1.         0.         0.6666667 ]
 ...
 [0.8333333  1.         1.         ... 0.5        0.25       0.6666667 ]
 [0.35416666 1.         0.         ... 1.         0.         1.        ]
 [0.7083333  0.         0.33333334 ... 1.         0.5        0.6666667 ]]
Actual Output: 
 [[1.]
 [1.]
 [1.]
 [0.]
 [0.]
 [1.]
 [1.]
 [1.]
 [1.]
 [0.]]
Predicted Output: 
 [[0.39732664]
 [0.42027121]
 [0.35282868]
 [0.48522254]
 [0.38224776]
 [0.43770891]
 [0.43618112]
 [0.43337599]
 [0.3874501 ]
 [0.42466675]]
+---------EPOCH 2---------+
Input: 
 [[0.5208333  1.         0.6666667  ... 1.         0.         1.        ]
 [0.6041667  0.         0.6666667  ... 1.         0.         0.6666667 ]
 [0.375      1.         0.         ... 1.         0.         0.6666667 ]
 ...
 [0.8333333  1.    

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [128]:
# Keras imports
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

# sklearn imports
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

# Create Model
model = Sequential()
model.add(Dense(10, activation='relu', input_shape=(13,)))
model.add(Dense(13, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Fit Model
history = model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=50, batch_size=20, verbose=1)

Train on 203 samples, validate on 100 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


### Hypertune

In [133]:
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=1)

# define the grid search parameters
param_grid = {'batch_size': [20, 30, 40],
              'epochs': [20, 50, 100]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(x_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")



Best: 0.7635467933316537 using {'batch_size': 30, 'epochs': 100}
Means: 0.47783251804084026, Stdev: 0.0365794108889656 with: {'batch_size': 20, 'epochs': 20}
Means: 0.6206896606043641, Stdev: 0.09484876453405078 with: {'batch_size': 20, 'epochs': 50}
Means: 0.6453202001273338, Stdev: 0.10790992719751503 with: {'batch_size': 20, 'epochs': 100}
Means: 0.5073891675530984, Stdev: 0.06973419785597788 with: {'batch_size': 30, 'epochs': 20}
Means: 0.5862068964049146, Stdev: 0.07660003001572495 with: {'batch_size': 30, 'epochs': 50}
Means: 0.7635467933316537, Stdev: 0.11379559263822903 with: {'batch_size': 30, 'epochs': 100}
Means: 0.41871921020775593, Stdev: 0.07020009313355102 with: {'batch_size': 40, 'epochs': 20}
Means: 0.5221674938507268, Stdev: 0.03657940635525028 with: {'batch_size': 40, 'epochs': 50}
Means: 0.47783250849822473, Stdev: 0.03657940777873878 with: {'batch_size': 40, 'epochs': 100}
