<a href="https://colab.research.google.com/github/damerei/DS-Unit-4-Sprint-2-Neural-Networks/blob/master/LS_DS_Unit_4_Sprint_Challenge_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** A neuron - perhaps more accurately an artificial neuron - is a conceptual descriptor for a node in the mathematical artifice we call a neural network. 
- **Input Layer:** The input layer is the set of neurons representing the input data. For example, in the typical case of an image, a natural representation is to treat each pixel as an input corresponding to a neuron.
- **Hidden Layer:** The hidden layers are one or more layers of artificial neurons which represent various compressions, encodings, or otherwise transformations of the input layer, generally according to an activation function.
- **Output Layer:** The output layer represents as artificial neurons the results of the transformations encoded in the artificial neural network. 
- **Activation:** The activation function of a node (aka artificial neuron) is what transforms the input into an output. In the simplest feedforward network, you have an input layer which is transformed through a single application of the activation function in a single hidden layer. Typically you want to choose a function that is continuous (because differentiable and therefore restricted to giving appropriately small output variation in response to input variation) and monotonic (so that input-output variation is consistent in parity). Typical examples are sigmoid or arctan functions. 
- **Backpropagation:** Backpropagation is an algorithm - closely related to traditional numerical approximation by gradient methods such as the Gauss-Newton algorithm - to allow for progressive, recursive-iterative, and self-engineered improvement of a neural network's modeling effectiveness. Essentially one is constantly searching for the gradient of the error function (which in turn consists of the partial derivatives of the error function with respect to the weights), since this represents the rate of greatest change. Intuitively, by moving in that direction, one hypothetically reduces the error function at the greatest rate. Repeated application does not guarantee locating a global minimum, but certainly is helpful, and a number of techniques such as boosting and momentum exist to help prevent being caught in local minima. 

## 2. Perceptron on XOR Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [0]:
import numpy as np

# Establish Inputs
inputs = np.array([
    [1,1,1],
    [1,0,1],
    [0,1,1],
    [0,0,1]
])

# Establish Target 
target = [[1], 
          [0], 
          [0], 
          [0]]

# Sigmoid functions
def sigmoid(x):
    return 1 / (1+np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx* (1-sx)

In [0]:
weights = np.random.random((3,1)) - 1

weighted_sum = np.dot(inputs, weights)

activated_output = sigmoid(weighted_sum)

error = target - activated_output

adjustments = error * sigmoid_derivative(activated_output)

weights += np.dot(inputs.T, adjustments)

In [3]:
weights = np.random.random((3,1)) - 1

for iteration in range(10000):
    
    # Weighted Sum of inputs/weights
    
    weighted_sum = np.dot(inputs, weights)
    
    # Activate!
    activated_output = sigmoid(weighted_sum)
    
    # Calculate the Error
    error = target - activated_output
    
    # Adjustments
    adjustments = error * sigmoid_derivative(activated_output)
    
    # New weights. 
    weights += np.dot(inputs.T, adjustments)
    
print('Weights after Training')
print(weights)

print('Outputs after training')
print(activated_output)

Weights after Training
[[ 11.840066  ]
 [ 11.840066  ]
 [-18.04840824]]
Outputs after training
[[9.96430036e-01]
 [2.00873011e-03]
 [2.00873011e-03]
 [1.45146560e-08]]


These outputs are substantially the target [1, 0, 0, 0], differing in most cases by a few orders of magnitude. 

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [0]:
class MLP(object):
    def __init__(self,
                inputLayerSize=4,
                outputLayerSize=1,
                hiddenLayerSize=4):
        
        #Define Hyperparameters
        self.inputLayerSize = inputLayerSize
        self.outputLayerSize = outputLayerSize
        self.hiddenLayerSize = hiddenLayerSize
        
        #Weights (parameters)
        #Input Layer
        self.W1 = np.random.randn(self.inputLayerSize,self.hiddenLayerSize)
        # Hidden Layers
        self.Wh = np.random.randn(self.hiddenLayerSize,self.hiddenLayerSize)
        # Output Layer
        self.W2 = np.random.randn(self.hiddenLayerSize,self.outputLayerSize)
        
    def forward(self, X):
        """
        Propagate inputs though network
        """
        # Input/1st Hidden Layer
        self.z2 = np.dot(X, self.W1)
        self.a2 = self.sigmoid(self.z2)
        # 2nd Hidden Layer
        self.zh = np.dot(self.a2, self.Wh)
        self.a3 = self.sigmoid(self.zh)
        # Output Layer
        self.z3 = np.dot(self.a3, self.W2)
        yHat = self.sigmoid(self.z3) 
        return yHat
        
    def sigmoid(self, z):
        #Apply sigmoid activation function to scalar, vector, or matrix
        return 1/(1+np.exp(-z))
    
    def sigmoidPrime(self,z):
        #Gradient of sigmoid
        return np.exp(-z)/((1+np.exp(-z))**2)
    
    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J
        
    def costFunctionPrime(self, X, y):
        #Compute derivative with respect to W and W2 for a given X and y:
        self.yHat = self.forward(X)
        
        delta3 = np.multiply(-(y-self.yHat), self.sigmoidPrime(self.z3))
        dJdW2 = np.dot(self.a2.T, delta3)
        
        delta2 = np.dot(delta3, self.W2.T)*self.sigmoidPrime(self.z2)
        dJdW1 = np.dot(X.T, delta2)  
        
        return dJdW1, dJdW2
    
    #Helper Functions for interacting with other classes:
    def getParams(self):
        #Get W1 and W2 unrolled into vector:
        params = np.concatenate((self.W1.ravel(), self.W2.ravel()))
        return params
    
    def setParams(self, params):
        #Set W1 and W2 using single paramater vector.
        W1_start = 0
        W1_end = self.hiddenLayerSize * self.inputLayerSize
        self.W1 = np.reshape(params[W1_start:W1_end], (self.inputLayerSize , self.hiddenLayerSize))
        W2_end = W1_end + self.hiddenLayerSize*self.outputLayerSize
        self.W2 = np.reshape(params[W1_end:W2_end], (self.hiddenLayerSize, self.outputLayerSize))
        
    def computeGradients(self, X, y):
        dJdW1, dJdW2 = self.costFunctionPrime(X, y)
        return np.concatenate((dJdW1.ravel(), dJdW2.ravel()))
    

class trainer(object):
    def __init__(self, N):
        #Make Local reference to network:
        self.N = N
        
    def callbackF(self, params):
        self.N.setParams(params)
        self.J.append(self.N.costFunction(self.X, self.y))   
        
    def costFunctionWrapper(self, params, X, y):
        self.N.setParams(params)
        cost = self.N.costFunction(X, y)
        grad = self.N.computeGradients(X,y)
        
        return cost, grad
        
    def train(self, X, y):
        #Make an internal variable for the callback function:
        self.X = X
        self.y = y

        #Make empty list to store costs:
        self.J = []
        
        params0 = self.N.getParams()

        options = {'maxiter': 200, 'disp' : True}
        _res = optimize.minimize(self.costFunctionWrapper, params0, jac=True, method='BFGS', \
                                 args=(X, y), options=options, callback=self.callbackF)

        self.N.setParams(_res.x)
        self.optimizationResults = _res

In [0]:
from scipy import optimize
class trainer(object):
    def __init__(self, N):
        #Make Local reference to network:
        self.N = N
        
    def callbackF(self, params):
        self.N.setParams(params)
        self.J.append(self.N.costFunction(self.X, self.y))   
        
    def costFunctionWrapper(self, params, X, y):
        self.N.setParams(params)
        cost = self.N.costFunction(X, y)
        grad = self.N.computeGradients(X,y)
        
        return cost, grad
        
    def train(self, X, y):
        #Make an internal variable for the callback function:
        self.X = X
        self.y = y

        #Make empty list to store costs:
        self.J = []
        
        params0 = self.N.getParams()

        options = {'maxiter': 200, 'disp' : True}
        _res = optimize.minimize(self.costFunctionWrapper, params0, jac=True, method='BFGS', \
                                 args=(X, y), options=options, callback=self.callbackF)

        self.N.setParams(_res.x)
        self.optimizationResults = _res

In [55]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')

df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [58]:
X = df.drop(columns='target').values
y = df[['target']].values

target.shape

(303,)

In [59]:
p2 = MLP(inputLayerSize=X.shape[1])
tp = trainer(p2)
tp.train(X,y)

y_pred = np.around(MLP.forward(p2 , X))

Optimization terminated successfully.
         Current function value: 41.355619
         Iterations: 0
         Function evaluations: 1
         Gradient evaluations: 1




In [60]:
from sklearn.metrics import accuracy_score
accuracy_score(y_pred, y)

0.5445544554455446

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [23]:
!pip install category_encoders

Collecting category_encoders
[?25l  Downloading https://files.pythonhosted.org/packages/6e/a1/f7a22f144f33be78afeb06bfa78478e8284a64263a3c09b1ef54e673841e/category_encoders-2.0.0-py2.py3-none-any.whl (87kB)
[K     |████████████████████████████████| 92kB 5.8MB/s 
Installing collected packages: category-encoders
Successfully installed category-encoders-2.0.0


In [25]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [0]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

from xgboost import XGBClassifier


import keras
from keras.models import Sequential
from keras.layers import Dense

from keras.wrappers.scikit_learn import KerasClassifier

import category_encoders as ce

In [0]:
X = df.drop(columns=['target']).values
y = df['target'].values

In [0]:
def base_pipe(tts=False, X=X, y=y):
    """A basic pipeline for transforming the data"""
    
    ord_enc = ce.OrdinalEncoder()
    scaler  = StandardScaler()


    X = ord_enc.fit_transform(X)
    X = scaler.fit_transform(X)
    
    
    if tts==True:
        
        X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                            test_size=0.3, 
                                                            random_state=42)

        return X_train, X_test, y_train, y_test
    
    else:
        
        return X, y

In [41]:
X_train, X_test, y_train, y_test = base_pipe(tts=True)

X_train.shape, X_test.shape, y_train.shape, y_test.shape

((212, 13), (91, 13), (212,), (91,))

In [0]:
# Define model function for Keras Classifier Object
# create model
model = Sequential()
# Input and First Hidden Layer
model.add(Dense(32, input_dim=X.shape[1], activation='relu'))
# Second Hidden Layer
model.add(Dense(16, activation='relu'))
# Output Layer
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', 
              optimizer='rmsprop', 
              metrics=['accuracy'])

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    random_state=42)

In [43]:
model_history = model.fit(X_train, y_train,
                          epochs=100,
                          batch_size=64,
                          validation_data=(X_test, y_test),
                          verbose=0)
scores = model.evaluate(X_test, y_test)
print('Neural Network ACC: ', scores[1])

Neural Network ACC:  0.524590154163173


In [0]:
LeakyReLU = keras.layers.LeakyReLU(alpha=0.3)

# create model
model = Sequential()
# Input and First Hidden Layer
model.add(Dense(64, input_dim=X.shape[1], activation='relu'))
# LeakyReLU Advanced function layer. 
# See https://github.com/keras-team/keras/issues/2272#issuecomment-209001884
model.add(LeakyReLU)
# Second Hidden Layer
model.add(Dense(64, activation='sigmoid'))
# Third Hidden Layer
model.add(Dense(32, activation='sigmoid'))
# Output Layer
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', 
              optimizer='adam', 
              metrics=['accuracy'])

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    random_state=42)

In [48]:
model_history = model.fit(X_train, y_train,
                          epochs=100,
                          batch_size=64,
                          validation_data=(X_test, y_test),
                          verbose=0)

scores = model.evaluate(X_test, y_test)
print('Neural Network ACC: ', scores[1])

Neural Network ACC:  0.8688524609706441


In [0]:
# Define model function for Keras Classifier Object
def create_model():
    
    LeakyReLU = keras.layers.LeakyReLU(alpha=0.3)

    # create model
    model = Sequential()
    # Input and First Hidden Layer
    model.add(Dense(64, input_dim=X.shape[1], activation='relu'))
    # LeakyReLU Advanced function layer. 
    # See https://github.com/keras-team/keras/issues/2272#issuecomment-209001884
    model.add(LeakyReLU)
    # Second Hidden Layer
    model.add(Dense(64, activation='sigmoid'))
    # Third Hidden Layer
    model.add(Dense(32, activation='sigmoid'))
    # Output Layer
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', 
                  optimizer='adam', 
                  metrics=['accuracy'])

    return model

model = KerasClassifier(build_fn=create_model, verbose=1)

In [63]:
%%time

# Define the grid search parameters
param_grid = {'batch_size': [10, 40, 80, 120],
              'epochs': [20, 50, 100 , 200, 400, 600]}

# Create Grid Search
grid = GridSearchCV(estimator=model, 
                    param_grid=param_grid,
                    cv=5,
                    n_jobs=-1)

grid_result = grid.fit(X, y, verbose=0)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")



Best: 0.7491749165081741 using {'batch_size': 10, 'epochs': 100}
Means: 0.44554455986510805, Stdev: 0.2078515207231098 with: {'batch_size': 10, 'epochs': 20}
Means: 0.6831683205692681, Stdev: 0.0672472366711982 with: {'batch_size': 10, 'epochs': 50}
Means: 0.7491749165081741, Stdev: 0.17598905598592587 with: {'batch_size': 10, 'epochs': 100}
Means: 0.6963696420782863, Stdev: 0.07223526382205561 with: {'batch_size': 10, 'epochs': 200}
Means: 0.6831683261756456, Stdev: 0.060439423997315424 with: {'batch_size': 10, 'epochs': 400}
Means: 0.6666666725681166, Stdev: 0.05605020195118876 with: {'batch_size': 10, 'epochs': 600}
Means: 0.3894389481237619, Stdev: 0.24925199860587943 with: {'batch_size': 40, 'epochs': 20}
Means: 0.5379538021662055, Stdev: 0.17844677452732335 with: {'batch_size': 40, 'epochs': 50}
Means: 0.6468646927635269, Stdev: 0.10456819283874651 with: {'batch_size': 40, 'epochs': 100}
Means: 0.554455440528322, Stdev: 0.09804963434997392 with: {'batch_size': 40, 'epochs': 200}


In [65]:
best_batch = grid_result.best_params_['batch_size']
best_epoch = grid_result.best_params_['epochs']

optimal_model = create_model()

opt = optimal_model.fit(X_train, y_train,
                        epochs=best_epoch,
                        batch_size=best_batch,
                        validation_data=(X_test, y_test),
                        verbose=1)

Train on 242 samples, validate on 61 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Ep

In [66]:
scores = optimal_model.evaluate(X_test, y_test)
print('Neural Network ACC: ', scores[1])

Neural Network ACC:  0.8688524609706441
