<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** 
    The base unit for a network. Applies weights and biases to inputs, then an activatin function before passing info 
    to the next level.
- **Input Layer:** 
    The layer that takes in the data from the dataset.
- **Hidden Layer:** 
    Layers between the input and output layers.
- **Output Layer:**
    The final layer. This is where the prediction is made.
- **Activation:**
    A function applied to the result of multiplying input by weight and adding bias. Determines if node 'fires' and at what degree.
- **Backpropagation:**
    A function for updating weights and biases at the beginning of the network based off of the last iteration.


## 2. Perceptron on XOR Gates <a id="Q2"></a>

The XOr, or “exclusive or”, problem is a classic problem in ANN research. It is the problem of using a neural network to predict the outputs of XOr logic gates given two binary inputs. An XOr function should return a true value if the two inputs are not equal and a false value if they are equal. Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2 | y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 1 | 0 |
| 1 | 0 | 1 |


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from tensorflow import keras
from keras.layers import Dense
from keras.models import Sequential
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score, KFold
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OrdinalEncoder, OneHotEncoder


Using TensorFlow backend.


In [2]:
class Perceptron(object):
    """Perceptron estimator with early stopping.
    
    :param learning_rate: float Estimator learning rate. Default == 0.01
    :param epochs: int Number of epochs to run Perceptron. Default = 1000
    :param early_stopping: int Number of epochs without imoprovement at which to stop estimator. Default = 10
    
    """
    
    def __init__(self, learning_rate=0.01, epochs=100, early_stopping=10):
        self.lr = learning_rate
        self.epochs = epochs
        self.early_stopping = early_stopping
        
    def predict(self,row):
        """Apply weights and add bias to inputs.
        
        Return 1 if output is greater or equal zero, else zero for each element in input row.
        """
        
        return (np.dot(row, self.weight[1:]) + self.weight[0]) >= 0

    def fit(self, X, y):
        """Fit training data
        
        Initialize with random bias and weights.
        Update weights and bias with each row based on previous iteration's error.
        Store number of errors for each epoch.
        Stop if no errors in number of `early_stopping` epochs.
        """
        
        self.weight = np.array([np.random.random() for _ in range(X.shape[1] + 1)])
    
        self.errors_ = []
        
        for _ in range(self.epochs):
            error = 0
            for row, label in zip(X, y):
                
                # Check our current prediction against the actual label to get the error.
                # Multiply the result by the learning rate.
                adjustment = self.lr * (label - self.predict(row))
                
                # Adjust our weigts and bias accordingly.
                self.weight[1:] += adjustment * row
                self.weight[0] += adjustment
                
                # Add up our errors for each epoch.
                error += adjustment != 0.0
                
            # Make a list of number of errors per epoch.
            self.errors_.append(error)

            # If we've been correct each time for a number of rounds, stop already.
            if sum(self.errors_[-self.early_stopping:]) == 0:
#                 print('Stopped Early')
                break
                
        return self


In [3]:
class DoublePerceptron(object):
    """Combines output of two Perceptrons as input to a final Perceptron.
    
    """
    
    def __init__(self):
        self.perc = Perceptron()
        
    def fit(self, X, y):
        """Fit two Perceptrons to the data, zip outputs together to use as input
        for self.perc.
        """
        self.one = Perceptron().fit(X, y)
        self.two = Perceptron().fit(X, y)
        first = self.one.predict(X)
        second = self.two.predict(X)
        inputs = np.array([np.array([one, two]) for one, two in zip(first, second)])
        self.perc.fit(inputs, y)
        
    def _predict(self, X):
        """Use predictions from self.one and self.two to predict yhat from X."""
        first = self.one.predict(X)
        second = self.two.predict(X)

        try:
            inputs = np.array([np.array([one, two]) for one, two in zip(first, second)])
        except TypeError as e:
            inputs = np.array([first, second])
        return self.perc.predict(inputs)


In [4]:
xor = DoublePerceptron()

Xor = np.array([np.array([0, 0]),
                np.array([1, 0]),
                np.array([0, 1]),
                np.array([1, 1])])

yor = np.array([[0], [1], [1], [0]])

xor.fit(Xor, yor)

xor._predict(np.array([np.array([1, 0]),
                      np.array([1, 1]),
                      np.array([0, 1]), 
                      np.array([0, 0])]))

array([ True, False, False, False])

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [5]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.reindex(np.random.permutation(df.index))
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
62,52,1,3,118,186,0,0,190,0,0.0,1,0,1,1
150,66,1,0,160,228,0,0,138,0,2.3,2,0,1,1
239,35,1,0,126,282,0,0,156,1,0.0,2,0,3,0
65,35,0,0,138,183,0,1,182,0,1.4,2,0,2,1
268,54,1,0,122,286,0,0,116,1,3.2,1,2,2,0


In [6]:
X = df.drop(columns='target').values
y = df['target'].values

In [7]:
scaler = StandardScaler()
X = scaler.fit_transform(X)

In [8]:
X.shape, y.shape

((303, 13), (303,))

In [9]:
class MLP(object):
    
    def __init__(self, epochs=10000, learning_rate=0.01, n_input=13, n_hidden=64, n_out=1):
        
        # Initialize hyperparameter variables.
        self.epochs = epochs
        self.lr = learning_rate
        self.n_input = n_input
        self.n_hidden = n_hidden
        self.n_out = n_out
    
        # Initialize weights and biases.
        self.hidden_weight = np.random.random(size=(self.n_input + 1, self.n_hidden))
        self.output_weight = np.random.random(size=(self.n_hidden + 1, self.n_out))
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_prime(self, x):
        return x * (1 - x)
    
    def fit(self, X, y):
        self.errors = []
        for i in range(self.epochs):
            out = self.predict(X)
            self.backpass(X, y, out)
        print(f'Training error at {i} epoch: {self.errors[-1]}')

    def backpass(self, X, y, out):
        y = y.reshape((y.shape[0], 1))
        error = y - out
        
        self.errors.append(np.sum(error**2))
        # Caluculate adjustment from hidden -> output.
        delta_output = self.sigmoid_prime(out) * error
        
        # Calculate error from input -> hidden.
        output_error = delta_output.dot(self.output_weight[1:].T)
        delta_hidden = output_error * self.sigmoid_prime(out)
        
        #Adjust hidden -> output weghts.

        self.output_weight[1:] += self.activated_hidden.T.dot(delta_output) * self.lr
        self.output_weight[0] = np.sum(delta_output)

        self.hidden_weight[1:] += X.T.dot(delta_hidden) * self.lr
        self.hidden_weight[0] = np.sum(delta_hidden)
    
    def predict(self, X):
        inputs = np.dot(X, self.hidden_weight[1:]) + self.hidden_weight[0]
        self.activated_hidden = self.sigmoid(inputs)
        output = np.dot(self.activated_hidden, self.output_weight[1:]) + self.output_weight[0]
        final = self.sigmoid(output)
        return final
        
    def plot_error(self):
        plt.figure(figsize=(8, 8))
        plt.title('Training Error')
        plt.plot(self.errors)


In [10]:
better = MLP()

In [11]:
better.fit(X, y)

Training error at 9999 epoch: 31.522235220614412


In [26]:
p_one = [37, 1, 2, 130, 250, 0, 1, 187, 0, 3.5, 0, 0, 2]
p_one = scaler.transform([p_one])
better.predict(p_one)

array([[0.86302619]])

In [27]:
p_two = [56, 1, 1, 120, 236, 0, 1, 178, 0, 0.8, 2, 0, 2]
p_two = scaler.transform([p_two])
better.predict(p_two)

array([[1.]])

In [28]:
p_three = [63, 0, 0, 108, 269, 0, 1, 169, 1, 1.8, 1, 2, 2]
p_three = scaler.transform([p_three])
better.predict(p_three)

array([[0.08120308]])

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [19]:
def model_creator(optimizer='adam'):
    model = Sequential()
    model.add(Dense(32, activation='relu', input_dim=inputs))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

In [20]:
model = KerasClassifier(build_fn=model_creator, verbose=1)


In [21]:
X, y = X, y

inputs = X.shape[1]
epochs = 20
batch_size = 42


In [22]:
params = {'batch_size': [10, 50, 100, 250, 500, 1000, 2500],
          'epochs': [20]}

grid = GridSearchCV(estimator=model, param_grid=params)
grid_result = grid.fit(X, y)
print(f'Best: {grid_result.best_score_} using {grid_result.best_params_}')
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f'Mean: {mean}, Stdev: {stdev} with : {param}')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20

In [23]:
params = {'optimizer': ['adam', 'adagrad', 'sgd'],
          'epochs': [20]}
grid1 = GridSearchCV(estimator=grid_result.best_estimator_, param_grid= params,
                   n_jobs=-1)
grid_result1 = grid1.fit(X, y, verbose=1)

print(f'Best: {grid_result1.best_score_} using {grid_result1.best_params_}')
means = grid_result1.cv_results_['mean_test_score']
stds = grid_result1.cv_results_['std_test_score']
params = grid_result1.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f'Mean: {mean}, Stdev: {stdev} with : {param}')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Best: 0.8184818442505185 using {'epochs': 20, 'optimizer': 'adagrad'}
Mean: 0.7887788727731988, Stdev: 0.03645332596892404 with : {'epochs': 20, 'optimizer': 'adam'}
Mean: 0.8184818442505185, Stdev: 0.02333685607392056 with : {'epochs': 20, 'optimizer': 'adagrad'}
Mean: 0.7491749206391891, Stdev: 0.05382674417391531 with : {'epochs': 20, 'optimizer': 'sgd'}


In [24]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
