<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** A single 'function' that given a set of inputs returns some output
- **Input Layer:** Here each feature of the data is treated asif an output from some earlier layer, only the 'earlier layer' is the raw data.
- **Hidden Layer:** The values being passed to this layer have already been abstracted from the raw data and is no longer obviously interpretable by a human.
- **Output Layer:** This is the interpretable conclusion of the neural net, it's prediction.
- **Activation:** If the sum of the weighted outputs from the earlier outputs meet the neuron's activation threshhold then it will fire causing its own weighted value to be passed forward.
- **Backpropagation:**


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [1]:
import pandas as pd
import numpy as np
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [2]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. You will not be able to achieve more than ~50% with the simple perceptron. Explain why you could not achieve a higher accuracy with the *simple perceptron* architecture, because it's possible to achieve ~95% accuracy on this dataset. Provide your answer in markdown (and *optional* data anlysis code) after your perceptron implementation. 

In [3]:
X = candy[['chocolate', 'gummy']].values
y = candy[['ate']].values

In [4]:
# Start your candy perceptron here
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx / (1 - sx)

np.random.seed(42)
weights = np.random.random((2,1))



weighted_sum = np.dot(X, weights)

# Output activated value (first iteration of predictions)
activated_output = sigmoid(weighted_sum)

error = y - activated_output

adjustments = error * sigmoid_derivative(activated_output)


In [5]:
# Train Perceptron
# Now, Update the weights n times, which should reduce the error
num_iterations = 1000
for iteration in range(num_iterations):
    
    # Weighted sum of inputs / weights
    weighted_sum = np.dot(X, weights)

    # Activate
    activated_output = sigmoid(weighted_sum)

    # Calculate the error
    error = y - activated_output

    # Make adjustments informed by the error
    adjustments = error * sigmoid_derivative(activated_output)

    # Update the weights
    weights += np.dot(X.T, adjustments)

accuracy = 1 - np.mean(np.abs(y - activated_output))
print("Acuracy after training: \n", accuracy)

  This is separate from the ipykernel package so we can avoid doing imports until
Acuracy after training: 
 0.3882


### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [7]:
class NeuralNetwork:
    def __init__(self):
        # Set up Architecture of Neural Network
        self.inputs = 2
        self.hiddenNodes = 3
        self.outputNodes = 1

        # Initial Weights
        # 2x3 Matrix Array for the First Layer
        self.weights1 = np.random.rand(self.inputs, self.hiddenNodes)
       
        # 3x1 Matrix Array for Hidden to Output
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        sx = self.sigmoid(s)
        return sx * (1 - sx)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        aka "predict"
        """
        
        # Weighted sum of inputs => hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
        
    def backward(self, X,y,o):
        """
        X, input matrix
        y, output matix
        o, error
        Backward propagate through the network
        """
        
        # Error in Output
        self.o_error = y - o
        
        # Apply Derivative of Sigmoid to error
        # How far off are we in relation to the Sigmoid f(x) of the output
        # ^- aka hidden => output
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        # z2 error
        self.z2_error = self.o_delta.dot(self.weights2.T)
        
        # How much of that "far off" can explained by the input => hidden
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        # Adjustment to first set of weights (input => hidden)
        self.weights1 += X.T.dot(self.z2_delta)
        # Adjustment to second set of weights (hidden => output)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)

In [10]:
# Train my 'net
nn = NeuralNetwork()

# Number of Epochs / Iterations
for i in range(10000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
    nn.train(X,y)

+---------EPOCH 1---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.66585232]
 [0.69899744]
 [0.66585232]
 ...
 [0.66585232]
 [0.66585232]
 [0.69899744]]
Loss: 
 0.28281336482874414
+---------EPOCH 2---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.49987818]
 [0.49996918]
 [0.49987818]
 ...
 [0.49987818]
 [0.49987818]
 [0.49996918]]
Loss: 
 0.20118402677336614
+---------EPOCH 3---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.5]
 [0.5]
 [0.5]
 ...
 [0.5]
 [0.5]
 [0.5]]
Loss: 
 0.20115
+---------EPOCH 4---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.5]
 [0.5]
 [0.5]
 ...
 [0.5]
 [0.5]
 [0.5]]
Loss: 
 0.20115
+---

In [11]:
nn.o_error

array([[0.5],
       [0.5],
       [0.5],
       ...,
       [0.5],
       [0.5],
       [0.5]])

P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
264,54,1,0,110,206,0,0,108,1,0.0,1,1,2,0
143,67,0,0,106,223,0,1,142,0,0.3,2,2,2,1
112,64,0,2,140,313,0,1,133,0,0.2,2,0,3,1
19,69,0,3,140,239,0,1,151,0,1.8,2,2,2,1
5,57,1,0,140,192,0,1,148,0,0.4,1,0,1,1


In [2]:
scaler = StandardScaler()

X = df.drop(columns="target").values
X = scaler.fit_transform(X)

y = df["target"].values

In [3]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

In [4]:
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Important Hyperparameters
inputs = X_train.shape[1]
epochs = 50
batch_size = 5


# Create Model
model = Sequential()
model.add(Dense(13, activation='relu', input_shape=(inputs,)))
model.add(Dense(13, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Fit Model
model.fit(X_train, y_train, 
          validation_data=(X_test,y_test), 
          epochs=epochs, 
          batch_size=batch_size,
          # For ease of grading
          verbose=True
         )

Train on 242 samples, validate on 61 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x7fe89c3206d0>

In [7]:
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(13, activation='relu', input_shape=(inputs,)))
    model.add(Dense(13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.818367350101471 using {'batch_size': 10, 'epochs': 20}
Means: 0.818367350101471, Stdev: 0.02927123513828241 with: {'batch_size': 10, 'epochs': 20}
Means: 0.8059523940086365, Stdev: 0.032449815235266084 with: {'batch_size': 20, 'epochs': 20}
Means: 0.7686224460601807, Stdev: 0.03281188206591915 with: {'batch_size': 40, 'epochs': 20}
Means: 0.7688775539398194, Stdev: 0.036053416227158915 with: {'batch_size': 60, 'epochs': 20}
Means: 0.669132649898529, Stdev: 0.05870328380394499 with: {'batch_size': 80, 'epochs': 20}
Means: 0.7151360630989074, Stdev: 0.07725143296555516 with: {'batch_size': 100, 'epochs': 20}


In [9]:
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(13, activation='relu', input_shape=(inputs,)))
    model.add(Dense(13, activation='relu'))
    model.add(Dense(13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.7811224460601807 using {'batch_size': 10, 'epochs': 20}
Means: 0.7811224460601807, Stdev: 0.027004693827132498 with: {'batch_size': 10, 'epochs': 20}
Means: 0.7728741526603699, Stdev: 0.05801188858662839 with: {'batch_size': 20, 'epochs': 20}
Means: 0.7727891206741333, Stdev: 0.04327772201819599 with: {'batch_size': 40, 'epochs': 20}
Means: 0.7442176938056946, Stdev: 0.061892027801758294 with: {'batch_size': 60, 'epochs': 20}
Means: 0.7357142925262451, Stdev: 0.0623604973678458 with: {'batch_size': 80, 'epochs': 20}
Means: 0.7193877577781678, Stdev: 0.06579511394909926 with: {'batch_size': 100, 'epochs': 20}


Adding an additional hidden layer resulted in lower accuracy of 0.79

Reverting to single hidden layer and testing different activation functions. Keeping Best 'batch_size' of 20 and 

In [11]:
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(13, activation='relu', input_shape=(inputs,)))
    model.add(Dense(13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [5, 7, 9, 11],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.8431972861289978 using {'batch_size': 7, 'epochs': 20}
Means: 0.826955771446228, Stdev: 0.05756494465359904 with: {'batch_size': 5, 'epochs': 20}
Means: 0.8431972861289978, Stdev: 0.029773307080887948 with: {'batch_size': 7, 'epochs': 20}
Means: 0.8267857193946838, Stdev: 0.04683868818642726 with: {'batch_size': 9, 'epochs': 20}
Means: 0.7810374140739441, Stdev: 0.04041934800643247 with: {'batch_size': 11, 'epochs': 20}


Best batch size determined to be x: 5<x<9

In [13]:
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(13, activation='relu', input_shape=(inputs,)))
    model.add(Dense(13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [6,7,8],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.8185374140739441 using {'batch_size': 8, 'epochs': 20}
Means: 0.8142006874084473, Stdev: 0.033476852743194896 with: {'batch_size': 6, 'epochs': 20}
Means: 0.7854591846466065, Stdev: 0.04807963341270975 with: {'batch_size': 7, 'epochs': 20}
Means: 0.8185374140739441, Stdev: 0.053428442694877495 with: {'batch_size': 8, 'epochs': 20}


Accuracy lower than previous fit, but 8 consistantly wins out as best batch size.
Testing alternative activation function then concluding with Grid CV of epochs

In [16]:
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(13, activation='relu', input_shape=(inputs,)))
    model.add(Dense(13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
param_grid = {'batch_size': [8],
              'epochs': [20, 40, 80, 160]}
              

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")

Best: 0.818367338180542 using {'batch_size': 8, 'epochs': 80}
Means: 0.8141156435012817, Stdev: 0.03440611883195407 with: {'batch_size': 8, 'epochs': 20}
Means: 0.7852891206741333, Stdev: 0.019124152431700438 with: {'batch_size': 8, 'epochs': 40}
Means: 0.818367338180542, Stdev: 0.02927122904326694 with: {'batch_size': 8, 'epochs': 80}
Means: 0.7893707513809204, Stdev: 0.048782845522695385 with: {'batch_size': 8, 'epochs': 160}
