<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** A function that receives an input and pass it to the next layer of nodes if a certain threshold is reached
- **Input Layer:** A layer that receives input from dataset to be passed to network
- **Hidden Layer:** The layer in between input and output. Hyperparameter, functions
- **Output Layer:** Output vector of values
- **Activation:** Transform output value into format that fits the context
- **Backpropagation:** The process by which weights in the Neural Network are adjusted. This is where Gradient Descent usually comes in. Weights are adjusted in the direction that minimizes some defined loss function.


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [0]:
import pandas as pd
candy = pd.read_csv('/content/chocolate_gummy_bears.csv')

In [0]:
import numpy as np

In [88]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


In [89]:
candy.shape

(10000, 3)

### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [0]:
# Start your candy perceptron here

X = candy[['chocolate', 'gummy']].values
y = candy['ate'].values

In [97]:
X.shape, y.shape

((10000, 2), (10000,))

In [0]:
#Manual split train, test 70%, 30%

X_train = X[0:7000]
y_train = y[0:7000]
X_test = X[7000:10000]
y_test = y[7000:10000]

In [0]:
#Straight up from 1st lecture note

class Perceptron(object):
  def __init__(self, rate = 0.01, niter = 10):
    self.rate = rate
    self.niter = niter

  def fit(self, X, y):
    """Fit training data
    X : Training vectors, X.shape : [#samples, #features]
    y : Target values, y.shape : [#samples]
    """

    # weights
    self.weight = np.zeros(1 + X.shape[1])

    # Number of misclassifications
    self.errors = []  # Number of misclassifications

    for i in range(self.niter):
      err = 0
      for xi, target in zip(X, y):
        delta_w = self.rate * (target - self.predict(xi))
        self.weight[1:] += delta_w * xi
        self.weight[0] += delta_w
        err += int(delta_w != 0.0)
      self.errors.append(err)
    return self

  def net_input(self, X):
    """Calculate net input"""
    return np.dot(X, self.weight[1:]) + self.weight[0]

  def predict(self, X):
    """Return class label after unit step"""
    return np.where(self.net_input(X) >= 0.0, 1, -1)

In [104]:
pn = Perceptron(0.1, 10)
pn.fit(X_train, y_train)

<__main__.Perceptron at 0x7f5806b69e10>

In [105]:
from sklearn.metrics import accuracy_score
pred = pn.predict(X_test)
accuracy_score(pred, y_test)

0.5036666666666667

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [55]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


In [56]:
sc = StandardScaler()
dataset = candy.values
# split into input (X) and output (y) variables
X = dataset[:,:-1]
X = sc.fit_transform(X)
y = dataset[:,-1]
y = y.reshape(y.shape[0], 1)
print(X.shape)
print(y.shape)

(10000, 2)
(10000, 1)


In [0]:
class NeuralNetwork:
    def __init__(self):
        # Set up Architecture of Neural Network
        self.inputs = X.shape[1]
        self.hiddenNodes = 13
        self.outputNodes = 1
        
        # Initial Weights
        # 13x26 Matrix Array for the First Layer
        self.weights1 = 2 * np.random.randn(self.inputs, self.hiddenNodes) - 1
        # 26x1 Matrix Array for Hidden to Output
        self.weights2 = 2 * np.random.rand(self.hiddenNodes, self.outputNodes) - 1
        
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        aka "predict"
        """
        
        # Weighted sum of inputs => hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    def backward(self, X, y, o):
        """
        Backward propagate through the network
        """
        
        # Error in output
        self.o_error = y - o
        
        # Apply derivative of sigmoid to error
        # How far off are we in relation to the Sigmoid f(x) of the output
        # ^- hidden => output
        self.o_delta = self.o_error * self.sigmoidPrime(o) # apply derivative of sigmoid to error
        
        # z2 error
        self.z2_error = self.o_delta.dot(self.weights2.T) # how much the hidden layer weights were off
        # How much of that "far off" can be explained by the inputs => hidden layer
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        # Adjustment to first set of weights (input => hidden)
        self.weights1 += X.T.dot(self.z2_delta)
        
        # Adjustment to second set of weights (hidden => output)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        
    def train(self, X, y):
        # Output
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [60]:
nn = NeuralNetwork()
# number of epochs / iterations
loss_lst = []
for i in range(2):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 50 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        loss = np.mean(np.square(y - nn.feed_forward(X)))
#         if loss < .1:
#             print('Input: \n', X)
#             print('First 5 Actual Output: \n', y[:5])
#             print('First 5 Predicted Output: \n', str(nn.feed_forward(X)[:5]))
#             print("Total Loss: \n", str(loss))
#             break
        print("Accuracy: \n", str(loss))
    loss_lst.append(loss)
    nn.train(X,y)

+---------EPOCH 1---------+
Accuracy: 
 0.2541104685957717
+---------EPOCH 2---------+
Accuracy: 
 0.5


  from ipykernel import kernelapp as app


P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [28]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
246,56,0,0,134,409,0,0,150,1,1.9,1,2,3,0
160,56,1,1,120,240,0,1,169,0,0.0,0,0,2,1
274,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0
248,54,1,1,192,283,0,0,195,0,0.0,2,1,3,0
69,62,0,0,124,209,0,1,163,0,0.0,2,0,2,1


In [0]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

from tensorflow import keras
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [30]:
sc = StandardScaler()
df = sc.fit_transform(df)
# split into input (X) and output (y) variables
X = df[:,:-1]
y = df[:,-1]

print(X.shape)
print(y.shape)

(303, 13)
(303,)


In [0]:
# Create model function for Keras Classifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

## Baseline Model <a id="Q3=2"></a>

In [39]:
# Baseline Model

num_neurons = 13
epochs = 10
batch_size = 13
init_mode = 'normal'
learning_rate = .5

# create model
model = KerasClassifier(build_fn=create_model, epochs=epochs, batch_size=batch_size, verbose=1)
model.fit(X, y, validation_split=.1)

Train on 272 samples, validate on 31 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f580c1632b0>

## Hyperparameter Tune (3 models: batch size tune, epochs tune, num eurons and learning rate tune) <a id="Q3=2"></a>

In [40]:
# GridSearchCV to hyperparameter
# Tune Batch size

# Hyperparameters
num_neurons = 13
init_mode = 'normal'
learning_rate = .5
# Grid Search parameters

param_grid = {'batch_size': [10, 30, 50, 100],
              'epochs': [10, 20]}
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}\n")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.7953795393308004 using {'batch_size': 10, 'epochs': 20}

Means: 0.7788778940836588, Stdev: 0.0673137688972482 with: {'batch_size': 10, 'epochs': 10}
Means: 0.7953795393308004, Stdev: 0.06484117772000389 with: {'batch_size': 10, 'epochs': 20}
Means: 0.6864686409632365, Stdev: 0.07063015293313091 with: {'batch_size': 30, 'epochs': 10}
Means: 0.7821782231330872, Stdev: 0.04501048723594382 with: {'batch_size': 30, 'epochs': 20}
Means: 0.5973597466945648, Stdev: 0.11724303580608335 with: {'batch_size': 50, 'epochs': 10}
Means: 0.6732673048973083, Stdev: 0.021388576788767738 with: {'batch_size': 50, 'epochs': 20}
Means: 0.6468646923700968, Stdev: 0.16335800976563744 with: {'batch_size': 100, 'epochs': 10}
Means: 0.627062718073527, Stdev: 0.07335350495647412 with: {'batch_size': 100, 'epochs': 20}


In [41]:
# GridSearchCV to hyperparameter
# Tune epochs!!!

# Hyperparameters
num_neurons = 13
init_mode = 'normal'
learning_rate = .5
# Grid Search parameters
param_grid = {'batch_size': [10, 30, 50, 100],
              'epochs': [5, 10, 20, 50, 80]}
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}\n")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.8415841658910116 using {'batch_size': 10, 'epochs': 50}

Means: 0.735973596572876, Stdev: 0.04148451343229978 with: {'batch_size': 10, 'epochs': 5}
Means: 0.7524752616882324, Stdev: 0.03523787151819811 with: {'batch_size': 10, 'epochs': 10}
Means: 0.8052805264790853, Stdev: 0.05382673785375711 with: {'batch_size': 10, 'epochs': 20}
Means: 0.8415841658910116, Stdev: 0.04501051345849255 with: {'batch_size': 10, 'epochs': 50}
Means: 0.8118811845779419, Stdev: 0.0646729772350719 with: {'batch_size': 10, 'epochs': 80}
Means: 0.6138613820075989, Stdev: 0.05829543388949175 with: {'batch_size': 30, 'epochs': 5}
Means: 0.6633663177490234, Stdev: 0.03523787151819811 with: {'batch_size': 30, 'epochs': 10}
Means: 0.801980197429657, Stdev: 0.05048532667293615 with: {'batch_size': 30, 'epochs': 20}
Means: 0.7953795393308004, Stdev: 0.040689189213402255 with: {'batch_size': 30, 'epochs': 50}
Means: 0.8217821717262268, Stdev: 0.05048532667293615 with: {'batch_size': 30, 'epochs': 80}
Means: 0.

In [42]:
# Hyperparameters
# Tune num_neurons and learning rate

num_neurons = 21
init_mode = 'normal'
learning_rate = .3
# Grid Search parameters

param_grid = {'batch_size': [10, 30, 50, 100],
              'epochs': [10, 20]}
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}\n")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.8283828298250834 using {'batch_size': 10, 'epochs': 20}

Means: 0.7887788812319437, Stdev: 0.033656884448624094 with: {'batch_size': 10, 'epochs': 10}
Means: 0.8283828298250834, Stdev: 0.048728774455357206 with: {'batch_size': 10, 'epochs': 20}
Means: 0.660066028436025, Stdev: 0.08177895876323736 with: {'batch_size': 30, 'epochs': 10}
Means: 0.7986798683802286, Stdev: 0.024697401133156067 with: {'batch_size': 30, 'epochs': 20}
Means: 0.5973597367604574, Stdev: 0.025986816922028384 with: {'batch_size': 50, 'epochs': 10}
Means: 0.7029703060785929, Stdev: 0.029147743940335084 with: {'batch_size': 50, 'epochs': 20}
Means: 0.6105610529581705, Stdev: 0.03267159071396789 with: {'batch_size': 100, 'epochs': 10}
Means: 0.6765676538149515, Stdev: 0.06174352407290809 with: {'batch_size': 100, 'epochs': 20}
