<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** They receive input and pass their signal to the next layer.
- **Input Layer:** Where the data from our dataset goes into the neural network.
- **Hidden Layer:** A layer where data is input to the layer and output comes out. We can't inspect what is happening because it is invisible. Helpful for learning about more complex relationships.
- **Output Layer:** The layer where your answer is being held.
- **Activation:** Decides where a cell "fires" or not. They also decide how much signal is passed onto the next layer.
- **Backpropagation:** It's the process of updating the weights of the neural network, so that it can possibly be more accurate.


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [1]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [2]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


In [3]:
candy.shape

(10000, 3)

### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [32]:
import numpy as np
# Start your candy perceptron here
np.random.seed(66) # reference to one of my favorite sci-fi movies

X = candy.drop(columns=['ate', 'ones']).values
y = candy['ate'].values

weights = 2 * np.random.random((2,1)) - 1

In [33]:
X.shape, X.T.shape, y.shape

((10000, 2), (2, 10000), (10000,))

In [34]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

In [35]:
for iteration in range(5):
    weighted_sum = np.dot(X, weights)
    
    activated_output = sigmoid(weighted_sum)
    
    error = y - activated_output
    
    adjustments = error * sigmoid_derivative(activated_output)
    
    weights = weights + np.dot(X.T, adjustments)
    
print("Output after training")
print(activated_output)
print("The True Output")
print(y)

Output after training
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]]
The True Output
[1 1 1 ... 1 1 1]


In [36]:
activated_output[0][0:10]

array([1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 8.75291772e-139,
       8.75291772e-139, 1.00000000e+000, 8.75291772e-139, 8.75291772e-139,
       8.75291772e-139, 8.75291772e-139])

### Written section (You know what I mean)
#### You could achieve a higher accuracy because any more accurate is 100% accurate
#### and that's not possible because of computer error.

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [37]:
y = candy[['ate']].values

In [38]:
class NeuralNetwork:
    def __init__(self):
        self.inputs = 2
        self.hiddenNodes = 3
        self.outputNodes = 1
        
        self.weights1 = np.random.random((self.inputs, self.hiddenNodes))
        self.weights2 = np.random.random((self.hiddenNodes, self.outputNodes))
        
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        self.hidden_sum = np.dot(X, self.weights1)
        
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    
    def backward(self, X, y, o):
        
        self.o_error = y - o
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        self.z2_error = self.o_delta.dot(self.weights2.T)
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 = self.weights1 + X.T.dot(self.z2_delta)
        self.weights2 = self.weights2 + self.activated_hidden.T.dot(self.o_delta)
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [42]:
nn = NeuralNetwork()

# Number of Epochs / Iterations
for i in range(5):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('-----' * 2 + f'EPOCH {i+1}' + '-----' * 2)
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
    nn.train(X, y)

----------EPOCH 1----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.58792383]
 [0.57860317]
 [0.58792383]
 ...
 [0.58792383]
 [0.58792383]
 [0.57860317]]
Loss: 
 0.2568328621097321
----------EPOCH 2----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.96291051e-40]
 [2.83754380e-40]
 [1.96291051e-40]
 ...
 [1.96291051e-40]
 [1.96291051e-40]
 [2.83754380e-40]]
Loss: 
 0.5
----------EPOCH 3----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.96291051e-40]
 [2.83754380e-40]
 [1.96291051e-40]
 ...
 [1.96291051e-40]
 [1.96291051e-40]
 [2.83754380e-40]]
Loss: 
 0.5
----------EPOCH 4----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 


P.S. Don't try candy gummy bears. They're disgusting. 

## I don't know why it's so bad.

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
241,59,0,0,174,249,0,1,143,1,0.0,1,0,2,0
234,70,1,0,130,322,0,0,109,0,2.4,1,3,2,0
132,42,1,1,120,295,0,1,162,0,0.0,2,0,2,1
93,54,0,1,132,288,1,0,159,1,0.0,2,1,2,1
18,43,1,0,150,247,0,1,171,0,1.5,2,0,2,1


In [2]:
# Check for space values
df.target.value_counts()

1    165
0    138
Name: target, dtype: int64

In [3]:
from sklearn.model_selection import train_test_split
X = df.drop(columns='target').values
y = df[['target']].values

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=66)

scaler = StandardScaler()
X_train = scaler.fit_transform(x_train)
X_test = scaler.transform(x_test)

In [4]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

In [6]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = SGD(learning_rate=learning_rate, momentum=0.0)
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [10, 50, 100, 200],
            'epochs': [10],
            'learning_rate': [.01],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.6847290605159816 using {'batch_size': 10, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.6847290605159816, Stdev: 0.011766094209586466 with: {'batch_size': 10, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.600985216096117, Stdev: 0.07343975869018655 with: {'batch_size': 50, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.6699507524227274, Stdev: 0.11313466257941322 with: {'batch_size': 100, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.6699507368609235, Stdev: 0.02066095132627891 with: {'batch_size': 200, 'epochs': 10, 'learning_rate': 0.01}


In [10]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = SGD(learning_rate=learning_rate, momentum=0.0)
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [10],
            'epochs': [10, 15, 20, 50],
            'learning_rate': [.01],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.7684729096337493 using {'batch_size': 10, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7389162467618294, Stdev: 0.01882143432651619 with: {'batch_size': 10, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.7487684640978357, Stdev: 0.0017546040598203712 with: {'batch_size': 10, 'epochs': 15, 'learning_rate': 0.01}
Means: 0.7586206967020269, Stdev: 0.055506750519219474 with: {'batch_size': 10, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7684729096337493, Stdev: 0.024020643691044476 with: {'batch_size': 10, 'epochs': 50, 'learning_rate': 0.01}


In [11]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = SGD(learning_rate=learning_rate, momentum=0.0)
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [10],
            'epochs': [50],
            'learning_rate': [.1, .01, 1],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7980295393267288 using {'batch_size': 10, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7142857101750492, Stdev: 0.006201682620291183 with: {'batch_size': 10, 'epochs': 50, 'learning_rate': 0.1}
Means: 0.7980295393267288, Stdev: 0.04846886891686785 with: {'batch_size': 10, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7290640317747745, Stdev: 0.02693259615722777 with: {'batch_size': 10, 'epochs': 50, 'learning_rate': 1}


In [13]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = Adam(learning_rate=learning_rate)
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [10, 20, 50, 100],
            'epochs': [50],
            'learning_rate': [.01],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.8029556855779564 using {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7142857101750492, Stdev: 0.006201682620291183 with: {'batch_size': 10, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7487684861192563, Stdev: 0.030790876909517217 with: {'batch_size': 20, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7487684773106881, Stdev: 0.034734607200330204 with: {'batch_size': 50, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.8029556855779564, Stdev: 0.03354365054101564 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.01}


In [14]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = Adam(learning_rate=learning_rate)
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [100],
            'epochs': [10, 20, 50, 100, 1000],
            'learning_rate': [.01],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.7980295393267288 using {'batch_size': 100, 'epochs': 100, 'learning_rate': 0.01}
Means: 0.7832512200759549, Stdev: 0.03405604441427941 with: {'batch_size': 100, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.7832512312334746, Stdev: 0.02903465614056324 with: {'batch_size': 100, 'epochs': 20, 'learning_rate': 0.01}
Means: 0.7931034573780492, Stdev: 0.019506902146222832 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7980295393267288, Stdev: 0.005573408621184634 with: {'batch_size': 100, 'epochs': 100, 'learning_rate': 0.01}
Means: 0.7438423510255485, Stdev: 0.018135413342920563 with: {'batch_size': 100, 'epochs': 1000, 'learning_rate': 0.01}


In [16]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = Adam(learning_rate=learning_rate)
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [100],
            'epochs': [50],
            'learning_rate': [1, .1, .01],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7586206922977429 using {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.01}
Means: 0.7142856990175294, Stdev: 0.008979361871070764 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 1}
Means: 0.6945812666944682, Stdev: 0.02455756050559572 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.1}
Means: 0.7586206922977429, Stdev: 0.01228214433497244 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.01}
