<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** An individual node that takes in an input and gives an activation output
- **Input Layer:** The first layer of nodes that takes the inputs of the model
- **Hidden Layer:** Layers inbetween input and output nodes that are black box models
- **Output Layer:** Last layer to have the final output value/predicted values
- **Activation:** Commonly relu or sigmoid, used to convert the input signal to an output after weights/biases
- **Backpropagation:** Feeds information back to the previous nodes based on the predicted output errors in order to learn


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [1]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [3]:
print(candy.shape)
candy.head()

(10000, 3)


Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [58]:
# Start your candy perceptron here

X = candy[['chocolate', 'gummy']].values
y = candy['ate'].values

In [115]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

class Perceptron(object):
    
    def __init__(self, no_of_x, niter=1000, learning_rate=0.01):
        self.niter = niter
        self.learning_rate = learning_rate
        self.weights = np.zeros(no_of_x + 1)
        self.errors_ = []
    
    def transform_df(self, df, target):
        X = df.drop(columns=target).to_numpy()
        y = df[target].to_numpy()
        X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)
        y_train = y_train.reshape(-1, 1)
        y_test = y_test.reshape(-1, 1)
        return X_train, X_test, y_train, y_test
    
    def net_input(self, X_test):
        """Calculate net input"""
        return np.dot(X_test, self.weights[1:]) + self.weights[0]

    def predict(self, X_test):
        """Return class label after unit step"""
        return np.where(self.net_input(X_test) >= 0.0, 1, -1)
    
    def Errors(self):
        return self.errors_

    def train(self, X_train, y_train):        
        for _ in range(self.niter):
            for row, target in zip(X_train, y_train):
                prediction = self.predict(row)
                delta_w = self.learning_rate * (target - prediction)
                self.weights[1:] += delta_w * row
                self.weights[0] += delta_w
                
    def rmse(self, X_test, y_test):
        return np.sqrt(np.mean((self.predict(X_test) - y_test) ** 2))
    
    def acc(self, y_true, y_pred):
        return accuracy_score(y_true, y_pred)

    
pn = Perceptron(2)
X_train, X_test, y_train, y_test = pn.transform_df(candy, 'ate')
pn.train(X_train, y_train)
y_pred = pn.predict(X_test)
pn.acc(y_test, y_pred)

0.493

#### With a simple perceptron, you have no hidden layers or backprop to help the model identify different aspects of the dataset and learn from its mistakes.

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [111]:
class NeuralNetwork:
    def __init__(self, input_size=2, hidden_size=8, output_size=1, lr=0.01):
        # Set up Architecture of Neural Network
        self.lr = lr

        # Initial Weights
        self.weights1 = np.random.rand(input_size, hidden_size) * 0.1
       
        self.weights2 = np.random.rand(hidden_size, output_size) * 0.1
    
        self.Bh = np.zeros(hidden_size)
        
        self.Bo = np.zeros(output_size)
        
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def relu(self, s):
        return np.maximum(0, s)
    
    def leaky_relu(self, s):
        return np.maximum(0.1 * s, s)
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        aka "predict"
        """
        
        # Weighted sum of inputs => hidden layer
        self.hidden_sum = np.dot(X, self.weights1) + self.Bh

        
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2) + self.Bo
        
        # Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
        
    def backward(self, X,y,o):
        """
        Backward propagate through the network
        """
        
        # Error in Output
        self.o_error = y - o
        
        # Apply Derivative of Sigmoid to error
        # How far off are we in relation to the Sigmoid f(x) of the output
        # ^- aka hidden => output
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        # z2 error
        self.z2_error = self.o_delta.dot(self.weights2.T)
        # How much of that "far off" can explained by the input => hidden
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        # Adjustment to first set of weights (input => hidden)
        self.weights1 += X.T.dot(self.z2_delta) * self.lr
        # Adjustment to second set of weights (hidden => output)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) * self.lr
        # Adjustments to hidden bias
        self.Bh += np.sum(self.z2_delta) * self.lr
        #Adjustments to output bias
        self.Bo += np.sum(self.o_delta) * self.lr

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)
    
    def acc(self, y_true):
        y_pred = self.feed_forward(X)
        return accuracy_score(y_true, y_pred)
        

nn = NeuralNetwork()

for i in range(8000):
    if ((i+1) % 2000 == 0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---' * 3 + '+')
        print('Error', nn.o_error)
        print('Input: \n', X_test)
        print('Actual Output: \n', y_test)
        print('Predicted Output: \n', str(nn.feed_forward(X_test)))
        print('Loss: \n', str(np.mean(np.square(y_test - nn.feed_forward(X_test)))))
    nn.train(X_train, y_train)
    
print(y_test - nn.feed_forward(X_test))

+---------EPOCH 2000---------+
Error [[-0.02743428]
 [-0.02743428]
 [ 0.04360077]
 ...
 [-0.02743428]
 [-0.02743428]
 [ 0.04360077]]
Input: 
 [[1 0]
 [0 0]
 [0 1]
 ...
 [0 1]
 [1 1]
 [0 0]]
Actual Output: 
 [[1]
 [0]
 [1]
 ...
 [1]
 [0]
 [0]]
Predicted Output: 
 [[0.95636407]
 [0.06413145]
 [0.94726679]
 ...
 [0.94726679]
 [0.02766046]
 [0.06413145]]
Loss: 
 0.0432489973238964
+---------EPOCH 4000---------+
Error [[-0.0560534 ]
 [-0.0560534 ]
 [ 0.05316808]
 ...
 [-0.0560534 ]
 [-0.0560534 ]
 [ 0.05316808]]
Input: 
 [[1 0]
 [0 0]
 [0 1]
 ...
 [0 1]
 [1 1]
 [0 0]]
Actual Output: 
 [[1]
 [0]
 [1]
 ...
 [1]
 [0]
 [0]]
Predicted Output: 
 [[0.94683193]
 [0.06131042]
 [0.94458348]
 ...
 [0.94458348]
 [0.05605398]
 [0.06131042]]
Loss: 
 0.04313152677004808
+---------EPOCH 6000---------+
Error [[-0.05671284]
 [-0.05671284]
 [ 0.05315465]
 ...
 [-0.05671284]
 [-0.05671284]
 [ 0.05315465]]
Input: 
 [[1 0]
 [0 0]
 [0 1]
 ...
 [0 1]
 [1 1]
 [0 0]]
Actual Output: 
 [[1]
 [0]
 [1]
 ...
 [1]
 [0]
 [

In [113]:
y_pred = nn.feed_forward(X_test)
rounded = [round(x[0]) for x in y_pred]
y_pred1 = np.array(rounded, dtype='int64')
accuracy_score(y_test, y_pred1)

0.955

#### I should be getting higher results since the network is able to teach itself from its mistakes by calculating the error and increasing or decreasing the bias accordingly
#### After fine tuning and removing an error from stratifying during train test split, my loss drastically reduced as it was able to iterate over the error and adjust the weights.

P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [53]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
257,50,1,0,144,200,0,0,126,1,0.9,1,0,3,0
160,56,1,1,120,240,0,1,169,0,0.0,0,0,2,1
266,55,0,0,180,327,0,2,117,1,3.4,1,0,2,0
98,43,1,2,130,315,0,1,162,0,1.9,2,1,2,1
200,44,1,0,110,197,0,0,177,0,0.0,2,1,2,0


In [51]:
Xk = df.drop(columns='target')
yk = df['target']

sc = StandardScaler()
Xk = sc.fit_transform(Xk)

Xk_train, Xk_test, yk_train, yk_test = train_test_split(Xk, yk, stratify=yk)

In [120]:
# Baseline model with low hyperparams for abtch and epoch

def create_model():
    # create model
    model = Sequential()
    
    model.add(Dense(20, input_dim=13, activation='relu'))
    
    model.add(Dense(20, activation='relu'))
    
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])
    
    return model

# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, epochs=10, verbose=1)
model.fit(Xk_train, yk_train)

Train on 227 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fdc6d948e80>

In [77]:
# Testing for batch size first

def create_model():
    # create model
    model = Sequential()
    
    model.add(Dense(20, input_dim=13, activation='relu'))
    
    model.add(Dense(20, activation='relu'))
    
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])
    
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

p = {'batch_size': [10, 20, 50, 200], 'epochs': [20], }
              
              
# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=p, n_jobs=1)
grid_result = grid.fit(Xk_train, yk_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.814977974618584 using {'batch_size': 10, 'epochs': 20}
Means: 0.814977974618584, Stdev: 0.044216727600996604 with: {'batch_size': 10, 'epochs': 20}
Means: 0.7929515373864363, Stdev: 0.0215827404386921 with: {'batch_size': 20, 'epochs': 20}
Means: 0.7929515520906658, Stdev: 0.027277156580766918 with: {'batch_size': 50, 'epochs': 20}
Means: 0.6343612347930538, Stdev: 0.09539719978617034 with: {'batch_size': 200, 'epochs': 20}


In [56]:
# 10 batch size was best, now to test for epochs

def create_model():
    # create model
    model = Sequential()
    
    model.add(Dense(20, input_dim=13, activation='relu'))
    
    model.add(Dense(20, activation='relu'))
    
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])
    
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

p = {'batch_size': [10], 'epochs': [20, 40, 60, 80, 100] }
              
              
# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=p, n_jobs=1)
grid_result = grid.fit(Xk_train, yk_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.837004405023768 using {'batch_size': 10, 'epochs': 40}
Means: 0.8017621260907681, Stdev: 0.03981353037095379 with: {'batch_size': 10, 'epochs': 20}
Means: 0.837004405023768, Stdev: 0.022725458980421796 with: {'batch_size': 10, 'epochs': 40}
Means: 0.8061674058699922, Stdev: 0.01188552266062769 with: {'batch_size': 10, 'epochs': 60}
Means: 0.7797356836071099, Stdev: 0.022077530515565644 with: {'batch_size': 10, 'epochs': 80}
Means: 0.8105726953645109, Stdev: 0.016266509406020146 with: {'batch_size': 10, 'epochs': 100}


In [79]:
model = KerasClassifier(build_fn=create_model, verbose=1, batch_size=10, epochs=40)
model.fit(Xk_train, yk_train)

Train on 227 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<tensorflow.python.keras.callbacks.History at 0x7fda790a7be0>

### The accuracy went up due to tuning, however more hyperparams need to be added for better performance.
### Also the hyperparams best picked from grid search were close to the low params chosen for baseline.