<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**

A neuron is a single node of a neural network that takes input, sums values of the input, applies an activation function,
and then sends output.

- **Input Layer:** 

The only layer of the neural network connected to the dataset, where each feature used in the model is input.

- **Hidden Layer:**

A hidden layer is a layer between input and output. They are considered hidden because they take input from
the previous layer as opposed to the dataset.

- **Output Layer:**

The layer of the neural network where the final value is output. The layer has One output node for regression problems
and for classification problems there are as many output nodes as there are categories.

- **Activation:**

The activation function is a function applied to the weighted sum before output from one layer moves on to the next.

- **Backpropagation:**

Back propagration is the method of using information from the cost function to recursivley update the weights
of previous layers to reduce loss and improve the model.



## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

<!--
If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)
-->

In [1]:
import numpy as np
import pandas as pd
from collections import Counter

import tensorflow.keras as keras
from tensorflow.keras.optimizers import Adam, Nadam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split as tts
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV

import warnings
warnings.filterwarnings("ignore")

In [2]:
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [3]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


In [4]:
# checks baseline
candy['ate'].value_counts()

# 50/50 split so baseline would be 50% to always guess one way

1    5000
0    5000
Name: ate, dtype: int64

### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [5]:
# Start your candy perceptron here
X = np.array(candy[['chocolate', 'gummy']].values)
y = np.array(candy['ate'].values)

# reshapes y
y = y.reshape(10000, 1)

In [33]:
# setting sigmoid functions

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

# defines a function to create binary predictions
def binary_predict(x):
    predictions = []
    for i in x:
        if i < .5:
            predictions.append([0])
        else:
            predictions.append([1])
    return(predictions)

In [73]:
# set iterations
iters = 10000

# set learning rate
lr = .1

# set initial weights
weights = np.random.random((2,1))

# updates weights for the ammount of times in iters
for _ in range(iters):
    
    # gets weighted sums
    w_sum = np.dot(X, weights)

    # activates weighted sums
    act_out = 1 / (1 + np.exp(-w_sum))

    # gets error
    error = y - act_out

    # finds adjustment value and applies learning rate
    adjustments = lr * error * sigmoid_derivative(act_out)

    # updates weights
    weights += np.dot(X.T, adjustments)
     
# returns most recent activation values        
print(f'Activation values:\n {act_out}\n')
print(f'Average error: {(np.sqrt(np.square(error)).mean())}')

Activation values:
 [[0.99776025]
 [0.00167836]
 [0.99776025]
 ...
 [0.99776025]
 [0.99776025]
 [0.00167836]]

Average error: 0.48417630563846603


In [74]:
# gets predictions
y_pred = binary_predict(act_out)
correct_predictions = np.sum(y_pred == y)
total_predictions = len(y)
accuracy = correct_predictions / total_predictions
print(f'Acuracy score: {accuracy}')

# accuracy only about half that of the baseline majority class

Acuracy score: 0.4993


The accuracy for the perceptron is low because this is very similar to the XOR problem.
If a pieice of candy is just gummy or just choclate a child is very likley to eat it,
but if the candy is both or neither the child is not likley to eat it. The perceptron
can not seperate these outcomes effectivly and thus has a poor accuracy.

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [117]:
class NeuralNetwork:
    
    def __init__(self, inputs, hiddenNodes, outputNodes=1):
        self.inputs = inputs
        self.hiddenNodes =  hiddenNodes
        self.outputNodes = outputNodes
        self.inputWeights = np.random.randn(self.inputs, self.hiddenNodes)
        self.hiddenWeights = np.random.randn(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
        
    def sigmoidPrime(self, s):
        return s * (1-s)
    
    def feed_forward(self, X):
        self.hiddenSum = np.dot(X, self.inputWeights)
        self.hiddenActv = self.sigmoid(self.hiddenSum)
        self.outputSum = np.dot(self.hiddenActv, self.hiddenWeights)
        self.outputActv = self.sigmoid(self.outputSum)
        
        return(self.outputActv) # used as output for backward
    
    def backward(self, X, y, output, learning_rate=.1):
        self.outputError = y - output
        self.outputDelta = learning_rate * (self.outputError * self.sigmoidPrime(output)) 
        self.hiddenError = self.outputDelta.dot(self.hiddenWeights.T)
        self.hiddenDelta = self.hiddenError * self.sigmoidPrime(self.hiddenActv)
        self.hiddenWeights += self.hiddenActv.T.dot(self.outputDelta)
        self.inputWeights += X.T.dot(self.hiddenDelta)
        
    def train(self, X, y, epochs=100, learning_rate=.1):
        for _ in range(epochs):
            output = self.feed_forward(X)
            self.backward(X, y, output, learning_rate=learning_rate)
        self.loss = np.mean(np.square(y - self.feed_forward(X)))
        print(f'\nLoss after {epochs} epochs was {self.loss}\n')
    
    def predict(self, X):
        output = self.feed_forward(X)
        predictions = []
        for i in output:
            if i[0] >= 0.5:
                predictions.append([1])
            else:
                predictions.append([0])
        self.predictions = np.array(predictions)
        
    def check(self, y):
        correct_predictions = np.sum(self.predictions == y)
        total_predictions = len(self.predictions)
        accuracy = correct_predictions / total_predictions
        return(f'The model had a {accuracy} accuracy score')

In [122]:
# creates an instance of neural network class
nn = NeuralNetwork(inputs=2, hiddenNodes=8)

# trains network and gets loss
nn.train(X, y, epochs=10000, learning_rate=.01)

# creates predictions
nn.predict(X)

# checks the models accuracy
nn.check(y)


Loss after 10000 epochs was 0.05125916383849586



'The model had a 0.9458 accuracy score'

The Multilayer perceptron works better because it can adapt to the non-linearity of the data in a way that the perceptron could not.
The model could not get a perfect score because some kids were brave enough to try bad candy, and there was no feature for the model to detect this.

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [141]:
# import dataframe
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
238,77,1,0,125,304,0,0,162,1,0.0,2,3,2,0
55,52,1,1,134,201,0,1,158,0,0.8,2,1,2,1
200,44,1,0,110,197,0,0,177,0,0.0,2,1,2,0
184,50,1,0,150,243,0,0,128,0,2.6,1,0,3,0
170,56,1,2,130,256,1,0,142,1,0.6,1,1,1,0


In [186]:
# puts data into numpy array
X2 = np.array(df.drop(columns=['target']))
y2 = np.array(df['target'])

# reshapes y
y2 = y2.reshape(303,1)

In [187]:
# initiates scaler
scaler = MinMaxScaler()

# scales values
X_t = scaler.fit_transform(X2, y2)

In [188]:
# creates train test split
X_train, X_test, y_train, y_test = tts(X_t, y2, test_size=0.1)

In [189]:
# checks shape of X to know how many input nodes to have
X_train.shape

(272, 13)

In [190]:
# sets number of inputs
inputs = 13

# sets up paramaters to tune with initial values
param_grid = {'batch_size': [10],
              'epochs': [100],
              'optimizer': ['Adam'],
              'input_act': ['sigmoid'],
              'hidden_nodes': [5],
              'dropout_rate': [0],
              'weight_func': ['RandomUniform'],
              }

# defines function for building model for keras classifier
def create_model(optimizer, input_act, hidden_nodes, dropout_rate, weight_func,):
    mod = Sequential()
    mod.add(Dense(hidden_nodes, input_shape=(inputs,), kernel_initializer=weight_func, activation=input_act, name='input'))
    mod.add(Dropout(dropout_rate))
    mod.add(Dense(hidden_nodes, kernel_initializer=weight_func, activation=input_act, name='central')) # uses same activation/weight initializer as input layer
    mod.add(Dense(1, activation='sigmoid', name='output'))
    mod.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return(mod)

# create model
mod = KerasClassifier(build_fn=create_model, verbose=0)

# creates grid
grid = RandomizedSearchCV(estimator=mod, param_distributions=param_grid, n_jobs=1, cv=3)

# fits grid
grid_fit = grid.fit(X_train, y_train)

# gets best score and params
best_score = grid_fit.best_score_
best_params = grid_fit.best_params_

# prints results from inital run through
print(f'Best Params were: {best_params}  --- Which gave an accuracy score of {best_score}')

Best Params were: {'weight_func': 'RandomUniform', 'optimizer': 'Adam', 'input_act': 'sigmoid', 'hidden_nodes': 5, 'epochs': 100, 'dropout_rate': 0, 'batch_size': 10}  --- Which gave an accuracy score of 0.8161764815449715


In [178]:
# adds more values to paramater grid
param_grid2 = {'batch_size': [100, 200, 300],
               'epochs': [100],
               'optimizer': ['Adam', 'Adagrad', 'Nadam'],
               'input_act': ['sigmoid', 'relu'],
               'hidden_nodes': [5, 10, 20],
               'dropout_rate': [0, 0.01, 0.05],
               'weight_func': ['RandomUniform', 'Ones', 'RandomNormal'],
               }

# create grid search
grid2 = RandomizedSearchCV(estimator=mod, param_distributions=param_grid2, n_jobs=1, cv=3)

# fits grid
grid_fit2 = grid2.fit(X_train, y_train)

# gets best score and params
best_score2 = grid_fit2.best_score_
best_params2 = grid_fit2.best_params_

# prints results from second run
print(f'Best Params were: {best_params2}  --- Which gave an accuracy score of {best_score2}')

Best Params were: {'weight_func': 'RandomUniform', 'optimizer': 'Adam', 'input_act': 'relu', 'hidden_nodes': 10, 'epochs': 100, 'dropout_rate': 0, 'batch_size': 100}  --- Which gave an accuracy score of 0.8088235309457078


In [191]:
# checks more values
param_grid3 = {'batch_size': [50, 100],
               'epochs': [100],
               'optimizer': ['Adam', 'Adagrad'],
               'input_act': ['sigmoid', 'relu'],
               'hidden_nodes': [5, 10],
               'dropout_rate': [0.01, 0.05],
               'weight_func': ['RandomUniform', 'RandomNormal'],
               }

# create grid search
grid3 = RandomizedSearchCV(estimator=mod, param_distributions=param_grid3, n_jobs=1, cv=3, n_iter=20) # increased iterations from the default of 10

# fits grid
grid_fit3 = grid3.fit(X_train, y_train)

# gets best score and params
best_score3 = grid_fit3.best_score_
best_params3 = grid_fit3.best_params_

# prints results from second run
print(f'Best Params were: {best_params3}  --- Which gave an accuracy score of {best_score3}')

Best Params were: {'weight_func': 'RandomUniform', 'optimizer': 'Adam', 'input_act': 'relu', 'hidden_nodes': 10, 'epochs': 100, 'dropout_rate': 0.01, 'batch_size': 100}  --- Which gave an accuracy score of 0.8419117741286755


In [192]:
# checks more values
param_grid4 = {'batch_size': [80, 100, 120], # 100 seemed to be a good number so I am checking values around 100
               'epochs': [100],
               'optimizer': ['Adam', 'Adagrad', 'Nadam'],
               'input_act': ['relu'], # relu was better twice so I will stick with it
               'hidden_nodes': [5, 10],
               'dropout_rate': [0.01, 0],  # 0 won first time, and then the lower number the second so seeing if 0 is just better
               'weight_func': ['RandomUniform', 'RandomNormal', 'Ones'],
               }

# create grid search
grid4 = RandomizedSearchCV(estimator=mod, param_distributions=param_grid4, n_jobs=1, cv=3, n_iter=20) 

# fits grid
grid_fit4 = grid4.fit(X_train, y_train)

# gets best score and params
best_score4 = grid_fit4.best_score_
best_params4 = grid_fit4.best_params_

# prints results from second run
print(f'Best Params were: {best_params4}  --- Which gave an accuracy score of {best_score4}')

Best Params were: {'weight_func': 'RandomNormal', 'optimizer': 'Nadam', 'input_act': 'relu', 'hidden_nodes': 10, 'epochs': 100, 'dropout_rate': 0, 'batch_size': 100}  --- Which gave an accuracy score of 0.8382353037595749


In [193]:
# checks more values
param_grid5 = {'batch_size': [100],
               'epochs': [100],
               'optimizer': ['Adam', 'Nadam'], # choosing between these two now
               'input_act': ['relu'],
               'hidden_nodes': [10, 20, 30], # raising node count since 10 always won over 5
               'dropout_rate': [0], # sticking with 0
               'weight_func': ['RandomUniform', 'RandomNormal'], # just going to check between the two to save time
               }

# create grid search
grid5 = RandomizedSearchCV(estimator=mod, param_distributions=param_grid5, n_jobs=1, cv=3, n_iter=20) 

# fits grid
grid_fit5 = grid5.fit(X_train, y_train)

# gets best score and params
best_score5 = grid_fit5.best_score_
best_params5 = grid_fit5.best_params_

# prints results from second run
print(f'Best Params were: {best_params5}  --- Which gave an accuracy score of {best_score5}')

# same score as with grid3

Best Params were: {'weight_func': 'RandomUniform', 'optimizer': 'Nadam', 'input_act': 'relu', 'hidden_nodes': 10, 'epochs': 100, 'dropout_rate': 0, 'batch_size': 100}  --- Which gave an accuracy score of 0.8419117741286755


In [196]:
# creates final paramaters and increaes epochs for the final run
param_grid_final = {'batch_size': [100],
                    'epochs': [1000], # increases epochs for
                    'optimizer': ['Nadam'], # Nadam won the times it was included
                    'input_act': ['relu'],
                    'hidden_nodes': [10], # 10 worked best when both lower and higher options were there
                    'dropout_rate': [0], 
                    'weight_func': ['RandomUniform'], # better most of the time
                    }

# create grid search
grid_final = RandomizedSearchCV(estimator=mod, param_distributions=param_grid_final, n_jobs=1, cv=3, n_iter=1) # just 1 iteration since theres only 1 to go through

# fits grid
grid_fit_final = grid_final.fit(X_train, y_train)

# gets best score and params
best_score_final = grid_fit_final.best_score_
best_params_final = grid_fit_final.best_params_

# prints results from second run
print(f'Best Params were: {best_params_final}  --- Which gave an accuracy score of {best_score_final}')

Best Params were: {'weight_func': 'RandomUniform', 'optimizer': 'Nadam', 'input_act': 'relu', 'hidden_nodes': 10, 'epochs': 1000, 'dropout_rate': 0, 'batch_size': 100}  --- Which gave an accuracy score of 0.8088235408067703


In [201]:
# checks if lower epochs were better
param_grid_final2 = {'batch_size': [100],
                    'epochs': [90, 100, 120], # checks if lower epoch is better?
                    'optimizer': ['Nadam'], 
                    'input_act': ['relu'],
                    'hidden_nodes': [10], 
                    'dropout_rate': [0], 
                    'weight_func': ['RandomUniform'], 
                    }

# create grid search
grid_final2 = GridSearchCV(estimator=mod, param_grid=param_grid_final2, n_jobs=1, cv=3) # now using grid search

# fits grid
grid_fit_final2 = grid_final2.fit(X_train, y_train)

# gets best score and params
best_score_final2 = grid_fit_final2.best_score_
best_params_final2 = grid_fit_final2.best_params_

# prints results from second run
print(f'Best Params were: {best_params_final2}  --- Which gave an accuracy score of {best_score_final2}')

Best Params were: {'batch_size': 100, 'dropout_rate': 0, 'epochs': 100, 'hidden_nodes': 10, 'input_act': 'relu', 'optimizer': 'Nadam', 'weight_func': 'RandomUniform'}  --- Which gave an accuracy score of 0.8419117741286755


This was the same accuracy as run 5 and 3, with the same parameters including epochs.
So these are going to be my final hyper parameters for the model.