<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
A Neuron is a perceptron that takes a weighted sum of the inputs, adds bias, and passes through an activation function to determine if it passes information to the next stage.
- **Input Layer:**
The layer that takes in the input to a network/perceptron.
- **Hidden Layer:**
The layers that sits between the input and output layers, goes through forward feeding and backward propogation, and interacts with weights and biases.
- **Output Layer:**
The last layer that outputs the numbers for predictions such as classification, regression, etc.
- **Activation:**
An activation function transforms a matrix into a desirable shape for the output. The common activation functions are sigmoid, tanh, leaky ReLU, etc. And for regression problems the activation function is just a pass through, e.g., no activation function.
- **Backpropagation:**
Used to train neural networks, it is the calculating of gradients and updating of weights to be used in the neural network. 


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [80]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [83]:
candy.head()

Unnamed: 0,chocolate,gummy,ate,bias
0,0,1,1,1.0
1,1,0,1,1.0
2,0,1,1,1.0
3,0,0,0,1.0
4,1,1,0,1.0


### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [84]:
# Start your candy perceptron here

X = candy[['chocolate', 'gummy']].values
y = candy['ate'].values

In [85]:
X.shape, y.shape

((10000, 3), (10000,))

In [87]:
y = y.reshape(10000, 1)

In [88]:
y

array([[1],
       [1],
       [1],
       ...,
       [1],
       [1],
       [1]])

In [89]:
def sigmoid(x):
    '''  A funciton that takes 1 parameter, x, and returns the sigmoid calculation for it '''
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    ''' A function that takes 1 parameter, x, and returns the sigmoid derivative for it  '''
    sx = sigmoid(x)
    return sx * (1-sx)
 

def perceptron(inputs, outputs, num_passes):
    ''' A Funcition that runs a basic neural network, the Perceptron. 
        It takes the inputs, the outputs to search for, as well as the number of passes
        to learn off of. '''
    
    ''' Assigning Random weights for the inputs'''
    weights = 2 * np.random.random((len(inputs.T),1)) 
    
    for iteration in range(num_passes):
        ''' Calculating the dot product of the inputs times the weights'''
        weighted_sum = np.dot(inputs, weights) 

        ''' Output the activated value for the end of 1 Training epoch '''
        activated_output = sigmoid(weighted_sum)

        ''' taking the difference of Output and True values and calculating the error '''
        error = outputs - activated_output

        ''' Where the magic happens... Gradiant Descent/Backdrop!! This is where the learning happens '''
        adjustments = error * sigmoid_derivative(activated_output)

        ''' Updating the Weights after each iteration '''
        weights += np.dot(inputs.T, adjustments)
        
        print('\nEpoch ', iteration)
        print('Weights after training:\n', weights, '\n')
        print('Outputs After the Training:\n',activated_output)
    
    return  
    


In [90]:
perceptron(X, y, 3)


Epoch  0
Weights after training:
 [[-315.85551186]
 [-308.56362431]
 [-612.96955606]] 

Outputs After the Training:
 [[0.77922382]
 [0.79241912]
 [0.77922382]
 ...
 [0.77922382]
 [0.77922382]
 [0.79241912]]

Epoch  1
Weights after training:
 [[308.89448814]
 [316.43637569]
 [637.03044394]] 

Outputs After the Training:
 [[0.]
 [0.]
 [0.]
 ...
 [0.]
 [0.]
 [0.]]

Epoch  2
Weights after training:
 [[-181.0624495 ]
 [-173.71717388]
 [-346.02922227]] 

Outputs After the Training:
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]


### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [91]:
class NeuralNetwork:
    def __init__(self):
        '''Set up Architecture of Neural Network'''
        self.inputs = 2
        self.hiddenNodes = 10000
        self.outputNodes = 1

        
        ''' Initial Weights '''
        self.weights1 = np.random.rand(self.inputs, self.hiddenNodes) + 1
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes) + 1
   

    ''' Sigmoid Function '''    
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    
    ''' Sigmoid Derivative Function '''
    def sigmoidPrime(self, s):
        return s * (1 - s) 
    
    
    ''' Feed Forward Function '''
    def feed_forward(self, X):
        self.hidden_sum = np.dot(X, self.weights1)
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        self.activated_output = self.sigmoid(self.output_sum)
        return self.activated_output
    
    
    ''' Backward Propogation through the network function '''   
    def backward(self, X,y,o):
        self.o_error = y - o
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        self.z2_error = self.o_delta.dot(self.weights2.T)
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        self.weights1 += X.T.dot(self.z2_delta)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        
    ''' Defining the Training Function '''
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)  

In [197]:
nn = NeuralNetwork()

''' For Loop Epoch Printing from Lecture 2 '''
for i in range(5):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
    nn.train(X,y)

+---------EPOCH 1---------+
Input: 
 [[0. 1. 1.]
 [1. 0. 1.]
 [0. 1. 1.]
 ...
 [0. 1. 1.]
 [0. 1. 1.]
 [1. 0. 1.]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 2---------+
Input: 
 [[0. 1. 1.]
 [1. 0. 1.]
 [0. 1. 1.]
 ...
 [0. 1. 1.]
 [0. 1. 1.]
 [1. 0. 1.]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 3---------+
Input: 
 [[0. 1. 1.]
 [1. 0. 1.]
 [0. 1. 1.]
 ...
 [0. 1. 1.]
 [0. 1. 1.]
 [1. 0. 1.]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 4---------+
Input: 
 [[0. 1. 1.]
 [1. 0. 1.]
 [0. 1. 1.]
 ...
 [0. 1. 1.]
 [0. 1. 1.]
 [1. 0. 1.]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 5---------+
Input: 
 [[0. 1. 1.]
 [1. 0. 1.]
 [0. 1. 1.]
 ...
 [

P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [143]:
''' Imports and such  '''
#!pip install keras
import pandas as pd
import numpy as np
import keras 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D
from keras.optimizers import RMSprop
from keras.layers.normalization import BatchNormalization
from keras import backend
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

In [176]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
20,59,1,0,135,234,0,1,161,0,0.5,1,0,3,1
242,64,1,0,145,212,0,0,132,0,2.0,1,2,1,0
5,57,1,0,140,192,0,1,148,0,0.4,1,0,1,1
43,53,0,0,130,264,0,0,143,0,0.4,1,0,2,1
288,57,1,0,110,335,0,1,143,1,3.0,1,1,3,0


In [177]:
''' Assigning x and y TRAIN BABAY '''
X_train = df.loc[:, df.columns != 'target'].values
y_train = df.loc[:, df.columns == 'target'].values

In [178]:
''' Pre normalization '''
X_train

array([[59.,  1.,  0., ...,  1.,  0.,  3.],
       [64.,  1.,  0., ...,  1.,  2.,  1.],
       [57.,  1.,  0., ...,  1.,  0.,  1.],
       ...,
       [52.,  1.,  1., ...,  2.,  0.,  2.],
       [74.,  0.,  1., ...,  2.,  1.,  2.],
       [54.,  0.,  2., ...,  2.,  0.,  2.]])

In [179]:
''' Normalizing the data  '''
#X_train = keras.utils.normalize(X_train, axis=1, order=2)
#y_train = keras.utils.normalize(y_train, axis=1, order=2)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

In [180]:
''' Checking for normalization '''
X_train

array([[ 0.5110413 ,  0.68100522, -0.93851463, ..., -0.64911323,
        -0.71442887,  1.12302895],
       [ 1.06248543,  0.68100522, -0.93851463, ..., -0.64911323,
         1.24459328, -2.14887271],
       [ 0.29046364,  0.68100522, -0.93851463, ..., -0.64911323,
        -0.71442887, -2.14887271],
       ...,
       [-0.26098049,  0.68100522,  0.03203122, ...,  0.97635214,
        -0.71442887, -0.51292188],
       [ 2.16537369, -1.46841752,  0.03203122, ...,  0.97635214,
         0.26508221, -0.51292188],
       [-0.04040284, -1.46841752,  1.00257707, ...,  0.97635214,
        -0.71442887, -0.51292188]])

In [181]:
''' Baseline Model '''
input_dim = (len(df.columns)-1)
epochs=15
batch_size=10

''' Making the sexy model '''
model_1 = Sequential()
model_1.add(Dense(input_dim*2, input_dim=input_dim, activation='relu'))
model_1.add(Dense(1, activation='sigmoid')) # Using sigmoid since its binary classification and can converge faster
model_1.compile(loss = 'mean_absolute_percentage_error',
              optimizer = 'sgd',
              metrics = ['accuracy'])

''' Fitting the model '''
fitting = model_1.fit(X_train, y_train,
                      epochs = epochs,
                      batch_size = batch_size,
                      verbose=1)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [None]:
''' So it appears that having the sigmoid in the activation function of the final layer is 
    getting the accuracy stuck at one value for each epoch, will fix this in the fine tuning model
'''

In [196]:
''' Hyper Parameter Tuned Model '''

input_dim = (len(df.columns) - 1)
epochs = 10
batch_size=10
momentum= 0.01

''' Building the Model '''
def create_model():
    model = Sequential()

    model.add(Dense(input_dim**2, input_dim=input_dim, activation='relu'))
    model.add(Dropout(.2))
    model.add(Dense(input_dim*3, activation='sigmoid'))
    model.add(Dense(input_dim*2, activation='sigmoid'))
    
    model.add(Dense(1,)) #activation='softmax'))


    model.compile(loss = 'mse',
                  optimizer='adadelta',
                  metrics = ['accuracy'],)
    return model


''' Grid Search CV '''
model = KerasClassifier(build_fn=create_model, verbose=0)

''' The fine tuning parameters, minimum 3 for a 3 '''
param_grid = {#'batch_size': [10, 20, 30],
              'epochs': [10,20],
              #'shuffle' : ['True','False'],
              #'validation_split' : [0.2, 0.3 ,0.4],

             }

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")

Best: 0.8217821786112518 using {'epochs': 20}


In [None]:
''' I commented out stuff on the param grid as I went because if i did too much at once it took 
    wayyyyy too long to finish computations
'''

''' ALso the optimizers I tried:  '''
# with 10 epochs
#adam 82
#rmsprop 81
#Nadam 82
#sgd 54
#adamax 84
#adagrad 81.5
#adadelta 84.15

''' loss function results '''
#mse 84.15
#mae 81.8
#binary_crossentropy 22.7 hahahah 