<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

**Neuron:** Also known as a Perceptron takes a group of weighted inputs, applies an activation function, and returns an output. 

- **Input Layer:** A layer in a Neural Network, which takes inputs from a training set, where size of an input layer corresponds to a feature size in the data set. 

- **Hidden Layer:** Sits between Input Layer and Output layer and applies activation function before passing results to next layer. There are often multiple hidden layers in a neural network.

- **Output Layer:** The final layer in a neural network, that receives inputs from previous Hidden layer, optionally applies an activation function, and returns an output representing model's prediction.

- **Activation:** Activation function takes a real value input and outputs another value between 0 and 1.
- **Backpropagation:** A way of propagating the total loss back into the neural network to know how much of the loss every node is responsible for, and subsequently updating the weights in such a way that minimizes the loss by giving the nodes with higher error rates lower weights and vice versa.


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [0]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [0]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [0]:
#Imports 
import numpy as np

In [0]:
# Start your candy perceptron here

X = candy[['chocolate', 'gummy']].values
y = candy['ate'].values

In [0]:
X.shape, y.shape

((10000, 2), (10000,))

In [0]:
# Reshaping y
y = y.reshape(10000, 1)
y

array([[1],
       [1],
       [1],
       ...,
       [1],
       [1],
       [1]])

In [0]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

def perceptron(inputs, outputs, num_passes):
  
  
  weights = 2 * np.random.random((len(inputs.T), 1))
  
  for iteration in range(num_passes):
    
    
        weighted_sum = np.dot(inputs, weights)
      
        activated_output = sigmoid(weighted_sum)
      
        error = outputs - activated_output
      
        adjustments = error * sigmoid_derivative(activated_output)
      
        weights += np.dot(inputs.T, adjustments)
      
        print('\nEpoch ', iteration)
        print('Weights after training:\n', weights, '\n')
        print('Outputs After the Training:\n',activated_output)
        
        return

In [0]:
perceptron(X, y, 3)


Epoch  0
Weights after training:
 [[-398.44543025]
 [-420.73814991]] 

Outputs After the Training:
 [[0.8806484 ]
 [0.83890286]
 [0.8806484 ]
 ...
 [0.8806484 ]
 [0.8806484 ]
 [0.83890286]]


### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [0]:
class NeuralNetwork:
    def __init__(self):
        '''Set up Architecture of Neural Network'''
        self.inputs = 2
        self.hiddenNodes = 10000
        self.outputNodes = 1
        
        ''' Initial Weights '''
        self.weights1 = np.random.rand(self.inputs, self.hiddenNodes) + 1
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes) + 1
    
    ''' Sigmoid Function '''    
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    
    ''' Sigmoid Derivative Function '''
    def sigmoidPrime(self, s):
        return s * (1 - s) 
    
    
    ''' Feed Forward Function '''
    def feed_forward(self, X):
        self.hidden_sum = np.dot(X, self.weights1)
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        self.activated_output = self.sigmoid(self.output_sum)
        return self.activated_output
    
    
    ''' Backward Propogation through the network function '''   
    def backward(self, X,y,o):
        self.o_error = y - o
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        self.z2_error = self.o_delta.dot(self.weights2.T)
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        self.weights1 += X.T.dot(self.z2_delta)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        
    ''' Defining the Training Function '''
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)

In [0]:
nn = NeuralNetwork()

''' For Loop Epoch Printing from Lecture 2 '''
for i in range(5):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
    nn.train(X,y)

+---------EPOCH 1---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 2---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 3---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 4---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]
+---------EPOCH 5---------+
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]


P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [0]:
import keras 
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D
from keras.optimizers import RMSprop
from keras.layers.normalization import BatchNormalization
from keras import backend
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

Using TensorFlow backend.


In [0]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
26,59,1,2,150,212,1,1,157,0,1.6,2,0,2,1
182,61,0,0,130,330,0,0,169,0,0.0,2,0,2,0
84,42,0,0,102,265,0,0,122,0,0.6,1,0,2,1
35,46,0,2,142,177,0,0,160,1,1.4,0,0,2,1
86,68,1,2,118,277,0,1,151,0,1.0,2,1,3,1


In [0]:
X_train = df.loc[:, df.columns != 'target'].values
y_train = df.loc[:, df.columns == 'target'].values

In [0]:
X_train

array([[59.,  1.,  2., ...,  2.,  0.,  2.],
       [61.,  0.,  0., ...,  2.,  0.,  2.],
       [42.,  0.,  0., ...,  1.,  0.,  2.],
       ...,
       [58.,  0.,  3., ...,  2.,  0.,  2.],
       [41.,  0.,  2., ...,  2.,  0.,  2.],
       [50.,  1.,  2., ...,  2.,  0.,  2.]])

In [0]:
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_train

array([[ 0.5110413 ,  0.68100522,  1.00257707, ...,  0.97635214,
        -0.71442887, -0.51292188],
       [ 0.73161895, -1.46841752, -0.93851463, ...,  0.97635214,
        -0.71442887, -0.51292188],
       [-1.36386876, -1.46841752, -0.93851463, ..., -0.64911323,
        -0.71442887, -0.51292188],
       ...,
       [ 0.40075247, -1.46841752,  1.97312292, ...,  0.97635214,
        -0.71442887, -0.51292188],
       [-1.47415758, -1.46841752,  1.00257707, ...,  0.97635214,
        -0.71442887, -0.51292188],
       [-0.48155814,  0.68100522,  1.00257707, ...,  0.97635214,
        -0.71442887, -0.51292188]])

In [0]:
# Baseline 
input_dim = (len(df.columns)-1)
epochs=10
batch_size=12

#Useful 
model_1 = Sequential()
model_1.add(Dense(input_dim*2, input_dim=input_dim, activation='relu'))
model_1.add(Dense(1, activation='sigmoid')) # Using sigmoid since its binary classification and can converge faster
model_1.compile(loss = 'mean_absolute_percentage_error',
              optimizer = 'sgd',
              metrics = ['accuracy'])

#Fitting Model 

fitting = model_1.fit(X_train, y_train,
                      epochs = epochs,
                      batch_size = batch_size,
                      verbose=1)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [0]:
input_dim = (len(df.columns) - 1)
epochs = 20
batch_size=15
momentum= 0.01

#Bob the Buildering it
def create_model():
    model = Sequential()

    model.add(Dense(input_dim**2, input_dim=input_dim, activation='relu'))
    model.add(Dropout(.2))
    model.add(Dense(input_dim*3, activation='sigmoid'))
    model.add(Dense(input_dim*2, activation='sigmoid'))
    
    model.add(Dense(1,)) #activation='softmax'))


    model.compile(loss = 'binary_crossentropy',
                  optimizer='adadelta',
                  metrics = ['accuracy'],)
    return model


#Grid Searching
model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid = {'batch_size': [10, 20, 35],
              'epochs': [10,25]
              }

#Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")



Best: 0.719471945719357 using {'batch_size': 10, 'epochs': 25}
