<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
Neurons are the nerve cells of the human body.
- **Input Layer:**
This is the first part in a Neural Network. This is the layer that recieves information to be processed. This is the only part of a Neural Network that interacts with a user.
- **Hidden Layer:**
Once information has been recieved from the Input Layer, the Hidden Layer processes in the information in-order to calculate the output of the Neural Network. This output is then passed to the Output Layer. Processing information can involve passing through an activation function and calculating weights.
- **Output Layer:**
Once the Hidden Layer has processed the data from the Input Layer the calculated output in then passed the Output Layer where it is viewed by the user.
- **Activation:**
Not all the input from the Input Layer is required. Activation is a function decides how much information is actually passed into the Neural Network. It takes away all the input that isn't needed and only keep the input that is actually being used in the Neural Network.
- **Backpropagation:**
In a Neural Network Backpropagation is the process of updating weights at the end of each training loop of a model. The weights of each layer is updated in reverse order. When a Neural Network loop is finished, the error of the ouput is calculated and used to update the wieghts. Once the weights have been updated the model is run again using the new weights in order to produce a more accurate output.

## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [1]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [2]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. You will not be able to achieve more than ~50% with the simple perceptron. Explain why you could not achieve a higher accuracy with the *simple perceptron* architecture, because it's possible to achieve ~95% accuracy on this dataset. Provide your answer in markdown (and *optional* data anlysis code) after your perceptron implementation. 

In [3]:
# Start your candy perceptron here
import numpy as np

X = candy[['chocolate', 'gummy']].values
y = candy[['ate']].values


def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

weights = 2 * np.random.random((2, 1)) - 1

weighted_sum = np.dot(X, weights)
activated_output = sigmoid(weighted_sum)\

Simple or Single layer Perceptrons only have an input layer and an output layer. There are no hidden layers to provide further computation

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [4]:
class Neural_Network:
    def __init__(self, input_nodes = 2, hidden_nodes = 3, output_nodes = 1):
        self.input_nodes = input_nodes
        self.hidden_nodes = hidden_nodes
        self.output_nodes = output_nodes
    
    def __sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        return x * (1 - x)
    
    def __feed_forward(self, X):
        self.weights1 = np.random.rand(self.input_nodes, self.hidden_nodes)
        self.weights2 = np.random.rand(self.hidden_nodes, self.output_nodes)
        
        self.hidden_sum = np.dot(X, self.weights1)
        self.activated_hidden = self.__sigmoid(self.hidden_sum)
        
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        self.activated_output = self.__sigmoid(self.output_sum)
        
        return self.activated_output
    
    def __backward(self, X, y, o):
        self.o_error = y - o
        
        self.o_delta = self.o_error * self.__sigmoid_derivative(o)
        self.z2_error = self.o_delta.dot(self.weights2.T)
        
        self.z2_delta = self.z2_error - self.__sigmoid_derivative(self.activated_hidden)
        
        self.weights1 = X.T.dot(self.z2_delta)
        self.weights2 = self.activated_hidden.T.dot(self.o_delta)
    
    def train(self, X, y, n_iter = 10):
        self.o = self.__feed_forward(X)
        self.__backward(X, y, self.o)

In [5]:
nn = Neural_Network(input_nodes=X.shape[1], hidden_nodes=8)
nn.train(X, y)

In [6]:
print(nn.o_error)

[[0.06171792]
 [0.07024141]
 [0.06171792]
 ...
 [0.06171792]
 [0.06171792]
 [0.07024141]]


With a Multi-layer Perceptron hidden layers are used to handle further processing of the data from the input. By hidden layers the data is iteerated more times in order to calculate a more accurate output. Before it was running through a single iteration with a single set of weights to return an output that wasn't very accurate. Adding hidden layers allows the data to go through multiple iterations with more sets of weights, which end up producing a more accurate output.

P.S. Don't try candy gummy bears. They're disgusting. 

P.S. Not really

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [7]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
37,54,1,2,150,232,0,0,165,0,1.6,2,0,3,1
88,54,0,2,110,214,0,1,158,0,1.6,1,0,2,1
95,53,1,0,142,226,0,0,111,1,0.0,2,0,3,1
293,67,1,2,152,212,0,0,150,0,0.8,1,0,3,0
5,57,1,0,140,192,0,1,148,0,0.4,1,0,1,1


In [8]:
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [9]:
model = Sequential()
X = df[['age', 'cp', 'trestbps', 'chol']]
y = df[['target']]
x_train, x_test, y_train, y_test = train_test_split(X, y)

scalar = StandardScaler()
scalar.fit(x_train, y_train)

model.add(Dense(4, input_dim=4, activation='sigmoid'))
model.add(Dense(5, input_dim=4, activation='relu'))
model.add(Dense(5, input_dim=4, activation='relu'))
model.add(Dense(1, activation='softmax'))

model.compile(optimizer='adam', loss='mae', metrics=['accuracy', 'mae'])
history = model.fit(x_train, y_train, batch_size=10, epochs=75)



  return self.partial_fit(X, y)


Train on 227 samples
Epoch 1/75
Epoch 2/75
Epoch 3/75
Epoch 4/75
Epoch 5/75
Epoch 6/75
Epoch 7/75
Epoch 8/75
Epoch 9/75
Epoch 10/75
Epoch 11/75
Epoch 12/75
Epoch 13/75
Epoch 14/75
Epoch 15/75
Epoch 16/75
Epoch 17/75
Epoch 18/75
Epoch 19/75
Epoch 20/75
Epoch 21/75
Epoch 22/75
Epoch 23/75
Epoch 24/75
Epoch 25/75
Epoch 26/75
Epoch 27/75
Epoch 28/75
Epoch 29/75
Epoch 30/75
Epoch 31/75
Epoch 32/75
Epoch 33/75
Epoch 34/75
Epoch 35/75
Epoch 36/75
Epoch 37/75
Epoch 38/75
Epoch 39/75
Epoch 40/75
Epoch 41/75
Epoch 42/75
Epoch 43/75
Epoch 44/75
Epoch 45/75
Epoch 46/75
Epoch 47/75
Epoch 48/75
Epoch 49/75
Epoch 50/75
Epoch 51/75
Epoch 52/75
Epoch 53/75
Epoch 54/75
Epoch 55/75
Epoch 56/75
Epoch 57/75
Epoch 58/75
Epoch 59/75
Epoch 60/75
Epoch 61/75
Epoch 62/75
Epoch 63/75
Epoch 64/75
Epoch 65/75
Epoch 66/75
Epoch 67/75
Epoch 68/75
Epoch 69/75
Epoch 70/75
Epoch 71/75
Epoch 72/75
Epoch 73/75
Epoch 74/75
Epoch 75/75


In [10]:
history.history

{'loss': [0.45814978833503134,
  0.4581497834773841,
  0.45814978971355286,
  0.45814978544670054,
  0.45814978682522206,
  0.4581497864313588,
  0.45814978479026175,
  0.45814978833503134,
  0.4581497845276862,
  0.45814978682522206,
  0.4581497843963984,
  0.4581497874816609,
  0.45814978577491994,
  0.4581497863657149,
  0.45814978649700266,
  0.4581497841994668,
  0.4581497877442364,
  0.4581497870877976,
  0.45814978255836974,
  0.45814978321480854,
  0.45814978544670054,
  0.4581497864313588,
  0.45814978288658914,
  0.4581497877442364,
  0.45814978577491994,
  0.4581497841994668,
  0.45814978616878327,
  0.4581497855123444,
  0.4581497878098803,
  0.45814978879453855,
  0.4581497858405638,
  0.4581497878098803,
  0.45814978577491994,
  0.45814978321480854,
  0.4581497838712474,
  0.4581497855123444,
  0.4581497901074162,
  0.45814978800681194,
  0.4581497884663191,
  0.4581497841338229,
  0.4581497845276862,
  0.4581497864313588,
  0.45814978314916466,
  0.4581497838056035,
  0.

In [11]:
from sklearn.model_selection import GridSearchCV

In [12]:
param_grid = {'batch_size': [10, 20],
              'epoch': [70]}
grid_model = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=10, scoring='accuracy')

In [13]:
grid_model.fit(x_train, y_train)



TypeError: Cannot clone object '<tensorflow.python.keras.engine.sequential.Sequential object at 0x1a36f7b080>' (type <class 'tensorflow.python.keras.engine.sequential.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.