<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
    nodes which recieve an input and produce an output
- **Input Layer:**
    recieves data from dataset
- **Hidden Layer:**
    where functions are applied on data
- **Output Layer:**
    where the result of functions having been performed is passed
- **Activation:**
    calculates how much signal is passed to node
- **Backpropagation:**
    using the previous run's errors to improve the next run

## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [1]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [2]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. You will not be able to achieve more than ~50% with the simple perceptron. Explain why you could not achieve a higher accuracy with the *simple perceptron* architecture, because it's possible to achieve ~95% accuracy on this dataset. Provide your answer in markdown (and *optional* data anlysis code) after your perceptron implementation. 

In [60]:
import numpy as np

X = candy.drop(columns = 'ate')

y = candy['ate']
y = [[num] for num in y.tolist()]

In [61]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx/(1-sx)

In [65]:
weights = np.random.random((2,1))

for iteration in range(10000):
    
    # Weighted sum of inputs and weights
    weighted_sum = np.dot(X, weights)
    
    # Activation
    activated_output = sigmoid(weighted_sum)
    
    # Calculate error
    error = y - activated_output
    
    adjustments = error * sigmoid_derivative(activated_output)
    
    # Update the Weights
    weights += np.dot(X.T, adjustments)
    
print("Weights after training")
print(weights)

print("Output after training")
print(activated_output)

  


Weights after training
[[-6153.53005224]
 [-5514.99454364]]
Output after training
[[1.]
 [1.]
 [1.]
 ...
 [1.]
 [1.]
 [1.]]


In [66]:
matches = [int(activated_output[ii] == y[ii]) for ii in range(len(y))]
total_correct = sum(matches)
avg_correct = total_correct/len(matches)
avg_correct

0.4859

**The simple perceptron model gave me an accuracy of 48.59%, this is because there may be relationships in the data which are non-linear.** 

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [94]:
class NeuralNetwork:
    def __init__(self):
        # Set up Architecture of Neural Network
        self.inputs = 2
        self.hiddenNodes = 1
        self.outputNodes = 1

        # Initial Weights
        # Matrix Array for the First Layer
        self.weights1 = np.random.rand(self.inputs, self.hiddenNodes)
       
        # Matrix Array for Hidden to Output
        self.weights2 = np.random.rand(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s / (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forward.
        aka "predict"
        """
        
        # Weighted sum of inputs => hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Weight sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        # Final activation of output
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
        
    def backward(self, X,y,o):
        """
        Backward propagate through the network
        """
        
        # Error in Output
        self.o_error = y - o
        
        # Apply Derivative of Sigmoid to error
        # How far off are we in relation to the Sigmoid f(x) of the output
        # ^- aka hidden => output
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        # z2 error
        self.z2_error = self.o_delta.dot(self.weights2.T)
        
        # How much of that "far off" can explained by the input => hidden
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        # Adjustment to first set of weights (input => hidden)
        self.weights1 += X.T.dot(self.z2_delta)
        # Adjustment to second set of weights (hidden => output)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)

In [95]:
nn = NeuralNetwork()

In [96]:
def accuracy(y_true, y_pred):
    matches = [int(y_pred[ii]==y[ii]) for ii in range(len(y_true))]
    
    return sum(matches)/len(matches)

In [97]:
for i in range(1000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
        y_pred = nn.feed_forward(X)
        acc = accuracy(y, y_pred)
        if acc > 0.95:
            break
    nn.train(X,y)

+---------EPOCH 1---------+
Input: 
       chocolate  gummy
0             0      1
1             1      0
2             0      1
3             0      0
4             1      1
...         ...    ...
9995          0      0
9996          0      1
9997          0      1
9998          0      1
9999          1      0

[10000 rows x 2 columns]
Predicted Output: 
 [[0.61802376]
 [0.63485415]
 [0.61802376]
 ...
 [0.61802376]
 [0.61802376]
 [0.63485415]]
Loss: 
 0.2649188103411242
+---------EPOCH 2---------+
Input: 
       chocolate  gummy
0             0      1
1             1      0
2             0      1
3             0      0
4             1      1
...         ...    ...
9995          0      0
9996          0      1
9997          0      1
9998          0      1
9999          1      0

[10000 rows x 2 columns]
Predicted Output: 
 [[0.5]
 [0.5]
 [0.5]
 ...
 [0.5]
 [0.5]
 [0.5]]
Loss: 
 0.20115


  app.launch_new_instance()


+---------EPOCH 3---------+
Input: 
       chocolate  gummy
0             0      1
1             1      0
2             0      1
3             0      0
4             1      1
...         ...    ...
9995          0      0
9996          0      1
9997          0      1
9998          0      1
9999          1      0

[10000 rows x 2 columns]
Predicted Output: 
 [[0.5]
 [0.5]
 [0.5]
 ...
 [0.5]
 [0.5]
 [0.5]]
Loss: 
 0.20115
+---------EPOCH 4---------+
Input: 
       chocolate  gummy
0             0      1
1             1      0
2             0      1
3             0      0
4             1      1
...         ...    ...
9995          0      0
9996          0      1
9997          0      1
9998          0      1
9999          1      0

[10000 rows x 2 columns]
Predicted Output: 
 [[0.5]
 [0.5]
 [0.5]
 ...
 [0.5]
 [0.5]
 [0.5]]
Loss: 
 0.20115
+---------EPOCH 5---------+
Input: 
       chocolate  gummy
0             0      1
1             1      0
2             0      1
3             0      0
4 

The neural net is able to better correctly adjust the weights of each feature according to the error rate

P.S. Don't try candy gummy bears. They're disgusting. 

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [126]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold


In [99]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
274,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0
88,54,0,2,110,214,0,1,158,0,1.6,1,0,2,1
262,53,1,0,123,282,0,1,95,1,2.0,1,2,3,0
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
77,59,1,1,140,221,0,1,164,1,0.0,2,0,2,1


In [106]:
X_train, X_test, y_train,y_test = train_test_split(df.drop(columns = 'target'), df['target'], test_size = 0.2, stratify = df['target'])

transformer = StandardScaler()

X_train = transformer.fit_transform(X_train)
X_test = transformer.transform(X_test)

X_train.shape, y_train.shape, X_test.shape, y_test.shape


  return self.partial_fit(X, y)
  return self.fit(X, **fit_params).transform(X)
  


((242, 13), (242,), (61, 13), (61,))

In [115]:
def model_structure():
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(X_train.shape[1],)))
    model.add(Dropout(0.2))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

In [116]:
def small_structure():
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=(X_train.shape[1],)))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

In [114]:
model = KerasClassifier(build_fn=model_structure, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=5, shuffle=True)
results = cross_val_score(model, X_train, y_train, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 79.32% (5.80%)


In [128]:
param_grid = {'batch_size': [10, 20, 40, 80, 100],
              'epochs': [20, 50, 100]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=6)
grid_result_lg = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result_lg.best_score_} using {grid_result_lg.best_params_}")
means = grid_result_lg.cv_results_['mean_test_score']
stds = grid_result_lg.cv_results_['std_test_score']
params = grid_result_lg.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Mean_score: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.8429751957743621 using {'batch_size': 40, 'epochs': 100}
Mean_score: 0.8223140495867769, Stdev: 0.04690846300788868 with: {'batch_size': 10, 'epochs': 20}
Mean_score: 0.8181818250782233, Stdev: 0.02235955091934193 with: {'batch_size': 10, 'epochs': 50}
Mean_score: 0.7892562010564095, Stdev: 0.05245229767671063 with: {'batch_size': 10, 'epochs': 100}
Mean_score: 0.818181831235728, Stdev: 0.023903363515775548 with: {'batch_size': 20, 'epochs': 20}
Mean_score: 0.822314051064578, Stdev: 0.037366797476976264 with: {'batch_size': 20, 'epochs': 50}
Mean_score: 0.8305785104262927, Stdev: 0.030188114360189464 with: {'batch_size': 20, 'epochs': 100}
Mean_score: 0.8140495823434561, Stdev: 0.01808438256645638 with: {'batch_size': 40, 'epochs': 20}
Mean_score: 0.7933884332002688, Stdev: 0.034503202220376616 with: {'batch_size': 40, 'epochs': 50}
Mean_score: 0.8429751957743621, Stdev: 0.024667661738069755 with: {'batch_size': 40, 'epochs': 100}
Mean_score: 0.8057851244595425, Stdev: 0.015687

In [123]:
model_small_build = KerasClassifier(build_fn=small_structure, epochs=50, batch_size=50, verbose=0)
kfold = StratifiedKFold(n_splits=5, shuffle=True)
results = cross_val_score(model_small_build, X_train, y_train, cv=kfold)
print("Baseline_small_build: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline_small_build: 77.70% (4.77%)


In [124]:
param_grid = {'batch_size': [10, 20, 40, 80, 100],
              'epochs': [20, 50, 100]}

# Create Grid Search
grid = GridSearchCV(estimator=model_small_build, param_grid=param_grid, n_jobs=6)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Mean_score: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.8512396521804747 using {'batch_size': 100, 'epochs': 100}
Mean_score: 0.731404955968384, Stdev: 0.13671106056827673 with: {'batch_size': 10, 'epochs': 20}
Mean_score: 0.7933884147277548, Stdev: 0.03757327969545572 with: {'batch_size': 10, 'epochs': 50}
Mean_score: 0.81818182064482, Stdev: 0.054868323232796536 with: {'batch_size': 10, 'epochs': 100}
Mean_score: 0.7851239735922537, Stdev: 0.0547425255242526 with: {'batch_size': 20, 'epochs': 20}
Mean_score: 0.8181818221226211, Stdev: 0.04474758084693203 with: {'batch_size': 20, 'epochs': 50}
Mean_score: 0.8140495931806643, Stdev: 0.049721559852156295 with: {'batch_size': 20, 'epochs': 100}
Mean_score: 0.7479338870068227, Stdev: 0.012481652129726805 with: {'batch_size': 40, 'epochs': 20}
Mean_score: 0.8099173657165086, Stdev: 0.03993705022423734 with: {'batch_size': 40, 'epochs': 50}
Mean_score: 0.7975206803684393, Stdev: 0.039869893157456406 with: {'batch_size': 40, 'epochs': 100}
Mean_score: 0.6570247859994242, Stdev: 0.16830048

In [127]:
pred = grid_result.predict(X_test)
accuracy_score(y_test, pred)

0.7540983606557377

In [129]:
pred = grid_result_lg.predict(X_test)
accuracy_score(y_test, pred)

0.7704918032786885

Upon examination, it appears that more layers, especially to dropout layers help stop some overfitting. The validation scores seem to deviate from test scores by quite a bit. Further analysis required.