<a href="https://colab.research.google.com/github/120Davies/120Davies.github.io/blob/master/Ro_Davies_LS_DS_Unit_4_Sprint_Challenge_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Chocolate Gummy Bears](#Q2)
    - Perceptron
    - Multilayer Perceptron
4. [Keras MMP](#Q3)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:** They receive input and pass their signal to the next layer.
- **Input Layer:** Where the data from our dataset goes into the neural network.
- **Hidden Layer:** A layer where data is input to the layer and output comes out. We can't inspect what is happening because it is invisible. Helpful for learning about more complex relationships.
- **Output Layer:** The layer where your answer is being held.
- **Activation:** Decides where a cell "fires" or not. They also decide how much signal is passed onto the next layer.
- **Backpropagation:** It's the process of updating the weights of the neural network, so that it can possibly be more accurate.


## 2. Chocolate Gummy Bears <a id="Q2"></a>

Right now, you're probably thinking, "yuck, who the hell would eat that?". Great question. Your candy company wants to know too. And you thought I was kidding about the [Chocolate Gummy Bears](https://nuts.com/chocolatessweets/gummies/gummy-bears/milk-gummy-bears.html?utm_source=google&utm_medium=cpc&adpos=1o1&gclid=Cj0KCQjwrfvsBRD7ARIsAKuDvMOZrysDku3jGuWaDqf9TrV3x5JLXt1eqnVhN0KM6fMcbA1nod3h8AwaAvWwEALw_wcB). 

Let's assume that a candy company has gone out and collected information on the types of Halloween candy kids ate. Our candy company wants to predict the eating behavior of witches, warlocks, and ghosts -- aka costumed kids. They shared a sample dataset with us. Each row represents a piece of candy that a costumed child was presented with during "trick" or "treat". We know if the candy was `chocolate` (or not chocolate) or `gummy` (or not gummy). Your goal is to predict if the costumed kid `ate` the piece of candy. 

If both chocolate and gummy equal one, you've got a chocolate gummy bear on your hands!?!?!
![Chocolate Gummy Bear](https://ed910ae2d60f0d25bcb8-80550f96b5feb12604f4f720bfefb46d.ssl.cf1.rackcdn.com/3fb630c04435b7b5-2leZuM7_-zoom.jpg)

In [0]:
import pandas as pd
candy = pd.read_csv('chocolate_gummy_bears.csv')

In [0]:
candy.head()

Unnamed: 0,chocolate,gummy,ate
0,0,1,1
1,1,0,1
2,0,1,1
3,0,0,0
4,1,1,0


In [0]:
candy.shape

(10000, 3)

### Perceptron

To make predictions on the `candy` dataframe. Build and train a Perceptron using numpy. Your target column is `ate` and your features: `chocolate` and `gummy`. Do not do any feature engineering. :P

Once you've trained your model, report your accuracy. Explain why you could not achieve a higher accuracy with a *simple perceptron*. It's possible to achieve ~95% accuracy on this dataset.

In [0]:
import numpy as np
# Start your candy perceptron here
np.random.seed(69420)

X = candy[['chocolate', 'gummy']].values
y = candy['ate'].values

weights = 2 * np.random.random((2,1)) - 1

In [0]:
X.shape, y.shape

((10000, 2), (10000,))

In [0]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

In [0]:
class Perceptron(object):
    def __init__(self, rate=0.01, niter=10):
        self.rate = rate
        self.niter = niter
        
    def fit(self, X, y):
        # weights
        self.weight = np.zeros(1 + X.shape[1])
        
        # Number of missclassifications
        self.errors = []
        
        for i in range(self.niter):
            err = 0
            for xi, target in zip(X, y):
                delta_w = self.rate * (target - self.predict(xi))
                self.weight[1:] += delta_w * xi
                self.weight[0] += delta_w
                err += int(delta_w !=0.0)
            self.errors.append(err)
        return self
    
    def net_input(self, X):
        return np.dot(X, self.weight[1:]) + self.weight[0]
    
    def predict(self, X):
        return np.where(self.net_input(X) >= 0.0, 1, -1)

In [0]:
for iteration in range(5):
    weighted_sum = np.dot(X, weights)
    
    activated_output = sigmoid(weighted_sum)
    
    error = y - activated_output
    
    adjustments = error * sigmoid_derivative(activated_output)
    
    weights = weights + np.dot(X.T, adjustments)
    
print("weights")
print(weights)
print("training output")
print(activated_output)
print("actual output")
print(y)

weights
[[613.41909554 613.41909554 613.41909554 ... 613.41909554 613.41909554
  613.41909554]
 [670.59791588 670.59791588 670.59791588 ... 670.59791588 670.59791588
  670.59791588]]
training output
[[1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 ...
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]
 [1. 1. 1. ... 1. 1. 1.]]
actual output
[1 1 1 ... 1 1 1]


In [0]:
activated_output[0][0:10]

array([1.00000000e+000, 1.00000000e+000, 1.00000000e+000, 1.45477193e-226,
       1.45477193e-226, 1.00000000e+000, 1.45477193e-226, 1.45477193e-226,
       1.45477193e-226, 1.45477193e-226])

In [0]:
# Since this is just a simple perceptron, our accuracy is high because our baseline is high. 

### Multilayer Perceptron <a id="Q3"></a>

Using the sample candy dataset, implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. Your Multilayer Perceptron should be implemented in Numpy. 
Your network must have one hidden layer.

Once you've trained your model, report your accuracy. Explain why your MLP's performance is considerably better than your simple perceptron's on the candy dataset. 

In [0]:
y = candy[['ate']].values

In [0]:
class NeuralNetwork:
    def __init__(self, inputs=2, hiddenNodes=4, outputNodes=1):
        self.inputs = inputs
        self.hiddenNodes = hiddenNodes
        self.outputNodes = outputNodes
        
        self.weights1 = np.random.random((self.inputs, self.hiddenNodes))
        self.weights2 = np.random.random((self.hiddenNodes, self.outputNodes))
        
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        self.hidden_sum = np.dot(X, self.weights1)
        
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        
        self.activated_output = self.sigmoid(self.output_sum)
        
        return self.activated_output
    
    def backward(self, X, y, o):
        
        self.o_error = y - o
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        self.z2_error = self.o_delta.dot(self.weights2.T)
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 = self.weights1 + X.T.dot(self.z2_delta)
        self.weights2 = self.weights2 + self.activated_hidden.T.dot(self.o_delta)
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [0]:
# Training MLP
nn = NeuralNetwork()

# Number of Epochs / Iterations
for i in range(10000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('-----' * 2 + f'EPOCH {i+1}' + '-----' * 2)
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
    nn.train(X, y)

----------EPOCH 1----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.72380462]
 [0.71155978]
 [0.72380462]
 ...
 [0.72380462]
 [0.72380462]
 [0.71155978]]
Loss: 
 0.29606941798280395
----------EPOCH 2----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.24516033]
 [0.0500345 ]
 [0.24516033]
 ...
 [0.24516033]
 [0.24516033]
 [0.0500345 ]]
Loss: 
 0.4241596963382991
----------EPOCH 3----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.49997392]
 [0.49994386]
 [0.49997392]
 ...
 [0.49997392]
 [0.49997392]
 [0.49994386]]
Loss: 
 0.20116832629025336
----------EPOCH 4----------
Input: 
 [[0 1]
 [1 0]
 [0 1]
 ...
 [0 1]
 [0 1]
 [1 0]]
Actual Output: 
 [[1]
 [1]
 [1]
 ...
 [1]
 [1]
 [1]]
Predicted Output: 
 [[0.49997466]
 

P.S. Don't try candy gummy bears. They're disgusting. 

#### 0.2 is acceptable, I guess.

## 3. Keras MMP <a id="Q3"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV or RandomSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 3 parameters in order to get a 3 on this section.

In [0]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
177,64,1,2,140,335,0,1,158,0,0.0,2,0,2,0
63,41,1,1,135,203,0,1,132,0,0.0,1,0,1,1
213,61,0,0,145,307,0,0,146,1,1.0,1,0,3,0
22,42,1,0,140,226,0,1,178,0,0.0,2,0,2,1
159,56,1,1,130,221,0,0,163,0,0.0,2,0,3,1


In [0]:
df.target.value_counts()

1    165
0    138
Name: target, dtype: int64

In [0]:
from sklearn.model_selection import train_test_split, GridSearchCV
X = df.drop(columns='target').values
y = df[['target']].values

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)

scaler = StandardScaler()
X_train = scaler.fit_transform(x_train)
X_test = scaler.transform(x_test)

In [0]:
!pip install tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

Collecting tensorflow
  Downloading https://files.pythonhosted.org/packages/54/5f/e1b2d83b808f978f51b7ce109315154da3a3d4151aa59686002681f2e109/tensorflow-2.0.0-cp37-cp37m-win_amd64.whl (48.1MB)
Collecting grpcio>=1.8.6 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/4c/e5/79974f0288e36be3205e71f91e0dbe2a5746ccaa84780c65c4d75fa4b269/grpcio-1.24.1-cp37-cp37m-win_amd64.whl (1.6MB)
Collecting google-pasta>=0.1.6 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/d0/33/376510eb8d6246f3c30545f416b2263eee461e40940c2a4413c711bdf62d/google_pasta-0.1.7-py3-none-any.whl (52kB)
Collecting gast==0.2.2 (from tensorflow)
Collecting astor>=0.6.0 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/d1/4f/950dfae467b384fc96bc6469de25d832534f6b4441033c39f914efd13418/astor-0.8.0-py2.py3-none-any.whl
Collecting keras-preprocessing>=1.0.5 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/28/6a/8c1f62c37212d9fc441a7e26736df

In [0]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
       
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [10, 50, 100, 200],
            'epochs': [10],
            'learning_rate': [.01],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Best: 0.7241379231067714 using {'batch_size': 10, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.7241379231067714, Stdev: 0.03876268104706246 with: {'batch_size': 10, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.5911330008154432, Stdev: 0.02795490678553894 with: {'batch_size': 50, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.6305418610572815, Stdev: 0.10143038490289724 with: {'batch_size': 100, 'epochs': 10, 'learning_rate': 0.01}
Means: 0.5566502438096578, Stdev: 0.1059594608381056 with: {'batch_size': 200, 'epochs': 10, 'learning_rate': 0.01}


In [0]:
def create_model(learning_rate = 0.01):
    model = Sequential()
    model.add(Dense(14, input_dim=13, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = Adam(learning_rate=learning_rate)
    
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid ={'batch_size': [100],
            'epochs': [50],
            'learning_rate': [1, .1, .01],
            }

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.7881773595739467 using {'batch_size': 100, 'epochs': 50, 'learning_rate': 1}
Means: 0.7881773595739467, Stdev: 0.026166544045345456 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 1}
Means: 0.7684728964208969, Stdev: 0.019691996751114833 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.1}
Means: 0.7881773507653786, Stdev: 0.014714384541885745 with: {'batch_size': 100, 'epochs': 50, 'learning_rate': 0.01}


In [0]:
from sklearn.metrics import accuracy_score

y_pred = grid_result.best_estimator_.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 0.76
