<a href="https://colab.research.google.com/github/Higgins2718/DS-Unit-4-Sprint-2-Neural-Networks/blob/master/LS_DS_Unit_4_Sprint_Challenge_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
A neuron is a cell in the brain, consisting of an axon, soma, and dendrites. Information is passed from neuron to neuron in the brain in the form of electrochemical signals. An artifical neuron is loosly based on this; it is a function that receives at least one weighted input and sums them to make an activated output (the biological equivalent being an action potential). Biological similarities more or less stop here, with the artifical neuron and the way it receives and sends information. At a higher level, there are more differences than similarities.
- **Input Layer:**
The input layer is the first layer of the artificial neural network. It will receive at least one input—and sometimes many hundreds. Number of input nodes is typically correlated with the complexity of the task at hand. After receiving inputs, this layer will update the weights and send information to the hidden layer.
- **Hidden Layer:**
Deep learning is called deep learning because there are typically a good deal of hidden layers. We cannot view the ouput of a hidden layer; they are directly connected to either another hidden layer or the output layer.  Due to this, deep learning models are sometimes referred to as "black boxes".

- **Output Layer:**
The job of the output layer is... to output. A binary classification algorithm, like today's multilayer perceptron will have to output nodes, y1 and y2. If the model predicts "True" for a given input, y1 will be activated. 
- **Activation:**
The activation function (or transfer function) serves to define the output of a certain node. It determines whether a node sends information on to connected nodes in deeper layers. As mentioned above, its closest biological equivalent is the action potential, triggered when a neuron sends information down an axon. An action potential either fires or does not fire, depending on whether the threshold is met. The activation function is similarly binary: the neuron either sends on information or it does not.
- **Backpropagation:**
Backpropogation is the means by which neural networks improve—it tells the ANN whether its prediction was correct or incorrect and the network adjusts its parameters accordingly. Typically, this is done by computing the negative gradient of the error and then going through the network backwards to adjust weights

## 2. Perceptron on XOR Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [0]:
##### Your Code Here #####

import numpy as np

np.random.seed(812)

inputs = np.array([
    [1,1,1,1],
    [1,0,1,0],
    [0,1,1,0],
    [0,0,1,0]
])

ground_truth = [[1], [1], [0], [0]]


def sigmoid(x):
  return 1 / (1+np.exp(-x))


def sigmoid_derivative(x):
  sx = sigmoid(x)
  return sx * (1-sx)


In [0]:
weights = 2 * np.random.random((4,1)) - 1

In [0]:
weighted_sum = np.dot(inputs, weights)

In [0]:
activated_output = sigmoid(weighted_sum)

In [7]:
error = ground_truth - activated_output
error

array([[ 0.76334538],
       [ 0.69750132],
       [-0.42513931],
       [-0.38022086]])

In [8]:
adjustments = error * sigmoid_derivative(activated_output)
adjustments

array([[ 0.18818912],
       [ 0.17044631],
       [-0.10162331],
       [-0.09170084]])

In [9]:
weights += np.dot(inputs.T, adjustments)
weights

array([[ 0.01181885],
       [ 0.27346586],
       [-0.32329961],
       [-0.33439226]])

In [10]:
for iteration in range(10000):
  weighted_sum = np.dot(inputs, weights)
  activated_output = sigmoid(weighted_sum)
  error = ground_truth - activated_output
  adjustments = error * sigmoid_derivative(activated_output)
  weights += np.dot(inputs.T, adjustments)

print("Weights after training")
print(weights)

print("Output after training")
print(activated_output)

Weights after training
[[13.98252118]
 [-1.77024514]
 [-6.90811174]
 [ 3.5375337 ]]
Output after training
[[9.99855428e-01]
 [9.99154141e-01]
 [1.70220561e-04]
 [9.98742530e-04]]


## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [0]:
import pandas as pd

In [0]:
df = pd.read_csv("https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv")

In [148]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [149]:
df.columns

Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',
       'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],
      dtype='object')

In [0]:
from sklearn.preprocessing import StandardScaler

# Split —> np arrays
y = df[['target']].values
X = df.drop("target", axis='columns').values

# Standard scaler...
scaler = StandardScaler()
X = scaler.fit_transform(X)
print(X)
y = scaler.fit_transform(y)
print(y)

In [0]:
y

In [236]:
X.shape, y.shape

((303, 13), (303, 1))

In [0]:
class NeuralNetwork:
    def __init__(self):
      
      # Create architecture here...
        self.input = 13
        self.hiddenNodes = 1
        self.outputNodes = 1
        
        # Set weights
        self.weights1 = np.random.randn(self.input,self.hiddenNodes)
        self.weights2 = np.random.randn(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1+np.exp(-s))
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self,X):
       
        #Sum inputs + hidden
        self.hidden_sum = np.dot(X, self.weights1)
        
        # Activate weighted sum here
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        
        # Get weighted sum of hiddne and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
        print
        
        self.activated_output = self.sigmoid(self.output_sum)
        #print("Activated output", self.activated_output)
        return self.activated_output
    def backward(self, X, y, o):
       
        self.o_error = y - o 
        self.o_delta = self.o_error * self.sigmoidPrime(o) 
        
        self.z2_error = self.o_delta.dot(self.weights2.T) 
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)
        
        self.weights1 += X.T.dot(self.z2_delta) 
        self.weights2 += self.activated_hidden.T.dot(self.o_delta) 
        
    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [241]:
for i in range(1000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 50 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
    nn.train(X,y)

+---------EPOCH 1---------+
Input: 
 [[ 0.9521966   0.68100522  1.97312292 ... -2.27457861 -0.71442887
  -2.14887271]
 [-1.91531289  0.68100522  1.00257707 ... -2.27457861 -0.71442887
  -0.51292188]
 [-1.47415758 -1.46841752  0.03203122 ...  0.97635214 -0.71442887
  -0.51292188]
 ...
 [ 1.50364073  0.68100522 -0.93851463 ... -0.64911323  1.24459328
   1.12302895]
 [ 0.29046364  0.68100522 -0.93851463 ... -0.64911323  0.26508221
   1.12302895]
 [ 0.29046364 -1.46841752  0.03203122 ... -0.64911323  0.26508221
  -0.51292188]]
Actual Output: 
 [[ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0.91452919]
 [ 0

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



[[4.99999984e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [2.41137474e-18]
 [5.00000000e-01]
 [5.00000000e-01]
 [4.99999490e-01]
 [5.00000000e-01]
 [2.41152997e-18]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [4.99999999e-01]
 [3.46163194e-18]
 [5.00000000e-01]
 [2.41137471e-18]
 [5.00000000e-01]
 [4.99999720e-01]
 [2.41550666e-18]
 [5.00000000e-01]
 [5.00000000e-01]
 [4.99999998e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [4.83577280e-01]
 [5.00000000e-01]
 [2.41137471e-18]
 [5.00000000e-01]
 [4.54146634e-01]
 [4.99999955e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [4.99999861e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [2.41137471e-18]
 [4.99999993e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [5.00000000e-01]
 [2.41137472e-18]
 [2.41137471e-18]
 [5.00000000e-01]
 [5.00000000e-01]
 [3.126263

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [0]:
df2 = pd.read_csv("https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv")

In [156]:
import keras
from keras.models import Sequential
from keras.layers import Dense
import pandas as pd
import numpy

numpy.random.seed(42)

Using TensorFlow backend.


In [165]:
from sklearn.preprocessing import StandardScaler

# Split —> np arraysy = df.target.values
y = df2.target.values

X = df2.drop("target", axis='columns').values

# Standard scaler...
scaler = StandardScaler()
X = scaler.fit_transform(X)
print(X)

[[ 0.9521966   0.68100522  1.97312292 ... -2.27457861 -0.71442887
  -2.14887271]
 [-1.91531289  0.68100522  1.00257707 ... -2.27457861 -0.71442887
  -0.51292188]
 [-1.47415758 -1.46841752  0.03203122 ...  0.97635214 -0.71442887
  -0.51292188]
 ...
 [ 1.50364073  0.68100522 -0.93851463 ... -0.64911323  1.24459328
   1.12302895]
 [ 0.29046364  0.68100522 -0.93851463 ... -0.64911323  0.26508221
   1.12302895]
 [ 0.29046364 -1.46841752  0.03203122 ... -0.64911323  0.26508221
  -0.51292188]]


In [253]:
# IMprove these!
inputs = X.shape[1]
epochs = 50
batch_size = 10
# Create model below
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=(inputs,)))
model.add(Dense(64, activation='relu'))
model.add(Dense(1))
# Compile 
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
# Fit 
model.fit(X, y, validation_split=0.33, epochs=epochs, batch_size=batch_size, verbose=1)

Train on 203 samples, validate on 100 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7f91400bad30>

In [245]:
# Use scikit-learn to grid search the batch size and epochs
import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
inputs = X.shape[1]

def create_model(optimizer='adam'):
	# create model
	model = Sequential()
	model.add(Dense(64, input_dim=13, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
# split into input (X) and output (Y) variables
# Split —> np arraysy = df.target.values
Y = df2.target.values

X = df2.drop("target", axis='columns').values
# create model
model = KerasClassifier(build_fn=create_model, batch_size=10, verbose=10)
# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
epochs = [10, 20, 50, 100]
param_grid = dict(optimizer=optimizer, epochs=epochs)

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Best: 0.897690 using {'epochs': 20, 'optimizer': 'Nadam'}
0.455446 (0.413082) with: {'epochs': 10, 'optimizer': 'SGD'}
0.557756 (0.416339) with: {'epochs': 10, 'optimizer': 'RMSprop'}
0.577558 (0.369961) with: {'epochs': 10, 'optimizer': 'Adagrad'}
0.290429 (0.261373) with: {'epochs': 10, 'optimizer': 'Adadelta'}
0.531353 (0.105197) with: {'epochs': 10, 'optimizer': 'Adam'}
0.455446 (0.413082) with: {'epochs': 10, 'optimizer': 'Adamax'}
0.778878 (0.278443) with: {'epochs': 10, 'optimizer': 'Nadam'}
0.544554 (0.413082) with: {'epochs': 20, 'optimizer': 'SGD'}
0.590759 (0.352966) with: {'epochs': 20, 'optimizer': 'RMSprop'}
0.122112 (0.172693) with: {'epochs': 20, 'optimizer': 'Adagrad'}
0.788779 (0.298712) with: {'epochs': 20, 'optimizer': 'Adadelta'}
0.561056

# Highest score of .924092 achieved with 10 epochs and the SGD optimizer.