<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
- **Input Layer:**
- **Hidden Layer:**
- **Output Layer:**
- **Activation:**
- **Backpropagation:**

#### Neuron: a node that when structured are the building blocks for neural nets.

#### Input Layer: the first layer of nodes that take in the data to the neural network. the number of input nodes directly correspond with the number of features(dimensions) in the dataframe

#### Hidden Layer:  (if present) 0 or more layers of nodes of variable width, that take the data from the input node and perform calculations before passing to the output layer

#### Output layer The last layer in the neural network, it takes transformed data from the hidden layers and returns the results.

#### Activation:  The activation functions determine how 'on' or 'off' a node will be while processing the data

#### Backpropagation The mechanism by which neural networks learn by calculating gradient descent, and by using previous loss values to adjust the weights used to train the model.



## 2. Perceptron on XOR Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [1]:
import numpy as np
np.random.seed(42)

inputs = np.array([
    [1, 1, 1],
    [1, 0, 1],
    [0, 1, 1],
    [0, 0, 1]
])
correct_outputs = [[1], [0], [0], [0]]

In [2]:
def sigmoid(x):
    return 1 / (1 +np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1-sigmoid(x))

In [3]:
weights = 2 * np.random.random((3,1)) -1
weights

array([[-0.25091976],
       [ 0.90142861],
       [ 0.46398788]])

In [4]:
class Perceptron(object):
  def __init__(self, rate = 0.01, niter = 10):
    self.rate = rate
    self.niter = niter

  def fit(self, X, y):
    """Fit training data
    X : Training vectors, X.shape : [#samples, #features]
    y : Target values, y.shape : [#samples]
    """
    def sigmoid(x):
      return 1 / (1 +np.exp(-x))

    def sigmoid_derivative(x):
      return sigmoid(x) * (1-sigmoid(x))
    
    # weights
    self.weight = np.zeros(1 + X.shape[1])

    # Number of misclassifications
    self.errors = []  # Number of misclassifications

    for i in range(self.niter):
      err = 0
      for xi, target in zip(X, y):
        delta_w = self.rate * (target - self.predict(xi))
        self.weight[1:] += delta_w * xi
        self.weight[0] += delta_w
        err += int(delta_w != 0.0)
      self.errors.append(err)
    return self

  def net_input(self, X):
    """Calculate net input"""
    return np.dot(X, self.weight[1:]) + self.weight[0]

  def predict(self, X):
    """Return class label after unit step"""
    return np.where(self.net_input(X) >= 0.0, 1, 0)

In [5]:
pn = Perceptron(.1, 10)

In [6]:
pn.fit(inputs, correct_outputs)

<__main__.Perceptron at 0x110a85cf8>

In [7]:
import matplotlib.pyplot as plt
pn.fit(inputs, correct_outputs)
plt.plot(range(1, len(pn.errors) + 1), pn.errors, marker='o')
plt.xlabel('Epochs')
plt.ylabel('Number of misclassifications')
plt.show()

<Figure size 640x480 with 1 Axes>

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [8]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
183,58,1,2,112,230,0,0,165,0,2.5,1,1,3,0
60,71,0,2,110,265,1,0,130,0,0.0,2,1,2,1
124,39,0,2,94,199,0,1,179,0,0.0,2,0,2,1
93,54,0,1,132,288,1,0,159,1,0.0,2,1,2,1
63,41,1,1,135,203,0,1,132,0,0.0,1,0,1,1


In [9]:
X = df.drop('target', axis=1).values.astype('float32')
y = df['target'].values.reshape(-1,1)
X.shape, y.shape

((303, 13), (303, 1))

In [10]:
scaler = StandardScaler()
X = scaler.fit_transform(X)

In [11]:
df['target'].value_counts(normalize=True)

1    0.544554
0    0.455446
Name: target, dtype: float64

In [12]:
class NeuralNetwork:
    def __init__(self):
#         set up NN architecture/layout
        self.inputs = 13
        self.hiddenNodes = 40
        self.outputNodes = 1
        
#         Initialize the weights
        self.weights1 = np.random.randn(self.inputs, self.hiddenNodes) #2x3
        self.weights2 = np.random.randn(self.hiddenNodes, self.outputNodes) #3x1
        
    def sigmoid(self, s):
        return 1/(1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """Calculate the NN inference using feed forward"""
#         Weighted sum of inputs and hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
#         activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)
#         Weighted sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)
#         final activation for the output
        self.activated_output = self.sigmoid(self.output_sum)
        return self.activated_output
    
    def backward(self, X,y,o):
        """
        Backward propagate through the network
        """
        
        # Error in Output
        self.o_error = y - o
        
        # Apply Derivative of Sigmoid to error
        # How far off are we in relation to the Sigmoid f(x) of the output
        # ^- aka hidden => output
        self.o_delta = self.o_error * self.sigmoidPrime(o)
        
        # z2 error
        self.z2_error = self.o_delta.dot(self.weights2.T)
        # How much of that "far off" can explained by the input => hidden
        self.z2_delta = self.z2_error * self.sigmoidPrime(self.activated_hidden)
        
        # Adjustment to first set of weights (input => hidden)
        self.weights1 += X.T.dot(self.z2_delta)
        # Adjustment to second set of weights (hidden => output)
        self.weights2 += self.activated_hidden.T.dot(self.o_delta)
        

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X,y,o)

In [13]:
nn = NeuralNetwork()

for i in range(10000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
#         print('Input: \n', X)
#         print('Actual Output: \n', y)
#         print('Predicted Output: \n', str(nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - nn.feed_forward(X)))))
    nn.train(X,y)

+---------EPOCH 1---------+
Loss: 
 0.3192032685334596
+---------EPOCH 2---------+
Loss: 
 0.455440539348121
+---------EPOCH 3---------+
Loss: 
 0.45544041937712587
+---------EPOCH 4---------+
Loss: 
 0.45544029368579125
+---------EPOCH 5---------+
Loss: 
 0.455440161863255
+---------EPOCH 1000---------+
Loss: 
 0.10561278574445432
+---------EPOCH 2000---------+
Loss: 
 0.10561143317407622
+---------EPOCH 3000---------+
Loss: 
 0.09241217747014238
+---------EPOCH 4000---------+
Loss: 
 0.09241015273211459
+---------EPOCH 5000---------+
Loss: 
 0.09240982587596978
+---------EPOCH 6000---------+
Loss: 
 0.09240966098009316
+---------EPOCH 7000---------+
Loss: 
 0.09240952456365842
+---------EPOCH 8000---------+
Loss: 
 0.092408629248191
+---------EPOCH 9000---------+
Loss: 
 0.07920957786566152
+---------EPOCH 10000---------+
Loss: 
 0.07920876253275623


## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [14]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df = df.sample(frac=1)
print(df.shape)
df.head()

(303, 14)


Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
189,41,1,0,110,172,0,0,158,0,0.0,2,0,3,0
52,62,1,2,130,231,0,1,146,0,1.8,1,3,3,1
156,47,1,2,130,253,0,1,179,0,0.0,2,0,2,1
143,67,0,0,106,223,0,1,142,0,0.3,2,2,2,1


In [15]:
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV

In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [17]:
X_train.shape, X_test.shape, y_test.shape, y_train.shape

((242, 13), (61, 13), (61, 1), (242, 1))

In [18]:
model = Sequential()
model.add(Dense(20, activation ='relu', input_shape=(13,)))
model.add(Dense(20, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

W0913 10:32:00.912522 4546385344 deprecation.py:506] From /anaconda3/envs/U4-S2-NN/lib/python3.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0913 10:32:00.980687 4546385344 deprecation.py:323] From /anaconda3/envs/U4-S2-NN/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [19]:
model.fit(X_train, y_train, epochs=40, batch_size=10)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<tensorflow.python.keras.callbacks.History at 0x1a48c14eb8>

In [20]:
results = model.evaluate(X_test, y_test)
results



[0.5397239503313284, 0.7704918]

In [21]:
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Def the grid search params
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20, 40]}
# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(20, activation ='relu', input_shape=(13,)))
    model.add(Dense(20, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# Create model
model = KerasClassifier(build_fn=create_model, verbose=1)

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X_train, y_train)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 



Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoc



Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40
Best: 0.8264462905974428 using {'batch_size': 20, 'epochs': 40}
Means: 0.8016528987194881, Stdev: 0.03562412269927396 with: {'batch_size': 10, 'epochs': 20}
Means: 0.8223140434292723, Stdev: 0.04175143513582783 with: {'batch_size': 10, 'epochs': 40}
Means: 0.8099173519236982, Stdev: 0.005366696667640126 with: {'batch_size': 20, 'epochs': 20}
Means: 0.8264462905974428, Stdev: 0.03723886833530126 with: {'batch_size': 20, 'epochs': 40}
Means: 0.760330570877091, Stdev: 0.0152280337025441 with: {'batch_size': 40, 'epochs': 20}
M

In [22]:
grid_result.score(X_test, y_test)



0.75409836