<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**
A Neuron is the basic unit of a neural network. It takes N inputs, weighs them and optionally applies a bias. It then applies an activation function to the sum of all inputs, weights, and bias. Depending on the actiation function, the neuron might activate, not activate, or send a weak signal to the next layer. 
- **Input Layer:**
The input layer is the first layer of the neural network, it will have number of nodes equal to the number of features in the data and apply no activation function, but pass the inputs to the first hidden layer.
- **Hidden Layer:**
The hidden layers do not produce inputs or outputs, they are made up of neurons and behave due to the above description of neurons: they apply weights to the inputs they receive, may add bias, and then apply an activation function, possibly activating and sending a signal to the next layer.

- **Output Layer:**
The output layer will determine the shape of the output from the neural network, for a regression function, it will be a single neuron applying a linear function to the input.

For a binary classification function, it will use a binary activation function, for a multi-classificaiton function, it will use a softmax function.

The number of neurons in the output layer will correspond to the number of output features.

- **Activation:**

Neurons activate according to their activation functions. Sigmoid, Relu, and step are some possible functions, based on the inputs and weights (and bias) applied to the inputs, the function will (or will not) produce an output.

Choosing the activation function depends on the nature of the problem. Relu functions produce a linearly increasing output once the input signal is above zero, and nothing when the input signal is below zero. 

Step functions produce a signal of 1 if the input signal is above zero, and zero below it.

Sigmoid (or tanh which is a transformed sigmoid function) provide a signal of 1 at high inputs and zero at low inputs, with an inbetween output signal between 6 and -6.

- **Backpropagation:**

The process by which neural networks learn. During training, the outputs of the neural network are compared to the true y values to generate an error term, this error term is multiplied by the derivative of the activation function of the output of each layer of the neural network (except the input layer that has no activation function) to create a delta term. The weights of the inputs to each layer are then multiplied by delta term.

Intuitively, we are updating the weights of inputs to our layers by the derivative of our activation functions so as to reduce the error or loss function of our model.


## 2. Perceptron on AND Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [175]:
import numpy as np

# first we format our inputs and outputs

inputs = np.array(([1,1,1, 0],
                   [1,0,1, 0],
                   [0,1,1, 0],
                   [0,0,1, 0]), dtype=float)

correct_outputs = np.array(([1], [0], [0], [0]), dtype=float)

In [176]:
# this is a single perceptron neural network
class Perceptron():
    def __init__(self, input_size = 4, outputNodes = 1):
  # Set up Architecture of neural network
        self.input = input_size 
        self.outputNodes = outputNodes
        # Initial Weights
        # 3x4 matrix for first layer
        self.weights1 = np.random.randn(self.input, self.outputNodes)
        
        
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forawrd

        """
        #weighted sum of inputs and hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        #Activations of weighted sums
        self.activated_output = self.sigmoid(self.hidden_sum)
        


        return self.activated_output
    
    def get_attributes(self):
        
        attributes = ['weights1', 'hidden_sum', 'activated_hidden', 'weights2', 'output']

        [print(i + '\n', getattr(nn,i), '\n' + '---'*3) for i in dir(nn) if i in attributes]

    def backward(self, X, y, o):
        """
        Backward propagate through the network
        """
        self.o_error = y - o #error in output
        self.o_delta = self.o_error * self.sigmoidPrime(o) 
        # apply derivative of sigmoid to error

        self.weights1 += X.T.dot(self.o_delta) #Adjust first set (input => hidden) weights

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)
        

In [177]:
p = Perceptron()

X = inputs
y = correct_outputs

for i in range(10000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 50 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
        print('Actual Output: \n', y)
        print('Predicted Output: \n', str(p.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - p.feed_forward(X)))))

+---------EPOCH 1---------+
Input: 
 [[1. 1. 1. 0.]
 [1. 0. 1. 0.]
 [0. 1. 1. 0.]
 [0. 0. 1. 0.]]
Actual Output: 
 [[1.]
 [0.]
 [0.]
 [0.]]
Predicted Output: 
 [[0.68491864]
 [0.30493931]
 [0.68013858]
 [0.3002838 ]]
Loss: 
 0.18625577549629527
+---------EPOCH 2---------+
Input: 
 [[1. 1. 1. 0.]
 [1. 0. 1. 0.]
 [0. 1. 1. 0.]
 [0. 0. 1. 0.]]
Actual Output: 
 [[1.]
 [0.]
 [0.]
 [0.]]
Predicted Output: 
 [[0.68491864]
 [0.30493931]
 [0.68013858]
 [0.3002838 ]]
Loss: 
 0.18625577549629527
+---------EPOCH 3---------+
Input: 
 [[1. 1. 1. 0.]
 [1. 0. 1. 0.]
 [0. 1. 1. 0.]
 [0. 0. 1. 0.]]
Actual Output: 
 [[1.]
 [0.]
 [0.]
 [0.]]
Predicted Output: 
 [[0.68491864]
 [0.30493931]
 [0.68013858]
 [0.3002838 ]]
Loss: 
 0.18625577549629527
+---------EPOCH 4---------+
Input: 
 [[1. 1. 1. 0.]
 [1. 0. 1. 0.]
 [0. 1. 1. 0.]
 [0. 0. 1. 0.]]
Actual Output: 
 [[1.]
 [0.]
 [0.]
 [0.]]
Predicted Output: 
 [[0.68491864]
 [0.30493931]
 [0.68013858]
 [0.3002838 ]]
Loss: 
 0.18625577549629527
+---------EPOCH 5---

In [None]:
# my loss function is not reducing, not sure why

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [178]:
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# download data and seperate columns into categorical vs continuous values

colnames = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 
            'restecg', 'thalach', 'exang', 'oldspeak', 'slope', 'ca', 'thal', 'target']

onehot = ['sex', 'cp', 'fbs', 'restecg','exang', 'slope', 'ca', 'thal']

scale = ['age', 'trestbps', 'chol', 'thalach', 'oldspeak' ]

df = pd.read_csv('processed.cleveland.data', names=colnames)


In [234]:
# clean out some nans, cast object columns to numeric and encode the target as a 
# binary variable
df.replace('?', np.nan, inplace=True)
df.dropna(inplace=True)
df['ca'] = pd.to_numeric(df['ca']) 
df['thal'] = pd.to_numeric(df['thal'])
df['target'] = df['target'].map({0:0, 1:1, 2:1, 3:1, 4:1})

In [235]:
# encode data using a column transformer
cleaning_trans = ColumnTransformer(
    [
    ('scaler', StandardScaler(), scale),
    ('hot', OneHotEncoder(), onehot)],
    n_jobs=-1, remainder='passthrough', verbose=True)

In [236]:
cleaning_trans.fit_transform(df)
X = df.drop(columns='target').values
y = df['target'].values
X.shape, y.shape

((297, 13), (297,))

In [237]:
class NueralNetwork():
    def __init__(self, input_size = 13, hiddenNodes = 13, outputNodes = 1):
#     def __init__(self, input_size, hiddenNodes, outputNodes):
        # Set up Architecture of neural network
        self.input = input_size 
        self.hiddenNodes = hiddenNodes
        self.outputNodes = outputNodes
        
        # Initial Weights
        # 3x4 matrix for first layer
        self.weights1 = np.random.randn(self.input, self.hiddenNodes)
        #4x1 matrix for hidden to output layer
        self.weights2 = np.random.randn(self.hiddenNodes, self.outputNodes)
        
    def sigmoid(self, s):
        return 1 / (1 + np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def feed_forward(self, X):
        """
        Calculate the NN inference using feed forawrd

        """
        #weighted sum of inputs and hidden layer
        self.hidden_sum = np.dot(X, self.weights1)
        
        #Activations of weighted sums
        self.activated_hidden = self.sigmoid(self.hidden_sum)
        

        #Weighted sum between hidden and output
        self.output_sum = np.dot(self.activated_hidden, self.weights2)

        # Final Activation of output
        self.activated_output = self.sigmoid(self.output_sum)


        return self.activated_output
    
    def get_attributes(self):
        
        attributes = ['weights1', 'hidden_sum', 'activated_hidden', 'weights2', 'output']

        [print(i + '\n', getattr(nn,i), '\n' + '---'*3) for i in dir(nn) if i in attributes]

    def backward(self, X, y, o):
        """
        Backward propagate through the network
        """
        self.o_error = y - o #error in output
        self.o_delta = self.o_error * self.sigmoidPrime(o) # apply derivative of sigmoid to error

        self.z2_error = self.o_delta.dot(self.weights2.T) # z2 error: how much our hidden layer weights were off
        self.z2_delta = self.z2_error*self.sigmoidPrime(self.activated_hidden)

        self.weights2 += self.activated_hidden.T.dot(self.o_delta) #adjust second set (hidden => output) weights
        self.weights1 += X.T.dot(self.z2_delta) #Adjust first set (input => hidden) weights

    def train(self, X, y):
        o = self.feed_forward(X)
        self.backward(X, y, o)

In [202]:
# reshape y for network compatibility
y = y.reshape(-1,1)
y.shape, X.shape

((297, 1), (297, 13))

In [203]:
new_nn = NueralNetwork()

for i in range(1000):
    if (i+1 in [1,2,3,4,5]) or ((i+1) % 1000 ==0):
        print('+' + '---' * 3 + f'EPOCH {i+1}' + '---'*3 + '+')
        print('Input: \n', X)
#         print('Actual Output: \n', y)
#         print('Predicted Output: \n', str(new_nn.feed_forward(X)))
        print("Loss: \n", str(np.mean(np.square(y - new_nn.feed_forward(X)))))
    new_nn.train(X, y)
# print("Loss: \n", str(np.mean(np.square(y - new_nn.feed_forward(X)))))

+---------EPOCH 1---------+
Input: 
 [[63.  1.  1. ...  3.  0.  6.]
 [67.  1.  4. ...  2.  3.  3.]
 [67.  1.  4. ...  2.  2.  7.]
 ...
 [68.  1.  4. ...  2.  2.  7.]
 [57.  1.  4. ...  2.  1.  7.]
 [57.  0.  2. ...  2.  1.  3.]]
Loss: 
 0.42392537888364434
+---------EPOCH 2---------+
Input: 
 [[63.  1.  1. ...  3.  0.  6.]
 [67.  1.  4. ...  2.  3.  3.]
 [67.  1.  4. ...  2.  2.  7.]
 ...
 [68.  1.  4. ...  2.  2.  7.]
 [57.  1.  4. ...  2.  1.  7.]
 [57.  0.  2. ...  2.  1.  3.]]
Loss: 
 0.5387205387205387
+---------EPOCH 3---------+
Input: 
 [[63.  1.  1. ...  3.  0.  6.]
 [67.  1.  4. ...  2.  3.  3.]
 [67.  1.  4. ...  2.  2.  7.]
 ...
 [68.  1.  4. ...  2.  2.  7.]
 [57.  1.  4. ...  2.  1.  7.]
 [57.  0.  2. ...  2.  1.  3.]]
Loss: 
 0.5387205387205387
+---------EPOCH 4---------+
Input: 
 [[63.  1.  1. ...  3.  0.  6.]
 [67.  1.  4. ...  2.  3.  3.]
 [67.  1.  4. ...  2.  2.  7.]
 ...
 [68.  1.  4. ...  2.  2.  7.]
 [57.  1.  4. ...  2.  1.  7.]
 [57.  0.  2. ...  2.  1.  3.]]
Lo

In [193]:
# loss increases, again not sure why

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [154]:
# create a baseline model

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import train_test_split

# Random Seed
seed = 2001
np.random.seed(seed)

# Important Hyperparameters
inputs = X.shape[1]
epochs = 150
batch_size = 20

# Create our model
model = Sequential()

# input and hidden
model.add(Dense(16, input_dim = inputs, activation='relu'))
# model.add(Dropout(0.3))
# model.add(Dense(64, activation='sigmoid'))
model.add(Dense(1, activation='relu'))

#compile
model.compile(loss='binary_crossentropy',
               optimizer = 'adam',
               metrics=['accuracy'])


# Manual Validation Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, 
                                                    random_state=seed)

In [155]:
history = model.fit(X_train, y_train, batch_size=batch_size, epochs = epochs, validation_split=.1, verbose =0)
scores = model.evaluate(X_test, y_test)
print(f'{model.metrics_names[1]}: {scores[1]*100}')

acc: 50.0


In [204]:
# now to hyper paramater tune
# I'm going to optimize batch size 

from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

# Random Seed
seed = 2001
np.random.seed(seed)

# Important Hyperparameters
inputs = X.shape[1]
# epochs = 50
# batch_size = 32

# Create our model function for the kerasclassifier wrapper
def create_model():
    model = Sequential()
    # input and hidden
    model.add(Dense(32, input_dim = inputs, activation='relu'))
#     model.add(Dropout(0.3))
#     model.add(Dense(16, activation='relu'))
    model.add(Dense(1, activation='relu'))

    #compile
    model.compile(loss='binary_crossentropy',
                   optimizer = 'adam',
                   metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# create hyper paramaters to optomize
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# create a grid search
grid = GridSearchCV(estimator=model, cv=3, param_grid=param_grid, n_jobs=-1, verbose=0)
grid_results = grid.fit(X, y)


In [205]:
# Report Results
print(f"Best: {grid_results.best_score_} using {grid_results.best_params_}")
means = grid_results.cv_results_['mean_test_score']
stds = grid_results.cv_results_['std_test_score']
params = grid_results.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.5420875350634257 using {'batch_size': 60, 'epochs': 20}
Means: 0.49494948983192444, Stdev: 0.042854957712725565 with: {'batch_size': 10, 'epochs': 20}
Means: 0.5050505002339681, Stdev: 0.04285494366377606 with: {'batch_size': 20, 'epochs': 20}
Means: 0.5252525210380554, Stdev: 0.03298974560640662 with: {'batch_size': 40, 'epochs': 20}
Means: 0.5420875350634257, Stdev: 0.04832558313939803 with: {'batch_size': 60, 'epochs': 20}
Means: 0.5420875350634257, Stdev: 0.09523322062967969 with: {'batch_size': 80, 'epochs': 20}
Means: 0.5353535215059916, Stdev: 0.021820670271058235 with: {'batch_size': 100, 'epochs': 20}


In [210]:
# batch size looks good at 60
# now I will try different optimizer functions
optimizers = ['Adam', 'Adagrad', 'Adadelta', 'Adamax']
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']

model = Sequential()
# input and hidden
model.add(Dense(32, input_dim = inputs, activation='relu'))
#     model.add(Dropout(0.3))
#     model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='relu'))

#compile
model.compile(loss='binary_crossentropy',
               optimizer = 'adagrad',
               metrics=['accuracy'])

batch_size= 60
epochs= 20

history = model.fit(X_train, y_train, batch_size=batch_size, epochs = epochs, 
                    validation_split=.1, verbose =0)
scores = model.evaluate(X_test, y_test)
print(f'{model.metrics_names[1]}: {scores[1]*100}')

W0719 10:13:08.547494 140187306571584 deprecation.py:506] From /home/nedderlander/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/adagrad.py:105: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


acc: 46.666666865348816


In [212]:
model = Sequential()
# input and hidden
model.add(Dense(32, input_dim = inputs, activation='relu'))
#     model.add(Dropout(0.3))
#     model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='relu'))

#compile
model.compile(loss='binary_crossentropy',
               optimizer = 'adamax',
               metrics=['accuracy'])

batch_size= 60
epochs= 20

history = model.fit(X_train, y_train, batch_size=batch_size, epochs = epochs, 
                    validation_split=.1, verbose =0)
scores = model.evaluate(X_test, y_test)
print(f'{model.metrics_names[1]}: {scores[1]*100}')

acc: 53.33333611488342


In [221]:
# adamax seems to have a slight improvement
# so I will tune epochs with the adamax optimizer

# Random Seed
seed = 2001
np.random.seed(seed)

# Important Hyperparameters
inputs = X.shape[1]
# epochs = 50
# batch_size = 32

# Create our model function for the kerasclassifier wrapper
def create_model():
    model = Sequential()
    # input and hidden
    model.add(Dense(32, input_dim = inputs, activation='relu'))
#     model.add(Dropout(0.3))
#     model.add(Dense(16, activation='relu'))
    model.add(Dense(1, activation='relu'))

    #compile
    model.compile(loss='binary_crossentropy',
                   optimizer = 'adamax',
                   metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# create hyper paramaters to optomize
param_grid = {'batch_size': [60],
              'epochs': [100, 150, 250],
              'shuffle': [True, False],
             'validation_data': [(X_test,y_test)],
             'class_weight': [{0:1, 1:10}, {0:0.5, 1:100}]}

# create a grid search
grid = GridSearchCV(estimator=model, cv=3, param_grid=param_grid, n_jobs=-1, verbose=0)
grid_results = grid.fit(X_train, y_train)


In [222]:
# Report Results
print(f"Best: {grid_results.best_score_} using {grid_results.best_params_}")
means = grid_results.cv_results_['mean_test_score']
stds = grid_results.cv_results_['std_test_score']
params = grid_results.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

Best: 0.5280898809432983 using {'batch_size': 60, 'class_weight': {0: 0.5, 1: 100}, 'epochs': 100, 'shuffle': True, 'validation_data': (array([[6.50e+01, 1.00e+00, 4.00e+00, 1.20e+02, 1.77e+02, 0.00e+00,
        0.00e+00, 1.40e+02, 0.00e+00, 4.00e-01, 1.00e+00, 0.00e+00,
        7.00e+00],
       [5.80e+01, 1.00e+00, 4.00e+00, 1.46e+02, 2.18e+02, 0.00e+00,
        0.00e+00, 1.05e+02, 0.00e+00, 2.00e+00, 2.00e+00, 1.00e+00,
        7.00e+00],
       [4.70e+01, 1.00e+00, 4.00e+00, 1.10e+02, 2.75e+02, 0.00e+00,
        2.00e+00, 1.18e+02, 1.00e+00, 1.00e+00, 2.00e+00, 1.00e+00,
        3.00e+00],
       [5.70e+01, 1.00e+00, 4.00e+00, 1.10e+02, 2.01e+02, 0.00e+00,
        0.00e+00, 1.26e+02, 1.00e+00, 1.50e+00, 2.00e+00, 0.00e+00,
        6.00e+00],
       [3.40e+01, 1.00e+00, 1.00e+00, 1.18e+02, 1.82e+02, 0.00e+00,
        2.00e+00, 1.74e+02, 0.00e+00, 0.00e+00, 1.00e+00, 0.00e+00,
        3.00e+00],
       [6.60e+01, 0.00e+00, 1.00e+00, 1.50e+02, 2.26e+02, 0.00e+00,
        0.00e+00, 1.1

In [229]:
model = grid_results.best_estimator_

In [233]:
# our best estimator is thus
model.get_params()

{'verbose': 0,
 'batch_size': 60,
 'class_weight': {0: 0.5, 1: 100},
 'epochs': 100,
 'shuffle': True,
 'validation_data': (array([[6.50e+01, 1.00e+00, 4.00e+00, 1.20e+02, 1.77e+02, 0.00e+00,
          0.00e+00, 1.40e+02, 0.00e+00, 4.00e-01, 1.00e+00, 0.00e+00,
          7.00e+00],
         [5.80e+01, 1.00e+00, 4.00e+00, 1.46e+02, 2.18e+02, 0.00e+00,
          0.00e+00, 1.05e+02, 0.00e+00, 2.00e+00, 2.00e+00, 1.00e+00,
          7.00e+00],
         [4.70e+01, 1.00e+00, 4.00e+00, 1.10e+02, 2.75e+02, 0.00e+00,
          2.00e+00, 1.18e+02, 1.00e+00, 1.00e+00, 2.00e+00, 1.00e+00,
          3.00e+00],
         [5.70e+01, 1.00e+00, 4.00e+00, 1.10e+02, 2.01e+02, 0.00e+00,
          0.00e+00, 1.26e+02, 1.00e+00, 1.50e+00, 2.00e+00, 0.00e+00,
          6.00e+00],
         [3.40e+01, 1.00e+00, 1.00e+00, 1.18e+02, 1.82e+02, 0.00e+00,
          2.00e+00, 1.74e+02, 0.00e+00, 0.00e+00, 1.00e+00, 0.00e+00,
          3.00e+00],
         [6.60e+01, 0.00e+00, 1.00e+00, 1.50e+02, 2.26e+02, 0.00e+00,
   