# Neural Networks Sprint Challenge

## 1) Define the following terms:

- Neuron
- Input Layer
- Hidden Layer
- Output Layer
- Activation
- Backpropagation

Neuron: a container that holds a single scalar (usually a floating-point number) called an activation; neurons combine to form columns that make up either the input data or layers in a neural network. Also called 'node'.   
Input Layer: Ryan Allred mentioned during lecture that sometimes 'layer' is a misnomer with regard to input. In any case, this is the left-most information in a NN and is in the form of an array of rows by columns.  
Hidden Layer: this is a true layer, comprised of neurons, of which at least 1 is required to form a non-perceptron NN. Each hidden layer in a NN receives activations from the input or previous hidden layer, applies a weight to each activation then a bias, and forwards the new activations to each neuron in the next hidden layer or output layer.  
Output Layer: the right-most layer in a NN, this receives activations from the previous layer then returns final scalars, either integer or float, that provide an information array about the question of interest, eg, classification or regression.  
Activation: a single scalar found in a neuron.  
Backpropagation: an algorithmic process by which weights in a NN are revised in a backwards propagated fashion after each training epoch, ie last weight is revised, then one just prior, then one before that, etc.

## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [1]:
import numpy as np

# Define x1, x2, x3

x1 = [1, 1, 0, 1]
x2 = [1, 0, 1, 0]
x3 = [1, 1, 1, 1]
y_list = [1, 0, 0, 0]

X = np.array(list(zip(x1, x2, x3, np.ones(4))))
y = np.array([[val] for val in y_list])
X.shape, y.shape

((4, 4), (4, 1))

In [2]:
# Write perceptron class


class ANDPerceptron():
    def __init__(self, X, y, niter=100):
        self.X = X
        self.y = y
        self.niter = niter
       
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_prime(self, x):
        return self.sigmoid(x) * (1 - self.sigmoid(x))
    
    def fit(self):
        # Create weights
        weights = 2 * np.random.random((self.X.shape[0], self.X.shape[1])) - 1

        for iteration in range(self.niter):
            # Weighted sum of inputs and weights
            weighted_sum = np.dot(self.X, weights)

            # Activate with sigmoid function
            activated_output = self.sigmoid(weighted_sum)

            # Calculate Error
            error = self.y - activated_output

            # Calculate weight adjustments with sigmoid_derivative
            adjustments = error * self.sigmoid_prime(activated_output)

            # Update weights
            weights += np.dot(self.X.T, adjustments)
        
        print('optimized weights after training: ')
        print(weights)
        print('\ny:', y)
        print("\noutputs after training:")
        print(activated_output)

In [3]:
AP = ANDPerceptron(X, y)

In [4]:
AP.fit()

optimized weights after training: 
[[ 2.92611756  2.89688383  3.16407008  2.95721367]
 [ 3.74074392  3.70016481  3.93126604  3.76444709]
 [-3.15047259 -3.19682681 -2.93397234 -1.87306245]
 [-2.40487168 -2.30702194 -2.93880274 -3.7228538 ]]

y: [[1]
 [0]
 [0]
 [0]]

outputs after training:
[[0.75048007 0.74700151 0.77085915 0.75315822]
 [0.06783945 0.06928197 0.06297163 0.06724257]
 [0.14130851 0.14263029 0.12644057 0.13924864]
 [0.06783945 0.06928197 0.06297163 0.06724257]]


## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [5]:
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')

In [6]:
df.isna().sum().sum(), df.head()

(0,
    age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
 0   63    1   3       145   233    1        0      150      0      2.3      0   
 1   37    1   2       130   250    0        1      187      0      3.5      0   
 2   41    0   1       130   204    0        0      172      0      1.4      2   
 3   56    1   1       120   236    0        1      178      0      0.8      2   
 4   57    0   0       120   354    0        1      163      1      0.6      2   
 
    ca  thal  target  
 0   0     1       1  
 1   0     2       1  
 2   0     2       1  
 3   0     2       1  
 4   0     2       1  )

In [19]:
X = df.drop('target', axis=1)
y = df.target
# y = np.where(y == 0, -1, 1)  # recast each 0 to -1
y = np.array(y).reshape(-1, 1)
X.shape, y.shape

((303, 13), (303, 1))

In [15]:
class MultilayerPerceptron():
    def __init__(self):
        '''
        Define node size of input, hidden layer (one hidden layer only here), and output layer;
        plus weights (two). These values are all fixed
        '''
        self.input_size = 13
        self.hidden_layer_size = 4
        self.output_layer_size = 1
        
        # Weights (parameters)
        self.L1_weights = np.random.randn(self.input_size, self.hidden_layer_size)
        self.L2_weights = np.random.randn(self.hidden_layer_size, self.output_layer_size)
    
    def forward(self, X):
        '''Propagate inputs forward through network'''
        # Weighted sum between inputs and hidden layer
        self.hidden_sum = np.dot(X, self.L1_weights)  # WL calls this self.z2; summation is the idea
        # Activations of weighted sum
        self.activated_hidden = self.sigmoid(self.hidden_sum)  # WL calls this self.a2
        # Weighted sum between hidden layer and output layer
        self.output_sum = np.dot(self.activated_hidden, self.L2_weights)  # WL calls this self.z3
        y_hat = self.sigmoid(self.output_sum)  # called y_hat because is an estimate of output data 
        return y_hat
    
    def sigmoid(self, s):
        '''Apply sigmoid activation function to scalar, vector, or matrix'''
        return 1 / (1 + np.exp(-s))
    
    def sigmoid_prime(self, s):
        '''Calculate gradient of sigmoid'''
        return np.exp(-s) / ((1 + np.exp(-s))**2)
            
    def cost_function(self, X, y):
        '''
        Compute cost for given X, y, using weights already stored in class. Cost is a
        measure of how incorrect model is after at least one complete forward propagation
        '''
        self.y_hat = self.forward(X)
        J = 0.5 * sum((y - self.y_hat)**2)  # J is term for cost output unit
        return J
        
    def cost_function_prime(self, X, y):
        '''Compute derivative with respect to L1_weights and L2_weights for a given X and y'''
        self.y_hat = self.forward(X)
        
        delta3 = np.multiply(-(y - self.y_hat), self.sigmoid_prime(self.output_sum))
        dJdL2 = np.dot(self.activated_hidden.T, delta3)
        
        delta2 = np.dot(delta3, self.L2_weights.T) * self.sigmoid_prime(self.hidden_sum)
        dJdL1 = np.dot(X.T, delta2)  
        
        return dJdL1, dJdL2
    
    # Helper Functions for interacting with other classes
    def get_params(self):
        '''Get L1_weights and L2_weights unrolled into vector'''
        params = np.concatenate((self.L1_weights.ravel(), self.L2_weights.ravel()))
        return params
    
    def set_params(self, params):
        '''Set L1 and L2 using single parameter vector'''
        L1_start = 0
        L1_end = self.hidden_layer_size * self.input_size
        self.L1_weights = np.reshape(params[L1_start: L1_end], (self.input_size, self.hidden_layer_size))
        L2_end = L1_end + self.hidden_layer_size * self.output_layer_size
        self.L2_weights = np.reshape(params[L1_end: L2_end], (self.hidden_layer_size, self.output_layer_size))
    
    def compute_gradients(self, X, y):
        '''
        Returns the vector that takes us in the most downward direction along some function
        in hyperspace that has as many dimensions as we have weights--2, in this case
        '''
        dJdL1, dJdL2 = self.cost_function_prime(X, y)
        return np.concatenate((dJdL1.ravel(), dJdL2.ravel()))

In [9]:
from scipy import optimize

# Make trainer class - credit to Welch Labs

class trainer():
    def __init__(self, N):
        # Make Local reference to network
        self.N = N
    
    def callback_func(self, params):
        self.N.set_params(params)
        self.J.append(self.N.cost_function(self.X, self.y))   
        
    def cost_function_wrapper(self, params, X, y):
        self.N.set_params(params)
        cost = self.N.cost_function(X, y)
        grad = self.N.compute_gradients(X,y)
        
        return cost, grad
        
    def train(self, X, y):
        # Make an internal variable for the callback function
        self.X = X
        self.y = y

        # Make empty list to store costs
        self.J = []
        
        params0 = self.N.get_params()

        options = {'maxiter': 200, 'disp' : True}
        _res = optimize.minimize(self.cost_function_wrapper, params0, jac=True, method='BFGS', \
                                 args=(X, y), options=options, callback=self.callback_func)

        self.N.set_params(_res.x)
        self.optimization_results = _res

In [20]:
MLP = MultilayerPerceptron()

In [21]:
T = trainer(MLP)

In [22]:
T.train(X, y)

         Current function value: 38.796519
         Iterations: 0
         Function evaluations: 18
         Gradient evaluations: 6




## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [13]:
##### Your Code Here #####