# Neural Networks Sprint Challenge

## 1) Define the following terms:

### Neuron  
     A neuron in a neural network is the set of inputs being passed, their associated synaptic weights, and the activation function that is applied to the associative weights.
### Input Layer
    Any layer is the highest level organization you think of for a grouping of units in a neural network. So when we speak of input layers, we're talking about the grouping of inputs that is connected by either the training data, or another linked model's output. The input layer will pass it's data to the hidden layer, if applicable. Some inputs that would not be passed are things like bias nodes. Often times data will be normalized before the input layer is created, but this is not to be confused with batch normalization, which occurs in the hidden layer(s).
    
### Hidden Layer
    Hidden layers recieve distributed data from the input layers or from proceeding layers in multi-layer networks. The hidden layer applies some transformative measure to the data being recieved from the input layer, most commonly an aggregation of the weighted inputs and an applied activation function, though this process can certainly deviate given more complex/specialized networks. The layer creates data/signal/logical processing to be delivered to the output layer.
    
### Output Layer
    The output layer is essentially just the finalized grouping of neurons that represents the transformative measures applied by the hidden layers to the data we initialized the network with from the input layer. The type of output we would expect to see will be determined by the applied activation function. For sigmoidial transforms, we expect probability clamps between 0-1. For tanh, we expect the data to be transformed between -1 and 1. You can also expect to see more complex data returned with things like Leaky Relu, which uses a small alpha value to transform values <= 0, and returns all other values as their raw inputs.
### Activation
    We're sort of beating a dead horse at this point as I've already explained what an activation function is and does, but we can talk about why they're neccessary. For example, why not just take the dot product of the inputs, their weights, and applied them to a transposed bias matrix to calculate their output? Well, that's sort of the beauty of neural networks. If we simply did the above suggestion, we'd fail to capture any non-linear relationships with the data. The whole point of activation functions is to find those complex relationships and make sense of them. There is no linear combination of inputs that can produce those non-linear outputs, so matrix activation is out of the question, though a great tool nevertheless. It's really important to understand your data, and the type of relationship you need. You have to think about your problem, for example, i need specific activation functions for classification. And even different activation functions for singular vs. multi-classification. 
    
More detailed process with components of the aggregation that proceeds the applied activation function.
![aggregation](https://cdn-images-1.medium.com/max/900/1*4MVN69gdM72BtTtY75tntg.png)
       
       b = bias
       x = input to neuron
       w = weights
       n = number of inputs from incoming layer
       i = counter from 0 to n

       1. We multiply every incoming input by it's corresponding weight.
       2. We sum total all the values.
       3. We add the bias to the neuron in question.

       We take all the weights and begin to throw them in matrices for processing.
           1. Create a  weight matrices from n-to-M
           2. A M-by-1 matrix is made from the biases
           3. We can view inputs as a n-by-1 matrix.
           4. We tranpose the weight matrix to M-by-n
           5. Find the dot product of the transposed weights and inputs. 
              Meaning we multiply M-by-n and n-by-1, which gives us M-by-1
           6. We add the output of this to the bias matrix.
           7. Now we can run an activation function on each value in the vector.
### Backpropagation
    We know that forward propagation is the applied matrix transformation being fed to our hidden layers/activations, and to our output, so we can approach this logically.
    We can think of back propagation in simple terms. An iterative process that seeks to optimize our forward propagation.
    More technically, back propagation is a gradient descent that seeks to determine the optimized weights and their associative bias for our inputs 
    We calculate that gradient by using the generalized application of the delta rule. 
    
\begin{eqnarray} 
  \delta^L_j = \frac{\partial C}{\partial a^L_j} \sigma'(z^L_j).
\tag{BP1}\end{eqnarray}
    
    It's a bit more complex than I can explain well at this time, but essentially
    1. ∂C/∂aLj is a measurement of how quickly our cost changes given the function of some layer, let's call it j's,  output.
   
      If C is not depedent on the neuron it was derived from, then we can expect the below to be a smaller value  
\begin{eqnarray}
    \delta^L_j  
 \end{eqnarray}
    
   
       For the right hand side of the equation, 
  \begin{eqnarray}\sigma'(z^L_j)\end{eqnarray}
  
       We are measuring how fast the activation function is changing.
       I need to study some more on how exactly everything is computed, as pieces of it go above my head, 
       but that's a high level overview of the equation.

In [86]:
import numpy as np
import pandas as pd

from csv import reader
from keras.layers import Dense
from sklearn.metrics import accuracy_score, classification_report
from keras.models import Sequential
from keras.optimizers import Adam
from keras.utils.np_utils import to_categorical
from math import exp
from random import seed
from random import random
from sklearn import model_selection


## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [4]:


class AndGatePerceptron(object):
    """A class that models a logical AND gate with a single layer neural network"""
    
    def __init__(self, input_length, epochs=100, learning_rate=0.01):
        """
        params: input_lenth: determines amount of weights needed when performing aggregation and simple activation overwrite for logical 
                epochs: iteration count for model
                learning_rate: the rate at which the weights are modified per iteration
        
        """
        self.epochs = epochs
        self.learning_rate = learning_rate
        # notice here we add the bias weight to our input weights.
        self.weights = np.zeros(input_length + 1)
           
    def predict(self, inputs):
        """ Function to aggregate the weights and the 'bias', the fake constant value starts as an init: defined by self.weights"""
        aggregate = np.dot(inputs, self.weights[1:]) + self.weights[0]
        if aggregate > 0:
            activation = 1
        else:
            activation = 0            
        return activation

    def train(self, input_data, label_data):
        """ Function to train on our inputs, and overwrite our weights and 'bias' for each iteration. This is some ugly code :D """
        for iteration in range(self.epochs):
            for inputs, label in zip(input, label_data):
                prediction = self.predict(inputs)
                self.weights[1:] += self.learning_rate * (label - prediction) * inputs
                self.weights[0] += self.learning_rate * (label - prediction)

## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [25]:
def create_mlp(input_layers, hidden_layers, output_layers):
    neuralnet = list()
    
    hidden_layer = [{'weights':[random() for i in range(input_layers + 1)]} for i in range(hidden_layers)]
    neuralnet.append(hidden_layer)
    
    output_layer = [{'weights':[random() for i in range(hidden_layers + 1)]} for i in range(output_layers)]
    neuralnet.append(output_layer)
    
    return neuralnet


def simple_activation(weights, inputs):
    """Function that calculates neuron activation for an input by aggregating the weights and inputs """
    activation = weights[-1]
    
    
    for i in range(len(weights) - 1):
        activation += int(weights[i]) * int(inputs[i])
    return activation

def neural_transfer(activation):
    return 1.0 / (1.0 + exp(-activation))

def forward_propagate(network, row):
    inputs = row
    for layer in network:
        
        current_inputs = []
        
        for neuron in layer:
            
            activation = simple_activation(neuron['weights'], inputs)
            neuron['output'] = neural_transfer(activation)
            current_inputs.append(neuron['output'])
            
        inputs = current_inputs
        
    return inputs

def calc_derivative(output):
    return output * (1.0 - output)
 
def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
        layer = network[i]
        errors = list()
        if i != len(network)-1:
            for j in range(len(layer)):
                error = 0.0
                for neuron in network[i + 1]:
                    error += (neuron['weights'][j] * neuron['delta'])
                errors.append(error)
        else:
            for j in range(len(layer)):
                neuron = layer[j]
                errors.append(expected[j] - neuron['output'])
        for j in range(len(layer)):
            neuron = layer[j]
            neuron['delta'] = errors[j] * calc_derivative(neuron['output'])
            
def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']

def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
        sum_error = 0
        for row in train:
            outputs = forward_propagate(network, row)
            expected = [0 for i in range(n_outputs)]
            expected[row[-1]] = 1
            sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))])
            backward_propagate_error(network, expected)
            update_weights(network, row, l_rate)
        print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))


In [13]:
seed(42)

In [45]:
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset
        
def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup

In [54]:
filepath = 'heart.csv'
file = load_csv(filepath)


input_layers = len(file[0]) - 1
output_layers = len(set([row[-1] for row in file]))
network = create_mlp(input_layers, 2, output_layers)
train_network(network, file, 0.5, 20, output_layers)
for layer in network:
    print(layer)

## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [59]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')

In [60]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [63]:
X = np.array(df.drop(['target'], 1))
y = np.array(df['target'])

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size = 0.2)

In [73]:
Y_train = to_categorical(y_train, num_classes=None)
Y_test = to_categorical(y_test, num_classes=None)

Y_train_binary = y_train.copy()
Y_test_binary = y_test.copy()

Y_train_binary[Y_train_binary > 0] = 1
Y_test_binary[Y_test_binary > 0] = 1

print (Y_train_binary[:20])

[0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 1]


In [82]:
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(6, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(4, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    # compile model
    adam = Adam(lr=0.001)
    model.compile(loss='mse', optimizer=adam, metrics=['accuracy'])
    return model

model = create_model()

print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_16 (Dense)             (None, 6)                 84        
_________________________________________________________________
dense_17 (Dense)             (None, 4)                 28        
_________________________________________________________________
dense_18 (Dense)             (None, 1)                 5         
Total params: 117
Trainable params: 117
Non-trainable params: 0
_________________________________________________________________
None


In [83]:
model.fit(X_train, Y_train_binary, epochs=100, batch_size=10, verbose = 1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x1fef8371898>

In [87]:
# generate classification report using predictions for binary model 
predictions = np.round(model.predict(X_test)).astype(int)

print('Accuracy of model:', accuracy_score(Y_test_binary, predictions))
print(classification_report(Y_test_binary, predictions))

Accuracy of model: 0.819672131147541
              precision    recall  f1-score   support

           0       0.81      0.78      0.79        27
           1       0.83      0.85      0.84        34

   micro avg       0.82      0.82      0.82        61
   macro avg       0.82      0.82      0.82        61
weighted avg       0.82      0.82      0.82        61

