<a href="https://colab.research.google.com/github/valogonor/DS-Unit-4-Sprint-3-Neural-Networks/blob/master/DS43SC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks Sprint Challenge

## 1) Define the following terms:

- Neuron
- Input Layer
- Hidden Layer
- Output Layer
- Activation
- Backpropagation

### Neuron:

In Artificial Neural Networks the neurons or "nodes" are similar to neurons in brains in that they receive inputs and pass on their signal to the next layer of nodes if a certain threshold is reached.

### Input Layer:

The Input Layer is what receives input from our dataset. Sometimes it is called the visible layer because it's the only part that is exposed to our data and that our data interacts with directly. Typically node maps are drawn with one input node for each of the different inputs/features/columns of our dataset that will be passed to the network.

### Hidden Layer:

Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them. The simplest possible network is to have a single neuron in the hidden layer that just outputs the value. "Deep Learning" apart from being a big buzzword simply means that we are using a Neural Network that has multiple hidden layers. "Deep Learning" is a big part of the renewed hype around ANNs because it allows networks that are structured in specific ways to accomplish tasks that were previously out of reach (image recognition for example).

### Output Layer:

The final layer is called the Output Layer. The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address.

### Activation Function:

Typically the output value is modified by an "activation function" to transform it into a format that makes sense for our context.

### Backpropagation:

Backpropagation is short for "Backwards Propagation of errors" and refers to a specific (rather calculus intensive) algorithm for how weights in a neural network are updated in reverse order at the end of each training epoch.

## 2) Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

| x1 | x2 | x3 | y |
|----|----|----|---|
| 1  | 1  | 1  | 1 |
| 1  | 0  | 1  | 0 |
| 0  | 1  | 1  | 0 |
| 0  | 0  | 1  | 0 |

In [0]:
import numpy as np
np.random.seed(1)

inputs = np.array([[1,1,1],
                   [1,0,1],
                   [0,1,1],
                   [0,0,1]])
# Expected output for AND
correct_outputs = np.array([[1],
                  [0],
                  [0],
                  [0]])

# # Initial weights
# weights = 2 * np.random.random((3,1)) - 1
# # weights = np.random.random((3,1))

In [0]:
# From https://pythonmachinelearning.pro/perceptrons-the-first-neural-networks/
class Perceptron(object):
    """Implements a perceptron network"""
    def __init__(self, input_size, lr=1, epochs=100):
        self.W = np.zeros(input_size+1)
        # add one for bias
        self.epochs = epochs
        self.lr = lr
    
    def activation_fn(self, x):
        #return (x >= 0).astype(np.float32)
        return 1 if x >= 0 else 0
 
    def predict(self, x):
        z = self.W.T.dot(x)
        a = self.activation_fn(z)
        return a
 
    def fit(self, X, d):
        for _ in range(self.epochs):
            for i in range(d.shape[0]):
                x = np.insert(X[i], 0, 1)
                y = self.predict(x)
                e = d[i] - y
                self.W = self.W + self.lr * e * x

In [3]:
perceptron = Perceptron(input_size=3)
perceptron.fit(inputs, correct_outputs)
print(perceptron.W)

[-2.  1.  3. -2.]


In [4]:
'''
bias is -2
weights are 1, 3, and -2
if inputs are 1, 1, 1:
prediction is -2 + 1*1 + 1*3 + -2*1 = -2 + 1 + 3 - 2 = 0, which returns 1 with the activation function
'''

'\nbias is -2\nweights are 1, 3, and -2\nif inputs are 1, 1, 1:\nprediction is -2 + 1*1 + 1*3 + -2*1 = -2 + 1 + 3 - 2 = 0, which returns 1 with the activation function\n'

## 3) Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights. 
- Your network must have one hidden layer. 
- You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
- Train your model on the Heart Disease dataset from UCI:

[Github Dataset](https://github.com/ryanleeallred/datasets/blob/master/heart.csv)

[Raw File on Github](https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv)


In [5]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/heart.csv')
df.shape

(303, 14)

In [6]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [7]:
df.target.value_counts()

1    165
0    138
Name: target, dtype: int64

In [0]:
class Neural_Network(object):
    def __init__(self):
        self.inputs = 13
        self.L1Nodes = 10
        self.outputNodes = 1

        # Initlize Weights
        self.L1_weights = np.random.randn(self.inputs, self.L1Nodes)
        self.output_weights = np.random.randn(self.L1Nodes, self.outputNodes)

    def feed_forward(self, X):
        # Sum and activate flows to L1
        self.activated_L1 = self.sigmoid(np.dot(X, self.L1_weights)) 
        # Sum and activate flows to output
        self.activated_output = self.sigmoid(np.dot(self.activated_L1, self.output_weights))
        return self.activated_output
        
    def sigmoid(self, s):
        return 1/(1+np.exp(-s))
    
    def sigmoidPrime(self, s):
        return s * (1 - s)
    
    def backward(self, X, y, output):
        ## backward propgate through the network, calculating error and delta at each layer
        # Output
        self.output_error = y - output # error in this layer
        self.output_delta = self.output_error*self.sigmoidPrime(output) # apply derivative of sigmoid to error
        
        
        # L1
        self.L1_error = self.output_delta.dot(self.output_weights.T) 
        self.L1_delta = self.L1_error*self.sigmoidPrime(self.activated_L1)
        
        
        ## Update all weights
        self.L1_weights += X.T.dot(self.L1_delta) 
        self.output_weights += self.activated_L1.T.dot(self.L1_delta)
        
    def train (self, X, y):
        output = self.feed_forward(X)
        self.backward(X, y, output)

In [0]:
X = df.drop('target', axis=1)
y = df.target

# Scale X
X = (X - X.mean()) / (X.max() - X.min())

# Replace y with a vector of probabilities that the target is what it is
y_expanded = []
for idx, num in enumerate(y):
    row = np.zeros(10)
    row[num] = 1
    y_expanded.append(row)

y = np.array(y_expanded)

In [20]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(203, 13)
(100, 13)
(203, 10)
(100, 10)


In [29]:
nn = Neural_Network()

for i in range(4): # trains the NN 1,000 times
    nn.train(X_train, y_train)
#     y_pred = nn2.feed_forward(X_train)
#     loss = np.mean(np.square(y_train - nn2.feed_forward(X_train))) # mean sum squared loss
#     print(f'i: {loss}')

    print(nn.feed_forward(X_train)[1])
    print()
# print(nn2.output_weights)

ValueError: ignored

## 4) Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy. 

- Use the Heart Disease Dataset (binary classification)
- Use an appropriate loss function for a binary classification task
- Use an appropriate activation function on the final layer of your network. 
- Train your model using verbose output for ease of grading.
- Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
- When hyperparameter tuning, show you work by adding code cells for each new experiment. 
- Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
- You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [0]:
##### Your Code Here #####