<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
Input layers contain the data thats in our dataset. It's akin to the X features that are used in the sklearn pipeline. To avoid working with numbers that are extremely big inside the network, it's important to scale the data before training to cut down on compute time.
### Hidden Layer:
Hidden layers are nodes inside the neural network that can only be accessed by the input layers. The value of the hidden layers are the the sum of the input layers and the weights associated between the layers. You don't always have to have the same number of nodes in each hidden layer; there can be a growing or shrinking of the number of hidden layers as a function of the layers and weights before them. Hidden layers are what allow NNs to achieve their accuracy, where after many epochs of training the NN is able to finely tune the weights and biases to achieve the lowest possible error relative to the target.
### Output Layer:
Output layers are the target of a NN, akin to the Y target feature in an sklearn pipeline. They are the value(s) that you are trying to predict with the network. For a regression problem, you will typically have a single node in the output layer, but for classification problems you will have a number of nodes representing each class.
### Neuron:
A Neuron is a single biological neuron on which the perceptron and other NN nodes are modeled. It takes an input and passes that input on to the next based on certain factors. 
### Weight:
The weight represents the strength of the connection between nodes. During the first epoch the weights and biases are set at random values, and the error between our expected training set output and the actual output of the network affects how the weights and biases are tuned for the next epoch. This is done during backpropogation.
### Activation Function:
The activation function is a function that reduces the summed total of weights and inputs into a node to a number between 1 and 10. Doing this reduces computational demand by the network while ensuring the importances of the weights between the nodes is not lost, and also produces non-linear results. You can set different activation functions for different layers and even different nodes in a layer, but the use cases of doing so are limited in number.
### Node Map:
Node maps are handy visual representations of the way the nodes in each layer connect with one another. I suppose you could also include the initializtion weights and biases in a Node map as well.
### Perceptron:
Perceptrons are the simplest and first forms of NNs that takes inputs, modifies them by a weight, reduces them (or not) by an activation function, and then spits out an output.


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

Each feature in the data set is an input node in the input layer (x). The values of these input nodes are then multiplied by the weights and summed (xw). To avoid liniarity, this weighted sum is them passed into an activation function that reduces it down to a value between 1 and 0. Assuming there are no hidden layers, the sum of those values is then multiplied by the bias to get the final output.

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [0]:
import pandas as pd
import numpy as np

data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')

correct_output = np.array([[1], [1], [1], [0]])

In [0]:
# Sigmoid and its derivative

def sigmoid(x):
  return 1/(1 + np.exp(-x))

def sig_deriv(x):
  sx = sigmoid(x)
  return sx * (1 - sx)

In [4]:
# Initializing the weights

weights = 2* np.random.random((3,1)) -1  

# Get the weighted sum of the inputs

weighted_sum = np.dot(df, weights)
weighted_sum

array([[0.91922237],
       [1.70807832],
       [1.46083936],
       [1.33047293]])

In [5]:
# Apply the sigmoid to the output to get the activated output

act_output = sigmoid(weighted_sum)

act_output

array([[0.71488363],
       [0.84658687],
       [0.81166102],
       [0.79091885]])

In [6]:
# Find the error

error = np.subtract(correct_output, act_output)

error

array([[ 0.28511637],
       [ 0.15341313],
       [ 0.18833898],
       [-0.79091885]])

In [7]:
# Make the backprop adjustments

adjustments = error*sig_deriv(weighted_sum)

weights += np.dot(df.T, adjustments)

weights

array([[0.6779896 ],
       [0.43961661],
       [1.02605204]])

In [8]:
# Train the network by continuously updating the weights

for i in range(10000):
  weighted_sum = np.dot(df, weights)
  act_output = sigmoid(weighted_sum)
  error = np.subtract(correct_output, act_output)
  adjustments = error*sig_deriv(weighted_sum)
  weights += np.dot(df.T, adjustments)

print("Activated Output After Training:")
print(act_output)

Activated Output After Training:
[[0.9994418 ]
 [0.99382226]
 [0.99381485]
 [0.00799854]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [9]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [10]:
# Random test case
X_test = np.array([[6, 183, 66, 35, 94, 26.6, 0.672, 21]])
X_test

array([[  6.   , 183.   ,  66.   ,  35.   ,  94.   ,  26.6  ,   0.672,
         21.   ]])

In [0]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

X = diabetes[feats]

scaler = MinMaxScaler()
scaler.fit(X)

X = scaler.transform(X)
X_test = scaler.transform(X_test)

y = np.array([diabetes['Outcome']]).T

In [12]:
X.shape, y.shape, X_test.shape

((768, 8), (768, 1), (1, 8))

In [13]:
X_test

array([[0.35294118, 0.91959799, 0.54098361, 0.35353535, 0.11111111,
        0.39642325, 0.25362938, 0.        ]])

In [0]:
##### Update this Class #####

class Perceptron:
    
    def __init__(self, niter = 10):
        self.niter = niter
    
    def __sigmoid(self, x):
        return 1/(1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        sx = sigmoid(x)
        return sx * (1 - sx)

    def fit(self, X, y):
      """Fit training data
      X : Training vectors, X.shape : [#samples, #features]
      y : Target values, y.shape : [#samples]
      """

      # Randomly Initialize Weights
      self.weights = 2*np.random.random((8, 1)) -1

      for i in range(self.niter):
          # Weighted sum of inputs / weights
          weighted_sum = np.dot(X, self.weights)
          # Activate!
          act_output = sigmoid(weighted_sum)
          # Cac error
          error = np.subtract(y, act_output)
          # Update the Weights
          adjustments = error*sig_deriv(weighted_sum)
          self.weights += np.dot(X.T, adjustments)

    def predict(self, X):
      weighted_sum = np.dot(X, self.weights)
      act_output = sigmoid(weighted_sum)
      return act_output

In [0]:
nn = Perceptron()

In [0]:
nn.fit(X, y)

In [17]:
nn.predict(X_test)

array([[4.48220116e-26]])

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?