<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: 
The layer of a neural network that takes the "raw" input data. The first layer of the network. 

### Hidden Layer:
The layers that are not directly reachable (not input or output). Take input from either the input layer or another hidden layer and output to anther hidden layer or the output layer. 

### Output Layer:
The layer of a neural network that produces the prediction ("consumable" output). It takes its input from the last hidden layer in a network. 

### Neuron:
A single building block of a NN. Has an associated bias and a weight for every node in the previous layer (doesn't pertain to input layer neurons). It can be thought of as a function where the input is the previous layers node's activation value, the weight associated to each of those nodes, and a bias. The output is `a(1) = activation_function(np.dot(W, a(0)) + bias)`. The activation function is commonly relu or sigmoid. 

### Weight:
A single weight is a value associated with 2 specific nodes. Can be thought of as the connection between two nodes. Along with bias, weight is the "memory" of the network that you are trying to perfect in the training process. 

### Activation Function:
The function that puts a "threshold" on the computation using a neuron's inputs, weights and bias. Commonly used activation functions are relu, sigmoid, and step. 

### Node Map:
A type of diagram that shows the structure of a NN. 

### Perceptron:
The simplest NN in existance. Consists of a single output neuron and N input neruons (no hidden layers/neurons)


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

The flow of information through a NN starts with the input layers. The size of the input layer is equal to the number of features each instance has (for example a NN that predicts house sale prices using 11 features about the house would have an input layer with the 11 neurons/nodes). From the input layer the values of each input node is passed to each node of the following layer. The following layers "activation" (resulting value) will be `a(current_layer) = activation_function(dot_product(weights, a(prev_layer)) + bias)` where `weights` is the values that represent the "strength" of the connection between that node and the node the weight corresponds to. The ouput layer acts the same way as hidden layers but commonly the activation function is different. Common output layer activation functions are:
- regresssion tasks - no actication function (to allow a large range of output values) 
- binary classification - sigmoid function (either 1 or 0)
- multiclass classification - softmax score (for each class their is outputed a probability that it is that class) 

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [0]:
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'bias': [1, 1, 1, 1]
       }
      
correct_outputs = ([1],[1],[1],[0])
df = pd.DataFrame.from_dict(data).astype('int')

In [25]:
##### Your Code Here #####
import numpy as np
weights = np.random.random((3, 1))
weights

array([[0.25347418],
       [0.81595359],
       [0.26247803]])

In [0]:
def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
  sx = sigmoid(x)
  return sx * (1-sx)

In [31]:
weighted_sum = np.dot(df.values, weights)
weighted_sum

array([[0.26247803],
       [0.51595221],
       [1.07843162],
       [1.3319058 ]])

In [32]:
preds = sigmoid(weighted_sum)
preds

array([[0.56524535],
       [0.62620077],
       [0.74619707],
       [0.7911557 ]])

In [33]:
error = correct_outputs - preds
error

array([[ 0.43475465],
       [ 0.37379923],
       [ 0.25380293],
       [-0.7911557 ]])

In [35]:
adjustments = error * sigmoid_derivative(weighted_sum)
adjustments

array([[ 0.10683793],
       [ 0.08749644],
       [ 0.04806698],
       [-0.13072136]])

In [36]:
weights += np.dot(df.values.T, adjustments)
weights

array([[0.21024927],
       [0.73329921],
       [0.37415802]])

## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [37]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [0]:
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.model_selection import train_test_split

feats = list(diabetes)[:-1]
train, test = train_test_split(diabetes)
norm = Normalizer()
X = norm.fit_transform(train[feats])
X_test = norm.transform(test[feats])

In [0]:
##### Update this Class #####

class Perceptron(object):
    
    def __init__(self, niter = 10):
      self.niter = niter
    
    def __sigmoid(self, x):
      return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
      sx = self.__sigmoid(x)
      return sx * (1-sx)

    def fit(self, X, y):
      """Fit training data
      X : Training vectors, X.shape : [#samples, #features]
      y : Target values, y.shape : [#samples]
      """
      # Randomly Initialize Weights
      self.weights = np.random.random((X.shape[1], 1))

      for i in range(self.niter):
          # Weighted sum of inputs / weights
          weighted_sum = np.dot(X, self.weights)
          # Activate!
          activated_output = self.__sigmoid(weighted_sum)
          # Cac error
          error = y - activated_output
          # Update the Weights
          adjustments = error * self.__sigmoid_derivative(weighted_sum)
          self.weights += np.dot(X.T, adjustments)
      
      return self

    def predict(self, X):
      """Return class label after unit step"""
      values = np.dot(X, self.weights[1:]) + self.weights[0]
      return np.where(values >= .5, 1, 0)

In [79]:
from sklearn.metrics import accuracy_score

pn = Perceptron(niter=50)
bias = np.ones((X.shape[0], 1))
data = np.append(bias, X, 1)
target = train['Outcome'].values
pn.fit(data, target.reshape((len(target), 1)))
bias = np.ones((X_test.shape[0], 1))
data = np.append(bias, X_test, 1)
test_targets = test['Outcome'].values
test_predictions = pn.predict(X_test)
accuracy_score(test_predictions, test_targets)

0.609375

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?