<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: one of the 3 main types of neuron layers in a typical neural network topology. The input layer is what receives input, only part that our data interacts with directly. Node maps are typically drawn with one input node for each of the different inputs/features/columns of our dataset that are passed to the network. 
### Hidden Layer: hidden layer or layers come after the input layer and cannot be accessed except through the input layer. "Deep learning" means we are using a network with multiple hidden layers. 
### Output Layer: output layer is the final layer. Its purpose is to output a vector of values that is in a format that is suitable for the type of problem we are trying to address. Typically output value is modified by an activation function to transform it into a format that makes sense for our context. 
### Neuron: neural networks are made up of layers of neurons. Neurons of one layer of the network are connected to neurons in the next layer. Neurons are also called nodes or units. They receives input from some other nodes, or from an external source and compute an output.
### Weight: Weight is the parameter within a neural network that transforms input data within the network's hidden layers. A weight represent the strength of the connection between units. If the weight from node 1 to node 2 has greater magnitude, it means that neuron 1 has greater influence over neuron 2. With each input a weight is associated. Weight increases the steepness of activation function. This means weight decide how fast the activation function will trigger.
### Activation Function: The activation function decides whether a node "fires" or not. Activation functions decide how much signal to pass onto the next layer. 
### Node Map: structure of the neural network, how the nodes are connected.
### Perceptron: most fundamental, simplest form of a neural network. It is a single node or neuron of a neural network. 


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end? Neural neworks are typically organized in layers. Layers are made up of a number of interconnected 'nodes' which contain an 'activation function'. Patterns are presented to the network via the 'input layer', which communicates to one or more 'hidden layers' where the actual processing is done via a system of weighted 'connections'. The hidden layers then link to an 'output layer'.

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [77]:
import numpy as np
inputs = np.array ([
         [0,0],
         [1,0],
         [0,1], 
         [1,1]
])

In [78]:
correct_outputs =  [[1], [1], [1], [0]]

In [79]:
#sigmoid activation function
def sigmoid(x):
    return 1/(1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

In [80]:
#initialize random weights

weights = 2 * np.random.random((2,1)) - 1
weights

array([[-0.19948078],
       [-0.22036126]])

In [81]:
#calculate weighted sum of inputs and weights 

weighted_sum = np.dot(inputs, weights)

In [82]:
#output activated value for the end of 1 training epoch 

activated_outputs = sigmoid(weighted_sum)

In [83]:
#take difference of output and true values to calculate error relative to those outputs

error = correct_outputs - activated_outputs
error

array([[ 0.5       ],
       [ 0.54970548],
       [ 0.55486847],
       [-0.39655455]])

In [84]:
#gradient descent, how much I need to update weights in order to get closer to predicted output 

adjustments = error * sigmoid_derivative(weighted_sum)
weight_adjustments = np.dot(inputs.T, adjustments)

In [85]:
# add adjustments to our weights

weights += weight_adjustments

In [86]:
#put it all together (train model)

for iteration in range(10000):
    weighted_sum = np.dot(inputs, weights) + 7
    activated_outputs = sigmoid(weighted_sum)
    error = correct_outputs - activated_outputs
    adjustments = error * sigmoid_derivative(weighted_sum)
    weight_adjustments = np.dot(inputs.T, adjustments)
    weights += weight_adjustments

print(weights)
print(activated_outputs)

[[-4.66666667]
 [-4.66666667]]
[[0.99908895]
 [0.91160032]
 [0.91160032]
 [0.08839968]]


In [87]:
from sklearn.metrics import accuracy_score

prediction = np.round(activated_outputs)
accuracy_score(correct_outputs, prediction)

1.0

## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [95]:
import pandas as pd

In [96]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [102]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1] #list of columns except for the outcome 

X = Normalizer().fit_transform(diabetes[feats])

In [137]:
##### Update this Class #####

class Perceptron:
    
    def __init__(self, niter=10, bias=7):
        self.niter = niter
        self.bias = bias

    def __sigmoid(self, x):
        return 1/(1 + np.exp(-x))

    def __sigmoid_derivative(self, x):
        sx = self.__sigmoid(x)
        return sx * (1-sx)

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """
        # Randomly Initialize Weights
        weights = 2 * np.random.random((8,1)) - 1

        for i in range(self.niter):
            # Weighted sum of inputs / weights
            weighted_sum = np.dot(X, weights) + self.bias

            # Activate!
            activated_outputs = self.__sigmoid(weighted_sum)

            # Cac error
            error = y - activated_outputs

            # Update the Weights
            adjustments = error * self.__sigmoid_derivative(weighted_sum)
            weight_adjustments = np.dot(X.T, adjustments)

            weights += weight_adjustments
        
        self.weights = weights

    def predict(self, X):
        """Return class label after unit step"""
        # Weighted sum of inputs / weights
        weighted_sum = np.dot(X, self.weights) + self.bias

        # Activate!
        activated_outputs = self.__sigmoid(weighted_sum)

        return np.round(activated_outputs)

In [159]:
# instatiate your Perceptron class
perceptron = Perceptron(niter=1000, bias=5)

# fit it to data
y = diabetes.as_matrix(columns=['Outcome'])
perceptron.fit(X, y)
print(perceptron.weights)

# perdict the values
prediction = perceptron.predict(X)

# compute accuracy
accuracy_score(y, prediction)

[[ -0.11479897]
 [-26.51532768]
 [-26.52518754]
 [ -4.58377149]
 [ -8.14539698]
 [ -8.77048306]
 [ -0.78759289]
 [ -8.67572707]]


0.6510416666666666

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?