<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
- Receives input from dataset
- Only layer that interacts w/ dataset
- Typically one to one nodes to inputs/features

### Hidden Layer:
- Layers after input layer
- Cannot be accessed except through input layer
- "Deep Learning" = multiple hidden layers

### Output Layer:
- Final layer
- Outputs vector of values formatted to be suitable for the problem
- Output modified by "activation function"

### Neuron:
- Node in Artificial Neural Network
- Receives input
- Passes output if certain threshold is reached

### Weight:
- Associated with each input
- Amount by which each input is transformed
- Based on relative importance to other inputs

### Activation Function:
- Function to transform output
- Formats output contextually

### Node Map:
- Visual diagram of neural network
- Like a flow chart of inputs/outputs

### Perceptron:
- First and simplest NN
- Single node
- Takes *n* inputs, returns a single output


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

For each input *x* => sum all products of *x*<sub>n</sub>*weight*<sub>n</sub> => add bias => transform by activation function

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [52]:
import pandas as pd
import numpy as np

data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')

In [53]:
df.insert(loc=0, column='x0', value=[1,1,1,1])

In [54]:
df

Unnamed: 0,x0,x1,x2,y
0,1,0,0,1
1,1,1,0,1
2,1,0,1,1
3,1,1,1,0


In [55]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

In [56]:
weights = 2 * np.random.random((3,1)) - 1

In [57]:
weights

array([[ 0.66683695],
       [-0.69389209],
       [-0.39651434]])

In [58]:
weighted_sum = np.dot(df[['x0', 'x1', 'x2']], weights)
weighted_sum

array([[ 0.66683695],
       [-0.02705514],
       [ 0.27032261],
       [-0.42356949]])

In [59]:
activated_output = sigmoid(weighted_sum)
activated_output

array([[0.66079454],
       [0.49323663],
       [0.5671721 ],
       [0.39566292]])

In [60]:
error = df[['y']] - activated_output
error

Unnamed: 0,y
0,0.339205
1,0.506763
2,0.432828
3,-0.395663


In [61]:
adjustments = error * sigmoid_derivative(weighted_sum)
adjustments

Unnamed: 0,y
0,0.076031
1,0.126668
2,0.106254
3,-0.094608


In [62]:
weights += np.dot(df[['x0', 'x1', 'x2']].T, adjustments)
weights

array([[ 0.88118142],
       [-0.66183289],
       [-0.38486878]])

In [63]:
for iteration in range(10000):
    
    weighted_sum = np.dot(df[['x0', 'x1', 'x2']], weights)
    
    activated_output = sigmoid(weighted_sum)
    
    error = df[['y']] - activated_output
    
    adjustments = error * sigmoid_derivative(weighted_sum)
    
    weights += np.dot(df[['x0', 'x1', 'x2']].T, adjustments)
    
print("Weights after training")
print(weights)

print("Output after training")
print(activated_output)

Weights after training
[[12.116175  ]
 [-8.01980677]
 [-8.01980677]]
Output after training
[[0.99999453]
 [0.98363831]
 [0.98363831]
 [0.0193906 ]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [73]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [70]:
diabetes.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [162]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

X = diabetes[feats]
y = diabetes['Outcome']

In [169]:
len(y)

768

In [163]:
X_norm = Normalizer().fit_transform(X)

In [228]:
class Perceptron_1(object):
    
    def __init__(self, rate = 0.01, niter = 10):
        self.rate = rate
        self.niter = niter
        
    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """

        # weights
        self.weight = np.zeros(1 + X.shape[1])

        # Number of misclassifications
        self.errors = []  # Number of misclassifications

        for i in range(self.niter):
          err = 0
          for xi, target in zip(X, y):
            delta_w = self.rate * (target - self.predict(xi))
            self.weight[1:] += delta_w * xi
            self.weight[0] += delta_w
            err += int(delta_w != 0.0)
          self.errors.append(err)
        return self

    def net_input(self, X):
        """Calculate net input"""
        return np.dot(X, self.weight[1:]) + self.weight[0]

    def predict(self, X):
        """Return class label after unit step"""
        """ Default Step Function"""
        return np.where(self.net_input(X) >= 0.0, 1, -1)

In [229]:
# ##### Update this Class #####

# class Perceptron:
    
#     def __init__(self, n_iter = 10):
#         self.n_iter = n_iter
    
#     def __sigmoid(self, X):
#         return 1 / (1 + np.exp(-X))
    
#     def __sigmoid_derivative(self, X):
#         return sigmoid(X) * (1-sigmoid(X))

#     def fit(self, X, y):
#         """Fit training data
#         X : Training vectors, X.shape : [#samples, #features]
#         y : Target values, y.shape : [#samples]
#         """

#         # Randomly Initialize Weights
#         self.weights = 2 * np.random.random(X.shape[1]) - 1

#         for i in range(self.n_iter):
#             # Weighted sum of inputs / weights
#             self.weighted_sum = np.dot(X, self.weights)
            
#             # Activate!
#             self.output = sigmoid(self.weighted_sum)

#             # Calc error
#             self.error = y - self.output

#             # Update the Weights
#             self.adjustments = self.error * sigmoid_derivative(self.weighted_sum)
#             self.weights += np.dot(X.T, self.adjustments)
#         return self


#     def predict(self, X):
#         """Return class label after unit step"""
#         return print(f'Predicted Labels:\n{self.output}')

In [239]:
pn = Perceptron_1(.1, 100)
pn.fit(X_norm, y)

<__main__.Perceptron_1 at 0x126349950>

In [240]:
pn.predict(X_norm)

array([ 1, -1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,
        1, -1, -1, -1,  1,  1,  1, -1,  1,  1, -1,  1,  1, -1,  1, -1,  1,
        1, -1,  1, -1, -1, -1,  1,  1, -1,  1,  1,  1,  1, -1,  1,  1, -1,
        1, -1,  1,  1,  1,  1, -1,  1, -1,  1,  1, -1,  1,  1, -1, -1, -1,
       -1, -1, -1,  1,  1,  1, -1, -1, -1, -1,  1,  1,  1,  1, -1, -1,  1,
       -1,  1, -1,  1,  1,  1, -1, -1,  1,  1,  1, -1, -1, -1, -1,  1,  1,
        1, -1,  1,  1, -1,  1, -1, -1,  1,  1, -1,  1,  1,  1,  1,  1,  1,
       -1, -1,  1, -1,  1,  1, -1, -1, -1, -1, -1,  1,  1,  1, -1, -1,  1,
       -1, -1,  1,  1,  1, -1, -1,  1,  1, -1, -1, -1,  1, -1, -1,  1,  1,
        1,  1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1, -1,  1,  1,  1, -1,
        1,  1,  1, -1, -1,  1,  1, -1,  1,  1,  1, -1, -1,  1,  1,  1,  1,
       -1, -1, -1,  1, -1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1, -1, -1,
       -1,  1,  1,  1, -1,  1, -1, -1,  1,  1, -1,  1, -1, -1, -1,  1,  1,
        1,  1,  1, -1, -1

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?