<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: This is where data interacts with the NN directly. It is the only part exposed to our data.
### Hidden Layer: These are the layers between the input layer and output layer that do all the magic, but are never interacted with directly.
### Output Layer: The output layer produces one or more values in a format useful for the problem that is being attempted.
### Neuron: Neurons receive inputs, apply a function to the inputs, and pass the outputs to the next layer of neurons.
### Weight: A constant coefficient on an input; it is basically a simple linear function.
### Activation Function: A function that determines how much signal any individual node sends to the next layer.
### Node Map: A visual diagram showing the type of and relations between each cell.
### Perceptron: A simple neural network consisting of a single node, that takes each input value, multiplies it by a weight coefficient, and passes the result through an activation function.

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

Information first flows into a network through the input nodes, which accept raw input data and insert it into the network. Next, once that data is passed onto the first hidden layer, each node that interacts with it applies weights to each input, adds a bias, and then runs the output through an activation function to determine how much of the output is passed on to the next layer. Next, each node passes on their information to the next hidden layer, and the process repeats. If there are no more hidden layers, the data passes to the output nodes, which parses the information into a format useful for solving the problem.

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [1]:
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')

In [2]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)


In [3]:
import numpy as np
np.random.seed(0)

In [4]:
inputs = df.values
correct_outputs = df['y'].values.reshape(-1, 1)
print(repr(inputs))
print(repr(correct_outputs))

array([[0, 0, 1],
       [1, 0, 1],
       [0, 1, 1],
       [1, 1, 0]])
array([[1],
       [1],
       [1],
       [0]])


In [5]:
weights = 2 * np.random.random((3, 1)) - 1
weights

array([[0.09762701],
       [0.43037873],
       [0.20552675]])

In [6]:
weighted_sum = np.dot(inputs, weights)
weighted_sum

array([[0.20552675],
       [0.30315376],
       [0.63590548],
       [0.52800574]])

In [7]:
activated_output = sigmoid(weighted_sum)
activated_output

array([[0.55120158],
       [0.5752133 ],
       [0.6538273 ],
       [0.62901786]])

In [8]:
error = correct_outputs - activated_output
error

array([[ 0.44879842],
       [ 0.4247867 ],
       [ 0.3461727 ],
       [-0.62901786]])

In [9]:
adjustments = error * sigmoid_derivative(weighted_sum)
adjustments

array([[ 0.11102303],
       [ 0.10379364],
       [ 0.07835174],
       [-0.14678408]])

In [10]:
weights += np.dot(inputs.T, adjustments)
weights

array([[0.05463657],
       [0.3619464 ],
       [0.49869517]])

In [11]:
for iteration in range(10000):
    weighted_sum = np.dot(inputs, weights)
    
    activated_output = sigmoid(weighted_sum)
    
    error = correct_outputs - activated_output
    
    adjustments = error * sigmoid_derivative(weighted_sum)
    
    weights += np.dot(inputs.T, adjustments)

In [12]:
print("Weights after training:")
print(repr(weights))

print("Output after training")
print(repr(activated_output))

Weights after training:
array([[-2.41089422],
       [-2.40961964],
       [ 7.49030711]])
Output after training
array([[0.9994418 ],
       [0.99381462],
       [0.99382245],
       [0.00799856]])


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [13]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [34]:
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
feats = list(diabetes)[:-1]

train, test = train_test_split(diabetes, random_state=0)

X_train = train[feats]
y_train = train["Outcome"].values
X_test = test[feats]
y_test = test["Outcome"].values

pipeline = make_pipeline(
    Normalizer(norm='max')
)

X_train_transformed = pipeline.fit_transform(X_train, y_train)
X_test_transformed = pipeline.transform(X_test)

In [39]:
class Perceptron:
    def __init__(self, niter = 10):
        self.niter = niter
    
    def __sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        sx = self.__sigmoid(x)
        return sx * (1 - sx)
    
    def __loop(self, X, y=None, weights=None):
        # Weighted sum of inputs / weights
        if weights is None:
            weights = self.weights_
        weighted_sum = np.dot(X, weights)
        # Activate!
        activated_output = self.__sigmoid(weighted_sum)
        if y is None:
            return activated_output#.round()
        else:
            # Calc error
            error = y - activated_output
            # Update the Weights
            adjustments = error * self.__sigmoid_derivative(weighted_sum)
            return weights + np.dot(X.T, adjustments)
        
    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """
        
        if y.ndim == 1:
            y_ = y.reshape(-1, 1)
        else:
            y_ = y.copy()
            
        # Randomly Initialize Weights
        weights = 2 * np.random.random((X.shape[1], 1)) - 1
        self.weights_ = weights.T
        
        for i in range(self.niter):
            weights = self.__loop(X, y_, weights)
        self.weights_ = weights
        self.outputs_ = activated_output
        
        return self
    
    def predict(self, X):
        """Return class label after unit step"""
        return(self.__loop(X))

In [40]:
perceptron = Perceptron(niter=100).fit(X_train_transformed, y_train)

In [43]:
perceptron.outputs_

array([[0.9994418 ],
       [0.99381462],
       [0.99382245],
       [0.00799856]])

In [42]:
perceptron.predict(X_test_transformed)

array([[5.32175438e-13],
       [9.89786374e-17],
       [8.79700300e-17],
       [4.84762897e-14],
       [1.73797412e-14],
       [1.06713556e-17],
       [7.64223240e-14],
       [4.81577346e-15],
       [7.74902940e-08],
       [1.39934127e-10],
       [2.75952216e-10],
       [1.79515792e-13],
       [1.54809886e-15],
       [1.75720827e-12],
       [2.26173725e-18],
       [3.47193182e-15],
       [3.68565932e-14],
       [1.24939402e-15],
       [8.14515237e-16],
       [5.04165546e-16],
       [8.49406355e-15],
       [3.29953348e-15],
       [2.04637699e-11],
       [7.24553299e-17],
       [6.64806849e-12],
       [4.27331750e-14],
       [2.97523659e-15],
       [6.50566927e-16],
       [1.22097595e-14],
       [9.47932107e-15],
       [7.87207993e-16],
       [1.41577818e-16],
       [3.20277552e-17],
       [3.86719507e-12],
       [5.90022964e-14],
       [5.29558515e-13],
       [3.01144045e-16],
       [2.55468664e-16],
       [2.72464163e-14],
       [3.55965544e-13],


## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?