<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: 
Exposed input of features (x1,x2,x3,...)
### Hidden Layer: 
Intermediary neuronal computations between inner and output layer
### Output Layer: 
Outputs final value computed by hypothesis
### Neuron: 
Cells in the brain that communicate through electrical impulses. Receives input and transmits output if a threshold is met. 
### Weight: 
Parameters of a model 
### Activation Function: 
Computation of input wire that leads to the output wire by specific neuron 
### Node Map: 
Visual diagram of the architecture of "topology" of our neural network (like a flow chart of paths)
### Perceptron: 
A supervised learning model of binary classifiers enabled by back-propogation. May be single or multi layered. 

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

In a perceptron, input signals arrive to the input layer of a neural network, each parameterized with a weight and an additional offset ("bias"). These are computed in neuron nodes of the input layer by an activation function which determinine the output signal of the input layer. The activation function is modeled by a logistic regression using the sigmoid (most common), tanh, step, or retu functions to provide an output between 0 and 1. The output of the input layer may then be followed by a series of hidden layers which may be a set of hyperparameters using similar activation functions. This proceeds until the final computation is obtained by the output layer which deteremines the final output signal as a 0 to 1 probability for classification.  

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [1]:
import pandas as pd
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')
df

Unnamed: 0,x1,x2,y
0,0,0,1
1,1,0,1
2,0,1,1
3,1,1,0


In [2]:
import numpy as np
np.random.seed(78)

In [3]:
df['ones']=np.ones(4)
df['ones']=df['ones'].astype('int')
df

Unnamed: 0,x1,x2,y,ones
0,0,0,1,1
1,1,0,1,1
2,0,1,1,1
3,1,1,0,1


In [4]:
inputs = df[['x1','x2','ones']]
correct_outputs = df[['y']]

In [5]:
weights = 2 * np.random.random((3,1)) - 1
weights

array([[-0.90363754],
       [ 0.36192603],
       [ 0.59739214]])

In [6]:
#Defining a sigmoid function for my perceptron
def sigmoid(x):
    return 1/ (1 + np.exp(-x))
sigmoid(0)

#Defining sigmoid derivative function, this is a basic way to get back prop
def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

In [7]:
weighted_sum = np.dot(np.array(inputs), weights)
activated_output = sigmoid(weighted_sum)
print(weighted_sum)
print(activated_output)

[[ 0.59739214]
 [-0.30624541]
 [ 0.95931816]
 [ 0.05568062]]
[[0.64505944]
 [0.42403146]
 [0.72298527]
 [0.51391656]]


In [8]:
#Loop through 10k epochs to allow the perceptron to backprop and correct its weights
for iteration in range(10000):
    
    # Weighted sum of inputs / weights
    weighted_sum = np.dot(inputs, weights)
    
    # Activate!
    activated_output = sigmoid(weighted_sum)
    
    # Calc error
    error = correct_outputs - activated_output
    
    adjustments = error * sigmoid_derivative(activated_output)
    
    # Update the Weights
    weights += np.dot(inputs.T, adjustments)
    
print("Weights after training")
print(weights)

print("Output after training")
print(activated_output)

Weights after training
[[-11.84042561]
 [-11.84042561]
 [ 17.80950276]]
Output after training
[[0.99999998]
 [0.99744966]
 [0.99744966]
 [0.00281143]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [9]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [10]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

scaler = Normalizer()    # See U2-LinearModels-M4
X = diabetes[feats]
Y = diabetes[['Outcome']]

In [11]:
scaled_df = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)
scaled_df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,0.033552,0.827625,0.402628,0.195722,0.0,0.187893,0.003506,0.279603
1,0.008424,0.71604,0.555984,0.244296,0.0,0.224079,0.002957,0.261144
2,0.040398,0.924097,0.323181,0.0,0.0,0.117658,0.003393,0.161591
3,0.006612,0.588467,0.436392,0.152076,0.621527,0.185797,0.001104,0.138852
4,0.0,0.596386,0.174127,0.152361,0.731335,0.187622,0.00996,0.143655


In [28]:
##### Update this Class #####
# Daktoa's best accuracy was about 568/768 

class Perceptron(object):
    
    def __init__(self, niter = 10):
        self.niter = niter
        
    def __sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        return self.sigmoid(x) * (1 - self.sigmoid(x))

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """
        
        # Randomly Initialize Weights
        self.weights = 2 * np.random.random((X.shape[1] + 1,1)) - 1   # bias is the '+1' term
        
        for i in range(self.niter):
            # Weighted sum of inputs / weights
            #weighted_sum = np.dot(X, self.weights)

            # Activate!
            #self.activated_output = sigmoid(weighted_sum)
            
            # Calc error
            #error = Y - self.activated_output
            
            #adjustments = error * sigmoid_derivative(self.activated_output)
            
            # Update the Weights
            #self.weights += np.dot(X.T, adjustments)
            
            pass
            
    def predict(self, X):
        """Return class label after unit step"""
        return None

In [29]:
pn = Perceptron();
pn.fit(X,Y);
pn.weights

array([[-0.81889399],
       [-0.03921704],
       [ 0.65987081],
       [-0.22010296],
       [ 0.41591104],
       [-0.39559981],
       [-0.3430514 ],
       [ 0.82515573],
       [ 0.8799956 ]])

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?