<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
The set of neurons whose inputs are the actual data inputs of the model

### Hidden Layer:
The set of neurons which are in contact with neurons at both ends, rather than data inputs or the model outputs.

### Output Layer:
The set of neurons whose outputs are the actual model output.

### Neuron:
The basic unit of a neural network - also known as a node. Takes some input, applies some linear transformation, 
applies an activation function, and returns an output.

### Weight:
A part of the linear transformation applied by a given node - weights multiply (or divide!) the input.

### Activation Function:
A function that decides whether or not, and how, the neuron transmits it's transformed data to the next layer.
Generally bounded between zero and one, or one and negative one. Oftentimes values below some threshold are mapped
to zero.

### Node Map:
A diagram that shows the way in which the nodes of a neural network are connected to one another.

### Perceptron:
The definition can vary, but generally indicates a single-layer, feed-forward neural network. No hidden layers.


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### Your Answer Here

) Some set of feature values are given to the input layer of the neural network.

) These feature values have some linear transformation applied to them - they are each *multiplied* by a weight, and have a bias *added* to them.

) The results of those linear transformations are then passed to an activation function, which decides both whether to, and how much of, the transformed input should be passed on to the next layer. For example, a sigmoid function returns one for an input at infinity, and zero for an input at negative infinity.

) The output of the activation functions are then passed to the next layer, and the process repeats for each.

) That being said, the connectivity of each layer, and number of nodes in each layer, can differ. Layers may be fully connected, which is to say each node has a connection to all the nodes in the previous layer, or they may have some form of connectivity.

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [1]:
import pandas as pd
import numpy as np
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')
# adding a bias term
df['bias'] = 1
df

Unnamed: 0,x1,x2,y,bias
0,0,0,1,1
1,1,0,1,1
2,0,1,1,1
3,1,1,0,1


In [2]:
inputs = df[['x1','x2','bias']].to_numpy()
correct_outputs = np.array([[1],[1],[1],[0]])

print(inputs)
print(correct_outputs)

[[0 0 1]
 [1 0 1]
 [0 1 1]
 [1 1 1]]
[[1]
 [1]
 [1]
 [0]]


In [3]:
##### Your Code Here #####
# defining functions

def sigmoid(x):
    y = 1/(1+(np.exp(-x)))
    return y

def d_sigmoid(x):
    sx = sigmoid(x)
    y = sx/(1-sx)
    return y

def random_weights():
    np.random.seed(42)
    weights = np.random.random((3,1))
    return weights

def run_perceptron(inputs, correct_outputs, iterations):
    
    weights = random_weights()
    
    for ii in range(0,iterations):
        # apply the weights
        weighted_sum = np.dot(inputs, weights)
        # apply activation
        activated_output = sigmoid(weighted_sum)
        # get the error
        error = correct_outputs - activated_output
        # find adjustments from error
        adjustments = error * d_sigmoid(activated_output)
        
        """print('input shape: ',inputs.shape)
        print('weights shape: ',weights.shape)
        print('adjustments shape: ',adjustments.shape)
        """
        # Update Weights
        weights += np.dot(inputs.T, adjustments)
        
    return {'weights': weights, 'output': activated_output}


In [4]:
# run it!
results = run_perceptron(inputs, correct_outputs, 2000)
results

{'weights': array([[-12.64397375],
        [-12.64397375],
        [ 19.6332523 ]]),
 'output': array([[1.        ],
        [0.99907868],
        [0.99907868],
        [0.00349056]])}

## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [5]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [6]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

feats = list(diabetes)[:-1]

diabetesarray = diabetes[feats]

scaler = Normalizer()
X = scaler.fit_transform(diabetesarray)
y = diabetes['Outcome']

In [7]:
X.shape[1]

y_np = y.values
y_np.shape = [len(y),1]

In [8]:
##### Update this Class #####

class Perceptron(object):
    
    def __init__(self, niter = 10, learning_rate = 0.01,):
        self.niter = niter
        self.learning_rate = learning_rate
    
    def __sigmoid(self, x):
        y = 1/(1+(np.exp(-x)))
        return y
    
    def __sigmoid_derivative(self, x):
        sx = sigmoid(x)
        y = sx/(1-sx)
        return y

    def fit(self, X, y):
        """Fit training data
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """

        # Randomly Initialize Weights
        self.weights = np.zeros((X.shape[1], 1))

        for i in range(self.niter):
            # Weighted sum of inputs / weights
            weighted_sum = np.dot(X, self.weights)

            # Activate!
            activated_output = self.__sigmoid(weighted_sum)

            # Cac error
            error = y - activated_output

            # Update the Weights
            adjustments = error * self.__sigmoid_derivative(activated_output)
            self.weights += np.dot(X.T, adjustments)

    def predict(self, X):
        """Return class label after unit step"""
        return self.__sigmoid(np.dot(X,self.weights))

In [9]:
eye = Perceptron(niter=10000)
eye.fit(X,y_np)
output = eye.predict(X)
sum(np.round(output, 0))

  # Remove the CWD from sys.path while we load stuff.


array([54.])

In [10]:
sum(y)

268

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?