<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: 
- The input layer is what receives input from our dataset. Sometimes it is called the visible layer because it's the only part that is exposed to our data and that our data interacts with directly. Typically node maps are drawn with one input node for each of the different inputs/features/columns of our dataset that will be passed to the network.


### Hidden Layer:
- Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them. The simplest possible network is to have a single neuron in the hidden layer that just outputs the value. "Deep Learning" apart from being a big buzzword simply means that we are using a Neural Network that has multiple hidden layers.


### Output Layer:
- The final layer is called the Output Layer. The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address. Typically the output value is modified by an "activation function" to transform it into a format that makes sense for our context.


### Neuron:
- Artifical neurons or "nodes" that are similar to brain neurons in that they receive inputs and pass on their signal to the next layer of nodes if a certain threshold is reached, but that's about where the similarities end.


### Weight:
- Are values multiplied to the different inputs in order to determine how important or nonimportant features are to the desired output. 


### Activation Function:
- Typically the output value is modified by an "activation function" to transform it into a format that makes sense for our context. This helps eliminate the outliers that would dominate the data and bring values to 0 or 1. 


### Node Map:
![A Mapping](http://jalammar.github.io/images/NNs_formula_no_bias.png)
Takes an input, modified by a weight and then reports an output.



### Perceptron:
- The first and simplest kind of neural network that we could talk about is the perceptron. A perceptron is just a single node or neuron of a neural network with nothing else. It can take any number of inputs and spit out an output. What a neuron does is it takes each of the input values, multiplies each of them by a weight, sums all of these products up, and then passes the sum through what is called an "activation function" the result of which is the final value.


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

#### Your Answer Here

The inputs receieve the data from the dataset, they go into the hidden layer where their weights are assigned and biased found to get these values to the proper outputs that would help to provide a better solution for the output value. Usually under the hidden layer the activation function helps normalize all the data to 0 or 1 depending on its numerical value so that there is no crazy outliers that skew the data.

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [11]:
import pandas as pd
import numpy as np

data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')

In [12]:
df.head()

Unnamed: 0,x1,x2,y
0,0,0,1
1,1,0,1
2,0,1,1
3,1,1,0


In [13]:
correct_outputs = [[1], [1],[1], [0]]

In [14]:
def sigmoid(x):
    return 1/(1 + np.exp(-x))
#Sigmoid function

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

In [15]:
weights = 2 * np.random.random((3,1))-1
weights

array([[-0.37862028],
       [ 0.84517187],
       [ 0.64495934]])

In [17]:
# Update our weights 10,000 times - (fingers crossed that this process reduces error)
for iteration in range(10000):
    
    # Weighted sum of inputs / weights
    weighted_sum = np.dot(df, weights)
    
    # Activate!
    activated_output = sigmoid(weighted_sum)
    
    # Cac error
    error = correct_outputs - activated_output
    
    adjustments = error * sigmoid_derivative(weighted_sum)
    
    # Update the Weights
    weights += np.dot(df.T, adjustments)
    
print("Weights after training")
print(weights)

print("Output after training")
print(activated_output)

Weights after training
[[-2.41326398]
 [-2.40741719]
 [ 7.49057066]]
Output after training
[[0.99944194]
 [0.99380166]
 [0.99383757]
 [0.00799723]]


## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [18]:
diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [20]:
from sklearn.preprocessing import MinMaxScaler, Normalizer

scaler = MinMaxScaler()
normal = Normalizer()

feats = list(diabetes)[:-1]

y = diabetes['Outcome']

scaler.fit(diabetes.drop(columns = ['Outcome']))
X = scaler.transform(diabetes.drop(columns = ['Outcome']))

normal.fit(X)
X = normal.transform(X)

X = pd.DataFrame(X, columns = [feats])

print(X.shape, y.shape)
X.head()

(768, 8) (768,)


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,0.271471,0.572045,0.453936,0.271928,0.0,0.385158,0.180305,0.371765
1,0.067347,0.489028,0.619373,0.335375,0.0,0.453866,0.133458,0.190817
2,0.376673,0.736073,0.419897,0.0,0.0,0.277943,0.203012,0.146745
3,0.068414,0.520154,0.629186,0.270201,0.129227,0.487056,0.044198,0.0
4,0.0,0.47633,0.226851,0.24461,0.137398,0.444422,0.652899,0.138379


In [28]:
from random import uniform

class Perceptron(object):
    
    def __init__(self, niter = 10, verbose = False):
        '''
        Warning, verbose is very verbose.  Only use with small iterations, small
        dataframes or debugging
        '''
        self.niter = niter
        self.verbose = verbose
    
    def __sigmoid(self, x):
        return 1.0 / (1.0 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):
        sx = self.__sigmoid(x)
        return sx * (1.0 - sx)

    def fit(self, X, y):
        """Fit training data
        Assumes X and Y are dataframes and series respectively, there is 
        no bias added to the inputs X, and that X has been
        scaled or normalized to some degree
        X : Training vectors, X.shape : [#samples, #features]
        y : Target values, y.shape : [#samples]
        """
        # making a copy, adding a bias column, storing the width after bias and
        # transforming to a numpy array
        inputs = X.copy()
        bias = [0.0] * X.shape[0]
        inputs['bias'] = bias
        #print(inputs.head())
        width = inputs.shape[1]
        inputs = inputs.values
        
        # transforming to a numpy array
        y_width = y.shape[0]
        correct_outputs = y.values.reshape((y_width, 1))

        # Randomly Initialize Weights
        weights = 2.0 * np.random.random((width, 1)) - 1.0

        for i in range(self.niter):
            # Weighted sum of inputs / weights
            weighted_sum = np.dot(inputs, weights)
            if self.verbose: print(weighted_sum)
                
            # Activate!
            activated_output = self.__sigmoid(weighted_sum)
            if self.verbose: print(activated_output)
            
            # Cac error
            error = correct_outputs - activated_output
            if self.verbose: print(error)
            
            # Update the Weights
            adjustments = error * self.__sigmoid_derivative(weighted_sum)
            if self.verbose: print(adjustments)
                
            weights += np.dot(inputs.T, adjustments)
            if self.verbose: print(weights)
                
        self.weights = weights
        #print(activated_output)
        return self
    
    def net_input(self, X):
        # My bias is the last entry of my weights
        return np.dot(X, self.weights[:-1]) + self.weights[-1]
    
    def predict(self, X):
        return np.where(self.net_input(X) >= 0.5, 1, 0)

In [29]:
from sklearn.metrics import accuracy_score

ptrons = []

for i, col in enumerate(feats):

    ptron = Perceptron(niter = 100000)

    ptrons.append(ptron.fit(X[[col]], y))

    y_pred = pd.DataFrame(ptrons[i].predict(X[[col]]), columns = ['Predictions'])

    print('Accuracy is: ', accuracy_score(y, y_pred))
    y_pred['Predictions'].value_counts()

Accuracy is:  0.6510416666666666
Accuracy is:  0.6510416666666666
Accuracy is:  0.6471354166666666
Accuracy is:  0.5846354166666666
Accuracy is:  0.6510416666666666
Accuracy is:  0.6419270833333334
Accuracy is:  0.6510416666666666
Accuracy is:  0.3489583333333333


## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?