<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

# Neural Networks

## *Data Science Unit 4 Sprint 2 Assignment 1*

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
The input layer is the part of our data that the neural network is exposed to and interacts with. Each input node is typically assigned to one feature.

### Hidden Layer:
The magic of modern neural networks; any layer between input and output layer. The data can not be accessed or directly interact with in this layer. It is passed through obscured to the output.

### Output Layer:
The output layer puts vector values suitable to the type of problem we're trying to assess (regression, classification, etc). They have a node for each type of output (for example, binary classification will have a single output node (probability)).

### Neuron:
Neurons / Nodes receive inputs and pass on the signal to the next layer of nodes if a certain threshold is met.

### Weight:
The weight gives us the range of motion to move up and down the equation. Each step along the way, each layer affects the next layer by using the weighted sum of our inputs (plus bias). One way to think of weights is the strength of connection between nodes.

### Activation Function:
Activation functions determine how much signal is sent from the previous layer to the next. Hidden and output layers may contain activation functions, while input layers don't.

### Node Map:
A node map is a visual diagram that serves as a representation of the neural network's architecture. They are used because the equations to represent the neural network are so complicated that it would be painstakingly hard to read and interpret what is going on within them.

### Perceptron:
A perceptron is the simplest version of a neural network. It's a single node with nothing else but the input. It will take the inputs, calculate with weight, sum up all predictions, then pass the sum to the activation function. The result of this will be our final value.

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

The neural network takes the input layers, and multiples the weight and add the bias to the inputs. These are then sent to the activation function, which determines how much signal is passed on to the next layer. This is repeated through the hidden layers until the output layer is reached, at which point there will be a node for each type of output.

## Write your own perceptron code that can correctly classify (99.0% accuracy) a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [18]:
# NOTE TO BRANDON: I'm pretty sure I got this part down. It's the next part that killed Jacob & I.

import pandas as pd
import numpy as np
data = { 'x1': [0,1,0,1],
         'x2': [0,0,1,1],
         'y':  [1,1,1,0]
       }

df = pd.DataFrame.from_dict(data).astype('int')
df['bias'] = [1,1,1,1]  # Adding column for bias.

df.head()

Unnamed: 0,x1,x2,y,bias
0,0,0,1,1
1,1,0,1,1
2,0,1,1,1
3,1,1,0,1


In [0]:
correct_outputs = [[0], [0], [1], [1]]

In [0]:
# Sigmoid activation function + derivative for updating weights

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1 - sx)

### Split Up Individually For Comprehension

In [4]:
# First, we need to initialize random weights for our inputs (3 total)
weights = 2 * np.random.random((4,1)) - 1
print('Randomized Weights: \n', weights)

Randomized Weights: 
 [[ 0.27587198]
 [-0.44556584]
 [ 0.18074124]
 [-0.03173615]]


In [5]:
# Now, let's calculate the weighted sums of inputs and weights.
weighted_sums = np.dot(df, weights)
print('Weighted Sums: \n', weighted_sums)

Weighted Sums: 
 [[ 0.1490051 ]
 [ 0.42487708]
 [-0.29656074]
 [-0.20143001]]


In [6]:
# Now we need to output the activated value at the end of 1 training epoch.
activated_outputs = sigmoid(weighted_sums)
print('Predicted Probability: \n', activated_outputs)

Predicted Probability: 
 [[0.5371825 ]
 [0.6046497 ]
 [0.42639845]
 [0.44981208]]


In [7]:
# Now, we need to calculate our error.

error = correct_outputs - activated_outputs
print('Error: \n', error)

Error: 
 [[-0.5371825 ]
 [-0.6046497 ]
 [ 0.57360155]
 [ 0.55018792]]


In [8]:
# Make adjustments through backprop
adjustments = error + sigmoid_derivative(weighted_sums)
print('Adjustments: \n', adjustments)

Adjustments: 
 [[-0.28856504]
 [-0.36560126]
 [ 0.81818436]
 [ 0.7976691 ]]


In [9]:
# Update weights based upon gradient descent.
weights += np.dot(df.T, adjustments)
print(weights)

[[0.70793981]
 [1.17028762]
 [0.3447593 ]
 [0.929951  ]]


### For Loop to Put it All Together

In [10]:
for iteration in range(10000):
    
    # Weighted sum of inputs / weights
    weighted_sum = np.dot(df, weights)
    
    # Activate
    activated_output = sigmoid(weighted_sum)
    
    # Calculate error
    error = correct_outputs - activated_output
    
    # Calculate adjustements
    adjustments = error * sigmoid_derivative(weighted_sum)
    
    # Update the Weights
    weights += np.dot(df.T, adjustments)

print(f'Predicted Values: \n{activated_output}')
print(f'Weights: \n{weights}')
print(f'Error: \n{error}')

Predicted Values: 
[[0.01070925]
 [0.00333743]
 [0.99003923]
 [0.99840758]]
Weights: 
[[-1.17333857]
 [ 9.12507147]
 [-3.01517505]
 [-1.5107551 ]]
Error: 
[[-0.01070925]
 [-0.00333743]
 [ 0.00996077]
 [ 0.00159242]]




## Implement your own Perceptron Class and use it to classify a binary dataset: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [11]:
# NOTE TO BRANDON: I had hands down the hardest time with this section.
# I'd like to talk about this a little bit during 1:1 with you to further comprehension!
# I >kinda< got it. Emphasis on >>kinda<<

diabetes = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Although neural networks can handle non-normalized data, scaling or normalizing your data will improve your neural network's learning speed. Try to apply the sklearn `MinMaxScaler` or `Normalizer` to your diabetes dataset. 

In [12]:
from sklearn.preprocessing import MinMaxScaler, Normalizer
from sklearn.model_selection import train_test_split

feats = list(diabetes)[:-1]  # List of features (All columns minus the outcome)

transformer = Normalizer().fit(diabetes[feats])  # Instantiate / Fit the Normalizer.
X = transformer.transform(diabetes[feats])  # Transform using the fit.
y = diabetes['Outcome'].values  # Defining our target.

# Train-Test-Split (This is needed to properly predict, no?)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

print(f'''Train Shapes (77%):
X (Features): {X_train.shape}
y (Target): {y_train.shape}
''')

print(f'''Test Shapes (33%):
X (Features): {X_test.shape}
y (Target): {y_test.shape}
''')

Train Shapes (77%):
X (Features): (514, 8)
y (Target): (514,)

Test Shapes (33%):
X (Features): (254, 8)
y (Target): (254,)



In [0]:
class Perceptron:
    
    def __init__(self, niter = 100):  # Constructor
        self.niter = niter
    
    def __sigmoid(self, x):  # Sigmoid Activation function
        return 1 / (1 + np.exp(-x))
    
    def __sigmoid_derivative(self, x):  # Sigmoid Derivative function
        sx = sigmoid(x)
        return sx * (1 - sx)

    def fit(self, X, y):  # This will be our fit method.
      """Fit training data
      X : Training vectors, X.shape : [#samples, #features]
      y : Target values, y.shape : [#samples]
      """

        # Randomly Initialize Weights
      self.weights = 2 * np.random.random((8,1)) - 1  # Using self because the weights will be needed in the predict method.

      for i in range(self.niter):
          # Weighted sum of inputs / weights using dot product.
            weighted_sum = np.dot(X, self.weights)
          # Activate function.
            activated_output = self.__sigmoid(weighted_sum)
          # Calculate error.
            error = y - activated_output
          # Make adjustments with sigmoid derivative function.
            adjustments = error * self.__sigmoid_derivative(weighted_sum)
          # Update the weights in accordance to adjustments.
            self.weights = self.weights + np.dot(X.T, adjustments)

      return self

    def predict(self, X):
          """Return class label after unit step"""
          weighted_sum = np.dot(X, self.weights)  # We're using the weighted sum again...
          activated_output = self.__sigmoid(weighted_sum)  #... to make predictions.
          return np.round(activated_output)  # Note to self: must use numpy to round ndarray - returns predicted values.

In [0]:
pn = Perceptron()
pn.fit(X_train, y_train)
y_pred = pn.predict(X_test)  # DISCUSS WITH BRANDON: But how would I go about measuring accuracy?

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?