# Intro to Neural Networks Assignment

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer: The layer receives input for the dataset. Might be referred to as the visible layer bcause it's the only part that interacts with the data and is exposed to it.
### Hidden Layer(s): The layers that come after the input layer but before the output layer. Hidden layers cannot be accessed except through the input layer and are not interacted with
### Output Layer: This is the final layer. The output layer provides a vector of values. Usually it's modified by an activation function to put it into a format that works for our context. 
### Neuron:  A basic unit of computation in a neural network. Called a node or unit. It receives input from some other nodes or external sources.
### Weight: A value based on the relative importance as compared to the other inputs
### Activation Function: The activation function introduces non-linearity into the output. Most of our lives are not linear. Therefore, an activation function takes an output value and performs a mathematical operation on it.
### Node Map: A pictorial representation of our input, weights, hidden layers, output layers, bias, etc.
### Perceptron: A linear classifier (binary) that helps classify the input data. All inputs are mulitplied by their weights. Add all the multiplied values and call them the weighted sum. Apply the weighted sum to the activation function. This is used to classify the data into binary parts.
### Bias: Similar to the intercept in a linear equation. It's an additional parameter used to adjust the output along with the weighted sums from the input. Bias is a constant that helps the model find the best fit for the data.


## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

The flow can happen in one of two ways. The first is when the network is being trained. The second is after being trained. Information is fed into the network by the input units. There is also a bias to help the model find the best fit for the data. Those input units are multiplied by their weights according to importance then activate the hidden units and layers in the network. Each layer of hidden units is receiving input from the layer to the left and those inputs are multiplied by the weights of the connections. The weight of each unit is gradually adjusted as the connection between any two units changes. The activation function produces the output of any node eventually producing a final output layer.

## Write your own perceptron code that can correctly classify a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [40]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [41]:

inputs = np.array([[1., 0., 0.],
    [1., 1., 0.],
    [1., 0., 1.],
    [1., 1., 1.]])

# Ideal outputs
correct_outputs = [[1.],
    [1.],
    [1.],
   [0.]]

In [42]:
def sigmoid(x):
    return 1/(1+np.exp(-x))

def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

In [43]:
# Random weights
weights = 2 * np.random.random((3,1))-1
weights

array([[ 0.17785041],
       [ 0.09977725],
       [-0.06211495]])

In [44]:
weighted_sum = np.dot(inputs, weights)
weighted_sum

array([[0.17785041],
       [0.27762766],
       [0.11573546],
       [0.21551271]])

In [45]:
activated_output = sigmoid(weighted_sum)
activated_output

array([[0.54434577],
       [0.56896452],
       [0.52890161],
       [0.55367061]])

In [47]:
error = correct_outputs - activated_output
error

array([[ 0.45565423],
       [ 0.43103548],
       [ 0.47109839],
       [-0.55367061]])

In [48]:
adjustments = error * sigmoid_derivative(activated_output)  # Gradient Descent
adjustments

array([[ 0.10587496],
       [ 0.09948775],
       [ 0.10990745],
       [-0.12832898]])

In [49]:
weights += np.dot(inputs.T, adjustments)
weights

array([[ 0.36479158],
       [ 0.07093602],
       [-0.08053648]])

In [51]:
for iteration in range(10000):
  
  # Weighted sum of inputs and weights
  weighted_sum = np.dot(inputs, weights)
  
  # Activate with sigmoid function
  activated_output = sigmoid(weighted_sum)
  
  # Calculate Error
  error = correct_outputs - activated_output
  
  # Calculate weight adjustments with sigmoid_derivative
  adjustments = error * sigmoid_derivative(activated_output)
  
  # Update weights
  weights += np.dot(inputs.T, adjustments)
  
print('Optimized weights after training: ')
print(weights)

print("Output After Training:")
print(activated_output)

Optimized weights after training: 
[[ 19.89513933]
 [-13.2310983 ]
 [-13.2310983 ]]
Output After Training:
[[1.        ]
 [0.99872558]
 [0.99872558]
 [0.00140403]]


## Implement your own Perceptron Class and use it to classify a binary dataset like: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 
- [Titanic](https://raw.githubusercontent.com/ryanleeallred/datasets/master/titanic.csv)
- [A two-class version of the Iris dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/Iris.csv)

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [None]:
##### Your Code Here #####

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?