# Intro to Neural Networks Assignment

## Define the Following:
You can add image, diagrams, whatever you need to ensure that you understand the concepts below.

### Input Layer:
Input layer contains a set of cells. Each cell represents one feature, and contains the value present in that feature for this observation.
### Hidden Layer:
Hidden layers are all the layers to the right of the input layer. These are called hidden, because we cannot directly influence them.
### Output Layer:
An output layer contains as many neurons as you expect to solve your particular machine-learning problem. The output layer is activated by an activation function specific to the problem. For example for regression, the activation function is just the identity function (you don't want to manipulate the values of your output). For binary classfication, you might use the sigmoid function to restrict values between 0 and 1. These values are then the probability of predicting the primary class. You can use ceil(output) to get the binary classification you need.
### Neuron:
In machine learning, a neuron is a function that takes multiple inputs and outputs one value. If the neuron is in the first hidden layer, it's inputs are all the feature values of the input layer + the bias value. If the neuron is in a deeper hidden layer, it's inputs are all the neurons to it's left + the bias value.
### Weight:
The weight is a number that is multiplied by the current neuron's value + bias to give the output of the neuron. Each neuron then essentially follows this equation: output = activation_function(sum(Weight * earlier_layer_value) + bias)
### Activation Function (also called Transfer Function):
An activation function takes a value and returns another value. It has to be differentiable, since we will differentiate it during the backpropagation step. You may choose different activation functions for different layers.
An activation function is applied after a neuron has weighed all input values, added them together, and then added a bias.

Common activation functions are the Sigmoid, tanh, step, and relu.
### Node Map:
A node map is a diagram that shows how our nodes in the neural network are arranged, and the connections between them. It helps us understand which kind of neural network we're looking at. Each type of map will have different properties, and we will get a better fit if we select the right kind.
### Perceptron:
The first and simplest kind of neural network that we could talk about is the perceptron. A perceptron is just a single node or neuron of a neural network with nothing else. It can take any number of inputs and spit out an output. What a neuron does is it takes each of the input values, multplies each of them by a weight, sums all of these products up, adds a bias, and then passes the sum through what is called an "activation function" the result of which is the final value.
### Bias:
Bias is a value added to shift the sum(weight * earlier_layer_values). Bias is also a variable, so we can control the y-intercept (ex. in a 1-feature regression application with a neural net).

## Inputs -> Outputs

### Explain the flow of information through a neural network from inputs to outputs. Be sure to include: inputs, weights, bias, and activation functions. How does it all flow from beginning to end?

Input layer (contains values of each feature)

     |
     v
     
Hidden layer 1 (contains values calculated using: activation_function(sum(input_layer * weight) + bias)

     |
     v

Hidden layer 2 (contains values calculated using: activation_function(sum(input_layer * weight) + bias)

    ...
    ...
    
Output layer (contains values calculated using: activation_function(sum(input_layer * weight) + bias)

## Write your own perceptron code that can correctly classify a NAND gate. 

| x1 | x2 | y |
|----|----|---|
| 0  | 0  | 1 |
| 1  | 0  | 1 |
| 0  | 1  | 1 |
| 1  | 1  | 0 |

In [4]:
import numpy as np
import pandas as pd

In [13]:
X = np.array([[0, 0], 
              [1, 0],
              [0, 1], 
              [1, 1]])
y = np.array([[1], 
              [1], 
              [1], 
              [0]])
df = pd.DataFrame(data=np.concatenate([X, y], axis=1)) # , 
                  # columns=['x1', 'x2', 'y'])
df

Unnamed: 0,0,1,2
0,0,0,1
1,1,0,1
2,0,1,1
3,1,1,0


In [34]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Explanation for why this is is given here:
# https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e
def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

In [37]:
inputs  = X
weights = 2 * np.random.random((2, 1)) - 1
correct_outputs = y

In [58]:
for iteration in range(10000):
  
  # Weighted sum of inputs and weights
  weighted_sum = np.dot(inputs, weights)
  
  # Activate with sigmoid function
  activated_output = sigmoid(weighted_sum)
  
  # Calculate Error
  error = correct_outputs - activated_output
  
  # Calculate weight adjustments with sigmoid_derivative
  adjustments = error * sigmoid_derivative(activated_output)
  
  # Update weights
  weights += np.dot(inputs.T, adjustments)
  
print('optimized weights after training: ')
print(weights)

np.set_printoptions(formatter={'float': lambda x: "{0:0.3f}".format(x)})
print('activated_output:', activated_output)

optimized weights after training: 
[[0.000000000000]
 [-0.000000000000]]
activated_output: [[0.500]
 [0.500]
 [0.500]
 [0.500]]


## Implement your own Perceptron Class and use it to classify a binary dataset like: 
- [The Pima Indians Diabetes dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv) 
- [Titanic](https://raw.githubusercontent.com/ryanleeallred/datasets/master/titanic.csv)
- [A two-class version of the Iris dataset](https://raw.githubusercontent.com/ryanleeallred/datasets/master/Iris.csv)

You may need to search for other's implementations in order to get inspiration for your own. There are *lots* of perceptron implementations on the internet with varying levels of sophistication and complexity. Whatever your approach, make sure you understand **every** line of your implementation and what its purpose is.

In [65]:
class Perceptron:
    def __init__(self, inputs, weights, bias, outputs):
        self.inputs = inputs
        self.weights = weights
        self.outputs = outputs
    
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))

    # Explanation for why this is is given here:
    # https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e
    def sigmoid_derivative(x):
        return sigmoid(x) * (1 - sigmoid(x))
    
    def iterate(self, num_iters):
        for iteration in range(num_iters):

          # Weighted sum of inputs and weights
          weighted_sum = np.dot(self.inputs, 
                                self.weights) + bias

          # Activate with sigmoid function
          activated_output = sigmoid(weighted_sum)

          # Calculate Error
          error = self.outputs - activated_output

          # Calculate weight adjustments with sigmoid_derivative
          adjustments = error * sigmoid_derivative(activated_output)

          # Update weights
          self.weights += np.dot(self.inputs.T, adjustments)

        print('optimized weights after training: ')
        print(weights)

        np.set_printoptions(formatter={'float': lambda x: "{0:0.3f}".format(x)})
        print('activated_output:', activated_output)

        return activated_output
    
inputs  = X
weights = 2 * np.random.random((2, 1)) - 1
bias    = 1
correct_outputs = y
perceptron = Perceptron(inputs, weights, bias, correct_outputs)
perceptron.iterate(10000)

optimized weights after training: 
[[-0.690]
 [-0.690]]
activated_output: [[0.731]
 [0.577]
 [0.577]
 [0.406]]


array([[0.731],
       [0.577],
       [0.577],
       [0.406]])

## Stretch Goals:

- Research "backpropagation" to learn how weights get updated in neural networks (tomorrow's lecture). 
- Implement a multi-layer perceptron. (for non-linearly separable classes)
- Try and implement your own backpropagation algorithm.
- What are the pros and cons of the different activation functions? How should you decide between them for the different layers of a neural network?