<a href="https://colab.research.google.com/github/TedHaley/courses/blob/master/introduction_to_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Introduction to Neural Networks

Overview:  
Models such as linear regression can be used to make predictions. It makes predictions by generating weights for some parameters and then adding those parameters together, ex:


y = m1X1 + m2X2 + b. 

salary = 10,000 * years_exp + 50,000


Where m1 and m2 are the weights. We can also have linear models that have interacting variables, such as:


y = (m1X1 * m2X2) + b. 

salary = (10,000 * years_exp * 1.1* age) + 50,000


This is a very basic example of interactivity between variables. Neural networks are similar to this but take it to the next level as they model many iteractions between many inputs, and then use those interactions as inputs for other interactions.

![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/280px-Colored_neural_network.svg.png)

As nodes increase, so does the ability to capture interactions.

![alt text](https://drive.google.com/uc?id=1fdjoM2tBhsQektdjG1882m5TV9FD9rkg)

Forward Propogation:  
Forward propogation is when we place weights on the lines between nodes. The output node is equal to the sum of the input nodes times the line weights. This operation is dot product.

Example:

![alt text](https://drive.google.com/uc?id=1eJ5OLqSjIDluwHcomk2tYZbyGHVK_IWi)

hidden layer node 0 = (2 * 1) + (3 * 1) = 5





In [3]:
import numpy as np

input_data = np.array([2, 3])

weights = {
    'node_0': np.array([1, 1]),
    'node_1': np.array([-1, 1]),
    'output': np.array([2, -1]),
}

# [2, 3] * [1, 1] = (2 * 1) + (3 * 1) = 5 
node_0_value = (input_data * weights['node_0']).sum()
node_1_value = (input_data * weights['node_1']).sum()
output_value = (np.array([node_0_value, node_1_value]) * weights['output']).sum()

print('node 0:', node_0_value)
print('node 1:', node_1_value)
print('Output:', output_value)

node 0: 5
node 1: 1
Output: 9


Activation Function:  
Activation functions are found within the hidden layers. Activation functions allow hidden layers to capture non-linearities. One popular activation function is ReLU (rectified linear activation), with is 0 below zero, and linear above zero for x. In the below example, we are going to use tanh as our activation function.

![alt text](https://drive.google.com/uc?id=1R_aqYRh66slQSt3YY0n1wWRTOtF8DCRa)






In [4]:
import numpy as np

input_data = np.array([2, 3])

weights = {
    'node_0': np.array([1, 1]),
    'node_1': np.array([-1, 1]),
    'output': np.array([2, -1]),
}

node_0_input = (input_data * weights['node_0']).sum()
node_0_output = np.tanh(node_0_input)

node_1_input = (input_data * weights['node_1']).sum()
node_1_output = np.tanh(node_1_input)

output = (np.array([node_1_input, node_1_output]) * weights['output']).sum()

print('node 0:', node_0_output)
print('node 1:', node_1_output)
print('Output:', output)

node 0: 0.9999092042625951
node 1: 0.7615941559557649
Output: 1.2384058440442351


In [5]:
# Using ReLU function

def relu(input):
    '''0 below 0, x above 0.'''
    # Calculate the value for the output of the relu function: output
    output = max(0, input)
    
    # Return the value just calculated
    return(output)

# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

9


Deep Networks  
 - Deep networks internally build representations of patterns in the data
 - Partially replace the need for feature engineering
 - Subsequrnt layers build increasingly sophisticated representations of raw data

![alt text](https://drive.google.com/uc?id=1QMVRnZHgG1FPYHgJ_c5_X1nj3Lws3ffP)

In [8]:
# Multi-layer networks
def predict_with_network(input_data, weights):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights['node_0_1']).sum()
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    
    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights['node_1_1']).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()
    
    # Return model_output
    return(model_output)

weights = {
    'node_0_0': np.array([2, 4]),
    'node_0_1': np.array([ 4, -5]),
    'node_1_0': np.array([-1,  2]),
    'node_1_1': np.array([1, 2]),
    'output': np.array([2, 7])
 }
input_data = np.array([3, 5])
output = predict_with_network(input_data, weights)
print(output)

182


Training a Network:  
Neural networks are trained using labelled data. The value of the weights for each line are changed to get the desired output.

![alt text](https://drive.google.com/uc?id=1d65zM3Re55BS62mzqssqJS9XJxlvFaLJ)

We use back propogation to go back and re-weight the lines to get the desired output.

![alt text](https://drive.google.com/uc?id=12gM57rbKz0VQ9jYzyyDuNprv0Vy7Bn-e)

The challenge becomes when we are trying to make multiple accurate predictions with a static network. Each outcome is associated with its own error.

Loss Function  
The loss function is used to aggregate errors in many predictions to form a single number. This is a measure of a model's predictive performance.

A common loss function is Mean Squared Error (MSE), where we square each error and take the average of the squared errors. We need to optimize the weights of the model to minimize the loss function.

![alt text](https://drive.google.com/uc?id=1eKpqtdRmYIyBfiaC9LSpGzRofsbODznL)

A simple method for optmizing this loss function is by using gradient descent. We take the derivative at the current point to find the slope of the line. We take a step in the opposite direction of the slope to go to a minima.

![alt text](https://drive.google.com/uc?id=1V-gP00xaaRqOSzc6TR7YSsVrbMPbAjhN)

