<a href="https://colab.research.google.com/github/TedHaley/courses/blob/master/introduction_to_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Introduction to Neural Networks

Overview:  
Models such as linear regression can be used to make predictions. It makes predictions by generating weights for some parameters and then adding those parameters together, ex:


y = m1X1 + m2X2 + b. 

salary = 10,000 * years_exp + 50,000


Where m1 and m2 are the weights. We can also have linear models that have interacting variables, such as:


y = (m1X1 * m2X2) + b. 

salary = (10,000 * years_exp * 1.1* age) + 50,000


This is a very basic example of interactivity between variables. Neural networks are similar to this but take it to the next level as they model many iteractions between many inputs, and then use those interactions as inputs for other interactions.

![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/280px-Colored_neural_network.svg.png)

As nodes increase, so does the ability to capture interactions.

![alt text](https://drive.google.com/uc?id=1fdjoM2tBhsQektdjG1882m5TV9FD9rkg)

Forward Propogation:  
Forward propogation is when we place weights on the lines between nodes. The output node is equal to the sum of the input nodes times the line weights. This operation is dot product.

Example:

![alt text](https://drive.google.com/uc?id=1eJ5OLqSjIDluwHcomk2tYZbyGHVK_IWi)

hidden layer node 0 = (2 * 1) + (3 * 1) = 5





In [3]:
import numpy as np

input_data = np.array([2, 3])

weights = {
    'node_0': np.array([1, 1]),
    'node_1': np.array([-1, 1]),
    'output': np.array([2, -1]),
}

# [2, 3] * [1, 1] = (2 * 1) + (3 * 1) = 5 
node_0_value = (input_data * weights['node_0']).sum()
node_1_value = (input_data * weights['node_1']).sum()
output_value = (np.array([node_0_value, node_1_value]) * weights['output']).sum()

print('node 0:', node_0_value)
print('node 1:', node_1_value)
print('Output:', output_value)

node 0: 5
node 1: 1
Output: 9


Activation Function:  
Activation functions are found within the hidden layers. Activation functions allow hidden layers to capture non-linearities. One popular activation function is ReLU (rectified linear activation), with is 0 below zero, and linear above zero for x. In the below example, we are going to use tanh as our activation function.

![alt text](https://drive.google.com/uc?id=1R_aqYRh66slQSt3YY0n1wWRTOtF8DCRa)






In [4]:
import numpy as np

input_data = np.array([2, 3])

weights = {
    'node_0': np.array([1, 1]),
    'node_1': np.array([-1, 1]),
    'output': np.array([2, -1]),
}

node_0_input = (input_data * weights['node_0']).sum()
node_0_output = np.tanh(node_0_input)

node_1_input = (input_data * weights['node_1']).sum()
node_1_output = np.tanh(node_1_input)

output = (np.array([node_1_input, node_1_output]) * weights['output']).sum()

print('node 0:', node_0_output)
print('node 1:', node_1_output)
print('Output:', output)

node 0: 0.9999092042625951
node 1: 0.7615941559557649
Output: 1.2384058440442351


In [5]:
# Using ReLU function

def relu(input):
    '''0 below 0, x above 0.'''
    # Calculate the value for the output of the relu function: output
    output = max(0, input)
    
    # Return the value just calculated
    return(output)

# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

9


Deep Networks  
 - Deep networks internally build representations of patterns in the data
 - Partially replace the need for feature engineering
 - Subsequrnt layers build increasingly sophisticated representations of raw data

![alt text](https://drive.google.com/uc?id=1QMVRnZHgG1FPYHgJ_c5_X1nj3Lws3ffP)

In [8]:
# Multi-layer networks
def predict_with_network(input_data, weights):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights['node_0_1']).sum()
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    
    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights['node_1_1']).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()
    
    # Return model_output
    return(model_output)

weights = {
    'node_0_0': np.array([2, 4]),
    'node_0_1': np.array([ 4, -5]),
    'node_1_0': np.array([-1,  2]),
    'node_1_1': np.array([1, 2]),
    'output': np.array([2, 7])
 }
input_data = np.array([3, 5])
output = predict_with_network(input_data, weights)
print(output)

182


Training a Network:  
Neural networks are trained using labelled data. The value of the weights for each line are changed to get the desired output.

![alt text](https://drive.google.com/uc?id=1d65zM3Re55BS62mzqssqJS9XJxlvFaLJ)

We use back propogation to go back and re-weight the lines to get the desired output.

![alt text](https://drive.google.com/uc?id=12gM57rbKz0VQ9jYzyyDuNprv0Vy7Bn-e)

The challenge becomes when we are trying to make multiple accurate predictions with a static network. Each outcome is associated with its own error.

Loss Function  
The loss function is used to aggregate errors in many predictions to form a single number. This is a measure of a model's predictive performance.

A common loss function is Mean Squared Error (MSE), where we square each error and take the average of the squared errors. We need to optimize the weights of the model to minimize the loss function.

![alt text](https://drive.google.com/uc?id=1eKpqtdRmYIyBfiaC9LSpGzRofsbODznL)


Optimize the Loss Function  
A simple method for optmizing this loss function is by using gradient descent. We take the derivative at the current point to find the slope of the line. We take a step in the opposite direction of the slope to go to a minima.

![alt text](https://drive.google.com/uc?id=1V-gP00xaaRqOSzc6TR7YSsVrbMPbAjhN)

We need to make sure we don't step too quickly as we might miss the minima. This is why we use learn rate (often 0.01)
step size = learning rate * slope

Slope Caluclation Example:  
Node 1 = 3
weight = 2
Node 2 = 6
Actual target = 10

slope = 2 * error * input = 2 * (6-10) * 3 = -24
new weight = weight - lr(slope) = 2 - 0.01(-24) = 2.24




In [11]:
# Coding how weight changes affect accuracy

def predict_with_network(input_data, weights):
    # Calculate node 0 in the first hidden layer
    node_0_input = (input_data * weights['node_0']).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 in the first hidden layer
    node_1_input = (input_data * weights['node_1']).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_0_output, node_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()
    
    # Return model_output
    return(model_output)

# The data point you will make a prediction for
input_data = np.array([0, 3])

# Sample weights
weights_0 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 1]
            }

# The actual target value, used to calculate the error
target_actual = 3

# Make prediction using original weights
model_output_0 = predict_with_network(input_data, weights_0)

# Calculate error: error_0
error_0 = model_output_0 - target_actual

# Create weights that cause the network to make perfect prediction (3): weights_1
weights_1 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 0]
            }

# Make prediction using new weights: model_output_1
model_output_1 = predict_with_network(input_data, weights_1)

# Calculate error: error_1
error_1 = model_output_1 - target_actual

# Print error_0 and error_1
print(error_0)
print(error_1)

6
0


In [14]:
# Scaling up to multiple data points

from sklearn.metrics import mean_squared_error

weights_0 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 1]
            }

weights_1 = {'node_0': [2, 1],
             'node_1': [1, 1.5],
             'output': [1, 1.5]
            }

input_data = [np.array([0, 3]), np.array([1, 2]), np.array([-1, -2]), np.array([4, 0])]
target_actuals = [1, 3, 5, 7]

# Create model_output_0 
model_output_0 = []
# Create model_output_1
model_output_1 = []

# Loop over input_data
for row in input_data:
    # Append prediction to model_output_0
    model_output_0.append(predict_with_network(row, weights_0))
    
    # Append prediction to model_output_1
    model_output_1.append(predict_with_network(row, weights_1))

# Calculate the mean squared error for model_output_0: mse_0
mse_0 = mean_squared_error(model_output_0, target_actuals)

# Calculate the mean squared error for model_output_1: mse_1
mse_1 = mean_squared_error(model_output_1, target_actuals)

# Print mse_0 and mse_1
print("Mean squared error with weights_0: %f" %mse_0)
print("Mean squared error with weights_1: %f" %mse_1)

Mean squared error with weights_0: 37.500000
Mean squared error with weights_1: 49.890625


Calculate slope and update weights

Target is 6

![alt text](https://drive.google.com/uc?id=1isFICLF_N4x1pvs1jh04eKECBSRptS1t)



In [16]:
# Gradient descent 
import numpy as np

weights = np.array([1, 2])
input_data = np.array([3, 4])
target = 6
learning_rate = 0.01

preds = (weights * input_data).sum()
error = preds - target
print(error)

gradient = 2 * input_data * error
weights_updated = weights - learning_rate * gradient

preds_updated = (weights_updated * input_data).sum()
error_updated = preds_updated - target
print(error_updated)

5
2.5


Creating a Keras model  
 - specify architechture
 - compile
 - fit
 - predict



In [24]:
# Model specification

!pip install tensorflow==1.14.0
import tensorflow as tf

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from keras.layers import Dense
from keras.models import Sequential

data = load_iris()
predictors = pd.DataFrame(data = data.data, columns=data.feature_names)
target = data['target']

predictors.head()

n_cols = predictors.shape[1]
print(n_cols)

# Dense layers connect every layer of the previous layer to the current layer
# we are using 100 nodes per layer, but keras will set the number actually used
# it is not uncommon to use many many more nodes in each layer
model = Sequential() # Weights are only connected to the next deepest layer
model.add(Dense(100, activation='relu', input_shape=(n_cols,))) # Number of input nodes and any number of rows
model.add(Dense(100, activation='relu'))
model.add(Dense(1)) # The output layer

# Adam is a method of gradient descent that adjusts the learning rate dynamically
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(predictors, target)

Collecting tensorflow==1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (109.2MB)
[K     |████████████████████████████████| 109.2MB 83kB/s 
Collecting tensorflow-estimator<1.15.0rc0,>=1.14.0rc0
[?25l  Downloading https://files.pythonhosted.org/packages/3c/d5/21860a5b11caf0678fbc8319341b0ae21a07156911132e0e71bffed0510d/tensorflow_estimator-1.14.0-py2.py3-none-any.whl (488kB)
[K     |████████████████████████████████| 491kB 46.5MB/s 
Collecting tensorboard<1.15.0,>=1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/91/2d/2ed263449a078cd9c8a9ba50ebd50123adf1f8cfbea1492f9084169b89d9/tensorboard-1.14.0-py3-none-any.whl (3.1MB)
[K     |████████████████████████████████| 3.2MB 44.7MB/s 
Installing collected packages: tensorflow-estimator, tensorboard, tensorflow
  Found existing installation: tensorflow-estimator 2.2.0rc0
    Uninstalling te

4


RuntimeError: ignored