# Introduction to Deep Learning
## [DataCamp Course Dashboard](https://www.datacamp.com/courses/deep-learning-in-python)

### Imagine you work for a bank
- You need to predict how many transactions each customer will make next year
- You have features like `age`, `bank_balance`, `retirement_status`, etc.

#### How would a linear regression model work through this problem?
- The LR model would look at the influence of each feature individually and not the interactions between the features. This is not ideal from an intutive standpoint.

# Interactions
- Neural networks account for interactions really well
- Deep learning uses especially powerful neural networks

# Course structure
- First two chapters focus on conceptual knowledge
    - Debug and tune deep learning models on conventional prediction problems
    - Lay the foundation for progressing towards modern applications
    
## Build deep learning models with keras
```python
import numpy as np
from keras.layers import Dense
from keras.models import Sequential
predictors = np.loadtxt('predictors_data.csv', delimiter=',')
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = (n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(1))
```

## Deep learning models capture interactions
- We generate a function that describes the interaction of variables and use that function to predict the outcome of the response variable.
- The input layer describes our predictive features (`age`, `bank_balance`, `retirement_status`, etc)
- The output layer describes our response variable or what we want to predict.
- All other layers are called the hidden layers because these are non-empirical values that are calculated by the model to account for the interaction between variables in the input layer and more accurately predict the output layer.

# Forward propagation
- First step of the **forward propagation algorithm** is to draw lines between the input layer and the nodes of the hidden layer.
- Each line is given a weight to account for the strength of that interaction.
- Next, each input layer value is multiplied by its line weight and added to all other input layer calculations corresponding to that node.
- This process to repeated to draw lines, assign weights, and apply calculations to yeild the output layer value.

    - Multiply - add process
    - Dot product
    - Forward propagation for one data point at a time
    - Output is the prediction for that data point

In [1]:
# Forward propagation code

import numpy as np
input_data = np.array([2, 3])
weights = {'node_0': np.array([1, 1]),
          'node_1': np.array([-1, 1]),
          'output': np.array([2, -1])}
node_0_value = (input_data * weights['node_0']).sum()
node_1_value = (input_data * weights['node_1']).sum()

hidden_layer_values = np.array([node_0_value, node_1_value])
print('hidden layer values: ', hidden_layer_values)

output = (hidden_layer_values * weights['output']).sum()
print('output value: ', output)

hidden layer values:  [5 1]
output value:  9


In [3]:
input_data = np.array([3, 5])
weights = {'node_0': np.array([2, 4]), 
           'node_1': np.array([ 4, -5]), 
           'output': np.array([2, 7])}

# Calculate node 0 value: node_0_value
node_0_value = (input_data * weights['node_0']).sum()

# Calculate node 1 value: node_1_value
node_1_value = (input_data * weights['node_1']).sum()

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_value, node_1_value])

# Calculate output: output
output = (hidden_layer_outputs * weights['output']).sum()

# Print output
print(output)

-39


# Activation functions

- These functions allow the model to capture nonlinearities.
- Applied to node inputs to produce node output
- `tanh()` was a very popular activation function
- Today, ReLU (Rectified Linear Activation) is the industry standard activation function
$$RELU(x) = \left\{\begin{matrix}
0 \text{ if } x<0\\ 
x \text{ if } x\geq 0
\end{matrix}\right.$$

In [5]:
# Activation function code

import numpy as np
input_data = np.array([-1, 2])
weights = {'node_0': np.array([3, 3]),
          'node_1': np.array([1, 5]),
          'output': np.array([2, -1])}
node_0_input = (input_data * weights['node_0']).sum()

node_0_output = np.tanh(node_0_input)

node_1_input = (input_data * weights['node_1']).sum()

node_1_output = np.tanh(node_1_input)

hidden_layer_outputs = np.array([node_0_output, node_1_output])
print('hidden layer outputs: ', hidden_layer_outputs)

output = (hidden_layer_outputs * weights['output']).sum()
print('output value: ', output)

hidden layer outputs:  [0.99505475 0.99999997]
output value:  0.9901095378334199


The rectified linear activation function (called ReLU) has been shown to lead to very high-performance networks. This function takes a single number as an input, returning 0 if the input is negative, and the input if the input is positive.

In [13]:
def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(input, 0)
    
    # Return the value just calculated
    return(output)

input_data = np.array([3, 5])
weights = {'node_0': np.array([2, 4]), 
           'node_1': np.array([ 4, -5]), 
           'output': np.array([2, 7])}

# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print('output layer value:', model_output, '\nWithout the activation function, \
this prediction would be negative.')

output layer value: 52 
Without the activation function, this prediction would be negative.


In [14]:
input_data = [np.array([3, 5]), np.array([ 1, -1]), np.array([0, 0]), np.array([8, 4])]
weights = {'node_0': np.array([2, 4]), 
           'node_1': np.array([ 4, -5]), 
           'output': np.array([2, 7])}

# Define predict_with_network()
def predict_with_network(input_data_row, weights):

    # Calculate node 0 value
    node_0_input = (input_data_row * weights['node_0']).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 value
    node_1_input = (input_data_row * weights['node_1']).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])
    
    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs * weights['output']).sum()
    model_output = relu(input_to_final_layer)
    
    # Return model output
    return(model_output)


# Create empty list to store prediction results
results = []
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row, weights))

# Print results
print(results)

[52, 63, 0, 148]


# Deeper networks
## Multiple hidden layers
## Representation learning
- Deep networks internally build representations of patterns in the data.
- Partially replace the need for feature engineering
- Subsequent layers build increasingly sophisticated representations of raw data

# Deep learning
- Modeler doesn't need to specify the interactions
- When you train the model, the neural network gets weights that find the relevant patterns to make better predictions

*Identity function* - each node's output will be the same as its input.

In [17]:
input_data = np.array([3, 5])
weights = {'node_0_0': np.array([2, 4]),
           'node_0_1': np.array([ 4, -5]),
           'node_1_0': np.array([-1,  2]),
           'node_1_1': np.array([1, 2]),
           'output': np.array([2, 7])}

def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights['node_0_1']).sum()
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    
    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights['node_1_1']).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()
    
    # Return model_output
    return(model_output)

output = predict_with_network(input_data)
print('The output value is:', output)

The output value is: 182


# The need for optimization
