```
###############################
##                           ##
##  python:
##  numpy:
##                           ##
###############################
```

# Introduction to deep learning

## `[note]` Interactions

* Neural networks account for interactions really well.

* Deep learning uses especially powerful neural networks:

    * Text
    
    * Images
    
    * Videos
    
    * Audio
    
    * Source code

## `[note]` Interactions in neural network

![Interactions in neural network](ref1.%20Interactions%20in%20neural%20network.jpg)

## `[quiz]` Comparing neural network models to classical regression models

* Which of the models in the diagrams has a greater ability to account for interactions?

    ![Comparing neural network models to classical regression models](ref2.%20Comparing%20neural%20network%20models%20to%20classical%20regression%20models.png)
    
    $\Box$ Model 1.
    
    $\boxtimes$ Model 2.
    
    $\Box$ They are both the same.

# Forward propagation

## `[note]` Forward propagation

* Multiply - add process.

* Dot product.

* Propagate forward for one data point at a time.

* Output is the prediction for that data point.

![Forward propagation](ref3.%20Forward%20propagation.jpg)

## `[code]` Forward propagation

In [1]:
import numpy as np

input_data = np.array([2, 3])
weights = {
    'node_0': np.array([1, 1]),
    'node_1': np.array([-1, 1]),
    'output': np.array([2, -1])
}
node_0_value = (input_data * weights['node_0']).sum()
node_1_value = (input_data * weights['node_1']).sum()

hidden_layer_values = np.array([node_0_value, node_1_value])
print(hidden_layer_values)

[5 1]


In [2]:
output = (hidden_layer_values * weights['output']).sum()
print(output)

9


## `[task]` Coding the forward propagation algorithm

$\blacktriangleright$ **Task diagram**

![Coding the forward propagation algorithm](ref4.%20Coding%20the%20forward%20propagation%20algorithm.png)

$\blacktriangleright$ **Package pre-loading**

In [6]:
import numpy as np

$\blacktriangleright$ **Data pre-loading**

In [7]:
input_data = np.array([3, 5])

weights = {
    'node_0': np.array([2, 4]),
    'node_1': np.array([4, -5]),
    'output': np.array([2, 7])
}

$\blacktriangleright$ **Task practice**

In [8]:
# Calculate node 0 value: node_0_value
node_0_value = (input_data * weights['node_0']).sum()

# Calculate node 1 value: node_1_value
node_1_value = (input_data * weights['node_1']).sum()

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_value, node_1_value])

# Calculate output: output
output = (hidden_layer_outputs * weights['output']).sum()

# Print output
print(output)

-39


# Activation functions

## `[note]` Activation functions

* Activation functions are applied to node inputs to produce node output.

![Activation functions](ref5.%20Activation%20functions.jpg)

## What is the rectified linear activation (ReLU)?

![Rectified linear activation](ref6.%20Rectified%20linear%20activation.jpg)

\begin{equation*}
    RELU(x)\ =\
        \begin{cases}
            \begin{aligned}
                0\ \ if\ \ x\ <   \ 0\\
                x\ \ if\ \ x\ \geq\ 0  
            \end{aligned}
        \end{cases}
\end{equation*}

## Code of activation functions:

In [6]:
import numpy as np

input_data = np.array([-1, 2])
weights = {
    'node_0': np.array([3, 3]),
    'node_1': np.array([1, 5]),
    'output': np.array([2, -1])
}
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = np.tanh(node_0_input)
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = np.tanh(node_1_input)
hidden_layer_outputs = np.array([node_0_output, node_1_output])
output = (hidden_layer_outputs * weights['output']).sum()

print(output)

0.9901095378334199


## Practice exercises for activation functions:

$\blacktriangleright$ **Package pre-loading:**

In [7]:
import numpy as np

$\blacktriangleright$ **Data pre-loading:**

In [8]:
input_data = np.array([3, 5])

weights = {
    'node_0': np.array([2, 4]),
    'node_1': np.array([4, -5]),
    'output': np.array([2, 7])
}

$\blacktriangleright$ **The rectified linear activation function practice:**

In [9]:
def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(0, input)

    # Return the value just calculated
    return (output)


# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

52


$\blacktriangleright$ **Data re-pre-loading:**

In [10]:
input_data = [
    np.array([3, 5]),
    np.array([1, -1]),
    np.array([0, 0]),
    np.array([8, 4])
]

$\blacktriangleright$ **Network to many observations/rows of data applying practice:**

In [11]:
# Define predict_with_network()
def predict_with_network(input_data_row, weights):

    # Calculate node 0 value
    node_0_input = (input_data_row * weights['node_0']).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 value
    node_1_input = (input_data_row * weights['node_1']).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])

    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs * weights['output']).sum()
    model_output = relu(input_to_final_layer)

    # Return model output
    return (model_output)


# Create empty list to store prediction results
results = []
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row, weights))

# Print results
print(results)

[52, 63, 0, 148]


# Deeper networks

## How do multiple hidden layers function?

![Multiple hidden layers](ref7.%20Multiple%20hidden%20layers.jpg)

## Why is deep learning also sometimes called representation learning?

* Deep networks internally build representations of patterns in the data; in this way, partially replace the need for feature engineering.

* Subsequent layers build increasingly sophisticated representations of raw data.

![Representation learning](ref8.%20Representation%20learning.jpg)

## How does the deep learning process?

* The modeler doesn't need to specify the interactions.

* When training the model, the neural network gets weights that find the relevant patterns to make better predictions.

## Practice question for the forward propagation in a deeper network:

* Ther is a model with two hidden layers. The values for an input data point are shown inside the input nodes. The weights are shown on the edges/lines. What prediction would this model make on this data point?

* Assume the activation function at each node is the *identity function*. That is, each node's output will be the same as its input. So the value of the bottom node in the first hidden layer is $-1$, and not $0$, as it would be if the ReLU activation function was used.

    ![Forward propagation in a deeper network](ref9.%20Forward%20propagation%20in%20a%20deeper%20network.png)
    
    $\boxtimes$ $0$.
    
    $\Box$ $7$.
    
    $\Box$ $9$.

## Practice exercises for deeper networks:

$\blacktriangleright$ **Diagram of the forward propagation:**

![Multi-layer neural networks](ref10.%20Multi-layer%20neural%20networks.png)

$\blacktriangleright$ **Package pre-loading:**

In [12]:
import numpy as np

$\blacktriangleright$ **Data pre-loading:**

In [13]:
input_data = np.array([3, 5])

weights = {
    'node_0_0': np.array([2, 4]),
    'node_0_1': np.array([4, -5]),
    'node_1_0': np.array([-1, 2]),
    'node_1_1': np.array([1, 2]),
    'output': np.array([2, 7])
}

$\blacktriangleright$ **Code pre-loading:**

In [14]:
def relu(input):
    output = max(0, input)
    return (output)

$\blacktriangleright$ **Multi-layer neural networks practice:**

In [15]:
def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights['node_0_1']).sum()
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])

    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights['node_1_1']).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()

    # Return model_output
    return (model_output)


output = predict_with_network(input_data)
print(output)

182


## Practice question for learned representations:

* How are the weights that determine the features/interactions in Neural Networks created?

    $\Box$ A user chooses them when creating the model.
    
    $\boxtimes$ The model training process sets them to optimize predictive accuracy.
    
    $\Box$ The weights are random numbers.

## Practice question for levels of representation:

* Which layers of a model capture more complex or "higher level" interactions?

    $\Box$ The first layers capture the most complex interactions.
    
    $\boxtimes$ The last layers capture the most complex interactions.
    
    $\Box$ All layers capture interactions of similar complexity.