# Basics of deep learning and neural networks
  
Deep learning is the machine learning technique behind the most exciting capabilities in diverse areas like robotics, natural language processing, image recognition, and artificial intelligence, including the famous AlphaGo. In this course, you'll gain hands-on, practical knowledge of how to use deep learning with Keras 2.0, the latest version of a cutting-edge library for deep learning in Python.
  
In this chapter, you'll become familiar with the fundamental concepts and terminology used in deep learning, and understand why deep learning techniques are so powerful today. You'll build simple neural networks and generate predictions with them.

## Resources
  
**Notebook Syntax**
  
<span style='color:#7393B3'>NOTE:</span>  
- Denotes additional information deemed to be *contextually* important
- Colored in blue, HEX #7393B3
  
<span style='color:#E74C3C'>WARNING:</span>  
- Significant information that is *functionally* critical  
- Colored in red, HEX #E74C3C
  
---
  
**Links**
  
[NumPy Documentation](https://numpy.org/doc/stable/user/index.html#user)  
[Pandas Documentation](https://pandas.pydata.org/docs/user_guide/index.html#user-guide)  
  
---
  
**Notable Functions**
  
<table>
  <tr>
    <th>Index</th>
    <th>Operator</th>
    <th>Use</th>
  </tr>
  <tr>
    <td>1</td>
    <td>numpy.array()</td>
    <td>Creates an array. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways.</td>
  </tr>
</table>

  
---
  
**Language and Library Information**  
  
Python 3.11.0  
  
Name: numpy  
Version: 1.24.3  
Summary: Fundamental package for array computing in Python  
  
Name: pandas  
Version: 2.0.3  
Summary: Powerful data structures for data analysis, time series, and statistics  
  
Name: matplotlib  
Version: 3.7.2  
Summary: Python plotting package  
  
Name: seaborn  
Version: 0.12.2  
Summary: Statistical data visualization  
  
---
  
**Miscellaneous Notes**
  
<span style='color:#7393B3'>NOTE:</span>  
  
`python3.11 -m IPython` : Runs python3.11 interactive jupyter notebook in terminal.
  
`nohup ./relo_csv_D2S.sh > ./output/relo_csv_D2S.log &` : Runs csv data pipeline in headless log.  
  
`print(inspect.getsourcelines(test))` : Get self-defined function schema  
  
<span style='color:#7393B3'>NOTE:</span>  
  
Schema:  
- input array -> **array**: feature values
- weights for nodes -> **dictionary**: keys = node_name, values = weight of node/input
- node_999_in -> **neural network operation**: (prior_input * weight[]).sum()
- node_999_out -> **neural network operation**: activation function
- node_hidden_concat -> **array**: concat node_999out nodes into an array
- output_in -> **neural network operation**: (prior_input * weight[]).sum()
- output_out -> **neural network operation**: output activation function, softmax
- Display, or create function for above, then make a loop iter with a loop variable

In [1]:
import numpy as np                  # Numerical Python:         Arrays and linear algebra
import pandas as pd                 # Panel Datasets:           Dataset manipulation
import matplotlib.pyplot as plt     # MATLAB Plotting Library:  Visualizations
import seaborn as sns               # Seaborn:                  Visualizations

## Introduction to deep learning
  
**Imagine you work for a bank**
  
Imagine you work for a bank, and you need to build a model predicting how many transactions each customer will make next year. You have predictive data or features like each customer’s age, bank balance, whether they are retired and so on. We'll get to deep learning in a moment, but for comparison, consider how a simple linear regression model works for this problem. The linear regression embeds an assumption that the outcome, in this case how many transactions a user makes, is the sum of individual parts. It starts by saying, "what is the average?" Then it adds the effect of age. Then the effect of bank balance. And so on. So the linear regression model isn't identifying the interactions between these parts, and how they affect banking activity. 
  
**Example as seen by linear regression**
  
Say we plot predictions from this model. We draw one line with the predictions for retired people, and another with the predictions for those still working. We put current bank balance on the horizontal axis, and the vertical axis is the predicted number of transactions.
  
The left graph shows predictions from a model with no interactions. In that model we simply add up the effect of the retirement status, and current bank balance. The lack of interactions is reflected by both lines being parallel. That's probably unrealistic, but it's an assumption of the linear regression model. The graph on the right shows the predictions from a model that allows interactions, and the lines don't need to be parallel. 

**Interactions**
  
Neural networks are a powerful modeling approach that accounts for interactions like this especially well. Deep learning, the focus of this course, is the use of especially powerful neural networks. Because deep learning models account for these types of interactions so well, they perform great on most prediction problems you've seen before. But their ability to capture extremely complex interactions also allow them to do amazing things with text, images, videos, audio, source code and almost anything else you could imagine doing data science with.
  
- Neural Networks account for interactions really well
- Deep learning uses especially powerful neural networks
  
**Course structure**
  
The first two chapters of this course focus on conceptual knowledge about deep learning. This part will be hard, but it will prepare you to debug and tune deep learning models on conventional prediction problems, and it will lay the foundation for progressing towards those new and exciting applications. You'll see this pay off in the third and fourth chapter.
  
- Debug and tune deep learning models on conventional prediction problems
- Lay the foundation for progressing towards modern applications
  
**Build and tune deep learning models using keras**
  
You will write code that looks like this, to build and tune deep learning models using keras, to solve many of the same modeling problems you might have previously solved with scikit-learn. As a start to how deep learning models capture interactions and achieve these amazing results, we'll modify the diagram you saw a moment ago.
  
**Deep learning models capture interactions**
  
Here there is an interaction between retirement status and bank balance. Instead of having them separately affecting the outcome, we calculate a function of these variables that accounts for their interaction, and use that to predict the outcome. Even this graphic oversimplifies reality, where most things interact with each in some way, and real neural network models account for far more interactions. So the diagram for a simple neural network looks like this.
  
**Interactions in neural network**
  
On the far left, we have something called an input layer. This represents our predictive features like age or income. On the far right we have the output layer. The prediction from our model, in this case, the predicted number of transactions. All layers that are not the input or output layers are called hidden layers. They are called hidden layers because, while the inputs and outputs correspond to visible things that happened in the world, and they can be stored as data, the values in the hidden layer aren't something we have data about, or anything we observe directly from the world. 
  
Nevertheless, each dot, called a node, in the hidden layer, represents an aggregation of information from our input data, and each node adds to the model's ability to capture interactions. So the more nodes we have, the more interactions we can capture.

### Comparing neural network models to classical regression models
  
Which of the models in the diagrams has greater ability to account for interactions?
  
Model 1: Input 2, Hidden 2, Output 1  
Model 2: Input 2, Hidden 3, Output 1  
  
Possible Answers

- [ ] Model 1
- [x] Model 2
- [ ] They are both the same
  
Correct! Model 2 has more nodes in the hidden layer, and therefore, greater ability to capture interactions.

## Forward propagation
  
We’ll start by showing how neural networks use data to make predictions. This is called the forward propagation algorithm.
  
**Bank transactions example**
  
Let's revisit our example predicting how many transactions a user will make at our bank. For simplicity, we'll make predictions based on only the number of children and number of existing accounts.
  
**Forward propagation**
  
This graph shows a customer with two children and three accounts. The forward-propagation algorithm will pass this information through the network to make a prediction in the output layer. Lines connect the inputs to the hidden layer. Each line has a weight indicating how strongly that input effects the hidden node that the line ends at. These are the first set of weights. We have one weight from the top input into the top node of the layer, and one weight from the bottom input to the top node of the hidden layer. These weights are the parameters we train or change when we fit a neural network to data, so these weights will be a focus throughout this course. To make predictions for the top node of the hidden layer, we take the value of each node in the input layer, multiply it by the weight that ends at that node, and then sum up all the values. In this case, we get (2 times 1) plus (3 times 1), which is 5. 
  
Now do the same to fill in the value of this node on the bottom. That is (two times (minus one)) plus (three times one). That's one. Finally, repeat this process for the next layer, which is the output layer. That is (five times two) plus (one times -1). That gives an output of 9. We predicted nine transactions. That’s forward-propagation. We moved from the inputs on the left, to the hidden layer in the middle, and then from the hidden layers to the output on the right. 
  
<img src='../_images/forward-propagation-demostrated.png' alt='img' width='400'>
  
We always use that same multiply then add process. If you're familiar with vector algebra or linear algebra, that operation is a dot product. If you don't know about dot products, that's fine too. That was forward propagation for a single data point. In general, we do forward propagation for one data point at a time. The value in that last layer is the model's prediction for that data point.
  
**Forward propagation code**
  
Let's see the code for this. We import Numpy for some of the mathematical operations. We've stored the input data as an array. We then have weights into each node in the hidden layer and to the output. We store the weights going into each node as an array, and we use a dictionary to store those arrays. Let’s start forward propagating. We fill in the top hidden node here, which is called node zero. We multiply the inputs by the weights for that node, and then sum both of those terms together. Notice that we had two weights for node_0. That matches the two items in the array it is multiplied by, which is the input_data. These get converted to a single number by the sum function at the end of the line. We then do the same thing for the bottom node of the hidden layer, which is called node 1. Now, both node zero and node one have numeric values. 
  
<img src='../_images/forward-propagation-demostrated1.png' alt='img' width='520'>
  
To simplify multiplication, we put those in an array here. If we print out the array, we confirm that those are the values from the hidden layer you saw a moment ago. It can also be instructive to verify this by hand with pen and paper. To get the output, we multiply the values in the hidden layer by the weights for the output. Summing those together gives us 10 minus 1, which is 9. In the exercises, you'll practice performing forward propagation in small neural networks.
  
<img src='../_images/forward-propagation-demostrated2.png' alt='img' width='520'>
  


### Coding the forward propagation algorithm
  
In this exercise, you'll write code to do forward propagation (prediction) for your first neural network:
  
<img src='../_images/forward-propagation-demostrated3.png' alt='img' width='490'>
  
Each data point is a customer. The first input is how many accounts they have, and the second input is how many children they have. The model will predict how many transactions the user makes in the next year. You will use this data throughout the first 2 chapters of this course.
  
The input data has been pre-loaded as `input_data`, and the weights are available in a dictionary called `weights`. The array of weights for the first node in the hidden layer are in `weights['node_0']`, and the array of weights for the second node in the hidden layer are in `weights['node_1']`.
  
The weights feeding into the output node are available in `weights['output']`.
  
NumPy will be pre-imported for you as `np` in all exercises.
  
1. Calculate the value in node 0 by multiplying `input_data` by its weights, `weights['node_0']` and computing their sum. This is the 1st node in the hidden layer.
2. Calculate the value in node 1 using `input_data` and `weights['node_1']`. This is the 2nd node in the hidden layer.
3. Put the hidden layer values into an array. This has been done for you.
4. Generate the prediction by multiplying `hidden_layer_outputs` by `weights['output']` and computing their sum.
5. Print the output

In [4]:
# Creating the input layer data: input accounts, input children
input_data = np.array([3, 5])

# Creating the weights for: input layer -> hidden layer (w/ 2 nodes) -> output
weights = {'node_0': np.array([2, 4]), 
           'node_1': np.array([ 4, -5]), 
           'output': np.array([2, 7])}

In [5]:
# Calculate node 0 value: node_0_value, matmul
node_0_value = (input_data * weights['node_0']).sum()

# Calculate node 1 value: node_1_value, matmul
node_1_value = (input_data * weights['node_1']).sum()

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_value, node_1_value])

# Calculate output: output
output = (hidden_layer_outputs * weights['output']).sum()

# Print output
print(output)

-39


It looks like the network generated a prediction of -39.

## Activation functions
  
But creating this multiply-add-process is only half the story for hidden layers. For neural networks to achieve their maximum predictive power, we must apply something called an activation function in the hidden layers.
  
**Linear vs. non-linear Functions**
  
An activation function allows the model to capture non-linearities. Non-linearities, as shown on the right here, capture patterns like how going from no children to one child may impact your banking transactions differently than going from three children to four. We have examples of linear functions, straight lines on the left, and non-linear functions on the right. If the relationships in the data aren’t straight-line relationships, we will need an activation function that captures non-linearities.
  
<img src='../_images/activation-functions-low-level-neural-net.png' alt='img' width='520'>
  
**Activation functions**
  
An activation function is something applied to the value coming into a node, which then transforms it into the value stored in that node, or the node output.
  
**Improving our neural network**
  
Let's go back to the previous diagram. The top hidden node previously had a value of 5. For a long time, an s-shaped function called tanh was a popular activation function.
  
<img src='../_images/activation-functions-low-level-neural-net1.png' alt='img' width='520'>
  
**Activation functions**
  
If we used the tanh activation function, this node's value would be tanh(5), which is very close to 1. Today, the standard in both industry and research applications is something called ReLU.
  
<img src='../_images/activation-functions-low-level-neural-net2.png' alt='img' width='520'>
  
**ReLU (Rectified Linear Activation)**
  
The ReLU or rectified linear activation function. That's depicted here. Though it has two linear pieces, it's surprisingly powerful when composed together through multiple successive hidden layers, which you will see soon. 
  
<img src='../_images/activation-functions-low-level-neural-net3.png' alt='img' width='520'>
  
**Activation functions**
  
The code that incorporates activation functions is shown here. It is the same as the code you saw previously, but we've distinguished the input from the output in each node, which is shown in these lines and then again here And we've applied the tanh function to convert the input to the output. That gives us a prediction of 1.2 transactions. In the exercise, you will use the Rectified Linear Activation function, or ReLU, in your network.
  
<img src='../_images/activation-functions-low-level-neural-net4.png' alt='img' width='520'>

### The Rectified Linear Activation Function
  
As explained, an "activation function" is a function applied at each node. It converts the node's input into some output.
  
The rectified linear activation function (called ReLU) has been shown to lead to very high-performance networks. This function takes a single number as an input, returning 0 if the input is negative, and the input if the input is positive.
  
Here are some examples:  
- relu(3) = 3
- relu(-3) = 0
  
1. Fill in the definition of the `relu()` function:
2. Use the `max()` function to calculate the value for the output of `relu()`.
3. Apply the `relu()` function to `node_0_input` to calculate `node_0_output`.
4. Apply the `relu()` function to `node_1_input` to calculate `node_1_output`.

In [6]:
# Creating the input layer data: input accounts, input children
input_data = np.array([3, 5])

# Creating the weights for: input layer -> hidden layer (w/ 2 nodes) -> output
weights = {
    'node_0': np.array([2, 4]), 
    'node_1': np.array([ 4, -5]),
    'output': np.array([2, 7])
}


In [8]:
# Defining the Rectified Linear Activation Function
def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(0, input)
    
    # Return the value just calculate
    return output


# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

52


You predicted 52 transactions. Without this activation function, you would have predicted a negative number! The real power of activation functions will come soon when you start tuning model weights.

### Applying the network to many observations/rows of data
  
You'll now define a function called ``predict_with_network()`` which will generate predictions for multiple data observations, which are pre-loaded as `input_data`. As before, `weights` are also pre-loaded. In addition, the `relu()` function you defined in the previous exercise has been pre-loaded.
  
1. Define a function called `predict_with_network()` that accepts two arguments - `input_data_row` and `weights` - and returns a prediction from the network as the output.
2. Calculate the input and output values for each node, storing them as: `node_0_input`, `node_0_output`, `node_1_input`, and `node_1_output`.
- To calculate the input value of a node, multiply the relevant arrays together and compute their sum.
- To calculate the output value of a node, apply the `relu()` function to the input value of the node.
3. Calculate the model output by calculating `input_to_final_layer` and `model_output` in the same way you calculated the input and output values for the nodes.
4. Use a for loop to iterate over `input_data`:
- Use your `predict_with_network()` to generate predictions for each row of the `input_data` - `input_data_row`. Append each prediction to `results`.

In [10]:
# List of 4 arrays each as shape (2,)
input_data = [np.array([3, 5]), np.array([ 1, -1]), np.array([0, 0]), np.array([8, 4])]


In [11]:
# Define predict_with_network()
def predict_with_network(input_data_row, weights):
    # Calculate node 0 value
    node_0_input = (input_data_row * weights['node_0']).sum()
    node_0_output = relu(node_0_input)
    
    # Calculate node 1 value
    node_1_input = (input_data_row * weights['node_1']).sum()
    node_1_output = relu(node_1_input)
    
    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])
    
    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs * weights['output']).sum()
    model_output = relu(input_to_final_layer)
    
    # Return model output
    return(model_output)


# Create empty list to store prediction results
results = []

# Iteration
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row, weights))
    

# Print results
print(results)

[52, 63, 0, 148]


Good work, each of the 4 outputs are the predictions for the inputs given.

## Deeper networks
  
The difference between modern deep learning and the historical neural networks that didn’t deliver these amazing results, is the use of models with not just one hidden layer, but with many successive hidden layers. We forward propagate through these successive layers in a similar way to what you saw for a single hidden layer.
  
**Multiple hidden layers**
  
Here is a network with two hidden layers. 
  
<img src='../_images/deeper-networks-low-level.png' alt='img' width='520'>
  
We first fill in the values for hidden layer one as a function of the inputs. Then apply the activation function to fill in the values in these nodes. 
  
<img src='../_images/deeper-networks-low-level1.png' alt='img' width='520'>
  
Then use values from the first hidden layer to fill in the second hidden layer.
  
<img src='../_images/deeper-networks-low-level2.png' alt='img' width='520'>
  
Then we make a prediction based on the outputs of hidden layer two. In practice, it's becoming common to have neural networks that have many, many layers; five layers, ten layers. A few years ago 15 layers was state of the art but this can scale quite naturally to even a thousand layers. 
  
<img src='../_images/deeper-networks-low-level3.png' alt='img' width='520'>
  
You use the same forward propagation process, but you apply that iterative process more times. Let's walk through the first steps of that. Assume all layers here use the ReLU activation function. We'll start by filling in the top node of the first hidden layer. That will use these two weights. The top weights contributes 3 times 2, or 6. The bottom weight contributes 20. The ReLU activation function on a positive number just returns that number. So we get 26. Now let's do the bottom node of that first hidden layer. We use these two nodes. Using the same process, we get 4 times 3, or 12 from this weight. And -25 from the bottom weight. So the input to this node is 12 minus 25. Recall that, when we apply ReLU to a negative number, we get 0. So this node is 0. We've shown the values for the subsequent layers here. Pause this video, and verify you can calculate the same values at each node. At this point, you understand the mechanics for how neural networks make predictions. Let’s close this chapter with an interesting and important fact about these deep networks.
  
<img src='../_images/deeper-networks-low-level4.png' alt='img' width='520'>
  
**Representation learning**
  
That is, they internally build up representations of the patterns in the data that are useful for making predictions. And they find increasingly complex patterns as we go through successive hidden layers of the network. In this way, neural networks partially replace the need for feature engineering, or manually creating better predictive features. Deep learning is also sometimes called representation learning, because subsequent layers build increasingly sophisticated representations of the raw data, until we get to a stage where we can make predictions. This is easiest to understand from an application to images, which you will see later in this course. 
  
<img src='../_images/deeper-networks-low-level5.png' alt='img' width='520'>
  
Even if you haven't worked with images, you may find it useful to think through this example heuristically. When a neural network tries to classify an image, the first hidden layers build up patterns or interactions that are conceptually simple. A simple interaction would look at groups of nearby pixels and find patterns like diagonal lines, horizontal lines, vertical lines, blurry areas, etc. Once the network has identified where there are diagonal lines and horizontal lines and vertical lines, subsequent layers combine that information to find larger patterns, like big squares. A later layer might put together the location of squares and other geometric shapes to identify a checkerboard pattern, a face, a car, or whatever is in the image. 
  
- Deep networks internally build representations of patterns in the data
- Partially replace the need for feature engineering
- Subsequent layers build increasingly sophisticated representations of raw data
  
**Deep learning**
  
The cool thing about deep learning is that the modeler doesn’t need to specify those interactions. We never tell the model to look for diagonal lines. Instead, when you train the model, which you’ll learn to do in the next chapter, the network gets weights that find the relevant patterns to make better predictions. Working with images may still seem abstract, but this idea of finding increasingly complex or abstract patterns is a recurring theme when people talk about deep learning, and it will feel more concrete as you work with these networks more.
  
- Modeler doesn't need to specify the interactions
- When you train the model, the neural network gets weights that find the relevant patterns to make better predictions

### Forward propagation in a deeper network
  
You now have a model with 2 hidden layers. The values for an input data point are shown inside the input nodes. The weights are shown on the edges/lines. What prediction would this model make on this data point?
  
Assume the activation function at each node is the *identity function*. That is, each node's output will be the same as its input. So the value of the bottom node in the first hidden layer is -1, and not 0, as it would be if the ReLU activation function was used.
  
<img src='../_images/deeper-networks-low-level6.png' alt='img' width='520'>
  
Possible Answers
  
- [x] 0
- [ ] 7
- [ ] 9
  
Correct

### Multi-layer neural networks
  
In this exercise, you'll write code to do forward propagation for a neural network with 2 hidden layers. Each hidden layer has two nodes. The input data has been preloaded as input_data. The nodes in the first hidden layer are called `node_0_0` and node_0_1. Their weights are pre-loaded as `weights['node_0_0']` and `weights['node_0_1']` respectively.
  
The nodes in the second hidden layer are called `node_1_0` and `node_1_1`. Their weights are pre-loaded as `weights['node_1_0']` and `weights['node_1_1']` respectively.
  
We then create a model output from the hidden nodes using weights pre-loaded as `weights['output']`.
  
<img src='../_images/deeper-networks-low-level7.png' alt='img' width='490'>
  
1. Calculate `node_0_0_input` using its weights `weights['node_0_0']` and the given input_data. Then apply the `relu()` function to get `node_0_0_output`.
2. Do the same as above for `node_0_1_input` to get `node_0_1_output`.
3. Calculate `node_1_0_input` using its weights `weights['node_1_0']` and the outputs from the first hidden layer - `hidden_0_outputs`. Then apply the `relu()` function to get `node_1_0_output`.
4. Do the same as above for `node_1_1_input` to get `node_1_1_output`.
5. Calculate `model_output` using its weights `weights['output']` and the outputs from the second hidden layer `hidden_1_outputs` array. Do not apply the `relu()` function to this output.

In [13]:
# Input layer
input_data = np.array([3, 5])

# Weights
weights = {
    'node_0_0': np.array([2, 4]),
    'node_0_1': np.array([ 4, -5]),
    'node_1_0': np.array([-1, 2]),
    'node_1_1': np.array([1, 2]),
    'output': np.array([2, 7])
}


In [14]:
def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)
    
    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights['node_0_1']).sum()
    node_0_1_output = relu(node_0_1_input)
    
    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])
    
    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)
    
    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights['node_1_1']).sum()
    node_1_1_output = relu(node_1_1_input)
    
    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])
    
    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights['output']).sum()
    
    # Return model_output
    return model_output


# Prediction
output = predict_with_network(input_data)
print(output)

182


The network generated a prediction of 182.

### Representations are learned
  
How are the weights that determine the features/interactions in Neural Networks created?
  
Possible Answers
  
- [ ] A user chooses them when creating the model.
- [x] The model training process sets them to optimize predictive accuracy.
- [ ] The weights are random numbers.
  
Exactly! You will learn more about how Neural Networks optimize their weights in the next chapter!

### Levels of representation
  
Which layers of a model capture more complex or "higher level" interactions?
  
Possible Answers
  
- [ ] The first layers capture the most complex interactions.
- [x] The last layers capture the most complex interactions.
- [ ] All layers capture interactions of similar complexity.
  
Exactly! The last layers capture the most complex interactions.