# <h1 style="text-align: center;" class="list-group-item list-group-item-action active" data-toggle="list" role="tab" aria-controls="home">Introduction to Deep Learning in Python</h1>

Deep learning is the machine learning technique behind the most exciting capabilities in diverse areas like robotics, natural language processing, image recognition, and artificial intelligence, including the famous AlphaGo. In this course, you'll gain hands-on, practical knowledge of how to use deep learning with Keras 2.0, the latest version of a cutting-edge library for deep learning in Python.

<a id="toc"></a>

<h3 class="list-group-item list-group-item-action active" data-toggle="list" role="tab" aria-controls="home">Table of Contents</h3>
    
* [1. Basics of deep learning and neural networks](#1)
    - Introduction to deep Learning
    - Forward propagation
    - Activation functions
    - Deeper networks
    
* [2. Optimizing a neural network with backward propagation](#2) 
    - The need for optimization
    - Gradient descent
    - Backpropagation
    - Backpropagation in practice
    
* [3. Building deep learning models with keras](#3)
    - Creating a Keras model
    - Compiling and fitting a model
    - Classification models
    - Using models
    
* [4. Fine-tuning keras models](#4)
    - Understanding model optimization
    - Model validation
    - Thinking about model capacity
    - Stepping up to images
    - Final thoughts

## Explore Datasets

Use the DataFrames imported in the first cell to explore the data and practice your skills!

In [1]:
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import scipy.stats 

import warnings
warnings.filterwarnings('ignore')

# Import the course datasets 
hourly_wages = pd.read_csv('datasets/hourly_wages.csv')
mnist = pd.read_csv('datasets/mnist.csv')
titanic_all_numeric = pd.read_csv('datasets/titanic_all_numeric.csv')

Use the DataFrames imported in the first cell to explore the data and practice your skills!
- You work for an agricultural research center. Your manager wants you to group seed varieties based on different measurements contained in the `grains` DataFrame. They also want to know how your clustering solution compares to the seed types listed in the dataset (the `variety_number` and `variety` columns). Try to use all of the relevant techniques you learned in Unsupervised Learning in Python!
- In the `fish` DataFrame, each row represents an individual fish. Standardize the features and cluster the fish by their measurements. You can then compare your cluster labels with the actual fish species (first column).
- In the `wine` DataFrame, there are three `class_labels` in this dataset. Transform the features to get the most accurate clustering.
- In the `eurovision` DataFrame, perform hierarchical clustering of the voting countries using `complete` linkage and plot the resulting dendrogram.

In [2]:
hourly_wages.head(3)

Unnamed: 0,wage_per_hour,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,5.1,0,8,21,35,1,1,0,1,0
1,4.95,0,9,42,57,1,1,0,1,0
2,6.67,0,12,1,19,0,0,0,1,0


In [3]:
mnist.head(3)

Unnamed: 0,5,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,...,0.608,0.609,0.610,0.611,0.612,0.613,0.614,0.615,0.616,0.617
0,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [4]:
titanic_all_numeric.head(3)

Unnamed: 0,survived,pclass,age,sibsp,parch,fare,male,age_was_missing,embarked_from_cherbourg,embarked_from_queenstown,embarked_from_southampton
0,0,3,22.0,1,0,7.25,1,False,0,0,1
1,1,1,38.0,1,0,71.2833,0,False,1,0,0
2,1,3,26.0,0,0,7.925,0,False,0,0,1


## <a id="1"></a>
<font color="lightseagreen" size=+2.5><b>1. Basics of deep learning and neural networks</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

In this chapter, you'll become familiar with the fundamental concepts and terminology used in deep learning, and understand why deep learning techniques are so powerful today. You'll build simple neural networks and generate predictions with them.

### 1 01 Introduction to deep learning

1. Introduction to deep learning

2. Imagine you work for a bank

![image.png](attachment:image.png)

Imagine you work for a bank, and you need to build a model predicting how many transactions each customer will make next year. You have predictive data or features like

3. Example as seen by linear regression

![image-2.png](attachment:image-2.png)

each customer’s age,

4. Example as seen by linear regression

![image-3.png](attachment:image-3.png)

bank balance,

5. Example as seen by linear regression

![image-4.png](attachment:image-4.png)

whether they are retired

6. Example as seen by linear regression

![image-5.png](attachment:image-5.png)

and so on. We'll get to deep learning in a moment, but for comparison, consider how a simple linear regression model works for this problem. The linear regression embeds an assumption that the outcome, in this case

7. Example as seen by linear regression

![image-6.png](attachment:image-6.png)

how many transactions a user makes, is the sum of individual parts. It starts by saying, "what is the average?" Then it adds

8. Example as seen by linear regression

![image-7.png](attachment:image-7.png)

the effect of age.

9. Example as seen by linear regression

![image-8.png](attachment:image-8.png)

Then the effect of bank balance. And so on. So the linear regression model isn't identifying the interactions between these parts, and how they affect banking activity.

10. Example as seen by linear regression

![image-9.png](attachment:image-9.png)

Say we plot predictions from this model.

11. Example as seen by linear regression

![image-10.png](attachment:image-10.png)

We draw one line with the predictions for retired people,

12. Example as seen by linear regression

![image-11.png](attachment:image-11.png)

and another with the predictions for those still working.

13. Example as seen by linear regression

![image-12.png](attachment:image-12.png)

We put current bank balance on the horizontal axis, and the

14. Example as seen by linear regression

![image-13.png](attachment:image-13.png)

vertical axis is the predicted number of transactions.

15. Example as seen by linear regression

![image-14.png](attachment:image-14.png)

The left graph shows predictions from a model with no interactions. In that model we simply add up the effect of the retirement status, and current bank balance. The lack of interactions is reflected by both lines being parallel. That's probably unrealistic, but it's an assumption of the linear regression model.

16. Example as seen by linear regression

![image-15.png](attachment:image-15.png)

The graph on the right shows the predictions from a model that allows interactions, and the lines don't need to be parallel. Neural networks are a powerful

17. Interactions

![image-16.png](attachment:image-16.png)

modeling approach that accounts for interactions like this especially well. Deep learning, the focus of this course, is the use of especially powerful neural networks. Because deep learning models account for these types of interactions so well, they perform great on most prediction problems you've seen before. But their ability to capture extremely complex interactions also allow them to do amazing things with text, images, videos, audio, source code and almost anything else you could imagine doing data science with.

18. Course structure

![image-17.png](attachment:image-17.png)

The first two chapters of this course focus on conceptual knowledge about deep learning. This part will be hard, but it will prepare you to debug and tune deep learning models on conventional prediction problems, and it will lay the foundation for progressing towards those new and exciting applications. You'll see this pay off in the third and fourth chapter.

19. Build and tune deep learning models using keras

![image-18.png](attachment:image-18.png)

You will write code that looks like this, to build and tune deep learning models using keras, to solve many of the same modeling problems you might have previously solved with scikit-learn. As a start to how deep learning models capture interactions and achieve these amazing results, we'll modify the diagram you saw a moment ago.

20. Deep learning models capture interactions

![image-19.png](attachment:image-19.png)

Here there is an interaction between

21. Deep learning models capture interactions

![image-20.png](attachment:image-20.png)

retirement status and bank balance. Instead of having them separately affecting the outcome, we calculate a function of these variables that accounts for their interaction, and use that to predict the outcome.

22. Deep learning models capture interactions

![image-21.png](attachment:image-21.png)

Even this graphic oversimplifies reality, where most things interact with each in some way, and real neural network models account for far more interactions. So the diagram for a simple neural network looks like this.

23. Interactions in neural network

![image-22.png](attachment:image-22.png)

On the far left, we have something called an input layer. This represents our predictive features like age or income.

24. Interactions in neural network

![image-23.png](attachment:image-23.png)

On the far right we have the output layer. The prediction from our model, in this case, the predicted number of transactions. All layers that are not the input or output layers

25. Interactions in neural network

![image-24.png](attachment:image-24.png)

are called hidden layers. They are called hidden layers because, while the inputs and outputs correspond to visible things that happened in the world, and they can be stored as data, the values in the hidden layer aren't something we have data about, or anything we observe directly from the world. Nevertheless, each dot, called a node, in the hidden layer, represents an aggregation of information

26. Interactions in neural network

![image-25.png](attachment:image-25.png)

from our input data, and each node

27. Interactions in neural network

![image-26.png](attachment:image-26.png)

adds to the model's ability to capture interactions. So the more nodes we have, the more interactions we can capture.

28. Let's practice!

**Comparing neural network models to classical regression models**

Which of the models in the diagrams has greater ability to account for interactions?

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

Correct! Model 2 has more nodes in the hidden layer, and therefore, greater ability to capture interactions.

### 1 02 Forward propagation

1. Forward propagation

We’ll start by showing how neural networks use data to make predictions. This is called the forward propagation algorithm.

2. Bank transactions example

![image.png](attachment:image.png)

Let's revisit our example predicting how many transactions a user will make at our bank. For simplicity, we'll make predictions based on only the number of children and number of existing accounts.

3. Forward propagation

![image-2.png](attachment:image-2.png)

This graph shows a customer with

4. Forward propagation

![image-3.png](attachment:image-3.png)

two children and

5. Forward propagation

![image-4.png](attachment:image-4.png)

three accounts. The forward-propagation algorithm will pass this information through the network to make a prediction in the output layer.

6. Forward propagation

![image-5.png](attachment:image-5.png)

Lines connect the inputs to the hidden layer.

7. Forward propagation

![image-6.png](attachment:image-6.png)

Each line has a weight indicating how strongly that input effects the hidden node that the line ends at. These are the first set of weights. We have one weight from the

8. Forward propagation

![image-7.png](attachment:image-7.png)

top input into the top node of the layer,

9. Forward propagation

![image-8.png](attachment:image-8.png)

and one weight from the bottom input to the top node of the hidden layer. These weights are the parameters we train or change when we fit a neural network to data, so these weights will be a focus throughout this course. To make predictions for the top node of the hidden layer, we take the value of each node in the input layer, multiply it by the weight that ends at that node, and then sum up all the values. In this case, we get (2 times 1) plus (3 times 1), which is

10. Forward propagation

![image-9.png](attachment:image-9.png)

5.Now do the same to fill in the value of this node on the bottom. That is

11. Forward propagation

![image-10.png](attachment:image-10.png)

(two times (minus one))

12. Forward propagation

![image-11.png](attachment:image-11.png)

plus (three times one).

13. Forward propagation

![image-12.png](attachment:image-12.png)

That's one. Finally, repeat this process

14. Forward propagation

![image-13.png](attachment:image-13.png)

for the next layer, which is the output layer. That is

15. Forward propagation

![image-22.png](attachment:image-22.png)

(five times two) plus

16. Forward propagation

![image-21.png](attachment:image-21.png)

(one times -1). That gives an output

17. Forward propagation

![image-20.png](attachment:image-20.png)

of 9. We predicted nine transactions. That’s forward-propagation. We moved from

18. Forward propagation

![image-19.png](attachment:image-19.png)

the inputs on the left, to

19. Forward propagation

![image-18.png](attachment:image-18.png)

the hidden layer in the middle, and then from the hidden layers

20. Forward propagation

![image-17.png](attachment:image-17.png)

to the output on the right. We always

21. Forward propagation

![image-16.png](attachment:image-16.png)

use that same multiply then add process. If you're familiar with vector algebra or linear algebra, that operation is a dot product. If you don't know about dot products, that's fine too. That was forward propagation for a single data point. In general, we do forward propagation for one data point at a time. The value in that last layer is the model's prediction for that data point.

22. Forward propagation code

![image-15.png](attachment:image-15.png)

Let's see the code for this. We import Numpy for some of the mathematical operations. We've stored the input data as an array. We then have weights into each node in the hidden layer and to the output. We store the weights going into each node as an array, and we use a dictionary to store those arrays. Let’s start forward propagating. We fill in the top hidden node here, which is called node zero. We multiply the inputs by the weights for that node, and then sum both of those terms together. Notice that we had two weights for node_0. That matches the two items in the array it is multiplied by, which is the input_data. These get converted to a single number by the sum function at the end of the line. We then do the same thing for the bottom node of the hidden layer, which is called node 1.Now, both node zero and node one have numeric values.

23. Forward propagation code

![image-14.png](attachment:image-14.png)

To simplify multiplication, we put those in an array here. If we print out the array, we confirm that those are the values from the hidden layer you saw a moment ago. It can also be instructive to verify this by hand with pen and paper. To get the output, we multiply the values in the hidden layer by the weights for the output. Summing those together gives us 10 minus 1, which is 9.

24. Let's practice!

In the exercises, you'll practice performing forward propagation in small neural networks.

**Exercise**

**Coding the forward propagation algorithm**

In this exercise, you'll write code to do forward propagation (prediction) for your first neural network:

![image.png](attachment:image.png)

Each data point is a customer. The first input is how many accounts they have, and the second input is how many children they have. The model will predict how many transactions the user makes in the next year. You will use this data throughout the first 2 chapters of this course.

The input data has been pre-loaded as input_data, and the weights are available in a dictionary called weights. The array of weights for the first node in the hidden layer are in weights['node_0'], and the array of weights for the second node in the hidden layer are in weights['node_1'].

The weights feeding into the output node are available in weights['output'].

NumPy will be pre-imported for you as np in all exercises.

**Instructions**

- Calculate the value in node 0 by multiplying input_data by its weights weights['node_0'] and computing their sum. This is the 1st node in the hidden layer.
- Calculate the value in node 1 using input_data and weights['node_1']. This is the 2nd node in the hidden layer.
- Put the hidden layer values into an array. This has been done for you.
- Generate the prediction by multiplying hidden_layer_outputs by weights['output'] and computing their sum.
- Hit 'Submit Answer' to print the output!

In [6]:
import numpy as np

In [9]:
input_data = np.array([3, 5])
input_data

array([3, 5])

In [12]:
weights = {'node_0': np.array([2, 4]), 
           'node_1': np.array([ 4, -5]), 
           'output': np.array([2, 7])}

weights

{'node_0': array([2, 4]), 'node_1': array([ 4, -5]), 'output': array([2, 7])}

In [13]:
# Calculate node 0 value: node_0_value
node_0_value = (input_data * weights['node_0']).sum()

# Calculate node 1 value: node_1_value
node_1_value = (input_data * weights['node_1']).sum()

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_value, node_1_value])

# Calculate output: output
output = (hidden_layer_outputs * weights['output']).sum()

# Print output
print(output)

-39


Wonderful work! It looks like the network generated a prediction of -39.

### 1 03 Activation functions

1. Activation functions

But creating this multiply-add-process is only half the story for hidden layers. For neural networks to achieve their maximum predictive power, we must apply something called an activation function in the hidden layers.

2. Linear vs. non-linear Functions

![image.png](attachment:image.png),

An activation function allows the model to capture non-linearities. Non-linearities, as shown on the right here, capture patterns like how going from no children to one child may impact your banking transactions differently than going from three children to four. We have examples of linear functions, straight lines on the left, and non-linear functions on the right. If the relationships in the data aren’t straight-line relationships, we will need an activation function that captures non-linearities.

3. Activation functions

![image-2.png](attachment:image-2.png)

An activation function is something applied to the value coming into a node, which then transforms it into the value stored in that node, or the node output.

4. Improving our neural network

![image-3.png](attachment:image-3.png)

Let's go back to the previous diagram. The top hidden node previously had a value of 5. For a long time, an s-shaped function called tanh was a popular activation function.

5. Activation functions

![image-4.png](attachment:image-4.png)

If we used the tanh activation function, this node's value would be tanh(5), which is very close to 1.Today, the standard in both industry and research applications is something called

6. ReLU (Rectified Linear Activation)

![image-5.png](attachment:image-5.png)

the ReLU or rectified linear activation function. That's depicted here. Though it has two linear pieces, it's surprisingly powerful when composed together through multiple successive hidden layers, which you will see soon. The code that incorporates activation functions

7. Activation functions

![image-6.png](attachment:image-6.png)

is shown here. It is the same as the code you saw previously, but we've distinguished the input from the output in each node, which is shown in these lines and then again here And we've applied the tanh function to convert the input to the output. That gives us a prediction of 1-point-2 transactions.

8. Let's practice!

In the exercise, you will use the Rectified Linear Activation function, or ReLU, in your network.

## <a id="2"></a>
<font color="lightseagreen" size=+2.5><b>2. Optimizing a neural network with backward propagation</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

Learn how to optimize the predictions generated by your neural networks. You'll use a method called backward propagation, which is one of the most important techniques in deep learning. Understanding how it works will give you a strong foundation to build on in the second half of the course.

## <a id="3"></a>
<font color="lightseagreen" size=+2.5><b>3. Building deep learning models with keras</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

In this chapter, you'll use the Keras library to build deep learning models for both regression and classification. You'll learn about the Specify-Compile-Fit workflow that you can use to make predictions, and by the end of the chapter, you'll have all the tools necessary to build deep neural networks.

## <a id="4"></a>
<font color="lightseagreen" size=+2.5><b>4. Fine-tuning keras models</b></font>

<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Table of Contents</a>

Learn how to optimize your deep learning models in Keras. Start by learning how to validate your models, then understand the concept of model capacity, and finally, experiment with wider and deeper networks.

In [None]:
print