# **Introduction to Deep Learning in Python**

# Course Description

Deep learning is the machine learning technique behind the most exciting capabilities in diverse areas like robotics, natural language processing, image recognition, and artificial intelligence, including the famous AlphaGo. In this course, you'll gain hands-on, practical knowledge of how to use deep learning with Keras 2.0, the latest version of a cutting-edge library for deep learning in Python.

Chapter 1: Basics of deep learning and neural networks

In this chapter, you'll become familiar with the fundamental concepts and terminology used in deep learning, and understand why deep learning techniques are so powerful today. You'll build simple neural networks and generate predictions with them.

# Introduction to deep learning

# Comparing neural network models to classical regression models

# Forward propagation

# Coding the forward propagation algorithm
In this exercise, you'll write code to do forward propagation (prediction) for your first neural network:

Ch1Ex4

Each data point is a customer. The first input is how many accounts they have, and the second input is how many children they have. The model will predict how many transactions the user makes in the next year. You will use this data throughout the first 2 chapters of this course.

The input data has been pre-loaded as input_data, and the weights are available in a dictionary called weights. The array of weights for the first node in the hidden layer are in weights['node_0'], and the array of weights for the second node in the hidden layer are in weights['node_1'].

The weights feeding into the output node are available in weights['output'].

NumPy will be pre-imported for you as np in all exercises.

## Instructions

Calculate the value in node 0 by multiplying input_data by its weights weights['node_0'] and computing their sum. This is the 1st node in the hidden layer.
Calculate the value in node 1 using input_data and weights['node_1']. This is the 2nd node in the hidden layer.
Put the hidden layer values into an array. This has been done for you.
Generate the prediction by multiplying hidden_layer_outputs by weights['output'] and computing their sum.
Hit 'Submit Answer' to print the output!


In [None]:
# Calculate node 0 value: node_0_value
node_0_value = (input_data * weights['node_0']).sum()

# Calculate node 1 value: node_1_value
node_1_value = (input_data * weights['node_1']).sum()

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_value, node_1_value])

# Calculate output: output
output = (hidden_layer_outputs * weights['output']).sum()

# Print output
print(output)

In [None]:
<script.py> output:
    -39

# Activation functions

# The Rectified Linear Activation Function
As Dan explained to you in the video, an "activation function" is a function applied at each node. It converts the node's input into some output.

The rectified linear activation function (called ReLU) has been shown to lead to very high-performance networks. This function takes a single number as an input, returning 0 if the input is negative, and the input if the input is positive.

Here are some examples:
relu(3) = 3
relu(-3) = 0

# Instructions

Fill in the definition of the relu() function:
Use the max() function to calculate the value for the output of relu().
Apply the relu() function to node_0_input to calculate node_0_output.
Apply the relu() function to node_1_input to calculate node_1_output.

In [None]:
def relu(input):
    '''Define your relu activation function here'''
    # Calculate the value for the output of the relu function: output
    output = max(0, input)

    # Return the value just calculated
    return(output)

# Calculate node 0 value: node_0_output
node_0_input = (input_data * weights['node_0']).sum()
node_0_output = relu(node_0_input)

# Calculate node 1 value: node_1_output
node_1_input = (input_data * weights['node_1']).sum()
node_1_output = relu(node_1_input)

# Put node values into array: hidden_layer_outputs
hidden_layer_outputs = np.array([node_0_output, node_1_output])

# Calculate model output (do not apply relu)
model_output = (hidden_layer_outputs * weights['output']).sum()

# Print model output
print(model_output)

In [None]:
<script.py> output:
    52

Great work! You predicted 52 transactions. Without this activation function, you would have predicted a negative number! The real power of activation functions will come soon when you start tuning model weights.

# Applying the network to many observations/rows of data
You'll now define a function called predict_with_network() which will generate predictions for multiple data observations, which are pre-loaded as input_data. As before, weights are also pre-loaded. In addition, the relu() function you defined in the previous exercise has been pre-loaded.

## Instructions

Define a function called predict_with_network() that accepts two arguments - input_data_row and weights - and returns a prediction from the network as the output.
Calculate the input and output values for each node, storing them as: node_0_input, node_0_output, node_1_input, and node_1_output.
To calculate the input value of a node, multiply the relevant arrays together and compute their sum.
To calculate the output value of a node, apply the relu() function to the input value of the node.
Calculate the model output by calculating input_to_final_layer and model_output in the same way you calculated the input and output values for the nodes.
Use a for loop to iterate over input_data:
Use your predict_with_network() to generate predictions for each row of the input_data - input_data_row. Append each prediction to results.


In [None]:
# Define predict_with_network()
def predict_with_network(input_data_row, weights):

    # Calculate node 0 value
    node_0_input = (input_data_row * weights["node_0"]).sum()
    node_0_output = relu(node_0_input)

    # Calculate node 1 value
    node_1_input = (input_data_row * weights["node_1"]).sum()
    node_1_output = relu(node_1_input)

    # Put node values into array: hidden_layer_outputs
    hidden_layer_outputs = np.array([node_0_output, node_1_output])

    # Calculate model output
    input_to_final_layer = (hidden_layer_outputs * weights["output"]).sum()
    model_output = relu(input_to_final_layer)

    # Return model output
    return(model_output)

# Create empty list to store prediction results
results = []
for input_data_row in input_data:
    # Append prediction to results
    results.append(predict_with_network(input_data_row, weights))

# Print results
print(results)

In [None]:
<script.py> output:
    [52, 63, 0, 148]

# **Deeper networks**
# Multi-layer neural networks
In this exercise, you'll write code to do forward propagation for a neural network with 2 hidden layers. Each hidden layer has two nodes. The input data has been preloaded as input_data. The nodes in the first hidden layer are called node_0_0 and node_0_1. Their weights are pre-loaded as weights['node_0_0'] and weights['node_0_1'] respectively.

The nodes in the second hidden layer are called node_1_0 and node_1_1. Their weights are pre-loaded as weights['node_1_0'] and weights['node_1_1'] respectively.

We then create a model output from the hidden nodes using weights pre-loaded as weights['output'].

Ch1Ex10

# Instructions

Calculate node_0_0_input using its weights weights['node_0_0'] and the given input_data. Then apply the relu() function to get node_0_0_output.
Do the same as above for node_0_1_input to get node_0_1_output.
Calculate node_1_0_input using its weights weights['node_1_0'] and the outputs from the first hidden layer - hidden_0_outputs. Then apply the relu() function to get node_1_0_output.
Do the same as above for node_1_1_input to get node_1_1_output.
Calculate model_output using its weights weights['output'] and the outputs from the second hidden layer hidden_1_outputs array. Do not apply the relu() function to this output.

In [None]:
def predict_with_network(input_data):
    # Calculate node 0 in the first hidden layer
    node_0_0_input = (input_data * weights['node_0_0']).sum()
    node_0_0_output = relu(node_0_0_input)

    # Calculate node 1 in the first hidden layer
    node_0_1_input = (input_data * weights["node_0_1"]).sum()
    node_0_1_output = relu(node_0_1_input)

    # Put node values into array: hidden_0_outputs
    hidden_0_outputs = np.array([node_0_0_output, node_0_1_output])

    # Calculate node 0 in the second hidden layer
    node_1_0_input = (hidden_0_outputs * weights['node_1_0']).sum()
    node_1_0_output = relu(node_1_0_input)

    # Calculate node 1 in the second hidden layer
    node_1_1_input = (hidden_0_outputs * weights["node_1_1"]).sum()
    node_1_1_output = relu(node_1_1_input)

    # Put node values into array: hidden_1_outputs
    hidden_1_outputs = np.array([node_1_0_output, node_1_1_output])

    # Calculate model output: model_output
    model_output = (hidden_1_outputs * weights["output"]).sum()

    # Return model_output
    return(model_output)

output = predict_with_network(input_data)
print(output)

# Representations are learned
How are the weights that determine the features/interactions in Neural Networks created?


Answer the question

Possible Answers
Select one answer

A user chooses them when creating the model.
PRESS
1

### The model training process sets them to optimize predictive accuracy.
PRESS **bold text** **bold text**
2 **bold text**

The weights are random numbers.


# **Chapter 2: Optimizing a neural network with backward propagation**


Learn how to optimize the predictions generated by your neural networks. You'll use a method called backward propagation, which is one of the most important techniques in deep learning. Understanding how it works will give you a strong foundation to build on in the second half of the course.

# The need for optimization
# Coding how weight changes affect accuracy
Now you'll get to change weights in a real network and see how they affect model accuracy!

Have a look at the following neural network: Ch2Ex4

Its weights have been pre-loaded as weights_0. Your task in this exercise is to update a single weight in weights_0 to create weights_1, which gives a perfect prediction (in which the predicted value is equal to target_actual: 3).

Use a pen and paper if necessary to experiment with different combinations. You'll use the predict_with_network() function, which takes an array of data as the first argument, and weights as the second argument.

## Instructions

Create a dictionary of weights called weights_1 where you have changed 1 weight from weights_0 (You only need to make 1 edit to weights_0 to generate the perfect prediction).
Obtain predictions with the new weights using the predict_with_network() function with input_data and weights_1.
Calculate the error for the new weights by subtracting target_actual from model_output_1.
Hit 'Submit Answer' to see how the errors compare!


In [None]:
# The data point you will make a prediction for
input_data = np.array([0, 3])

# Sample weights
weights_0 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 1]
            }

# The actual target value, used to calculate the error
target_actual = 3

# Make prediction using original weights
model_output_0 = predict_with_network(input_data, weights_0)

# Calculate error: error_0
error_0 = model_output_0 - target_actual

# Create weights that cause the network to make perfect prediction (3): weights_1
weights_1 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 0]
            }

# Make prediction using new weights: model_output_1
model_output_1 = predict_with_network(input_data, weights_1)

# Calculate error: error_1
error_1 = model_output_1 - target_actual

# Print error_0 and error_1

print(error_0)
print(error_1)

In [None]:
<script.py> output:
    6
    0

# Scaling up to multiple data points
You've seen how different weights will have different accuracies on a single prediction. But usually, you'll want to measure model accuracy on many points. You'll now write code to compare model accuracies for two different sets of weights, which have been stored as weights_0 and weights_1.

input_data is a list of arrays. Each item in that list contains the data to make a single prediction. target_actuals is a list of numbers. Each item in that list is the actual value we are trying to predict.

In this exercise, you'll use the mean_squared_error() function from sklearn.metrics. It takes the true values and the predicted values as arguments.

You'll also use the preloaded predict_with_network() function, which takes an array of data as the first argument, and weights as the second argument.

## Instructions

Import mean_squared_error from sklearn.metrics.
Using a for loop to iterate over each row of input_data:
Make predictions for each row with weights_0 using the predict_with_network() function and append it to model_output_0.
Do the same for weights_1, appending the predictions to model_output_1.
Calculate the mean squared error of model_output_0 and then model_output_1 using the mean_squared_error() function. The first argument should be the actual values (target_actuals), and the second argument should be the predicted values (model_output_0 or model_output_1).


In [None]:
from sklearn.metrics import mean_squared_error

# Create model_output_0
model_output_0 = []
# Create model_output_1
model_output_1 = []

# Loop over input_data
for row in input_data:
    # Append prediction to model_output_0
    model_output_0.append(predict_with_network(row, weights_0))

    # Append prediction to model_output_1
    model_output_1.append(predict_with_network(row, weights_1))

# Calculate the mean squared error for model_output_0: mse_0
mse_0 = mean_squared_error(target_actuals, model_output_0)

# Calculate the mean squared error for model_output_1: mse_1
mse_1 = mean_squared_error(target_actuals, model_output_1)

# Print mse_0 and mse_1
print("Mean squared error with weights_0: %f" %mse_0)
print("Mean squared error with weights_1: %f" %mse_1)

In [None]:
<script.py> output:
    Mean squared error with weights_0: 37.500000
    Mean squared error with weights_1: 49.890625

# **Gradient descent**
# Calculating slopes
You're now going to practice calculating slopes. When plotting the mean-squared error loss function against predictions, the slope is 2 * x * (xb-y), or 2 * input_data * error. Note that x and b may have multiple numbers (x is a vector for each data point, and b is a vector). In this case, the output will also be a vector, which is exactly what you want.

You're ready to write the code to calculate this slope while using a single data point. You'll use pre-defined weights called weights as well as data for a single point called input_data. The actual value of the target you want to predict is stored in target.

## Instructions

Calculate the predictions, preds, by multiplying weights by the input_data and computing their sum.
Calculate the error, which is preds minus target. Notice that this error corresponds to xb-y in the gradient expression.
Calculate the slope of the loss function with respect to the prediction. To do this, you need to take the product of input_data and error and multiply that by 2.

In [None]:
# Calculate the predictions: preds
preds = (input_data * weights).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = 2 * input_data * error

# Print the slope
print(slope)

In [None]:
<script.py> output:
    [14 28 42]

# Improving model weights
Hurray! You've just calculated the slopes you need. Now it's time to use those slopes to improve your model. If you add the slopes to your weights, you will move in the right direction. However, it's possible to move too far in that direction. So you will want to take a small step in that direction first, using a lower learning rate, and verify that the model is improving.

The weights have been pre-loaded as weights, the actual value of the target as target, and the input data as input_data. The predictions from the initial weights are stored as preds.

## Instructions

Set the learning rate to be 0.01 and calculate the error from the original predictions. This has been done for you.
Calculate the updated weights by subtracting the product of learning_rate and slope from weights.
Calculate the updated predictions by multiplying weights_updated with input_data and computing their sum.
Calculate the error for the new predictions. Store the result as error_updated.
Hit 'Submit Answer' to compare the updated error to the original!

In [None]:
# Set the learning rate: learning_rate
learning_rate = 0.01

# Calculate the predictions: preds
preds = (weights * input_data).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = 2 * input_data * error

# Update the weights: weights_updated
weights_updated = weights - learning_rate * slope

# Get updated predictions: preds_updated
preds_updated = (weights_updated * input_data).sum()

# Calculate updated error: error_updated
error_updated = preds_updated - target

# Print the original error
print(error)

# Print the updated error
print(error_updated)

In [None]:
<script.py> output:
    7
    5.04

# Making multiple updates to weights
You're now going to make multiple updates so you can dramatically improve your model weights, and see how the predictions improve with each update.

To keep your code clean, there is a pre-loaded get_slope() function that takes input_data, target, and weights as arguments. There is also a get_mse() function that takes the same arguments. The input_data, target, and weights have been pre-loaded.

This network does not have any hidden layers, and it goes directly from the input (with 3 nodes) to an output node. Note that weights is a single array.

We have also pre-loaded matplotlib.pyplot, and the error history will be plotted after you have done your gradient descent steps.

## Instructions

Using a for loop to iteratively update weights:
Calculate the slope using the get_slope() function.
Update the weights using a learning rate of 0.01.
Calculate the mean squared error (mse) with the updated weights using the get_mse() function.
Append mse to mse_hist.
Hit 'Submit Answer' to visualize mse_hist. What trend do you notice?

In [None]:
n_updates = 20
mse_hist = []

# Iterate over the number of updates
for i in range(n_updates):
    # Calculate the slope: slope
    slope = get_slope(input_data, target, weights)

    # Update the weights: weights
    weights = weights - 0.01 * slope

    # Calculate mse with new weights: mse
    mse = get_mse(input_data, target, weights)

    # Append the mse to mse_hist
    mse_hist.append(mse)


# Plot the mse history
plt.plot(mse_hist)
plt.xlabel('Iterations')
plt.ylabel('Mean Squared Error')
plt.show()

# **Backpropagation**

The relationship between forward and backward propagation
If you have gone through 4 iterations of calculating slopes (using backward propagation) and then updated weights, how many times must you have done forward propagation?


Answer the question
0XP
Possible Answers
Select one answer

0



1


### **4**


8


Thinking about backward propagation
If your predictions were all exactly right, and your errors were all exactly 0, the slope of the loss function with respect to your predictions would also be 0. In that circumstance, which of the following statements would be correct?


Answer the question
0XP
Possible Answers
Select one answer

### **The updates to all weights in the network would also be 0.**


The updates to all weights in the network would be dependent on the activation functions.


The updates to all weights in the network would be proportional to values from the input data.

# **Chapter 2: Optimizing a neural network with backward propagation**


Learn how to optimize the predictions generated by your neural networks. You'll use a method called backward propagation, which is one of the most important techniques in deep learning. Understanding how it works will give you a strong foundation to build on in the second half of the course.



# The need for optimization
# Coding how weight changes affect accuracy
Now you'll get to change weights in a real network and see how they affect model accuracy!

Have a look at the following neural network: Ch2Ex4

Its weights have been pre-loaded as weights_0. Your task in this exercise is to update a single weight in weights_0 to create weights_1, which gives a perfect prediction (in which the predicted value is equal to target_actual: 3).

Use a pen and paper if necessary to experiment with different combinations. You'll use the predict_with_network() function, which takes an array of data as the first argument, and weights as the second argument.

## Instructions

Create a dictionary of weights called weights_1 where you have changed 1 weight from weights_0 (You only need to make 1 edit to weights_0 to generate the perfect prediction).
Obtain predictions with the new weights using the predict_with_network() function with input_data and weights_1.
Calculate the error for the new weights by subtracting target_actual from model_output_1.
Hit 'Submit Answer' to see how the errors compare!

In [None]:
# The data point you will make a prediction for
input_data = np.array([0, 3])

# Sample weights
weights_0 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 1]
            }

# The actual target value, used to calculate the error
target_actual = 3

# Make prediction using original weights
model_output_0 = predict_with_network(input_data, weights_0)

# Calculate error: error_0
error_0 = model_output_0 - target_actual

# Create weights that cause the network to make perfect prediction (3): weights_1
weights_1 = {'node_0': [2, 1],
             'node_1': [1, 2],
             'output': [1, 0]
            }

# Make prediction using new weights: model_output_1
model_output_1 = predict_with_network(input_data, weights_1)

# Calculate error: error_1
error_1 = model_output_1 - target_actual

# Print error_0 and error_1

print(error_0)
print(error_1)

In [None]:

<script.py> output:
    6
    0

# Scaling up to multiple data points
You've seen how different weights will have different accuracies on a single prediction. But usually, you'll want to measure model accuracy on many points. You'll now write code to compare model accuracies for two different sets of weights, which have been stored as weights_0 and weights_1.

input_data is a list of arrays. Each item in that list contains the data to make a single prediction. target_actuals is a list of numbers. Each item in that list is the actual value we are trying to predict.

In this exercise, you'll use the mean_squared_error() function from sklearn.metrics. It takes the true values and the predicted values as arguments.

You'll also use the preloaded predict_with_network() function, which takes an array of data as the first argument, and weights as the second argument.

## Instructions

Import mean_squared_error from sklearn.metrics.
Using a for loop to iterate over each row of input_data:
Make predictions for each row with weights_0 using the predict_with_network() function and append it to model_output_0.
Do the same for weights_1, appending the predictions to model_output_1.
Calculate the mean squared error of model_output_0 and then model_output_1 using the mean_squared_error() function. The first argument should be the actual values (target_actuals), and the second argument should be the predicted values (model_output_0 or model_output_1).

In [None]:
from sklearn.metrics import mean_squared_error

# Create model_output_0
model_output_0 = []
# Create model_output_1
model_output_1 = []

# Loop over input_data
for row in input_data:
    # Append prediction to model_output_0
    model_output_0.append(predict_with_network(row, weights_0))

    # Append prediction to model_output_1
    model_output_1.append(predict_with_network(row, weights_1))

# Calculate the mean squared error for model_output_0: mse_0
mse_0 = mean_squared_error(target_actuals, model_output_0)

# Calculate the mean squared error for model_output_1: mse_1
mse_1 = mean_squared_error(target_actuals, model_output_1)

# Print mse_0 and mse_1
print("Mean squared error with weights_0: %f" %mse_0)
print("Mean squared error with weights_1: %f" %mse_1)

In [None]:

<script.py> output:
    Mean squared error with weights_0: 37.500000
    Mean squared error with weights_1: 49.890625

# **Gradient descent**
# Calculating slopes
You're now going to practice calculating slopes. When plotting the mean-squared error loss function against predictions, the slope is 2 * x * (xb-y), or 2 * input_data * error. Note that x and b may have multiple numbers (x is a vector for each data point, and b is a vector). In this case, the output will also be a vector, which is exactly what you want.

You're ready to write the code to calculate this slope while using a single data point. You'll use pre-defined weights called weights as well as data for a single point called input_data. The actual value of the target you want to predict is stored in target.

## Instructions

Calculate the predictions, preds, by multiplying weights by the input_data and computing their sum.
Calculate the error, which is preds minus target. Notice that this error corresponds to xb-y in the gradient expression.
Calculate the slope of the loss function with respect to the prediction. To do this, you need to take the product of input_data and error and multiply that by 2.

In [None]:
# Calculate the predictions: preds
preds = (input_data * weights).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = 2 * input_data * error

# Print the slope
print(slope)

In [None]:
<script.py> output:
    [14 28 42]

# Improving model weights
Hurray! You've just calculated the slopes you need. Now it's time to use those slopes to improve your model. If you add the slopes to your weights, you will move in the right direction. However, it's possible to move too far in that direction. So you will want to take a small step in that direction first, using a lower learning rate, and verify that the model is improving.

The weights have been pre-loaded as weights, the actual value of the target as target, and the input data as input_data. The predictions from the initial weights are stored as preds.

## Instructions

Set the learning rate to be 0.01 and calculate the error from the original predictions. This has been done for you.
Calculate the updated weights by subtracting the product of learning_rate and slope from weights.
Calculate the updated predictions by multiplying weights_updated with input_data and computing their sum.
Calculate the error for the new predictions. Store the result as error_updated.
Hit 'Submit Answer' to compare the updated error to the original!

In [None]:
# Set the learning rate: learning_rate
learning_rate = 0.01

# Calculate the predictions: preds
preds = (weights * input_data).sum()

# Calculate the error: error
error = preds - target

# Calculate the slope: slope
slope = 2 * input_data * error

# Update the weights: weights_updated
weights_updated = weights - learning_rate * slope

# Get updated predictions: preds_updated
preds_updated = (weights_updated * input_data).sum()

# Calculate updated error: error_updated
error_updated = preds_updated - target

# Print the original error
print(error)

# Print the updated error
print(error_updated)

In [None]:
<script.py> output:
    7
    5.04

# Making multiple updates to weights
You're now going to make multiple updates so you can dramatically improve your model weights, and see how the predictions improve with each update.

To keep your code clean, there is a pre-loaded get_slope() function that takes input_data, target, and weights as arguments. There is also a get_mse() function that takes the same arguments. The input_data, target, and weights have been pre-loaded.

This network does not have any hidden layers, and it goes directly from the input (with 3 nodes) to an output node. Note that weights is a single array.

We have also pre-loaded matplotlib.pyplot, and the error history will be plotted after you have done your gradient descent steps.

## Instructions

Using a for loop to iteratively update weights:
Calculate the slope using the get_slope() function.
Update the weights using a learning rate of 0.01.
Calculate the mean squared error (mse) with the updated weights using the get_mse() function.
Append mse to mse_hist.
Hit 'Submit Answer' to visualize mse_hist. What trend do you notice?

In [None]:
n_updates = 20
mse_hist = []

# Iterate over the number of updates
for i in range(n_updates):
    # Calculate the slope: slope
    slope = get_slope(input_data, target, weights)

    # Update the weights: weights
    weights = weights - 0.01 * slope

    # Calculate mse with new weights: mse
    mse = get_mse(input_data, target, weights)

    # Append the mse to mse_hist
    mse_hist.append(mse)


# Plot the mse history
plt.plot(mse_hist)
plt.xlabel('Iterations')
plt.ylabel('Mean Squared Error')
plt.show()

# **Backpropagation**
# **Backpropagation in practice**



# **Chapter 3: **Building deep learning models with keras

In this chapter, you'll use the Keras library to build deep learning models for both regression and classification. You'll learn about the Specify-Compile-Fit workflow that you can use to make predictions, and by the end of the chapter, you'll have all the tools necessary to build deep neural networks.

# Understanding your data
You will soon start building models in Keras to predict wages based on various professional and demographic factors. Before you start building a model, it's good to understand your data by performing some exploratory analysis.

The data is pre-loaded into a pandas DataFrame called df. Use the .head() and .describe() methods in the IPython Shell for a quick overview of the DataFrame.

The target variable you'll be predicting is wage_per_hour. Some of the predictor variables are binary indicators, where a value of 1 represents True, and 0 represents False.

Of the 9 predictor variables in the DataFrame, how many are binary indicators? The min and max values as shown by .describe() will be informative here. How many binary indicator predictors are there?

## Instructions

Possible answers


0

5

# **6**

# Specifying a model
Now you'll get to work with your first model in Keras, and will immediately be able to run more complex neural network models on larger datasets compared to the first two chapters.

To start, you'll take the skeleton of a neural network and add a hidden layer and an output layer. You'll then fit that model and see Keras do the optimization so your model continually gets better.

As a start, you'll predict workers wages based on characteristics like their industry, education and level of experience. You can find the dataset in a pandas DataFrame called df. For convenience, everything in df except for the target has been converted to a NumPy array called predictors. The target, wage_per_hour, is available as a NumPy array called target.

For all exercises in this chapter, we've imported the Sequential model constructor, the Dense layer constructor, and pandas.

## Instructions

Store the number of columns in the predictors data to n_cols. This has been done for you.
Start by creating a Sequential model called model.
Use the .add() method on model to add a Dense layer.
Add 50 units, specify activation='relu', and the input_shape parameter to be the tuple (n_cols,) which means it has n_cols items in each row of data, and any number of rows of data are acceptable as inputs.
Add another Dense layer. This should have 32 units and a 'relu' activation.
Finally, add an output layer, which is a Dense layer with a single node. Don't use any activation function here.

In [None]:
# Import necessary modules
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]

# Set up the model: model
model = Sequential()

# Add the first layer
model.add(Dense(50, activation="relu", input_shape=(n_cols,)))

# Add the second layer
model.add(Dense(32, activation = "relu", input_shape=(n_cols,)))

# Add the output layer
model.add(Dense(1))

# Compiling and fitting a model
# Compiling the model
You're now going to compile the model you specified earlier. To compile the model, you need to specify the optimizer and loss function to use. In the video, Dan mentioned that the Adam optimizer is an excellent choice. You can read more about it as well as other Keras optimizers here, and if you are really curious to learn more, you can read the original paper that introduced the Adam optimizer.

In this exercise, you'll use the Adam optimizer and the mean squared error loss function. Go for it!

## Instructions

Compile the model using model.compile(). Your optimizer should be 'adam' and the loss should be 'mean_squared_error'.


In [None]:
# Import necessary modules
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

# Specify the model
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the mode
model.compile(optimizer = "adam", loss = "mean_squared_error")
print("Loss function: " + model.loss)

In [None]:
<script.py> output:
    Loss function: mean_squared_error

# Fitting the model
You're at the most fun part. You'll now fit the model. Recall that the data to be used as predictive features is loaded in a NumPy array called predictors and the data to be predicted is stored in a NumPy array called target. Your model is pre-written and it has been compiled with the code from the previous exercise.

# Instructions

Fit the model. Remember that the first argument is the predictive features (predictors), and the data to be predicted (target) is the second argument.

In [None]:
# Import necessary modules
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

# Specify the model
n_cols = predictors.shape[1]
model = Sequential()
model.add(Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Fit the model
model.fit(predictors, target)

In [None]:
<script.py> output:

 1/17 [>.............................] - ETA: 5s - loss: 74.6956
17/17 [==============================] - 0s 1ms/step - loss: 30.7682

# **Classification models**
Understanding your classification data
Now you will start modeling with a new dataset for a classification problem. This data includes information about passengers on the Titanic. You will use predictors such as age, fare and where each passenger embarked from to predict who will survive. This data is from a tutorial on data science competitions. Look here for descriptions of the features.

The data is pre-loaded in a pandas DataFrame called df.

It's smart to review the maximum and minimum values of each variable to ensure the data isn't misformatted or corrupted. What was the maximum age of passengers on the Titanic? Use the .describe() method in the IPython Shell to answer this question.

Instructions
50 XP
Possible answers


29.699.

### **80.**

891.

It is not listed.


# Last steps in classification models
You'll now create a classification model using the titanic dataset, which has been pre-loaded into a DataFrame called df. You'll take information about the passengers and predict which ones survived.

The predictive variables are stored in a NumPy array predictors. The target to predict is in df.survived, though you'll have to manipulate it for Keras. The number of predictive features is stored in n_cols.

Here, you'll use the 'sgd' optimizer, which stands for Stochastic Gradient Descent. You'll learn more about this in the next chapter!

## Instructions

Convert df.survived to a categorical variable using the to_categorical() function.
Specify a Sequential model called model.
Add a Dense layer with 32 nodes. Use 'relu' as the activation and (n_cols,) as the input_shape.
Add the Dense output layer. Because there are two outcomes, it should have 2 units, and because it is a classification model, the activation should be 'softmax'.
Compile the model, using 'sgd' as the optimizer, 'categorical_crossentropy' as the loss function, and metrics=['accuracy'] to see the accuracy (what fraction of predictions were correct) at the end of each epoch.
Fit the model using the predictors and the target.


In [None]:
# Import necessary modules
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical

# Convert the target to categorical: target
target = to_categorical(df.survived)

# Set up the model
model = Sequential()

# Add the first layer
model.add(Dense(32, activation="relu", input_shape = (n_cols,) ))

# Add the output layer
model.add(Dense(2, activation="softmax"))

# Compile the model
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model.fit(predictors, target)

In [None]:

<script.py> output:

 1/28 [>.............................] - ETA: 7s - loss: 1.1229 - accuracy: 0.6250
28/28 [==============================] - 0s 1ms/step - loss: 2.4257 - accuracy: 0.6173

# **Using models**
# Making predictions
The trained network from your previous coding exercise is now stored as model. New data to make predictions is stored in a NumPy array as pred_data. Use model to make predictions on your new data.

In this exercise, your predictions will be probabilities, which is the most common way for data scientists to communicate their predictions to colleagues.

## Instructions

Create your predictions using the model's .predict() method on pred_data.
Use NumPy indexing to find the column corresponding to predicted probabilities of survival being True. This is the second column (index 1) of predictions. Store the result in predicted_prob_true and print it.

In [None]:
# Specify, compile, and fit the model
model = Sequential()
model.add(Dense(32, activation='relu', input_shape = (n_cols,)))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='sgd',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(predictors, target)

# Calculate predictions: predictions
predictions = model.predict(pred_data)

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = predictions[:,1]
# Print predicted_prob_true
print(predicted_prob_true)


In [None]:
<script.py> output:

 1/25 [>.............................] - ETA: 9s - loss: 2.3068 - accuracy: 0.5312
25/25 [==============================] - 0s 886us/step - loss: 2.5773 - accuracy: 0.5962
    [1.9726235e-02 7.7701166e-02 9.9880278e-01 6.9661826e-01 2.3009233e-02
     1.4385791e-02 3.2878593e-03 8.6308494e-02 9.8984838e-03 7.1594425e-02
     2.8858611e-02 2.4497489e-02 1.3393278e-02 8.1848919e-01 1.5841408e-02
     1.0602114e-02 4.5482241e-02 8.8932343e-02 2.3749503e-03 5.6128806e-01
     1.8803345e-02 2.7190614e-02 3.7797294e-03 3.8329318e-02 9.3755096e-01
     1.5262333e-02 8.7188773e-02 9.4810057e-01 1.6531866e-02 6.0499045e-03
     1.8589018e-01 7.9656804e-01 1.2890435e-02 4.3118924e-02 8.6674735e-02
     1.2338005e-01 6.0775645e-02 1.5670201e-02 6.9912039e-02 8.0010302e-02
     6.4597212e-02 1.6229436e-01 9.3298577e-02 6.9142068e-03 1.0172246e-01
     3.0521471e-03 2.7156094e-01 7.4712797e-03 8.9343682e-02 3.8207942e-01
     8.6054403e-01 3.5601308e-05 3.4184235e-01 4.1311824e-01 2.1742735e-02
     9.2756011e-02 7.5068629e-01 9.0262489e-03 8.6532019e-02 1.2890435e-02
     8.8150445e-03 1.0823880e-01 1.3477237e-02 3.0641690e-01 7.5383067e-02
     6.8508890e-03 5.8684189e-02 6.7337751e-02 1.8109571e-02 5.1130611e-01
     2.8882246e-02 2.7172768e-02 1.2526292e-02 1.7590256e-03 9.7573645e-02
     1.0808522e-01 7.7424705e-02 7.3382087e-02 1.5430615e-02 1.3379750e-02
     1.2039662e-01 1.1045987e-02 6.6549458e-02 3.4886107e-02 3.0437995e-02
     3.0781535e-02 5.0857931e-02 1.6302976e-01 6.1975192e-02 9.1626920e-02
     1.3591628e-02]

# *Understanding model optimization*

Diagnosing optimization problems
Which of the following could prevent a model from showing an improved loss in its first few epochs?


Answer the question

Possible Answers
Select one answer

Learning rate too low.


Learning rate too high.


Poor choice of activation function.


## **All of the above.**


# Changing optimization parameters
It's time to get your hands dirty with optimization. You'll now try optimizing a model at a very low learning rate, a very high learning rate, and a "just right" learning rate. You'll want to look at the results after running this exercise, remembering that a low value for the loss function is good.

For these exercises, we've pre-loaded the predictors and target values from your previous classification models (predicting who would survive on the Titanic). You'll want the optimization to start from scratch every time you change the learning rate, to give a fair comparison of how each learning rate did in your results. So we have created a function get_new_model() that creates an unoptimized model to optimize.

## Instructions

Import SGD from tensorflow.keras.optimizers.
Create a list of learning rates to try optimizing with called lr_to_test. The learning rates in it should be .000001, 0.01, and 1.
Using a for loop to iterate over lr_to_test:
Use the get_new_model() function to build a new, unoptimized model.
Create an optimizer called my_optimizer using the SGD() constructor with keyword argument lr=lr.
Compile your model. Set the optimizer parameter to be the SGD object you created above, and because this is a classification problem, use 'categorical_crossentropy' for the loss parameter.
Fit your model using the predictors and target.

In [None]:
# Import the SGD optimizer
from tensorflow.keras.optimizers import SGD

# Create list of learning rates: lr_to_test
lr_to_test = [.000001, 0.01, 1]

# Loop over learning rates
for lr in lr_to_test:
    print('\n\nTesting model with learning rate: %f\n'%lr )

    # Build new model to test, unaffected by previous models
    model = get_new_model()

    # Create SGD optimizer with specified learning rate: my_optimizer
    my_optimizer = SGD(lr=lr)

    # Compile the model
    model.compile(my_optimizer, loss='categorical_crossentropy')

    # Fit the model
    model.fit(predictors, target)

In [None]:
Testing model with learning rate: 0.000001


 1/28 [>.............................] - ETA: 7s - loss: 3.4494
28/28 [==============================] - 0s 1ms/step - loss: 2.6331


    Testing model with learning rate: 0.010000


 1/28 [>.............................] - ETA: 6s - loss: 2.8528
28/28 [==============================] - 0s 965us/step - loss: 1.8277


    Testing model with learning rate: 1.000000


 1/28 [>.............................] - ETA: 5s - loss: 4.4543
28/28 [==============================] - 0s 712us/step - loss: 5205835.0000

# **Model validation**
Evaluating model accuracy on validation *dataset*
Now it's your turn to monitor model accuracy with a validation data set. A model definition has been provided as model. Your job is to add the code to compile it and then fit it. You'll check the validation score in each epoch.

## Instructions

Compile your model using 'adam' as the optimizer and 'categorical_crossentropy' for the loss. To see what fraction of predictions are correct (the accuracy) in each epoch, specify the additional keyword argument metrics=['accuracy'] in model.compile().
Fit the model using the predictors and target. Create a validation split of 30% (or 0.3). This will be reported in each epoch.


In [None]:
# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer = 'adam', loss ='categorical_crossentropy', metrics=['accuracy'] )

# Fit the model
hist = model.fit(predictors, target, validation_split =0.3)

In [None]:
<script.py> output:

 1/20 [>.............................] - ETA: 6s - loss: 0.8835 - accuracy: 0.5000
20/20 [==============================] - 1s 12ms/step - loss: 0.7368 - accuracy: 0.6453 - val_loss: 0.5494 - val_accuracy: 0.7388

# Early stopping: Optimizing the optimization
Now that you know how to monitor your model performance throughout optimization, you can use early stopping to stop optimization when it isn't helping any more. Since the optimization stops automatically when it isn't helping, you can also set a high value for epochs in your call to .fit(), as Dan showed in the video.

The model you'll optimize has been specified as model. As before, the data is pre-loaded as predictors and target.

## Instructions

Import EarlyStopping from tensorflow.keras.callbacks.
Compile the model, once again using 'adam' as the optimizer, 'categorical_crossentropy' as the loss function, and metrics=['accuracy'] to see the accuracy at each epoch.
Create an EarlyStopping object called early_stopping_monitor. Stop optimization when the validation loss hasn't improved for 2 epochs by specifying the patience parameter of EarlyStopping() to be 2.
Fit the model using the predictors and target. Specify the number of epochs to be 30 and use a validation split of 0.3. In addition, pass [early_stopping_monitor] to the callbacks parameter.


In [None]:
# Import EarlyStoppin
from tensorflow.keras.callbacks import EarlyStopping

# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]
input_shape = (n_cols,)

# Specify the model
model = Sequential()
model.add(Dense(100, activation='relu', input_shape = input_shape))
model.add(Dense(100, activation='relu'))
model.add(Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer = 'adam', loss ='categorical_crossentropy', metrics=['accuracy'])
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience = 2 )

# Fit the model
model.fit(predictors, target, validation_split =0.3, epochs = 30, callbacks = [early_stopping_monitor])

In [None]:
<script.py> output:
    Epoch 1/30

 1/20 [>.............................] - ETA: 7s - loss: 2.2697 - accuracy: 0.1875
20/20 [==============================] - 1s 24ms/step - loss: 0.8621 - accuracy: 0.5939 - val_loss: 0.8361 - val_accuracy: 0.6418
    Epoch 2/30

 1/20 [>.............................] - ETA: 0s - loss: 0.7725 - accuracy: 0.5625
 7/20 [=========>....................] - ETA: 0s - loss: 0.8425 - accuracy: 0.5536
20/20 [==============================] - 0s 5ms/step - loss: 0.7570 - accuracy: 0.6196 - val_loss: 0.6375 - val_accuracy: 0.6940
    Epoch 3/30

 1/20 [>.............................] - ETA: 0s - loss: 0.7543 - accuracy: 0.6250
20/20 [==============================] - 0s 4ms/step - loss: 0.7270 - accuracy: 0.6629 - val_loss: 0.5898 - val_accuracy: 0.6642
    Epoch 4/30

 1/20 [>.............................] - ETA: 0s - loss: 0.5897 - accuracy: 0.6875
20/20 [==============================] - 0s 4ms/step - loss: 0.6712 - accuracy: 0.6565 - val_loss: 0.6186 - val_accuracy: 0.7276
    Epoch 5/30

 1/20 [>.............................] - ETA: 0s - loss: 0.6258 - accuracy: 0.6875
20/20 [==============================] - 0s 2ms/step - loss: 0.6872 - accuracy: 0.6629 - val_loss: 0.4905 - val_accuracy: 0.7724
    Epoch 6/30

 1/20 [>.............................] - ETA: 0s - loss: 0.7425 - accuracy: 0.6250
20/20 [==============================] - 0s 3ms/step - loss: 0.7740 - accuracy: 0.6629 - val_loss: 0.4737 - val_accuracy: 0.7575
    Epoch 7/30

 1/20 [>.............................] - ETA: 0s - loss: 0.6253 - accuracy: 0.6875
20/20 [==============================] - 0s 2ms/step - loss: 0.5991 - accuracy: 0.6934 - val_loss: 0.5484 - val_accuracy: 0.7799
    Epoch 8/30

 1/20 [>.............................] - ETA: 0s - loss: 0.7149 - accuracy: 0.6250
20/20 [==============================] - 0s 3ms/step - loss: 0.7327 - accuracy: 0.6918 - val_loss: 0.6044 - val_accuracy: 0.6866

# Experimenting with wider networks
Now you know everything you need to begin experimenting with different models!

A model called model_1 has been pre-loaded. You can see a summary of this model printed in the IPython Shell. This is a relatively small network, with only 10 units in each hidden layer.

In this exercise you'll create a new model called model_2 which is similar to model_1, except it has 100 units in each hidden layer.

After you create model_2, both models will be fitted, and a graph showing both models loss score at each epoch will be shown. We added the argument verbose=False in the fitting commands to print out fewer updates, since you will look at these graphically instead of as text.

Because you are fitting two models, it will take a moment to see the outputs after you hit run, so be patient.

# Instructions

Create model_2 to replicate model_1, but use 100 nodes instead of 10 for the first two Dense layers you add with the 'relu' activation. Use 2 nodes for the Dense output layer with 'softmax' as the activation.
Compile model_2 as you have done with previous models: Using 'adam' as the optimizer, 'categorical_crossentropy' for the loss, and metrics=['accuracy'].
Hit 'Submit Answer' to fit both the models and visualize which one gives better results! Notice the keyword argument verbose=False in model.fit(): This prints out fewer updates, since you'll be evaluating the models graphically instead of through text.

In [None]:
# Define early_stopping_monitor
early_stopping_monitor = EarlyStopping(patience=2)

# Create the new model: model_2
model_2 = Sequential()

# Add the first and second layers
model_2.add(Dense(100, activation = 'relu' , input_shape=input_shape))
model_2.add(Dense(100, activation = 'relu'))

# Add the output layer
model_2.add(Dense(2, activation = "softmax"))

# Compile model_2
model_2.compile(optimizer = "adam", loss = 'categorical_crossentropy', metrics=['accuracy'] )

# Fit model_1
model_1_training = model_1.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Fit model_2
model_2_training = model_2.fit(predictors, target, epochs=15, validation_split=0.2, callbacks=[early_stopping_monitor], verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()

In [None]:
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 10)                110
_________________________________________________________________
dense_1 (Dense)              (None, 10)                110
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 22
=================================================================
Total params: 242
Trainable params: 242
Non-trainable params: 0
_________________________________________________________________
None

The blue model is the one you made, the red is the original model. Your model had a lower loss value, so it is the better model. Nice job!

# Adding layers to a network
You've seen how to experiment with wider networks. In this exercise, you'll try a deeper network (more hidden layers).

Once again, you have a baseline model called model_1 as a starting point. It has 1 hidden layer, with 10 units. You can see a summary of that model's structure printed out. You will create a similar network with 3 hidden layers (still keeping 10 units in each layer).

This will again take a moment to fit both models, so you'll need to wait a few seconds to see the results after you run your code.

## Instructions

Specify a model called model_2 that is like model_1, but which has 3 hidden layers of 10 units instead of only 1 hidden layer.
Use input_shape to specify the input shape in the first hidden layer.
Use 'relu' activation for the 3 hidden layers and 'softmax' for the output layer, which should have 2 units.
Compile model_2 as you have done with previous models: Using 'adam' as the optimizer, 'categorical_crossentropy' for the loss, and metrics=['accuracy'].
Hit 'Submit Answer' to fit both the models and visualize which one gives better results!

In [None]:
# The input shape to use in the first hidden layer
input_shape = (n_cols,)

# Create the new model: model_2
model_2 = Sequential()

# Add the first, second, and third hidden layers
model_2.add(Dense(10, activation = "relu", input_shape = input_shape))
model_2.add(Dense(10, activation= "relu"))
model_2.add(Dense(10, activation= "relu"))

# Add the output layer
model_2.add(Dense(2, activation = "softmax"))

# Compile model_2
model_2.compile(optimizer = "adam", loss = 'categorical_crossentropy', metrics=['accuracy'])

# Fit model 1
model_1_training = model_1.fit(predictors, target, epochs=15, validation_split=0.4, verbose=False)

# Fit model 2
model_2_training = model_2.fit(predictors, target, epochs=15, validation_split=0.4, verbose=False)

# Create the plot
plt.plot(model_1_training.history['val_loss'], 'r', model_2_training.history['val_loss'], 'b')
plt.xlabel('Epochs')
plt.ylabel('Validation score')
plt.show()



In [None]:
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 10)                110
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 22
=================================================================
Total params: 132
Trainable params: 132
Non-trainable params: 0
_________________________________________________________________
None

Great work! The blue model is the one you made and the red is the original model. The model with the lower loss value is the better model.

# Thinking about model capacity
# Experimenting with model structures
You've just run an experiment where you compared two networks that were identical except that the 2nd network had an extra hidden layer. You see that this 2nd network (the deeper network) had better performance. Given that, which of the following would be a good experiment to run next for even better performance?


## Answer the question

Possible Answers
Select one answer

Try a new network with fewer layers than anything you have tried yet.


### **Use more units in each hidden layer.**


Use fewer units in each hidden layer.

# Stepping up to images
# Building your own digit recognition **model**
You've reached the final exercise of the course - you now know everything you need to build an accurate model to recognize handwritten digits!

We've already done the basic manipulation of the MNIST dataset shown in the video, so you have X and y loaded and ready to model with. Sequential and Dense from tensorflow.keras are also pre-imported.

To add an extra challenge, we've loaded only 2500 images, rather than 60000 which you will see in some published results. Deep learning models perform better with more data, however, they also take longer to train, especially when they start becoming more complex.

If you have a computer with a CUDA compatible GPU, you can take advantage of it to improve computation time. If you don't have a GPU, no problem! You can set up a deep learning environment in the cloud that can run your models on a GPU. Here is a blog post by Dan that explains how to do this - check it out after completing this exercise! It is a great next step as you continue your deep learning journey.

Ready to take your deep learning to the next level? Check out Advanced Deep Learning with Keras to see how the Keras functional API lets you build domain knowledge to solve new types of problems. Once you know how to use the functional API, take a look at Image Processing with Keras in Python to learn image-specific applications of Keras.

## Instructions

Create a Sequential object to start your model. Call this model.
Add the first Dense hidden layer of 50 units to your model with 'relu' activation. For this data, the input_shape is (784,).
Add a second Dense hidden layer with 50 units and a 'relu' activation function.
Add the output layer. Your activation function should be 'softmax', and the number of nodes in this layer should be the same as the number of possible outputs in this case: 10.
Compile model as you have done with previous models: Using 'adam' as the optimizer, 'categorical_crossentropy' for the loss, and metrics=['accuracy'].
Fit the model using X and y using a validation_split of 0.3 and 10 epochs.


In [None]:
# Create the model: model
model = Sequential()

# Add the first hidden layer
model.add(Dense(50, activation = "relu", input_shape = (784,)))

# Add the second hidden layer
model.add(Dense(50, activation = "relu"))

# Add the output layer
model.add(Dense(10, activation = "softmax"))

# Compile the model
model.compile(optimizer = "adam", loss = 'categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model.fit(X, y, validation_split = 0.3, epochs = 10)

In [None]:
<script.py> output:
    Epoch 1/10

 1/55 [..............................] - ETA: 27s - loss: 2.3654 - accuracy: 0.0938
37/55 [===================>..........] - ETA: 0s - loss: 1.7771 - accuracy: 0.4907 
55/55 [==============================] - 1s 5ms/step - loss: 1.5330 - accuracy: 0.5714 - val_loss: 0.8765 - val_accuracy: 0.7747
    Epoch 2/10

 1/55 [..............................] - ETA: 0s - loss: 0.9166 - accuracy: 0.6875
44/55 [=======================>......] - ETA: 0s - loss: 0.6376 - accuracy: 0.8359
55/55 [==============================] - 0s 2ms/step - loss: 0.6204 - accuracy: 0.8377 - val_loss: 0.5585 - val_accuracy: 0.8467
    Epoch 3/10

 1/55 [..............................] - ETA: 0s - loss: 0.4024 - accuracy: 0.9062
46/55 [========================>.....] - ETA: 0s - loss: 0.4214 - accuracy: 0.8770
55/55 [==============================] - 0s 2ms/step - loss: 0.4217 - accuracy: 0.8777 - val_loss: 0.4555 - val_accuracy: 0.8547
    Epoch 4/10

 1/55 [..............................] - ETA: 0s - loss: 0.2083 - accuracy: 0.9688
43/55 [======================>.......] - ETA: 0s - loss: 0.3429 - accuracy: 0.8997
55/55 [==============================] - 0s 2ms/step - loss: 0.3297 - accuracy: 0.9051 - val_loss: 0.3998 - val_accuracy: 0.8880
    Epoch 5/10

 1/55 [..............................] - ETA: 0s - loss: 0.2498 - accuracy: 0.9375
27/55 [=============>................] - ETA: 0s - loss: 0.2662 - accuracy: 0.9352
51/55 [==========================>...] - ETA: 0s - loss: 0.2633 - accuracy: 0.9314
55/55 [==============================] - 0s 3ms/step - loss: 0.2595 - accuracy: 0.9320 - val_loss: 0.3652 - val_accuracy: 0.8867
    Epoch 6/10

 1/55 [..............................] - ETA: 0s - loss: 0.1135 - accuracy: 1.0000
29/55 [==============>...............] - ETA: 0s - loss: 0.1810 - accuracy: 0.9591
55/55 [==============================] - 0s 3ms/step - loss: 0.2098 - accuracy: 0.9486 - val_loss: 0.3634 - val_accuracy: 0.8907
    Epoch 7/10

 1/55 [..............................] - ETA: 0s - loss: 0.1679 - accuracy: 1.0000
41/55 [=====================>........] - ETA: 0s - loss: 0.1556 - accuracy: 0.9634
55/55 [==============================] - 0s 2ms/step - loss: 0.1728 - accuracy: 0.9571 - val_loss: 0.3725 - val_accuracy: 0.8853
    Epoch 8/10

 1/55 [..............................] - ETA: 0s - loss: 0.3328 - accuracy: 0.9688
39/55 [====================>.........] - ETA: 0s - loss: 0.1414 - accuracy: 0.9704
55/55 [==============================] - 0s 2ms/step - loss: 0.1428 - accuracy: 0.9674 - val_loss: 0.3712 - val_accuracy: 0.8840
    Epoch 9/10

 1/55 [..............................] - ETA: 0s - loss: 0.3741 - accuracy: 0.9688
42/55 [=====================>........] - ETA: 0s - loss: 0.1197 - accuracy: 0.9762
55/55 [==============================] - 0s 2ms/step - loss: 0.1231 - accuracy: 0.9754 - val_loss: 0.3752 - val_accuracy: 0.8840
    Epoch 10/10

 1/55 [..............................] - ETA: 0s - loss: 0.1955 - accuracy: 0.9688
37/55 [===================>..........] - ETA: 0s - loss: 0.1054 - accuracy: 0.9755
55/55 [==============================] - 0s 2ms/step - loss: 0.1106 - accuracy: 0.9754 - val_loss: 0.3518 - val_accuracy: 0.8893

Congrats! You've done something pretty amazing. The code you wrote leads to a model that's 90% accurate at recognizing handwritten digits, even while using a small training set of only 1750 images!