# Building deep learning models with keras
  
In this chapter, you'll use the Keras library to build deep learning models for both regression and classification. You'll learn about the Specify-Compile-Fit workflow that you can use to make predictions, and by the end of the chapter, you'll have all the tools necessary to build deep neural networks.

## Resources
  
**Notebook Syntax**
  
<span style='color:#7393B3'>NOTE:</span>  
- Denotes additional information deemed to be *contextually* important
- Colored in blue, HEX #7393B3
  
<span style='color:#E74C3C'>WARNING:</span>  
- Significant information that is *functionally* critical  
- Colored in red, HEX #E74C3C
  
---
  
**Links**
  
[NumPy Documentation](https://numpy.org/doc/stable/user/index.html#user)  
[Pandas Documentation](https://pandas.pydata.org/docs/user_guide/index.html#user-guide)  
[Matplotlib Documentation](https://matplotlib.org/stable/index.html)  
[Seaborn Documentation](https://seaborn.pydata.org)  
[TensorFlow Documentation](https://www.tensorflow.org)  
  
---
  
**Notable Functions**
  
<table>
  <tr>
    <th>Index</th>
    <th>Operator</th>
    <th>Use</th>
  </tr>
  <tr>
    <td>1</td>
    <td>numpy.array()</td>
    <td>Creates an array. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways.</td>
  </tr>
  <tr>
    <td>2</td>
    <td>tensorflow.keras.utils.to_categorical()</td>
    <td>Converts a class vector (integer labels) to binary class matrix. It is commonly used for one-hot encoding of categorical variables in deep learning tasks.</td>
  </tr>
  <tr>
    <td>3</td>
    <td>tensorflow.keras.Sequential</td>
    <td>Creates a sequential model in Keras, which is a linear stack of layers. This is the most common type of model in deep learning, where each layer is connected to the next in a sequential manner.</td>
  </tr>
  <tr>
    <td>4</td>
    <td>tensorflow.keras.layers.Dense()</td>
    <td>A fully connected layer in a neural network. Dense layers are the most common type of layer used in deep learning models. They have a set of learnable weights and biases and each neuron is connected to every neuron in the previous layer.</td>
  </tr>
  <tr>
    <td>5</td>
    <td>model.compile()</td>
    <td>Compiles a Keras model. It configures the model for training by specifying the optimizer, loss function, and evaluation metrics. This step is required before training a model.</td>
  </tr>
</table>
  

---
  
**Language and Library Information**  
  
Python 3.11.0  
  
Name: numpy  
Version: 1.24.3  
Summary: Fundamental package for array computing in Python  
  
Name: pandas  
Version: 2.0.3  
Summary: Powerful data structures for data analysis, time series, and statistics  
  
Name: matplotlib  
Version: 3.7.2  
Summary: Python plotting package  
  
Name: seaborn  
Version: 0.12.2  
Summary: Statistical data visualization  
  
Name: tensorflow  
Version: 2.13.0  
Summary: TensorFlow is an open source machine learning framework for everyone.  
  
---
  
**Miscellaneous Notes**
  
<span style='color:#7393B3'>NOTE:</span>  
  
`python3.11 -m IPython` : Runs python3.11 interactive jupyter notebook in terminal.
  
`nohup ./relo_csv_D2S.sh > ./output/relo_csv_D2S.log &` : Runs csv data pipeline in headless log.  
  
`print(inspect.getsourcelines(test))` : Get self-defined function schema  
  
<span style='color:#7393B3'>NOTE:</span>  
  
Schema:  
- input -> **array**: feature values
- weights -> **dictionary**: keys = node_name, values = weight of node/input
- node_in -> **neural network operation**: (prior_input * weight[]).sum()
- node_out -> **neural network operation**: activation function, typically a ReLU is used
- node_hidden_concat -> **array**: concat node_out nodes into an array
- output_in -> **neural network operation**: (prior_input * weight[]).sum()
- output_out -> **neural network operation**: output activation function, typically a softmax is used
- Display, or create function for above, then make a loop iter with a loop variable

In [10]:
import numpy as np                  # Numerical Python:         Arrays and linear algebra
import pandas as pd                 # Panel Datasets:           Dataset manipulation
import matplotlib.pyplot as plt     # MATLAB Plotting Library:  Visualizations
import seaborn as sns               # Seaborn:                  Visualizations
import tensorflow as tf             # TensorFlow:               Deep-Learning Neural Networks


## Creating a Keras model
  
Congrats, You've learned the theory of back-propagation, which is core to understanding deep learning. Now you'll learn how to create and optimize these networks using the Keras interface to the TensorFlow deep learning library.
  
**Model building steps**
  
The Keras workflow has 4 steps: 
  
1. First, you specify the architecture, which are things like: 
- How many layers do you want? 
- How many nodes in each layer? 
- What activation function do you want to use in each layer?  
  
2. Next, you compile the model. 
- This specifies the loss function, and some details about how optimization works.  
  
3. Then you fit the model. 
- Which is that cycle of back-propagation and optimization of model weights with your data.  
  
4. Finally you will want to use your model to make predictions. 
- We'll go through these steps sequentially. The first step is creating or specifying your model.
  
Workflow simply put:  
  
1. Architecture
2. Compile
3. Fit
4. Predict
  
**Model specification**
  
Here is the code to do that. This code has three blocks. First we import what we will need. Numpy is here only for reading some data. The other two imports are used for building our model. 
  
```python
# Importation of libraries
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
```
  
The second block of two lines reads the data. We read the data here so we can find the number of nodes in the input layer. That is stored as the variable `n_cols`. We always need to specify how many columns are in the input when building a Keras model, because that is the number of nodes in the input layer. We then start building the model. 
  
```python
# Load dataset
predictors = np.loadtxt('<dataset-file-here.csv>', delimiter=',')
n_cols = predictors.shape([1])
```
  
The first line of model specification is model equals `Sequential()`. There are two ways to build up a model, and we will focus on sequential, which is the easier way to build a model. Sequential models require that each layer has weights or connections only to the one layer coming directly after it in the network diagram. There are more exotic models out there with complex patterns of connections, but `Sequential()` will do the trick for everything we need here. We start adding layers using the `.add()` method of the model. 
- The type of layer you have seen, that standard layer type, is called a Dense layer. 
- It is called Dense because all of the nodes in the previous layer connect to all of the nodes in the current layer. 
- As you advance in deep learning, you may start using layers that aren't Dense. 
  
In each layer, we specify the number of nodes as the first positional argument, and the activation function we want to use in that layer using the keyword argument `activation=`. Keras supports every activation function you will want in practice. In the first layer, we need to specify input shapes as shown here. That says the input will have `n_cols` columns, and there is nothing after the comma, meaning it can have any number of rows, that is, any number of data points. You'll notice the last layer has 1 node. That is the output layer, and it matches those diagrams where we ended with only a single node as the output or prediction of the model. 
  
```python
# Model specifications
model = Sequential()
model.add(Dense(100, activation='relu', input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(1))
```
  
This model has 2 hidden layers, and an output layer. You may be struck that each hidden layers has 100 nodes. Keras and TensorFlow do the math for us, so don't feel afraid to use much bigger networks than we've seen before. It's quite common to use 100 or 1000s nodes in a layer. You'll learn more about choosing an appropriate number of nodes later.

### Understanding your data
  
You will soon start building models in Keras to predict wages based on various professional and demographic factors. Before you start building a model, it's good to understand your data by performing some exploratory analysis.
  
The data is pre-loaded into a `pandas` DataFrame called `df`. Use the `.head()` and `.describe()` methods in the IPython Shell for a quick overview of the DataFrame.
  
The target variable you'll be predicting is `wage_per_hour`. Some of the predictor variables are binary indicators, where a value of 1 represents `True`, and 0 represents `False`.
  
Of the 9 predictor variables in the DataFrame, how many are binary indicators? The min and max values as shown by `.describe()` will be informative here. How many binary indicator predictors are there?
  
Possible answers
  
- [ ] 0
- [ ] 5
- [x] 6
  
Solution
  
```python
In [1]:
df.describe()
Out[1]:
       wage_per_hour    union  education_yrs  experience_yrs      age   female     marr    south  manufacturing  construction
count        534.000  534.000        534.000         534.000  534.000  534.000  534.000  534.000        534.000       534.000
mean           9.024    0.180         13.019          17.822   36.833    0.459    0.655    0.292          0.185         0.045
std            5.139    0.384          2.615          12.380   11.727    0.499    0.476    0.455          0.389         0.207
min            1.000    0.000          2.000           0.000   18.000    0.000    0.000    0.000          0.000         0.000
25%            5.250    0.000         12.000           8.000   28.000    0.000    0.000    0.000          0.000         0.000
50%            7.780    0.000         12.000          15.000   35.000    0.000    1.000    0.000          0.000         0.000
75%           11.250    0.000         15.000          26.000   44.000    1.000    1.000    1.000          0.000         0.000
max           44.500    1.000         18.000          55.000   64.000    1.000    1.000    1.000          1.000         1.000
```
  
Exactly! There are 6 binary indicators.

### Specifying a model
  
Now you'll get to work with your first model in Keras, and will immediately be able to run more complex neural network models on larger datasets compared to the first two chapters.
  
To start, you'll take the skeleton of a neural network and add a hidden layer and an output layer. You'll then fit that model and see Keras do the optimization so your model continually gets better.
  
As a start, you'll predict workers wages based on characteristics like their industry, education and level of experience. You can find the dataset in a `pandas` DataFrame called `df`. For convenience, everything in `df` except for the target has been converted to a NumPy array called `predictors`. The target, `wage_per_hour`, is available as a NumPy array called `target`.
  
For all exercises in this chapter, we've imported the Sequential model constructor, the Dense layer constructor, and `pandas`.
  
1. Store the number of columns in the `predictors` data to `n_cols`. This has been done for you.
2. Start by creating a `Sequential()` model called `model`.
3. Use the `.add()` method on model to add a `Dense()` layer.
- Add `50` units, specify `activation='relu'`, and the `input_shape=` parameter to be the tuple `(n_cols,)` which means it has `n_cols` items in each row of data, and any number of rows of data are acceptable as inputs.
4. Add another `Dense()` layer. This should have `32` units and a `'relu'` activation.
5. Finally, add an output layer, which is a `Dense()` layer with a single node. Don't use any activation function here.

In [11]:
df = pd.read_csv('../_datasets/hourly_wages.csv')
print(df.shape)
df.head()

(534, 10)


Unnamed: 0,wage_per_hour,union,education_yrs,experience_yrs,age,female,marr,south,manufacturing,construction
0,5.1,0,8,21,35,1,1,0,1,0
1,4.95,0,9,42,57,1,1,0,1,0
2,6.67,0,12,1,19,0,0,0,1,0
3,4.0,0,12,4,22,0,0,0,0,0
4,7.5,0,12,17,35,0,1,0,0,0


In [12]:
# Select all observations, split X/y, change to array
predictors = df.iloc[:, 1:].to_numpy()
target = df.iloc[:, 0].to_numpy()

In [13]:
# Save the number of columns in predictors: n_cols
n_cols = predictors.shape[1]

# Set up the model: model
model = tf.keras.Sequential()

# Add the first layer
model.add(tf.keras.layers.Dense(50, activation='relu', input_shape=(n_cols, )))

# Add the second layer
model.add(tf.keras.layers.Dense(32, activation='relu'))

# Add the output layer
model.add(tf.keras.layers.Dense(1))

Now that you've specified the model, the next step is to compile it.

## Compiling and fitting a model
  
After you've specified a model, the next task is to compile it, which sets up the network for optimization, for instance creating an internal function to do back-propagation efficiently. 
  
**Why you need to compile your model**
  
The compile methods has two important arguments for you to choose. The first is what optimizer to use, which controls the learning rate. In practice, the right choice of learning rate can make a big difference for how quickly our model finds good weights, and even how good a set of weights it can find. There are a few algorithms that automatically tune the learning rate. Even many experts in the field don't know all the details of all the optimization algorithms. So the pragmatic approach is to choose a versatile algorithm and use that for most problems. Adam is an excellent choice as your go-to optimizer. Adam adjusts the learning rate as it does gradient descent, to ensure reasonable values throughout the weight optimization process. The second thing you specify is the loss function. Mean squared error is the most common choice for regression problems. When we use Keras for classification, you will learn a new default metric.
  
1. Specify the optimizer
- Many options that are mathematically complex
- `"adam"` is usually a good "go-to" choice
  
2. Specify the loss function
- `"mean_squared_error"` is the most common choice for regression problems
  
**Compiling a model**
  
Here is an example of the code to compile a model. It builds a model, as you've already seen, and then we add a compile command after building the model. After compiling the model, you can fit it. 
  
```python
# Importation of libraries
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

# Load dataset
predictors = np.loadtxt('<dataset-file-here.csv>', delimiter=',')
n_cols = predictors.shape([1])

# Model specifications
model = Sequential()
model.add(Dense(100, activation='relu', input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
```
  
**What is fitting a model**
  
That is applying back-propagation and gradient descent with your data to update the weights. The `.fit()` step looks similar to what you've seen in scikit-learn, though it has more options which we will explore soon. Even with the Adam optimizer, which is pretty smart, it can improve your optimization process if you scale all the data so each feature is, on average, about similar sized values. One common approach is to subtract each feature by that features mean, and divide it by it's standard deviation, *Standardization*.
  
- Applying backpropagation and gradient descent with your data to update the weights
- Scaling data before fitting can ease optimization
- `.fit()` looks similar to prior exposure in `sklearn`, however it has more options
  
**Fitting a model**
  
You can see what the code looks like here. After the compile step, we run fit, with the predictors as the first argument. 
  
```python
# Importation of libraries
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

# Load dataset
predictors = np.loadtxt('<dataset-file-here.csv>', delimiter=',')
n_cols = predictors.shape([1])

# Model specifications
model = Sequential()
model.add(Dense(100, activation='relu', input_shape=(n_cols,)))
model.add(Dense(100, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(predictors, target)  # <-- fitting/training model
```
  
When you run this, you will see some output showing the optimizations progress as it fits the data. We'll go into more detail about this output soon, but for now, just think of it as a log showing model performance on the training data as we update model weights.


### Compiling the model
  
You're now going to compile the model you specified earlier. To compile the model, you need to specify the optimizer and loss function to use. You can read more about it as well as other Keras optimizers [here](https://keras.io/api/optimizers/), and if you are really curious to learn more, you can read the [original paper](https://arxiv.org/abs/1412.6980v8) that introduced the Adam optimizer.
  
In this exercise, you'll use the Adam optimizer and the mean squared error loss function. Go for it!
  
1. Compile the model using `model.compile()`. Your optimizer should be `'adam'` and the loss should be `'mean_squared_error'`.

In [14]:
# Select all observations, split X/y, change to array
predictors = df.iloc[:, 1:].to_numpy()
target = df.iloc[:, 0].to_numpy()

# Specify the model
n_cols = predictors.shape[1]
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(50, activation='relu', input_shape = (n_cols,)))
model.add(tf.keras.layers.Dense(32, activation='relu'))
model.add(tf.keras.layers.Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Verify that model contains information from compiling
print("Loss function: " + model.loss)

Loss function: mean_squared_error


Fantastic work - all that's left now is to fit the model!

### Fitting the model
  
You're at the most fun part. You'll now fit the model. Recall that the data to be used as predictive features is loaded in a NumPy array called `predictors` and the data to be predicted is stored in a NumPy array called `target`. Your model is pre-written and it has been compiled with the code from the previous exercise.
  
1. Fit the `model`. Remember that the first argument is the predictive features (`predictors`), and the data to be predicted (`target`) is the second argument.

In [15]:
# Fit the model
model.fit(predictors, target, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x12cf454d0>

You now know how to specify, compile, and fit a deep learning model using Keras!

## Classification models
  
So far we have focused on regression models. But deep learning works similarly for classification, that is for predicting outcomes from a set of discrete options.
  
**Classification**
  
For classification, you do a couple of things differently. The biggest changes are: first, you set the loss function as `'categorical_crossentropy'` instead of `'mean_squared_error'`. This isn't the only possible loss function for classification problems, but it is by far the most common. You may have heard of this before under the name *LogLoss*. 
  
---
  
<span style='color:#7393B3'>NOTE:</span> What is Log Loss?
  
Log Loss is the most important classification metric based on probabilities. It's hard to interpret raw log-loss values, but log-loss is still a good metric for comparing models. For any given problem, a lower log-loss value means better predictions. Log Loss is a slight twist on something called the *Likelihood Function*. In fact, Log Loss is $(-1 * log(\text{likelihood function})$. So, we will start by understanding the likelihood function.

The likelihood function answers the question "How likely did the model think the actually observed set of outcomes was." If that sounds confusing, an example should help. 
  
**Example**
  
A model predicts probabilities of `[0.8, 0.4, 0.1]` for three houses. The first two houses were sold, and the last one was not sold. So the actual outcomes could be represented numeically as `[1, 1, 0]`.
  
Let's step through these predictions one at a time to iteratively calculate the likelihood function.
  
The first house sold, and the model said that was 80% likely. So, the likelihood function after looking at one prediction is 0.8.
  
The second house sold, and the model said that was 40% likely. There is a rule of probability that the probability of multiple independent events is the product of their individual probabilities. So, we get the combined likelihood from the first two predictions by multiplying their associated probabilities. That is `0.8 * 0.4`, which happens to be 0.32.
  
Now we get to our third prediction. That home did not sell. The model said it was 10% likely to sell. That means it was 90% *likely to not sell*. So, the observed outcome of not selling was 90% likely according to the model. So, we multiply the previous result of 0.32 by 0.9.
  
We could step through all of our predictions. Each time we'd find the probability associated with the outcome that actually occurred, and we'd multiply that by the previous result. That's the likelihood.
  
**From Likelihood to Log Loss**
  
Each prediction is between 0 and 1. If you multiply enough numbers in this range, the result gets so small that computers can't keep track of it. So, as a clever computational trick, we instead keep track of the log of the Likelihood. This is in a range that's easy to keep track of. We multiply this by negative 1 to maintain a common convention that lower loss scores are better.
  
Credit: dansbecker - DanB - Kaggle Grandmaster - [Link to page](https://www.kaggle.com/code/dansbecker/what-is-log-loss)  
  
---
  
We won't go into the mathematical details of categorical crossentropy here. For categorical crossentropy loss function, a lower score is better. But it's still hard to interpret. So I've added this argument "metrics equals accuracy". This means I want to print out the accuracy score at the end of each epoch, which makes it easier to see and understand the models progress. The second thing we do is you need to modify the last layer, so it has a separate node for each potential outcome. You will also change the activation function to softmax. The softmax activation function ensures the predictions sum to 1, so they can be interpreted as probabilities.
  
- `'categorical_crossentropy'` is a loss function common for classification
- Similar to log loss: Lower is better
- Add `metrics=['accuracy']` to `model.compile()` step for easy-to-understand diagnostics
- Output layers has separate node for each possible outcome, and uses `softmax` activation
  
**Quick look at the data**
  
Here is some data for a binary classification problem. We have data from the NBA basketball league. It includes facts about each shot, and the shot result is either 0 or 1, indicating whether the shot went in or not. The outcome here is in a single column, which is not uncommon. But in general, we'll want to convert categoricals in Keras to a format with a separate column for each output. Keras includes a function to do that, which you will see in the code soon. This setup is consistent with the fact that you will have a separate node in the output for each possible class.
  
<img src='../_images/quick-look-at-nba-dataset-neural-net-class.png' alt='img' width='550'>
  
**Transforming to categorical**
  
We have a new column for each value of `shot_result`. A 1 in any column indicates that this column corresponds to the value from the original data. This is sometimes called *one-hot encoding*. If the original data had 3 or 4 or 100 different values, the new array for our data would have 3 or 4 or 100 columns respectively.
  
<img src='../_images/quick-look-at-nba-dataset-neural-net-class1.png' alt='img' width='550'>
  
**Classification**
  
Here is the code to build a model with that data. First, we import that utility function to convert the data from one column to multiple columns. That is this function `to_categorical`. We then read in the data. I like reading in the data with `pandas`, in case I want to inspect it. But this could be done with `numpy`. I also do a couple of `pandas` tricks here which you may or may be familiar with. Here, I use the `.drop()` method to get a version of my data without the target column. We then create our target using the `to_categorical` function. Then we build our model. It looks similar to models you've seen. Except the last line of the model definition has 2 nodes, for the 2 possible outcomes. And it has the softmax activation function.
  
<img src='../_images/quick-look-at-nba-dataset-neural-net-class2.png' alt='img' width='550'>
  
Lets look at the results now. Both accuracy and loss improve measurably for the first 3 epochs, and then the improvement slows down. Sometimes it gets a little worse for an epoch, sometimes it gets a little better. We will soon see a more sophisticated way to determine how long to train, but training for 10 epochs got us to that flat part of the loss function, so this worked well in this case.
  
<img src='../_images/quick-look-at-nba-dataset-neural-net-class3.png' alt='img' width='550'>
  


### Understanding your classification data
  
Now you will start modeling with a new dataset for a classification problem. This data includes information about passengers on the Titanic. You will use predictors such as `age`, `fare` and where each passenger embarked from to predict who will survive. This data is from [a tutorial on data science competitions](https://www.kaggle.com/c/titanic). Look [here](https://www.kaggle.com/c/titanic/data) for descriptions of the features.
  
The data is pre-loaded in a `pandas` DataFrame called `df`.
  
It's smart to review the maximum and minimum values of each variable to ensure the data isn't misformatted or corrupted. What was the maximum age of passengers on the Titanic? Use the `.describe()` method in the IPython Shell to answer this question.
  
Possible answers
  
- [ ] 29.699.
- [ ] 80.
- [ ] 891.
- [ ] It is not listed.
  
Solution
  
```python
In [1]:
df.age.max()
Out[1]:
80.0
```
  
Exactly! The maximum age in the data is 80.

### Last steps in classification models
  
You'll now create a classification model using the titanic dataset, which has been pre-loaded into a DataFrame called `df`. You'll take information about the passengers and predict which ones survived.
  
The predictive variables are stored in a NumPy array predictors. The target to predict is in `df.survived`, though you'll have to manipulate it for Keras. The number of predictive features is stored in `n_cols`.
  
Here, you'll use the `'sgd'` optimizer, which stands for [Stochastic Gradient Descent](https://en.wikipedia.org/wiki/Stochastic_gradient_descent). You'll learn more about this in the next chapter!
  
1. Convert `df.survived` to a categorical variable using the `to_categorical()` function.
2. Specify a `Sequential()` model called `model`.
3. Add a `Dense()` layer with 32 nodes. Use `'relu'` as the `activation=` and `(n_cols, )` as the `input_shape=`.
4. Add the `Dense()` output layer. Because there are two outcomes, it should have 2 units, and because it is a classification model, the `activation=` should be `'softmax'`.
5. Compile the model
- using `'sgd'` as the `optimizer=` 
- `'categorical_crossentropy'` as the `loss=` function 
- and `metrics=['accuracy']` to see the accuracy (what fraction of predictions were correct) at the end of each `epoch=`.
6. Fit the `model` using the `predictors` and the `target`.

In [16]:
df = pd.read_csv('../_datasets/titanic_all_numeric.csv')
print(df.shape)
df.head()

(891, 11)


Unnamed: 0,survived,pclass,age,sibsp,parch,fare,male,age_was_missing,embarked_from_cherbourg,embarked_from_queenstown,embarked_from_southampton
0,0,3,22.0,1,0,7.25,1,False,0,0,1
1,1,1,38.0,1,0,71.2833,0,False,1,0,0
2,1,3,26.0,0,0,7.925,0,False,0,0,1
3,1,1,35.0,1,0,53.1,0,False,0,0,1
4,0,3,35.0,0,0,8.05,1,False,0,0,1


In [17]:
# X/y split
predictors = df.iloc[:, 1:].astype(np.float32).to_numpy()
target = df.survived.astype(np.float32).to_numpy()

# Extract number of columns for the 1-D array shape
n_cols = predictors.shape[1]

In [18]:
# Convert the target to categorical: target
target = tf.keras.utils.to_categorical(target)

# Set up the model
model = tf.keras.Sequential()

# Add the first layer
model.add(tf.keras.layers.Dense(32, activation='relu', input_shape=(n_cols, )))

# Add the second layer
model.add(tf.keras.layers.Dense(2, activation='softmax'))

# Compile the model
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model
model.fit(predictors, target, epochs=10)

Epoch 1/10


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x12cf7c0d0>

This simple model is generating an accuracy of 70%!

## Using models
  
Now that you can build basic deep learning models, I'll show you how to use them. Then we'll go into some finer details on fine tuning model architectures. 
  
**Using models**
  
The things you'll want to do in order to use these models are: 
  
1. Save a model after you've trained it 
2. Reload that model 
3. Make predictions with the model 
  
**Saving, reloading, and using your Model**
  
Here is the code to save a model, reload it, and make predictions. We've imported a `load_model()` function here. Once I have a model I want to save, I can save it with the `.save()` method. I supply a filename. Models are saved in a format called `.hdf5`, for which h5 is the common extension. I then load the model back into memory with the `load_model()` function here. I then make predictions. The model I've loaded here is a classification model. The predictions come in the same format as the prediction target. You may recall that this had 1 column for whether the shot was missed, and then a 2nd column for whether the shot was made. In practice, I probably only want the probability that the shot is made. So, I'll extract that second column with `numpy` indexing, and I called that `probability_true`. Lastly, sometimes I'll want to verify that the model I loaded has the same structure I expect.
  
```python
from tensorflow.keras.models import load_model

# Saving a model
model.save('model.hdf5')

# Loading a trained model
my_model = load_model('model.hdf5')

# Make predictions with the model
y_pred = my_model.predict(<data-to-predict>)
prob_true = y_pred[:,1]
```
  
**Verifying model structure**
  
You can print out a summary of the model architecture with the `.summary()` method. You can see the output here. Now that you can save your model, reload it, make predictions, and verify its structure, you have most of what you need to not just build models, but to work with them in practical situations.

### Making predictions
  
The trained network from your previous coding exercise is now stored as `model`. New data to make predictions is stored in a NumPy array as `pred_data`. Use `model` to make predictions on your new data.
  
In this exercise, your predictions will be probabilities, which is the most common way for data scientists to communicate their predictions to colleagues.
  
1. Create your predictions using the model's `.predict()` method on `pred_data`.
2. Use NumPy indexing to find the column corresponding to predicted probabilities of survival being `True`. This is the second column (index 1) of predictions. Store the result in `predicted_prob_true` and `print()` it.

In [20]:
pred_data = pd.read_csv('../_datasets/titanic_pred.csv').astype(np.float32).to_numpy()
print(pred_data.shape)

(91, 10)


In [21]:
# Calculate predictions: predictions
predictions = model.predict(pred_data)

# Calculate predicted probability of survival: predicted_prob_true
predicted_prob_true = predictions[:, 1]

# Print predicted_prob_true
print(predicted_prob_true)

[0.23916313 0.34090686 0.26098457 0.60871285 0.16997376 0.1624222
 0.02218091 0.28563094 0.21588805 0.49518687 0.18234314 0.3011199
 0.2067293  0.3295896  0.16667812 0.03203486 0.26123515 0.49083805
 0.10354134 0.3701235  0.50857556 0.18585783 0.02376566 0.28800943
 0.29378074 0.19751285 0.49041864 0.45922384 0.21272047 0.55867046
 0.47306746 0.44617185 0.21370499 0.19416268 0.22178677 0.5206851
 0.21112947 0.15679581 0.48198697 0.44941497 0.20765562 0.32406965
 0.50327235 0.1982017  0.2366809  0.1195191  0.27044234 0.21312845
 0.42897233 0.46322972 0.32145202 0.0224067  0.4783128  0.47673464
 0.39504808 0.29774606 0.3132679  0.37537476 0.42565298 0.21370499
 0.15135628 0.26453272 0.45492855 0.2761854  0.29064864 0.27086556
 0.4456504  0.50218356 0.16953903 0.45036212 0.18242286 0.46752736
 0.16485803 0.11985264 0.41922057 0.3932271  0.2287412  0.21173613
 0.15548757 0.56425107 0.4101032  0.15636271 0.30023637 0.23880744
 0.17829    0.37258622 0.27453244 0.50115967 0.33048314 0.525188


You're now ready to begin learning how to fine-tune your models.