> <p><small><small>This Notebook is made available subject to the licence and terms set out in the <a href = "http://www.github.com/google-deepmind/ai-foundations">AI Research Foundations Github README file</a>.

<img src="https://storage.googleapis.com/dm-educational/assets/ai_foundations/GDM-Labs-banner-image-C3-white-bg.png">

# Lab: Train Your Model with Keras

<a href='https://colab.research.google.com/github/google-deepmind/ai-foundations/blob/master/course_3/gdm_lab_3_8_train_your_model_with_keras.ipynb'
target='_parent'><img src='https://colab.research.google.com/assets/colab-badge.svg' alt='Open In Colab'/></a>

Train a neural network model using optimizers implemented in Keras.

25 minutes

## Overview

In the previous article, you have learned how stochastic gradient descent (SGD) can be used to train a neural network model. In this lab, you will put that knowledge into practice and explore how you can use the SGD-based optimizer **Adam** to train Keras models.

### What you will learn:

By the end of this lab, you will understand:

* How you can train any Keras model, independent of its architecture.
* The role of the optimizer and loss function and how they can be combined to train a model.


### Tasks

As in the previous labs, you will work with a dataset of prompt embeddings. The goal is to build a classifier that predicts the next word ("food" or "water") from the prompt embedding.

**In this lab, you will**:
* Load the dataset.
* Define a two-layer neural network model using Keras.
* Train the model using Keras implementations of the loss function, the Adam optimizer, and the training loop.

All of these steps are described in detail in the following sections.

## How to use Google Colaboratory (Colab)

Google Colaboratory (also known as Google Colab) is a platform that allows you to run Python code in your browser. The code is written in *cells* that are executed on a remote server.

To run a cell, hover over a cell and click on the `run` button to its left. The run button is the circle with the triangle (▶). Alternatively, you can also click on a cell and use the keyboard combination Ctrl+Return (or ⌘+Return if you are using a Mac).

To try this out, run the following cell. This should print today's day of the week below it.

In [None]:
from datetime import datetime

print(f"Today is {datetime.today():%A}.")

Note that the order in which you run the cells matters. When you are working through a lab, make sure to always run all cells in order, otherwise the code might not work. If you take a break while working on a lab, Colab may disconnect you and in that case, you have to execute all cells again before  continuing your work. To make this easier, you can select the cell you are currently working on and then choose __Runtime → Run before__  from the menu above (or use the keyboard combination Ctrl/⌘ + F8). This will re-execute all cells before the current one.

## Imports

In this lab, you will primarily use the [`keras`](https://keras.io/) package for defining and training neural network models.

Run the following cell to import all required packages.

In [None]:
import os # For adjusting Keras settings.
os.environ['KERAS_BACKEND'] = 'jax' # Set a parameter for Keras.

import jax.numpy as jnp # For defining and working with vectors and matrices.
import keras # For defining and training neural nework models.
import pandas as pd # For displaying and loading data.

## Load the data

Run the following cell to download the dataset with 2-dimensional sentence embeddings. As in previous labs, the goal is to predict the next word from the words "food" (numeric label 1) and "water" (numeric label 2).

In [None]:
# Load data using pandas.
df = pd.read_csv("https://storage.googleapis.com/dm-educational/assets/ai_foundations/food-water-dataset.csv")

# Extract embeddings and labels.
X_train = jnp.array(df[["Embedding_dim_1", "Embedding_dim_2"]].values)
labels = df["Label"].values # Labels: "food" or "water".
# Convert labels to numeric values for training the model (food = 1, water = 0).
y_train = jnp.where(labels == "food", 1, 0)
df["Numeric label"] = y_train

# Print the loaded data for verification.
df.head(n=20)

## Define the neural network model

The following cell implements a function `build_neural_network` for defining a two-layer neural network using Keras.

The operations of the hidden layer are defined in
```python
operations.append(keras.layers.Dense(hidden_dim, activation="relu"))
```

and the operations of the output layer are defined in
```python
operations.append(keras.layers.Dense(1, activation="sigmoid"))
```

Reflect upon why this model is using a sigmoid activation function as the output layer? Could this be replaced with a SoftMax? When would this be useful?

<br />

------
> **ℹ️ Info: Combining layer operations**
>
>Note that for defining both layers, this code is combining two operations in one call here. When you implemented the MLP in one of the previous labs, you defined the computation of the dot product and the application of the activation function separately. However, as you rarely want to define a hidden layer without an activation function, Keras allows you to combine these two steps by passing the argument `activation` to the intialization of the `Dense` layer. You can find a list of available activation functions in the [Keras documentation](https://keras.io/api/layers/activations/).
>
------

<br />

Inspect both the documentation of the `build_neural_network` function and its implementation to understand how it defines a neural network. Then run the following cell to define it so that you can use it later on.

In [None]:
def build_neural_network(hidden_dim: int = 10) -> keras.Model:
  """
  A function that intializes a two-layer neural network for binary
  classification, implemented in Keras. The hidden layer uses a ReLU activation
  function, and the ouput layer uses a sigmoid activation function.

  Args:
    hidden_dim: The dimension of the hidden layer.

  Returns:
    A keras.Model instance that implements the logistic regression model.
  """

  operations = []

  # Add the operations for a hidden layer with a ReLU activation function.
  operations.append(keras.layers.Dense(hidden_dim, activation="relu"))

  # Add the operations for an output layer with a sigmoid activation function.
  operations.append(keras.layers.Dense(1, activation="sigmoid"))

  # Construct a model such that inputs are passed sequentially through every
  # layer.
  model = keras.Sequential(operations)
  return model

------
> **ℹ️ Info: The ingredients for training a model**
>
>Most deep learning frameworks, including Keras, use a combination of three components to train a neural network model:
>* The **loss function**: As you have already seen, the loss is a function of the current model weights, the current model predictions and the target labels in the training data. This is the function that you try to optimize during training. A lower value of this function means that the model is better at making predictions on examples in the training data.
>* The **optimizer**: This component is responsible for updating the parameters (weights) of the model such that the loss decreases. In optimizers that you would use for training neural networks, the updating is based on the gradient of the loss function with respect to the training examples, as you observed in the discussion of the SGD algorithm. All optimizers have to be initialized with the `learning_rate` parameter. This parameter defines how big each update step should be.
>* The **model**: This component defines which computations are needed to process the input. When you define a model using a deep learning framework such as Keras, this also automatically defines the necessary parameters for each layer. For example, when you define a layer using `Dense`, this also initializes the weights and the bias term for that layer.
>
------




## Coding activity 1: Define a loss function

Before you can train a model, you have to define both the loss function and the optimizer. In almost all cases, you can use existing implementations for both of these components.



------
> **💻 Your task:**
>
> Your first task is to define the loss function. Remember that you are building a **binary classifier** that predicts the probability of the next word being "food" (class 1). As you have seen in the previous lab and articles, for such classifiers, you will use a binary cross-entropy loss. In Keras, loss functions are defined in the `keras.losses` module and the binary cross-entropy loss can be initialized using the following class:
>
>```python
>keras.losses.BinaryCrossentropy()
>```
>
>Define the binary cross-entropy loss in the cell below.
------

In [None]:
# Define the loss function.
loss_fn = # Add your code here.

------
> **ℹ️ Info: Other loss functions**
>
>If you were doing a multi-class classification task (i.e., a classifier that chooses between more than two classes), you would use a `keras.losses.CategoricalCrossentropy` or a `keras.losses.SparseCategoricalCrossentropy` loss function. You can find a list of all loss functions implemented in Keras in the [Keras documentation](https://keras.io/api/losses/).
>
------


## Coding activity 2: Define the optimizer

As an optimizer, you will almost always use the Adam optimizer. This optimizer implements a more sophisticated version of the gradient update step than the regular SGD algorithm by adapting the step size depending on the shape of the loss function. In practice, this optimizer works very well for training neural networks.

<br />

------
> **💻 Your task:**
>
> Define the optimizer. The optimizer will compute the gradients and use them to update the parameters on each batch.
>
>In Keras, you can define the Adam optimizer as follows:
>```python
>keras.optimizers.Adam(learning_rate=<LEARNING RATE>)
>```
> If you want to apply weight decay, you can set its strength by adding the optional `weight_decay` parameter.
>
> Define this optimizer in the cell below. Use a learning rate of 0.01.
------






In [None]:
# Define the optimizer.
optimizer = # Add your code here.

## Putting it all together: The `compile` method

Finally, once you have defined the loss function, and the optimizer, you can put all of these components together in preparation for training.

First, you need to define your model. This should be an instance of the `keras.Model` class. In this activity, you will use the `build_neural_network` function from above to define an MLP.

Then, to put everything together, you can use the `compile` method of the model. This method attaches the loss function and the optimizer to the model. You can also specify optional metrics, such as the accuracy. If you specify metrics, the result of applying the metric after each epoch will be printed as part of the training log.

Run the following cell to define the model and combine it with the loss function, and the optimizer.



In [None]:
# Set a random seed for reproducibility.
keras.utils.set_random_seed(126)

# Define a model.
model = build_neural_network(hidden_dim = 10)

# Attach the loss function, the optimizer and metrics.
model.compile(loss=loss_fn, optimizer=optimizer, metrics=["accuracy"])

## Coding activity 3: Train the model

------
> **💻 Your task:**
>
> Train the model.
>
> To train the model, you can use the `model.fit()` method. This method takes the following arguments:
>* `x`: The input of the training data (a JAX array, `X_train` in this case).
>* `y`: the target values in the training data (a JAX array, `y_train` is this case).
>* `epochs`: The number of epochs. This specifies how many times the model loops through all training examples.
>* `batch_size` (optional): This specifies how many examples there should be in one mini-batch.
>* `validation_data` (optional): A tuple `(X_val, y_val)` to compute the validation loss and accuracy after each epoch.
>* `callbacks` (optional): A list of functions that should be executed at the end of each epoch. One useful function to include in this list is `keras.callbacks.EarlyStopping()` to implement early stopping.
>
>The function returns the training history that contains the loss and the accuracy after each epoch of training.
>
>In the following cell, implement training with the `fit` method. Train the model for 100 epochs with a mini-batch size of 8.

------







In [None]:
# Use the model.fit() method to train your model.
history = # Add your code here.

Congratulations! You now know all the steps involved in training neural network models with Keras. In the future, come back to this notebook if you need to train a Keras model to revise the steps.

You can also experiment with changing the model, or changing the parameters of the optimizer, or the training procedure to observe how the training process changes.

## Summary

In this activity, you trained a simple neural network using the Adam optimizer. You learned how the loss function, the optimizer and the model are combined in Keras to allow you to train any neural network model. You also learned where you can set hyperparameters such as the learning rate, the weight decay strength, or the number of epochs.

In future courses, you will apply this knowledge to train and fine-tune language models. While those models will be more complicated than the MLP that you have been working with here, all the same principles will apply.

## Solutions


The following cells provide reference solutions to the coding activities in this notebook. If you really get stuck after trying to solve the activities yourself, you may want to consult these solutions.


It is recommended that you *only* look at the solutions after you have tried to solve the activities *multiple times*. The best way to learn challenging concepts in computer science and artificial intelligence is to debug your code piece-by-piece until it works, rather than copying existing solutions.


If you feel stuck, you may want to first try to debug your code. For example, by adding additional print statements to see what your code is doing at every step. This will provide you with a much deeper understanding of the code and the materials. It will also provide you with practice on how to solve challenging coding problems beyond this course.


To view the solutions for an activity, click on the arrow to the left of the activity name. If you consult the solutions, do not copy and paste them into the cells above. Instead, look at them, and type them manually into the cell. This will help you understand where you went wrong.


### Coding Activity 1

In [None]:
# Copy this into the cell above.
loss_fn = keras.losses.BinaryCrossentropy()

### Coding Activity 2

In [None]:
# Copy this into the cell above.
optimizer = keras.optimizers.Adam(learning_rate=0.01)

### Coding Activity 3

In [None]:
# Copy this into the cell above.
history = model.fit(x=X_train, y=y_train, epochs=10, batch_size=8)