# Custom Training Loops

Custom training loops are a powerful tool for training deep learning models. They are used when the standard `fit` method provided by `Keras` or other high-level libraries is not enough to achieve the desired behavior.

Custom training loops provide full control over the training process, including the forward and backward pass, the optimization step, and the calculation of metrics. This allows for the implementation of complex training procedures, such as multi-stage training, transfer learning, or custom regularization methods.

The steps to create a custom training loop after preparing the data and defining the model are:

1. Define essential variables like the optimizer, loss function, and metrics that will be used to evaluate the model's performance.
2. Create a training step function that will be used to perform a single training step. This function will be called for each batch of data in the training set. The training step function should perform the following steps:
   - Calculate the forward pass passing the input data to the model and calculating the predictions.
   - Calculate the loss using the predictions and the true labels.
   - Calculate the gradients using the loss and the model's variables.
   - Apply parameters update using the optimizer and the gradients.
   - Calculate the metrics using the predictions and the true labels.
3. Create a test step function that will be used to perform a single test step. This function will be called for each batch of data in the test set. The test step function should perform the following steps:
   - Calculate the forward pass passing the input data to the model and calculating the predictions.
   - Calculate the loss using the predictions and the true labels.
   - Calculate the metrics using the predictions and the true labels.
4. Create a training loop that will be used to perform the training and test steps for each epoch. The training loop should perform the following steps:
   - Iterate over the training set and call the training step function for each batch.
   - Iterate over the test set and call the test step function for each batch.
   - Print the loss and metrics for the current epoch.
5. Profit!

By using custom training loops, you can achieve better control and flexibility over the training process and achieve better results for your specific use case.

## Table of Contents
- [Dataset Preparation](#dataset-preparation)
- [Model Definition](#model-definition)
- [Custom Training Loop](#custom-training-loop)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_datasets as tfds  # datasets
from tqdm.notebook import tqdm  # progress bar

## Dataset Preparation <a name="dataset-preparation"></a>
[Back to Top](#toc)

We'll use the same dataset setup as in the previous notebook.

In [None]:
dataset, info = tfds.load(
    "fashion_mnist",  # name of the dataset
    as_supervised=True,  # returns (image, label)
    with_info=True,  # returns info about the dataset
)

# prepare index labels
labels_index = info.features["label"].names

In [4]:
def preprocess(image, label):
    # preprocess images
    image = tf.reshape(image, (28, 28, 1))
    image = tf.cast(image, tf.float32)
    image = image / 255.0
    # preprocess labels
    label = tf.one_hot(label, 10)
    return image, label


# we will create a function to prepare the dataset for training
def dataset_prep(dataset):
    dataset = dataset.map(preprocess, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    # shuffle the dataset
    dataset = dataset.shuffle(1000)

    # batch the dataset
    dataset = dataset.batch(32)

    # prefetch the dataset
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)

    return dataset


train_dataset = dataset_prep(dataset["train"])
test_dataset = dataset_prep(dataset["test"])

## Model Definition <a name="model-definition"></a>
[Back to top](#toc)

In [5]:
model = tf.keras.Sequential(
    [
        tf.keras.layers.Conv2D(
            32, 3, padding="same", activation="relu", input_shape=(28, 28, 1)
        ),
        tf.keras.layers.MaxPooling2D(),
        tf.keras.layers.Conv2D(64, 3, padding="same", activation="relu"),
        tf.keras.layers.MaxPooling2D(),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 28, 28, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 14, 14, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 7, 7, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 3136)              0         
                                                                 
 dense (Dense)               (None, 128)               4

## Custom Training Loop <a name="custom-training-loop"></a>
[Back to top](#toc)

Let's start applying the steps described above to create a custom training loop.

In [6]:
# Step 1: Define essential variables like the optimizer, loss function, and metrics that will be used to evaluate the model's performance.

optimizer = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.CategoricalCrossentropy()

train_metric = tf.keras.metrics.CategoricalAccuracy()
test_metric = tf.keras.metrics.CategoricalAccuracy()

In [11]:
# Step 2: Define the training and testing steps. These steps will be executed in the training loop.
@tf.function  # this decorator will convert the function to a graph which will be executed in the GPU
def train_step(images, labels):
    # forward propagation starts
    with tf.GradientTape() as tape:  # this will record all the operations performed inside the block
        predictions = model(images, training=True)  # pass the images to the model
        loss = loss_fn(labels, predictions)  # calculate the loss value

    # forward propagation ends, now we will start the backward propagation

    parameters = (
        model.trainable_variables
    )  # get all the trainable parameters of the model

    gradients = tape.gradient(
        loss, parameters
    )  # calculate the gradients of the loss with respect to the parameters

    gradients_parameters_tuple = zip(
        gradients, parameters
    )  # zip the gradients and weights together

    optimizer.apply_gradients(
        gradients_parameters_tuple
    )  # apply the gradients to the weights

    # backward propagation ends

    # update the metrics using the labels and predictions
    train_metric.update_state(labels, predictions)

    # return the loss value to be used in the training loop
    return loss


# Step 3: Define the testing step. This step will be executed in the training loop using the test dataset.
@tf.function
def test_step(images, labels):
    predictions = model(images, training=True)  # pass the images to the model
    loss = loss_fn(labels, predictions)  # calculate the loss value
    test_metric.update_state(
        labels, predictions
    )  # update the metrics using the labels and predictions
    return loss  # return the loss value to be used in the training loop

In [12]:
epochs_trained = 0  # this will be used to keep track of the current epoch, this will be useful when we resume training from a checkpoint or when we want to train the model for more epochs
epochs = 10  # number of epochs

In [13]:
# Step 4: Define the training loop. This loop will be executed for a certain number of epochs.


# loop epochs, start from the epoch where the training stopped last time (default is 0)
for epoch in tqdm(range(epochs_trained, epochs)):
    # reset the loss values to 0
    train_loss = 0
    test_loss = 0

    # reset the metrics
    train_metric.reset_state()
    test_metric.reset_state()

    # loop over the training dataset and pass each batch to the training step
    for images, labels in train_dataset:
        train_loss += train_step(
            images, labels
        )  # add the loss value returned by the training step

    # loop over the test dataset and pass each batch to the testing step
    for images, labels in test_dataset:
        test_loss += test_step(
            images, labels
        )  # add the loss value returned by the testing step

    # calculate the average loss value
    train_loss = train_loss / len(train_dataset)
    test_loss = test_loss / len(test_dataset)

    # calculate the metric results
    train_metric_results = train_metric.result()
    test_metric_results = test_metric.result()

    # print the results
    print(
        f"""
  Epoch {epoch + 1} Ended
  Train Loss: {train_loss:.4f}, Train Metric: {train_metric_results:.4f} 
  Test Loss: {test_loss:.4f}, Test Metric: {test_metric_results:.4f}
  """
    )

    # increment the current epoch
    epochs_trained += 1

  0%|          | 0/10 [00:00<?, ?it/s]

2023-02-15 09:37:18.123735: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-15 09:37:30.639854: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2023-02-15 09:37:32.155626: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.



  Epoch 1 Ended
  Train Loss: 0.3981, Train Metric: 0.8565 
  Test Loss: 0.3074, Test Metric: 0.8903
  

  Epoch 2 Ended
  Train Loss: 0.2633, Train Metric: 0.9047 
  Test Loss: 0.2741, Test Metric: 0.9028
  

  Epoch 3 Ended
  Train Loss: 0.2196, Train Metric: 0.9185 
  Test Loss: 0.2465, Test Metric: 0.9103
  

  Epoch 4 Ended
  Train Loss: 0.1864, Train Metric: 0.9314 
  Test Loss: 0.2363, Test Metric: 0.9171
  

  Epoch 5 Ended
  Train Loss: 0.1573, Train Metric: 0.9422 
  Test Loss: 0.2470, Test Metric: 0.9162
  

  Epoch 6 Ended
  Train Loss: 0.1340, Train Metric: 0.9503 
  Test Loss: 0.2516, Test Metric: 0.9213
  

  Epoch 7 Ended
  Train Loss: 0.1108, Train Metric: 0.9593 
  Test Loss: 0.2548, Test Metric: 0.9195
  

  Epoch 8 Ended
  Train Loss: 0.0898, Train Metric: 0.9665 
  Test Loss: 0.2957, Test Metric: 0.9161
  

  Epoch 9 Ended
  Train Loss: 0.0768, Train Metric: 0.9710 
  Test Loss: 0.3073, Test Metric: 0.9172
  

  Epoch 10 Ended
  Train Loss: 0.0611, Train Metric: 0