**Regularization** is a technique used to reduce the discrepancy between the performance of a predictive model during training and testing. This technique enhances the model's ability to generalize.

The process of regularization involves the addition of specific terms to the loss function. These terms influence the selection of optimized parameters.

The basic loss function is represented as:

$$\hat{\phi} = \underset{\boldsymbol{\phi}}{\mathrm{argmin}} \left[ \Sigma_{i=1}^{m} l_i[x_i, y_i]\right]
$$

To guide the minimization process towards preferred solutions, an additional term is included in the loss function:

$$\hat{\phi} = \underset{\boldsymbol{\phi}}{\mathrm{argmin}} \left[ \Sigma_{i=1}^{m} l_i[x_i, y_i] + \lambda \cdot g[\boldsymbol{\phi}]\right]
$$

Here, $g[\boldsymbol{\phi}]$ is a function that returns a scalar. This scalar takes on larger values when the parameters are less desirable.

The term $\lambda$ is a positive scalar that determines the relative influence of the regularization term.

One of the most frequently used regularization terms is the **L2 norm**, which imposes a penalty on the sum of the squares of the parameter values:

$$\hat{\phi} = \underset{\boldsymbol{\phi}}{\mathrm{argmin}} \left[ \Sigma_{i=1}^{m} l_i[x_i, y_i] + \lambda \cdot \Sigma_{j} \boldsymbol{\phi_j}^2\right]
$$

In the context of neural networks, L2 regularization is typically applied to the weights rather than the biases. Hence, it is often referred to as a **weight decay** term.

In [None]:
import tensorflow as tf
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
import matplotlib.pyplot as plt

In [None]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
assert X_train.shape == (50000, 32, 32, 3)
assert X_test.shape == (10000, 32, 32, 3)
assert y_train.shape == (50000, 1)
assert y_test.shape == (10000, 1)

In [None]:
# Convert the data to TensorFlow Datasets
train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
test_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test))

def preprocess(image, label):
    image = tf.cast(image, tf.float32)
    image = (image / 127.5) - 1
    image = tf.image.resize(image, (32, 32))
    return image, label

def prepare_dataset(ds, batch_size=100):
    return ds.map(preprocess).batch(batch_size)

# Preprocess the data
train_ds = prepare_dataset(train_ds)
test_ds = prepare_dataset(test_ds)

In [None]:
model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation="relu", kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.L2(0.005)),
        tf.keras.layers.Dense(256, activation="relu", kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.L2(0.005)),
        tf.keras.layers.Dense(128, activation="relu", kernel_initializer="he_normal", kernel_regularizer=tf.keras.regularizers.L2(0.005)),
        tf.keras.layers.Dense(10, activation="softmax", kernel_regularizer=tf.keras.regularizers.L2(0.005)),
    ]
)

# Compile the model
model.compile(
    optimizer=tf.keras.optimizers.SGD(momentum=0.9, learning_rate=0.01, nesterov=True),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=["accuracy"],
)

reduce_lr = ReduceLROnPlateau(
    monitor="val_accuracy", factor=0.2, patience=2, min_lr=0.001
)
early_stopping = EarlyStopping(
    monitor="val_accuracy", patience=5, restore_best_weights=True
)
callbacks = [early_stopping, reduce_lr]

# Train the model
history = model.fit(
    train_ds, epochs=20, validation_data=test_ds, batch_size=100, callbacks=callbacks
)

In [None]:
# Plot training & validation accuracy values
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')

# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')

plt.tight_layout()
plt.show()