<div class="alert alert-success"><h1>L1 Regularization of a Deep Learning Model in Python</h1></div>

Regularization is a set of techniques designed to reduce overfitting in machine learning models. Overfitting occurs when a model performs exceptionally well on the training data but fails to generalize to unseen data. This often happens when the model becomes overly complex and starts memorizing the noise or irrelevant details in the training data, instead of learning the underlying patterns.

Regularization works by introducing constraints or penalties to the training process, limiting the model’s ability to fit the training data too closely. By controlling the complexity of the model, regularization helps strike a balance between underfitting and overfitting, resulting in better generalization performance.

## Learning Objectives
By the end of this tutorial, you will:
+ Know how to apply L1 regularization to a deep learning model.
+ Understand how to evaluate the impact of regularization on a deep learning model.


## Prerequisites
Before we begin, ensure you have:
+ Basic knowledge of Python programming (variables, functions, classes).
+ Familiarity with the fundamentals of how to build a deep learning model in Python using Keras.
+ A Python (version 3.x) environment with the `tensorflow`, `keras`, and `matplotlib` packages installed.

<div class="alert alert-success"><h2>1. Import and Preprocess the Data</h2></div>

We start by importing the data. For this tutorial, we'll use the **MNIST dataset**, a classic dataset in the machine learning community. It consists of 70,000 grayscale images of handwritten digits ranging from 0 to 9. Each image is 28 x 28 pixels, and the dataset is divided into 60,000 training images and 10,000 testing images. Our goal will be to develop a model that learns to correctly identify a handritten digit given the image.

In [None]:
from tensorflow import keras

keras.utils.set_random_seed(1234)
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()

Our deep learning model expects the images as a vector of size 784 (i.e. 28 $\times$ 28). So, let's flatten the images.

In [None]:
train_images = train_images.reshape(60000, 28 * 28)
test_images = test_images.reshape(10000, 28 * 28)

The model also expects the image pixel values scaled. Let's do that as well.

In [None]:
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255

Finally, we also need to one-hot encode the image labels.

In [None]:
num_classes = 10
train_labels = keras.utils.to_categorical(train_labels, num_classes)
test_labels = keras.utils.to_categorical(test_labels, num_classes)

<div class="alert alert-success"><h2>2. Define the Baseline Model</h2></div>

The baseline model consists of an input layer with 784 nodes, two hidden layers with 512 and 128 nodes (respectively), and an output layer with 10 nodes.

In [None]:
from tensorflow.keras.layers import Input, Dense

model = keras.Sequential([
    Input(shape = (784,)),
    Dense(512, activation = 'relu'),
    Dense(128, activation = 'relu'),
    Dense(10, activation = 'softmax')
])

<div class="alert alert-success"><h2>3. Compile and Train the Baseline Model</h2></div>

Next, we compile the baseline model, ...

In [None]:
model.compile(
    optimizer = 'adam',
    loss = 'categorical_crossentropy',
    metrics = ['accuracy']
)

... train the model, ...

In [None]:
history = model.fit(
    train_images, 
    train_labels,
    epochs = 15,
    batch_size = 128,
    validation_split = 0.1
)

... and plot the training and validation loss metrics.

In [None]:
from matplotlib import pyplot as plt

plt.figure(figsize = (8, 6))
plt.plot(history.history['loss'], label = 'Training Loss', marker = 'o')
plt.plot(history.history['val_loss'], label = 'Validation Loss', marker = 's')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

A clear indicator of overfitting is the divergence in the training and validation loss metrics, which is visible in the training curves above.

<div class="alert alert-info"><b>Note:</b> To learn more about deep learning and how to build a deep learning model in Python using Keras, refer to  the LinkedIn Learning course titled <b>"Deep Learning with Python: Foundations"</b>.</div>

<div class="alert alert-success"><h2>4. Apply L1 Regularization to the Model</h2></div>

L1 (Lasso) Regularization adds a penalty proportional to the absolute values of the weights during training. This encourages sparsity, meaning the model learns to rely only on the most important features. Mathematically, the impact of L1 regularization on the loss function is defined as:
$$
\text{Loss}_{L1} = \text{Original Loss} + \lambda \cdot \sum ^n _{i=1} \lvert w_i \rvert
$$

+ **$\lambda$:** is the regularization paremeter, controlling the strength of the penaly. Higher values of $\lambda$ lead to stronger regularization.
+ **$\lvert w_i \rvert$:** are the absolute values of the weights.

To apply L1 regularization to the baseline model, we set the `kernel_regularizer` argument within each hidden layer of the network to `l1(0.001)`. This means that the regularization paremeter is set to $0.001$.

In [None]:
from tensorflow.keras.regularizers import l1

model_l1 = keras.Sequential([
    Input(shape = (784,)),                          
    Dense(512, activation = 'relu', kernel_regularizer = l1(0.001)),         
    Dense(128, activation = 'relu', kernel_regularizer = l1(0.001)),
    Dense(10, activation = 'softmax')
])

Next, we compile the regularized model,...

In [None]:
model_l1.compile(
    optimizer = 'adam',                                  
    loss = 'categorical_crossentropy',                    
    metrics = ['accuracy']                                
)

... train the regularized model, ...

In [None]:
history = model_l1.fit(
    train_images, 
    train_labels,
    epochs = 15,
    batch_size = 128,
    validation_split = 0.1
)

... and plot the training and validation loss metrics.

In [None]:
plt.figure(figsize = (8, 6))
plt.plot(history.history['loss'], label = 'Training Loss', marker = 'o')
plt.plot(history.history['val_loss'], label = 'Validation Loss', marker = 's')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()

This time, we see that the two metrics reduce in value at a similar rate as training continues. This indicates that L1 regularization is effectively helping the model generalize better by encouraging sparsity in the learned weights. 

By penalizing the absolute values of the weights, L1 regularization pushes many weights towards zero, effectively simplifying the model and reducing the risk of overfitting to the training data.

The similar rate of reduction for both the training and validation loss metrics suggests that the model is learning patterns that are relevant across both datasets, rather than over-specializing to the training data. The sparsity induced by L1 regularization allows the model to focus only on the most important features, which improves its ability to generalize to unseen data.