In [1]:
import os
import torch
from torch import nn
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
from torchvision import transforms

In [2]:
class MLP(nn.Module):
    '''
    Multilayer Perceptron.
    '''
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
          nn.Flatten(),
          nn.Linear(28 * 28 * 1, 64),
          nn.ReLU(),
          nn.Linear(64, 32),
          nn.ReLU(),
          nn.Linear(32, 10)
        )

    def forward(self, x):
        '''Forward pass'''
        return self.layers(x)
  
    def compute_l1_loss(self, w):
        return torch.abs(w).sum()

**L1 Regularization**, also called **Lasso Regularization**, involves adding the absolute value of all weights to the loss value.

Suppose that you are using binary crossentropy loss with your PyTorch based classifier. You want to implement L1 Regularization, which effectively involves that $\sum_f{ _{i=1}^{n}} | w_i |$ is added to the loss.

Here, [latex]n[/latex] represents the number of individual weights, and you can see that we iterate over these weights. We then take the absolute value for each value $w_i$ and sum everything together.

In other words, L1 Regularization loss can be implemented as follows:

$\text{full_loss = original_loss + } \sum_f{ _{i=1}^{n}} | w_i |$

Here, original_loss is binary crossentropy. However, it can be pretty much any loss function that you desire!

Implementing L1 Regularization with PyTorch can be done in the following way.

- We specify a class MLP that extends PyTorch's nn.Module class. In other words, it's a neural network using PyTorch.
- To the class, we add a def called compute_l1_loss. This is an implementation of taking the absolute value and summing all values for w in a particular trainable parameter.
- In the training loop specified subsequently, we specify a L1 weight, collect all parameters, compute L1 loss, and add it to the loss function before error backpropagation.
- We also print the L1 component of our loss when printing statistics.

In [6]:
if __name__ == '__main__':
  
    # Set fixed random number seed
    torch.manual_seed(42)
  
    # Prepare MNIST dataset
    dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=64, shuffle=True, num_workers=6)
  
    # Initialize the MLP
    mlp = MLP()
  
    # Define the loss function and optimizer
    loss_function = nn.CrossEntropyLoss()
    optimizer = torch.optim.AdamW(mlp.parameters(), lr=1e-4)
  
    # Run the training loop
    for epoch in range(0, 5): # 5 epochs at maximum
    
        # Print epoch
        print(f'Starting epoch {epoch+1}')
    
        # Iterate over the DataLoader for training data
        for i, data in enumerate(trainloader, 0):
      
            # Get inputs
            inputs, targets = data
      
            # Zero the gradients
            optimizer.zero_grad()
      
            # Perform forward pass
            outputs = mlp(inputs)
      
            # Compute loss
            loss = loss_function(outputs, targets)
      
            # Compute L1 loss component
            l1_weight = 1.0
            l1_parameters = []
            for parameter in mlp.parameters():
                l1_parameters.append(parameter.view(-1))
            l1 = l1_weight * mlp.compute_l1_loss(torch.cat(l1_parameters))
      
            # Add L1 loss component
            loss += l1
      
            # Perform backward pass
            loss.backward()
      
            # Perform optimization
            optimizer.step()
      
            # Print statistics
            minibatch_loss = loss.item()
            if i % 500 == 499:
                print('Loss after mini-batch %5d: %.5f (of which %.5f L1 loss)'%(i + 1, minibatch_loss, l1))
                #current_loss = 0.0

    # Process is complete.
    print('Training process has finished.')

Starting epoch 1
Loss after mini-batch   500: 65.43456 (of which 63.13525 L1 loss)
Starting epoch 2
Loss after mini-batch   500: 4.08177 (of which 1.77761 L1 loss)
Starting epoch 3
Loss after mini-batch   500: 2.94135 (of which 0.63876 L1 loss)
Starting epoch 4
Loss after mini-batch   500: 2.94091 (of which 0.63833 L1 loss)
Starting epoch 5
Loss after mini-batch   500: 2.94062 (of which 0.63803 L1 loss)
Training process has finished.
