# Regularization Examples in Deep Learning

This notebook demonstrates **L2 (weight decay)**, **L1**, **Elastic Net**, **Dropout**, and **Batch Normalization** using **TensorFlow/Keras** and **PyTorch**.

---


## TensorFlow / Keras Examples

In [1]:

import tensorflow as tf
print("TensorFlow version:", tf.__version__)


TensorFlow version: 2.19.0


### 1. Weight Decay (L2) using AdamW (recommended)

In [2]:

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(784,)),
    tf.keras.layers.Dense(256, activation="relu"),
    tf.keras.layers.Dense(10)
])

optimizer = tf.keras.optimizers.AdamW(
    learning_rate=1e-3,
    weight_decay=1e-4
)

model.compile(
    optimizer=optimizer,
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"]
)

model.summary()


### 2. L1 and Elastic Net Regularization

In [3]:

l1 = tf.keras.regularizers.L1(1e-5)
elastic = tf.keras.regularizers.L1L2(l1=1e-5, l2=1e-4)

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(784,)),
    tf.keras.layers.Dense(256, activation="relu", kernel_regularizer=elastic),
    tf.keras.layers.Dense(10)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-3),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
)

model.summary()


### 3. Dropout + Batch Normalization

In [4]:

l2 = tf.keras.regularizers.L2(1e-4)

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(784,)),
    tf.keras.layers.Dense(256, use_bias=False, kernel_regularizer=l2),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Activation("relu"),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(10)
])

model.compile(
    optimizer=tf.keras.optimizers.AdamW(1e-3, weight_decay=1e-4),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
)

model.summary()


---
## PyTorch Examples

In [5]:

import torch
import torch.nn as nn
import torch.optim as optim

print("PyTorch version:", torch.__version__)


PyTorch version: 2.9.0+cpu


### 4. Weight Decay (L2) using AdamW

In [6]:

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
)

optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss()

print(model)


Sequential(
  (0): Linear(in_features=784, out_features=256, bias=True)
  (1): ReLU()
  (2): Linear(in_features=256, out_features=10, bias=True)
)


### 5. L1 and Elastic Net (manual penalty added to loss)

In [7]:

def l1_penalty(model):
    return sum(p.abs().sum() for p in model.parameters())

def l2_penalty(model):
    return sum((p ** 2).sum() for p in model.parameters())

lambda1 = 1e-5
lambda2 = 1e-4

# Dummy batch
x = torch.randn(32, 784)
y = torch.randint(0, 10, (32,))

optimizer.zero_grad()
logits = model(x)
data_loss = criterion(logits, y)

loss = data_loss + lambda1 * l1_penalty(model) + lambda2 * l2_penalty(model)
loss.backward()
optimizer.step()

loss.item()


2.3503854274749756

### 6. Dropout + BatchNorm in PyTorch

In [8]:

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(784, 256, bias=False),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.net(x)

model = MLP()
optimizer = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

model


MLP(
  (net): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=False)
    (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout(p=0.3, inplace=False)
    (4): Linear(in_features=256, out_features=10, bias=True)
  )
)


**Note:**  
- `model.train()` enables Dropout and BatchNorm updates  
- `model.eval()` disables Dropout and freezes BatchNorm statistics  

This distinction is critical during inference.
