Regularization in Deep Learning

Regularization helps prevent overfitting by reducing the complexity of the model. Two common techniques are L1 & L2 regularization and Dropout.


L1 & L2 Regularization (Lasso & Ridge)

L1 (Lasso): Shrinks some weights to exactly zero, promoting sparsity.

L2 (Ridge): Shrinks weights but does not set them to zero, reducing their magnitude smoothly.

Implementing L2 Regularization in a Neural Network

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers

# Create a neural network with L2 regularization
model = keras.Sequential([
    layers.Flatten(input_shape=(28, 28)),  # Input layer
    layers.Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.01)),  # L2 regularization
    layers.Dense(10, activation='softmax')  # Output layer
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


Effect: 

L2 regularization reduces large weight values, making the model more generalizable.

Dropout Regularization

Dropout randomly drops neurons during training, forcing the network to learn redundant and generalized features.

Adding Dropout in Keras

In [None]:
model = keras.Sequential([
    layers.Flatten(input_shape=(28, 28)),  
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),  # Dropout layer with 50% probability
    layers.Dense(10, activation='softmax')
])


Effect: Dropout prevents over-reliance on specific neurons, improving generalization.

Hyperparameters in Deep Learning

Hyperparameters control the learning process but are not learned by the model. Some key hyperparameters are:

Learning Rate (lr)
Batch Size
Number of Epochs
Number of Layers & Neurons

Tuning the Learning Rate

A small learning rate can slow down learning, while a large one may cause instability.

In [None]:
# Using a custom learning rate
optimizer = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


 Effect: Proper tuning of the learning rate improves convergence.



Validation Sets & Cross-Validation

Validation Set

A validation set helps in tuning hyperparameters and avoiding overfitting

In [None]:
# Splitting dataset into training, validation, and test sets
(x_train, x_valid) = x_train[:50000], x_train[50000:]
(y_train, y_valid) = y_train[:50000], y_train[50000:]

# Train model with validation set
model.fit(x_train, y_train, epochs=5, validation_data=(x_valid, y_valid))


K-Fold Cross-Validation

Instead of using a single validation set, K-Fold Cross-Validation splits data into K subsets, training on K-1 and validating on the remaining fold.

In [None]:
from sklearn.model_selection import KFold
import numpy as np

kf = KFold(n_splits=5)

for train_index, val_index in kf.split(x_train):
    x_train_fold, x_val_fold = x_train[train_index], x_train[val_index]
    y_train_fold, y_val_fold = y_train[train_index], y_train[val_index]
    
    model.fit(x_train_fold, y_train_fold, epochs=5, validation_data=(x_val_fold, y_val_fold))


Effect: Ensures the model generalizes well to unseen data.

Supervised vs. Unsupervised Learning

Supervised Learning

Training data has input-output pairs.

Example: Image classification, sentiment analysis

In [None]:
# Example: Training a supervised neural network on labeled images
model.fit(x_train, y_train, epochs=5)


Unsupervised Learning

No labels; model finds patterns.

Example: Clustering, anomaly detection.

Example: K-Means Clustering on Digits Dataset

In [None]:
from sklearn.cluster import KMeans
x_train_flat = x_train.reshape(x_train.shape[0], -1)  # Flatten images

kmeans = KMeans(n_clusters=10)  # 10 clusters for digits 0-9
kmeans.fit(x_train_flat)

# Get cluster assignments
labels = kmeans.labels_


Summary of Key Concepts

**Concept**	              **Purpose**	                    **Code Example**

**L1/L2 Regularization** 	Prevent overfitting	         regularizers.l2(0.01)

**Dropout**	          Improve generalization	          layers.Dropout(0.5)

**Learning Rate**	   Control step size	     optimizers.Adam(learning_rate=0.001)

**Validation Set**	   Tune hyperparameters	      validation_data=(x_valid, y_valid)

**Cross-Validation**	  Robust model training	            KFold(n_splits=5)

**Supervised Learning**	    Labeled data	             model.fit(x_train, y_train)

**Unsupervised Learning**	  Discover patterns	      KMeans(n_clusters=10).fit    (x_train_flat)
