# Overfitting

Overfitting occurs when a model learns too much from the training data, including noise, leading to poor generalization. Underfitting happens when the model is too simple to capture patterns.

# Signs of Overfitting
1. High Training Accuracy, Low Test Accuracy – If your model performs exceptionally well on training data but poorly on test data, it may be overfitting.

2. Large Gap Between Training and Validation Loss – A significant difference between training and validation loss indicates overfitting.

3. Complex Model with Too Many Parameters – Overly complex models tend to memorize training data rather than generalizing.

4. Performance Decreases on New Data – If the model struggles with unseen data, it may have learned patterns specific to the training set.

##Methods to Detect Overfitting
1. Learning Curves – Plot training and validation accuracy/loss over epochs. If validation loss increases while training loss decreases, overfitting is likely.

2. Cross-Validation – Using techniques like k-fold cross-validation helps assess generalization.

3. Regularization Techniques – L1/L2 regularization, dropout, and early stopping can help mitigate overfitting.

# Underfitting

# Signs of Underfitting
1. High Error Rates on Both Training and Validation Data – If the model performs poorly on both datasets, it may be underfitting.

2. Low Model Complexity – Using a simple model (e.g., linear regression for non-linear data) can lead to underfitting.

3. Failure to Capture Trends – Visualizing predictions vs. actual data can reveal if the model fails to learn patterns.

4. Bias-Variance Tradeoff – High bias and low variance indicate underfitting, meaning the model is too rigid.

# How to Address Underfitting
1. Increase Model Complexity – Use deeper neural networks or more features.

2. Train for More Epochs – Allow the model to learn longer before stopping.

3. Reduce Regularization – Excessive dropout or weight decay can prevent learning.

4. Improve Feature Engineering – Ensure relevant features are included

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Load dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = torchvision.datasets.MNIST(root="./data", train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)

# Define a simple neural network
class OverfitNet(nn.Module):
    def __init__(self):
        super(OverfitNet, self).__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.fc2 = nn.Linear(512, 512)
        self.fc3 = nn.Linear(512, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize model, loss function, and optimizer
model = OverfitNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model (overfitting likely due to large capacity)
for epoch in range(10):
    for images, labels in train_loader:
        optimizer.zero_grad()
        output = model(images)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")


100%|██████████| 9.91M/9.91M [00:00<00:00, 17.4MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 472kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.35MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 7.57MB/s]


Epoch 1, Loss: 0.2142
Epoch 2, Loss: 0.1167
Epoch 3, Loss: 0.2253
Epoch 4, Loss: 0.1279
Epoch 5, Loss: 0.0180
Epoch 6, Loss: 0.1655
Epoch 7, Loss: 0.1267
Epoch 8, Loss: 0.0335
Epoch 9, Loss: 0.0007
Epoch 10, Loss: 0.0104


# Regularization (L2 Weight Decay)

```
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)  # L2 Regularization
```



# Dropout Regularization

Dropout randomly disables neurons during training to prevent reliance on specific features.

In [2]:
class DropoutNet(nn.Module):
    def __init__(self):
        super(DropoutNet, self).__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.dropout = nn.Dropout(0.5)  # Dropout layer
        self.fc2 = nn.Linear(512, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)  # Apply dropout
        x = self.fc2(x)
        return x

# Initialize model with dropout
model_dropout = DropoutNet()
optimizer_dropout = optim.Adam(model_dropout.parameters(), lr=0.001)


# Early Stopping

Early stopping halts training when validation loss stops improving

In [5]:
class EarlyStopping:
    def __init__(self, patience=3):
        self.patience = patience
        self.best_loss = float("inf")
        self.counter = 0

    def check(self, val_loss):
        if val_loss < self.best_loss:
            self.best_loss = val_loss
            self.counter = 0
        else:
            self.counter += 1
            if self.counter >= self.patience:
                print("Early stopping triggered!")
                return True
        return False

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)  # Apply dropout
        x = self.fc2(x)
        return x