## In this notebook we are going to solve the problem of overfittig. We had Training accuracy = 95.5% and test accuracy = 88.11%. We want to reduce this gap.  

**How to reduce overfitting?**

1. Adding more data.
- We have only limited data so can't do that.
2. Reducing the complexity of NN architecture.
- Our NN is already very simple.  
3. Regualrization
- In this we add a penalty term (L1, L2) in the loss function. So the model try to minimize both (loss, penalty) and in this process we reduce the overfitting. We will do that.
  - Applied to model weights not on bias.
  - Introduced via loss function or optimizer.
  - Penalizes large weight.
  - Active during Training.

4. Dropouts
- During training time you turn off some neurons randomly. By doing that you are making NN simpler and introducing randomizaion which cause reduction in overfitting. We will do that.
 - Applied to the hidden layers.
 - Applied after the ReLU activation function.
 - Randomly turns off p% neurons in the hidden layer during each forward pass.
 - This has a regularization effect.
 - During evaluation dropout is not used.
5. Data Augmentation
- Modify your images. Works better when you use CNN. Since we are using ANN we are not going to use it.
6. Batch Normalization
- To stablize training. It also reduces overfitting. We will do that.  
  - Applied to hidden layer.
  - Applied after Linear layers and before activation functions.
  - Normalizes Activations:
    - Computes the mean and variance of the activations within a mini-batch and uses these statistics to normalize the activation.
  - Includes Learnable parameters:
    - Introduces two learnable parameters, gamma (scaling) and beta (shifting), which allow the network to adjust the normalized outputs.
  - Improves training stability.
  - Introduce some regularization because the statistics are computed over a mini-batch, adding noise to the training process.
  - During evaluation, BatchNorm used the running mean and variance accumulated during training, rather than recomputing them from the mini-batch.
7. Early stopping
- After some epochs your loss is not improving. So you stop your training early. We will not do it for now.

**Why reducing overfitting is crucial even when training accuracy improvements are minimal?**

Reducing overfitting is crucial even when training accuracy improvements are minimal because the primary goal is to ensure the model generalizes well to unseen data, not just perform well on the training set. Overfitting occurs when a model memorizes noise or irrelevant patterns in the training data, leading to poor performance in real-world scenarios. Here’s why addressing overfitting remains essential:

1. Generalization Over Memorization
Overfit models excel at memorizing training data but fail to capture underlying patterns applicable to new data. For example, a model achieving 99% training accuracy but only 45% test accuracy is useless in practice. Reducing overfitting shifts focus from memorization to learning generalizable features.

2. Validation Performance Matters More
A small gap between training and validation accuracy (e.g., 95% vs. 96%) might not seem problematic, but widening gaps signal overfitting. Techniques like early stopping or regularization prevent further divergence, ensuring validation performance doesn’t degrade.

3. Avoiding Noise Sensitivity
Overfit models often latch onto dataset-specific noise (e.g., lighting conditions in images) that don’t generalize. Simplifying the model or augmenting data forces it to focus on robust patterns.

4. Model Efficiency and Scalability
Complex, overfit models waste resources on unnecessary features. Pruning or regularization streamlines models, improving computational efficiency without sacrificing real-world performance.

In [1]:
import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt

In [2]:
torch.manual_seed(42)

<torch._C.Generator at 0x7ec6dfbfc670>

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Using device: cuda


In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
file_location_train = "/content/drive/MyDrive/PyTorch/Dataset/fashion-mnist_train.csv"
file_location_test = "/content/drive/MyDrive/PyTorch/Dataset/fashion-mnist_test.csv"

In [6]:
train_data = pd.read_csv(file_location_train)
train_data.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,6,0,0,0,0,0,0,0,5,0,...,0,0,0,30,43,0,0,0,0,0
3,0,0,0,0,1,2,0,0,0,0,...,3,0,0,0,0,1,0,0,0,0
4,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [7]:
test_data = pd.read_csv(file_location_test)
test_data.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,0,0,0,0,0,0,0,0,9,8,...,103,87,56,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,34,0,0,0,0,0,0,0,0,0
2,2,0,0,0,0,0,0,14,53,99,...,0,0,0,0,63,53,31,0,0,0
3,2,0,0,0,0,0,0,0,0,0,...,137,126,140,0,133,224,222,56,0,0
4,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
X_train = train_data.iloc[: , 1 : ].values
y_train = train_data.iloc[:, 0].values
X_test = test_data.iloc[ : , 1 : ].values
y_test = test_data.iloc[ : , 0].values

In [9]:
print(f"Shape of X_train, y_train, X_test, y_test: {X_train.shape, y_train.shape, X_test.shape, y_test.shape}")

Shape of X_train, y_train, X_test, y_test: ((60000, 784), (60000,), (10000, 784), (10000,))


In [10]:
X_train = X_train/255.0
X_test = X_test/255.0

In [11]:
class CustomDataset(Dataset):

  def __init__(self, features, labels):
    self.features = features
    self.labels = labels

  def __len__(self):
    return len(self.features)

  def __getitem__(self, index):
    return self.features[index], self.labels[index]

In [12]:
train_dataset = CustomDataset(X_train, y_train)
test_dataset = CustomDataset(X_test, y_test)

In [13]:
train_loader = DataLoader(train_dataset, batch_size = 32, shuffle = True, pin_memory = True)
test_loader = DataLoader(test_dataset, batch_size = 32, shuffle = False, pin_memory = True)

In [14]:
len(train_loader)

1875

In [24]:
class MyNN(nn.Module):

  def __init__(self, num_features):
    super().__init__()
    self.model = nn.Sequential(
        nn.Linear(num_features, 128),
        nn.BatchNorm1d(128),
        nn.ReLU(),
        nn.Dropout(p = 0.3),
        nn.Linear(128, 64),
        nn.BatchNorm1d(64),
        nn.ReLU(),
        nn.Dropout(p = 0.3),
        nn.Linear(64, 10)
    )

  def forward(self, x):
    return self.model(x.float())

In [25]:
learning_rate = 0.01
epochs = 100

In [26]:
model = MyNN(X_train.shape[1])

model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1e-4)             # Regularization term.

In [27]:
# training loop

for epoch in range(epochs):

  total_epoch_loss = 0

  for batch_features, batch_labels in train_loader:

    # move data to gpu
    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

    # forward pass
    outputs = model(batch_features)

    # calculate loss
    loss = criterion(outputs, batch_labels)

    # back pass
    optimizer.zero_grad()
    loss.backward()

    # update grads
    optimizer.step()

    total_epoch_loss = total_epoch_loss + loss.item()

  avg_loss = total_epoch_loss/len(train_loader)
  print(f'Epoch: {epoch + 1} , Loss: {avg_loss}')

Epoch: 1 , Loss: 0.6005098235766093
Epoch: 2 , Loss: 0.47685748488108315
Epoch: 3 , Loss: 0.44367153516610464
Epoch: 4 , Loss: 0.42010791396299996
Epoch: 5 , Loss: 0.40710906700293226
Epoch: 6 , Loss: 0.3939294756770134
Epoch: 7 , Loss: 0.3840624964038531
Epoch: 8 , Loss: 0.37844876986344655
Epoch: 9 , Loss: 0.3700479677259922
Epoch: 10 , Loss: 0.3626800413231055
Epoch: 11 , Loss: 0.36116283952792483
Epoch: 12 , Loss: 0.35366894583304725
Epoch: 13 , Loss: 0.3526247697452704
Epoch: 14 , Loss: 0.34696388305624326
Epoch: 15 , Loss: 0.34243846335013706
Epoch: 16 , Loss: 0.3345063408533732
Epoch: 17 , Loss: 0.3317247689286868
Epoch: 18 , Loss: 0.3318375828484694
Epoch: 19 , Loss: 0.32630489813486735
Epoch: 20 , Loss: 0.3254208302994569
Epoch: 21 , Loss: 0.3233134502530098
Epoch: 22 , Loss: 0.32098559683561323
Epoch: 23 , Loss: 0.3222304383834203
Epoch: 24 , Loss: 0.3199299179792404
Epoch: 25 , Loss: 0.3149029949227969
Epoch: 26 , Loss: 0.313706370006005
Epoch: 27 , Loss: 0.31492404814958574

In [28]:
model.eval()

MyNN(
  (model): Sequential(
    (0): Linear(in_features=784, out_features=128, bias=True)
    (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout(p=0.3, inplace=False)
    (4): Linear(in_features=128, out_features=64, bias=True)
    (5): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (6): ReLU()
    (7): Dropout(p=0.3, inplace=False)
    (8): Linear(in_features=64, out_features=10, bias=True)
  )
)

In [29]:
# evaluation on test data
total = 0
correct = 0

with torch.no_grad():

  for batch_features, batch_labels in test_loader:

    # move data to gpu
    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

    outputs = model(batch_features)

    _, predicted = torch.max(outputs, 1)

    total = total + batch_labels.shape[0]

    correct = correct + (predicted == batch_labels).sum().item()

print(correct/total)

0.8926


In [30]:
# evaluation on training data
total = 0
correct = 0

with torch.no_grad():

  for batch_features, batch_labels in train_loader:

    # move data to gpu
    batch_features, batch_labels = batch_features.to(device), batch_labels.to(device)

    outputs = model(batch_features)

    _, predicted = torch.max(outputs, 1)

    total = total + batch_labels.shape[0]

    correct = correct + (predicted == batch_labels).sum().item()

print(correct/total)

0.9332166666666667
