- Traget: Included Augmentation to make train data complex
- Results: 
    - Parameters : 7348
    - Best Train accuracy : 97.85
    - Best Test accuracy : 99.42
- Analysis: 
    - Augmentation makes the training data complex, hence test accuracy is always greater than train accuracy
    - The test accuracy has increased from previous 99.40 to 99.42. Consistently seen 99.4 in 2 epochs 

# Import Libraries

In [68]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
%matplotlib inline
import matplotlib.pyplot as plt
import models
import warnings
warnings.filterwarnings("ignore")

### Calculating the mean and std dev. of the dataset

In [81]:
tensor_transforms = transforms.Compose([transforms.ToTensor()])

exp = datasets.MNIST('./data', train=True, download=True, transform=tensor_transforms)

In [82]:
exp_train_data = exp.train_data
exp_test_data = exp.test_data
print(exp_train_data.shape)
print(exp_test_data.shape)


torch.Size([60000, 28, 28])
torch.Size([60000, 28, 28])


In [83]:
exp_train_data = exp.transform(exp_train_data.numpy())
print('[Train]')
print(' - Numpy Shape:', exp.train_data.cpu().numpy().shape)
print(' - Tensor Shape:', exp.train_data.size())
print(' - min:', torch.min(exp_train_data))
print(' - max:', torch.max(exp_train_data))
print(' - mean:', torch.mean(exp_train_data))
print(' - std:', torch.std(exp_train_data))
print(' - var:', torch.var(exp_train_data))

[Train]
 - Numpy Shape: (60000, 28, 28)
 - Tensor Shape: torch.Size([60000, 28, 28])
 - min: tensor(0.)
 - max: tensor(1.)
 - mean: tensor(0.1307)
 - std: tensor(0.3081)
 - var: tensor(0.0949)


## Data Transformations

We first start with defining our data transformations. We need to think what our data is and how can we augment it to correct represent images which it might not see otherwise.


In [94]:
# Train transformations
train_transforms = transforms.Compose([
    transforms.RandomRotation((-10.0, 10.0), fill=(1,)), # rotate the image within the range of -10 to 10 degrees and fill the empty regions with white color(1) 
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)), # translate the image up to 10% horizontally and vertically without any rotation(0 degrees)
    transforms.RandomPerspective(distortion_scale=0.1, p=0.5), # apply a perspective transformation with distortion up to 10% with probability of 0.5 
    transforms.RandomResizedCrop(size=28, scale=(0.9, 1.0)),  # resizes the cropped area to 28x28, 'scale=(0.9, 1.0)' ensures the crop covers 90-100% of the original area.
    transforms.ToTensor(),  # Convert to tensor first before erasing 
    # transforms.Normalize((0.1307,), (0.3081,)),
    transforms.RandomErasing(p=0.2, scale=(0.02, 0.08), ratio=(0.3, 3.3), value=0.0),  # erase a rectangle in the image with realtive size of 'scale' with 0.3 to 3.3 ratio, the recatangle is black in color(0). Probability of erasing is 0.25 
    transforms.Normalize((0.1307,), (0.3081,)) # Normalize it with mean and std dev of train_data. 
])

# Test transformations
test_transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)) # use the train data's mean and std dev
])



# Dataset and Creating Train/Test Split

In [95]:
train = datasets.MNIST('./data', train=True, download=True, transform=train_transforms) # downloading the train data and applying train transforms
test = datasets.MNIST('./data', train=False, download=True, transform=test_transforms) # downloading the test data and applying test transforms

# Dataloader Arguments & Test/Train Dataloaders


In [96]:
SEED = 1

# CUDA?
cuda = torch.cuda.is_available()
print("CUDA Available?", cuda)

# For reproducibility
torch.manual_seed(SEED)

if cuda:
    torch.cuda.manual_seed(SEED)

# dataloader arguments - something you'll fetch these from cmdprmt
dataloader_args = dict(shuffle=True, batch_size=128, num_workers=4, pin_memory=True) if cuda else dict(shuffle=True, batch_size=64)

# train dataloader
train_loader = torch.utils.data.DataLoader(train, **dataloader_args)

# test dataloader
test_loader = torch.utils.data.DataLoader(test, **dataloader_args)

CUDA Available? False


# The model


In [97]:


# class Net(nn.Module):
#     def __init__(self):
#         super(Net, self).__init__()
#         dropout_value = 0.05
        
#         # Input Block - RF: 3
#         self.convblock1 = nn.Sequential(
#             nn.Conv2d(1, 8, 3, padding=1, bias=False),  # RF: 3
#             nn.BatchNorm2d(8),
#             nn.ReLU(),
#             nn.Dropout(dropout_value)
#         )

#         # Conv Block 1 - RF: 5
#         self.convblock2 = nn.Sequential(
#             nn.Conv2d(8, 8, 3, padding=1, bias=False),  # RF: 5
#             nn.BatchNorm2d(8),
#             nn.ReLU(),
#             nn.Dropout(dropout_value)
#         )

#         # Transition Block 1 - RF: 6
#         self.pool1 = nn.MaxPool2d(2, 2)  # RF: 6
#         self.convblock3 = nn.Sequential(
#             nn.Conv2d(8, 12, 1, bias=False),  # RF: 6
#             nn.BatchNorm2d(12),
#             nn.ReLU(),
#             nn.Dropout(dropout_value)
#         )

#         # Conv Block 2 - RF: 14
#         self.convblock4 = nn.Sequential(
#             nn.Conv2d(12, 12, 3, padding=1, bias=False),  # RF: 10
#             nn.BatchNorm2d(12),
#             nn.ReLU(),
#             nn.Dropout(dropout_value),
#             nn.Conv2d(12, 12, 3, padding=1, bias=False),  # RF: 14
#             nn.BatchNorm2d(12),
#             nn.ReLU(),
#             nn.Dropout(dropout_value)
#         )

#         # Transition Block 2 - RF: 16
#         self.pool2 = nn.MaxPool2d(2, 2)  # RF: 16
#         self.convblock5 = nn.Sequential(
#             nn.Conv2d(12, 12, 1, bias=False),  # RF: 16
#             nn.BatchNorm2d(12),
#             nn.ReLU(),
#             nn.Dropout(dropout_value)
#         )

#         # Conv Block 3 - RF: 28
#         self.convblock6 = nn.Sequential(
#             nn.Conv2d(12, 12, 3, padding=1, bias=False),  # RF: 20
#             nn.BatchNorm2d(12),
#             nn.ReLU(),
#             nn.Dropout(dropout_value),
#             nn.Conv2d(12, 12, 3, padding=1, bias=False),  # RF: 24
#             nn.BatchNorm2d(12),
#             nn.ReLU(),
#             nn.Dropout(dropout_value),
#             nn.Conv2d(12, 10, 3, padding=1, bias=False),  # RF: 28
#             nn.BatchNorm2d(10),
#             nn.ReLU(),
#             nn.Dropout(dropout_value)
#         )

#         self.gap = nn.AdaptiveAvgPool2d(1)

#     def forward(self, x):
#         x = self.convblock1(x)
#         x = self.convblock2(x)
#         x = self.pool1(x)
#         x = self.convblock3(x)
#         x = self.convblock4(x)
#         x = self.pool2(x)
#         x = self.convblock5(x)
#         x = self.convblock6(x)
#         x = self.gap(x)
#         x = x.view(-1, 10)
#         return F.log_softmax(x, dim=-1)
    

       



# Model Params


In [98]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
print(device)
# model = Net().to(device)
model = models.Model_4_augmnetation().to(device)
summary(model, input_size=(1, 28, 28))

cpu
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 28, 28]              72
       BatchNorm2d-2            [-1, 8, 28, 28]              16
              ReLU-3            [-1, 8, 28, 28]               0
           Dropout-4            [-1, 8, 28, 28]               0
            Conv2d-5            [-1, 8, 28, 28]             576
       BatchNorm2d-6            [-1, 8, 28, 28]              16
              ReLU-7            [-1, 8, 28, 28]               0
           Dropout-8            [-1, 8, 28, 28]               0
         MaxPool2d-9            [-1, 8, 14, 14]               0
           Conv2d-10           [-1, 12, 14, 14]              96
      BatchNorm2d-11           [-1, 12, 14, 14]              24
             ReLU-12           [-1, 12, 14, 14]               0
          Dropout-13           [-1, 12, 14, 14]               0
           Conv2d-14           [-1,


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


# Training and Testing

Looking at logs can be boring, so we'll introduce **tqdm** progressbar to get cooler logs.

Let's write train and test functions

In [99]:
# model = Net().to(device)
model = models.Model_4_augmnetation().to(device)
optimizer = optim.AdamW(model.parameters(), lr=0.01, weight_decay=1e-4) # defining the optimizer with leraning rate of 0.01
# scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts(
#     optimizer,
#     T_0=4,  # Initial restart interval
#     T_mult=1,  # Multiplier for restart interval
#     eta_min=1e-6  # Minimum learning rate
# )

# Using the OneCycleLR scheduler for dynamic learning rate adjustment.
scheduler = optim.lr_scheduler.OneCycleLR(
    optimizer,
    max_lr=0.01,  # Maximum learning rate during the cycle.
    epochs=15,  # Total number of epochs for training.
    steps_per_epoch=len(train_loader),  # Number of steps in one epoch (based on train loader size).
    pct_start=0.3,  # Percentage of the cycle for increasing the learning rate.
    div_factor=10,  # Factor by which the initial learning rate is divided from max_lr.
    final_div_factor=100,  # Factor by which the learning rate is reduced at the end of the cycle.
)



In [100]:
from tqdm import tqdm

train_losses = []
test_losses = []
train_acc = []
test_acc = []

def train(model, device, train_loader, optimizer, epoch):
  model.train()
  pbar = tqdm(train_loader)
  correct = 0
  processed = 0
  for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)

    # Init
    optimizer.zero_grad()
    # In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes.
    # Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly.

    # Predict
    y_pred = model(data)

    # Calculate loss
    loss = F.nll_loss(y_pred, target)
    train_losses.append(loss)

    # Backpropagation
    loss.backward()
    optimizer.step()
    scheduler.step()

    # Update pbar-tqdm

    pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
    correct += pred.eq(target.view_as(pred)).sum().item()
    processed += len(data)

    pbar.set_description(desc= f'Loss={loss.item()} Batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}')
    train_acc.append(100*correct/processed)

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_losses.append(test_loss)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

    test_acc.append(100. * correct / len(test_loader.dataset))

In [101]:
EPOCHS = 15
for epoch in range(EPOCHS):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    
    test(model, device, test_loader)
    # scheduler.step() # If in case of Cosine Annealing Warm, or based on other scheduler, scheduler.step.() has to uncommented. 
    

EPOCH: 0


Loss=0.3920605480670929 Batch_id=937 Accuracy=80.65: 100%|██████████| 938/938 [01:12<00:00, 12.86it/s] 



Test set: Average loss: 0.1854, Accuracy: 9667/10000 (96.67%)

EPOCH: 1


Loss=0.10264067351818085 Batch_id=937 Accuracy=93.45: 100%|██████████| 938/938 [01:06<00:00, 14.09it/s]



Test set: Average loss: 0.1372, Accuracy: 9684/10000 (96.84%)

EPOCH: 2


Loss=0.22327573597431183 Batch_id=937 Accuracy=94.28: 100%|██████████| 938/938 [01:04<00:00, 14.46it/s] 



Test set: Average loss: 0.0608, Accuracy: 9829/10000 (98.29%)

EPOCH: 3


Loss=0.2098611444234848 Batch_id=937 Accuracy=95.04: 100%|██████████| 938/938 [01:04<00:00, 14.65it/s]  



Test set: Average loss: 0.0559, Accuracy: 9829/10000 (98.29%)

EPOCH: 4


Loss=0.0844479352235794 Batch_id=937 Accuracy=95.54: 100%|██████████| 938/938 [01:06<00:00, 14.20it/s]  



Test set: Average loss: 0.0523, Accuracy: 9846/10000 (98.46%)

EPOCH: 5


Loss=0.062942735850811 Batch_id=937 Accuracy=96.09: 100%|██████████| 938/938 [01:05<00:00, 14.24it/s]   



Test set: Average loss: 0.0473, Accuracy: 9854/10000 (98.54%)

EPOCH: 6


Loss=0.05413801968097687 Batch_id=937 Accuracy=96.31: 100%|██████████| 938/938 [01:05<00:00, 14.41it/s] 



Test set: Average loss: 0.0363, Accuracy: 9891/10000 (98.91%)

EPOCH: 7


Loss=0.051060888916254044 Batch_id=937 Accuracy=96.54: 100%|██████████| 938/938 [01:04<00:00, 14.61it/s]



Test set: Average loss: 0.0291, Accuracy: 9911/10000 (99.11%)

EPOCH: 8


Loss=0.3372788727283478 Batch_id=937 Accuracy=96.79: 100%|██████████| 938/938 [01:06<00:00, 14.04it/s]  



Test set: Average loss: 0.0327, Accuracy: 9895/10000 (98.95%)

EPOCH: 9


Loss=0.008700123988091946 Batch_id=937 Accuracy=97.10: 100%|██████████| 938/938 [01:04<00:00, 14.50it/s]



Test set: Average loss: 0.0236, Accuracy: 9923/10000 (99.23%)

EPOCH: 10


Loss=0.06875372678041458 Batch_id=937 Accuracy=97.35: 100%|██████████| 938/938 [01:05<00:00, 14.32it/s]  



Test set: Average loss: 0.0261, Accuracy: 9921/10000 (99.21%)

EPOCH: 11


Loss=0.055069927126169205 Batch_id=937 Accuracy=97.56: 100%|██████████| 938/938 [01:02<00:00, 15.01it/s] 



Test set: Average loss: 0.0229, Accuracy: 9928/10000 (99.28%)

EPOCH: 12


Loss=0.03588985651731491 Batch_id=937 Accuracy=97.61: 100%|██████████| 938/938 [01:05<00:00, 14.23it/s]  



Test set: Average loss: 0.0204, Accuracy: 9934/10000 (99.34%)

EPOCH: 13


Loss=0.031018974259495735 Batch_id=937 Accuracy=97.78: 100%|██████████| 938/938 [01:05<00:00, 14.30it/s] 



Test set: Average loss: 0.0199, Accuracy: 9942/10000 (99.42%)

EPOCH: 14


Loss=0.01290989015251398 Batch_id=937 Accuracy=97.85: 100%|██████████| 938/938 [01:06<00:00, 14.19it/s]  



Test set: Average loss: 0.0190, Accuracy: 9941/10000 (99.41%)



In [80]:
# changed wdecay to e-5
# changed erase prob to 0.2