# Import Libraries

In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

## Data Transformations

The ToTensor transformation is essential for converting images into a format that PyTorch can work with(tensors).

The Normalize transformation is critical for standardizing the input data, which can lead to faster convergence during training. The mean and standard deviation values used should be calculated based on the dataset we are working with.


In [2]:
# Train Phase transformations
train_transforms = transforms.Compose([
                                      #  transforms.Resize((28, 28)),
                                      #  transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
                                       transforms.RandomRotation((-7.0, 7.0), fill=(1,)),
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,)) # The mean and std have to be sequences (e.g., tuples), therefore you should add a comma after the values.
                                       # Note the difference between (0.1307) and (0.1307,)
                                       ])

# Test Phase transformations
test_transforms = transforms.Compose([
                                      #  transforms.Resize((28, 28)),
                                      #  transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,))
                                       ])


# Dataset and Creating Train/Test Split

In [3]:
train = datasets.MNIST('./data', train=True, download=True, transform=train_transforms)
test = datasets.MNIST('./data', train=False, download=True, transform=test_transforms)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 108614593.28it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 97926995.82it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 27420075.32it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 17461529.58it/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



# Dataloader Arguments & Test/Train Dataloaders

The dataloader_args dictionary is used to configure the DataLoader. Using a larger batch size when CUDA is available can help utilize the GPU more efficiently.

The shuffle=True argument in dataloader_args ensures that the data is shuffled at the beginning of each epoch, which can help improve model generalization.

num_workers is set to 4 to enable parallel data loading, which can speed up the data loading process.

pin_memory=True allows for faster data transfer to the GPU by pinning memory on the host (CPU) side.

The DataLoader is responsible for loading the data in batches and applying any specified transformations. It provides an iterable over the dataset for easy integration with training and testing loops.

In [4]:
SEED = 1

# CUDA?
cuda = torch.cuda.is_available()
print("CUDA Available?", cuda)

# For reproducibility
torch.manual_seed(SEED)

if cuda:
    torch.cuda.manual_seed(SEED)

# dataloader arguments - something you'll fetch these from cmdprmt
dataloader_args = dict(shuffle=True, batch_size=128, num_workers=4, pin_memory=True) if cuda else dict(shuffle=True, batch_size=64)

# train dataloader
train_loader = torch.utils.data.DataLoader(train, **dataloader_args)

# test dataloader
test_loader = torch.utils.data.DataLoader(test, **dataloader_args)

CUDA Available? True




# The model
Let's start with the model we first saw

In [22]:
import torch.nn.functional as F
dropout_value = 0.1
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Input Block
        self.convblock1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(8),
            nn.Dropout(dropout_value)
        ) # output_size = 26

        # CONVOLUTION BLOCK 1
        self.convblock2 = nn.Sequential(
            nn.Conv2d(in_channels=8, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 24

        # TRANSITION BLOCK 1
        self.convblock3 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=6, kernel_size=(1, 1), padding=0, bias=False),
        ) # output_size = 24
        self.pool1 = nn.MaxPool2d(2, 2) # output_size = 12

        # CONVOLUTION BLOCK 2
        self.convblock4 = nn.Sequential(
            nn.Conv2d(in_channels=6, out_channels=8, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(8),
            nn.Dropout(dropout_value)
        ) # output_size = 10
        self.convblock5 = nn.Sequential(
            nn.Conv2d(in_channels=8, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 8
        self.convblock6 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 6
        self.convblock7 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=1, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 6

        # OUTPUT BLOCK
        self.gap = nn.Sequential(
            nn.AvgPool2d(kernel_size=6)
        ) # output_size = 1

        self.convblock8 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
            # nn.BatchNorm2d(10),
            # nn.ReLU(),
            # nn.Dropout(dropout_value)
        )


        self.dropout = nn.Dropout(dropout_value)

    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.convblock3(x)
        x = self.pool1(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.convblock7(x)
        x = self.gap(x)
        x = self.convblock8(x)

        x = x.view(-1, 10)
        return F.log_softmax(x, dim=-1)

In [17]:
# dropout_value = 0.1

# class Net(nn.Module):
#     def __init__(self):
#         super(Net, self).__init__()
#         # Input Block
#         self.convblock1 = nn.Sequential(
#             nn.Conv2d(in_channels=1, out_channels=10, kernel_size=(3, 3), padding=0, bias=False),
#             nn.ReLU(),
#             nn.BatchNorm2d(10),
#             nn.Dropout(dropout_value)
#         ) # output_size = 26, RF = 3

#         # CONVOLUTION BLOCK 1
#         self.convblock2 = nn.Sequential(
#             nn.Conv2d(in_channels=10, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
#             nn.ReLU(),
#             nn.BatchNorm2d(16),
#             nn.Dropout(dropout_value)
#         ) # output_size = 24, RF = 5

#         # TRANSITION BLOCK 1
#         self.convblock3 = nn.Sequential(
#             nn.Conv2d(in_channels=16, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
#         ) # output_size = 22, RF = 5
#         self.pool1 = nn.MaxPool2d(2, 2) # output_size = 12, RF = 6

#         # CONVOLUTION BLOCK 2
#         self.convblock4 = nn.Sequential(
#             nn.Conv2d(in_channels=10, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
#             nn.ReLU(),
#             nn.BatchNorm2d(16),
#             nn.Dropout(dropout_value)
#         ) # output_size = 11, RF = 10
#         # self.convblock5 = nn.Sequential(
#         #     nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3, 3), padding=0, bias=False),
#         #     nn.ReLU(),
#         #     nn.BatchNorm2d(32),
#         #     nn.Dropout(dropout_value)
#         # ) # output_size = 12, RF = 14

#         self.pool2 = nn.MaxPool2d(2, 2) # output_size = 6, RF = 16

#         self.convblock5 = nn.Sequential(
#             nn.Conv2d(in_channels=32, out_channels=64, kernel_size=(3, 3), padding=0, bias=False),
#             nn.ReLU(),
#             nn.BatchNorm2d(64),
#             nn.Dropout(dropout_value)
#         ) # output_size = 6, RF = 24

#         # self.convblock7 = nn.Sequential(
#         #     nn.Conv2d(in_channels=16, out_channels=10, kernel_size=(3, 3), padding=1, bias=False),
#         # ) # output_size = 6, RF = 32

#         # OUTPUT BLOCK
#         self.gap = nn.Sequential(
#             nn.AvgPool2d(kernel_size=6)
#         ) # output_size = 1

#         self.convblock6 = nn.Sequential(
#             nn.Conv2d(in_channels=32, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
#         )

#     def forward(self, x):
#         x = self.convblock1(x)
#         x = self.convblock2(x)
#         x = self.convblock3(x)
#         x = self.pool1(x)
#         x = self.convblock4(x)
#         x = self.pool2(x)
#         x = self.convblock5(x)
#         x = self.gap(x)
#         x = self.convblock6(x)
#         x = x.view(-1, 10)
#         return F.log_softmax(x, dim=-1)


# Model Params


In [23]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
print(device)
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

cuda
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 26, 26]              72
              ReLU-2            [-1, 8, 26, 26]               0
       BatchNorm2d-3            [-1, 8, 26, 26]              16
           Dropout-4            [-1, 8, 26, 26]               0
            Conv2d-5           [-1, 16, 24, 24]           1,152
              ReLU-6           [-1, 16, 24, 24]               0
       BatchNorm2d-7           [-1, 16, 24, 24]              32
           Dropout-8           [-1, 16, 24, 24]               0
            Conv2d-9            [-1, 6, 24, 24]              96
        MaxPool2d-10            [-1, 6, 12, 12]               0
           Conv2d-11            [-1, 8, 10, 10]             432
             ReLU-12            [-1, 8, 10, 10]               0
      BatchNorm2d-13            [-1, 8, 10, 10]              16
          Dropout-14            [-

# Training and Testing

Looking at logs can be boring, so we'll introduce **tqdm** progressbar to get cooler logs.

Let's write train and test functions

In [24]:
from tqdm import tqdm

train_losses = []
test_losses = []
train_acc = []
test_acc = []

def train(model, device, train_loader, optimizer, epoch):
  model.train()
  pbar = tqdm(train_loader)
  correct = 0
  processed = 0
  for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)

    # Init
    optimizer.zero_grad()
    # In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes.
    # Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly.

    # Predict
    y_pred = model(data)

    # Calculate loss
    loss = F.nll_loss(y_pred, target)
    train_losses.append(loss)

    # Backpropagation
    loss.backward()
    optimizer.step()

    # Update pbar-tqdm

    pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
    correct += pred.eq(target.view_as(pred)).sum().item()
    processed += len(data)

    pbar.set_description(desc= f'Loss={loss.item()} Batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}')
    train_acc.append(100*correct/processed)

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_losses.append(test_loss)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

    test_acc.append(100. * correct / len(test_loader.dataset))

# Let's Train and test our model

In [25]:
from torch.optim.lr_scheduler import StepLR

model =  Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
scheduler = StepLR(optimizer, step_size=4, gamma=0.01)


EPOCHS = 20
for epoch in range(EPOCHS):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    scheduler.step()
    test(model, device, test_loader)

EPOCH: 0


Loss=0.1528863161802292 Batch_id=468 Accuracy=85.25: 100%|██████████| 469/469 [00:25<00:00, 18.12it/s]



Test set: Average loss: 0.0796, Accuracy: 9785/10000 (97.85%)

EPOCH: 1


Loss=0.10720237344503403 Batch_id=468 Accuracy=96.91: 100%|██████████| 469/469 [00:26<00:00, 17.74it/s]



Test set: Average loss: 0.0493, Accuracy: 9865/10000 (98.65%)

EPOCH: 2


Loss=0.04834076389670372 Batch_id=468 Accuracy=97.66: 100%|██████████| 469/469 [00:25<00:00, 18.19it/s]



Test set: Average loss: 0.0372, Accuracy: 9887/10000 (98.87%)

EPOCH: 3


Loss=0.10754526406526566 Batch_id=468 Accuracy=97.96: 100%|██████████| 469/469 [00:25<00:00, 18.68it/s]



Test set: Average loss: 0.0333, Accuracy: 9892/10000 (98.92%)

EPOCH: 4


Loss=0.06252288073301315 Batch_id=468 Accuracy=98.36: 100%|██████████| 469/469 [00:26<00:00, 17.86it/s]



Test set: Average loss: 0.0310, Accuracy: 9904/10000 (99.04%)

EPOCH: 5


Loss=0.1259755641222 Batch_id=468 Accuracy=98.33: 100%|██████████| 469/469 [00:26<00:00, 17.65it/s]



Test set: Average loss: 0.0306, Accuracy: 9906/10000 (99.06%)

EPOCH: 6


Loss=0.13521958887577057 Batch_id=468 Accuracy=98.41: 100%|██████████| 469/469 [00:27<00:00, 17.34it/s]



Test set: Average loss: 0.0298, Accuracy: 9908/10000 (99.08%)

EPOCH: 7


Loss=0.1156647801399231 Batch_id=468 Accuracy=98.39: 100%|██████████| 469/469 [00:26<00:00, 17.46it/s]



Test set: Average loss: 0.0297, Accuracy: 9911/10000 (99.11%)

EPOCH: 8


Loss=0.047018274664878845 Batch_id=468 Accuracy=98.43: 100%|██████████| 469/469 [00:26<00:00, 17.43it/s]



Test set: Average loss: 0.0303, Accuracy: 9908/10000 (99.08%)

EPOCH: 9


Loss=0.06100272014737129 Batch_id=468 Accuracy=98.46: 100%|██████████| 469/469 [00:26<00:00, 17.59it/s]



Test set: Average loss: 0.0288, Accuracy: 9914/10000 (99.14%)

EPOCH: 10


Loss=0.037530604749917984 Batch_id=468 Accuracy=98.37: 100%|██████████| 469/469 [00:26<00:00, 17.41it/s]



Test set: Average loss: 0.0295, Accuracy: 9910/10000 (99.10%)

EPOCH: 11


Loss=0.016686873510479927 Batch_id=468 Accuracy=98.47: 100%|██████████| 469/469 [00:26<00:00, 17.46it/s]



Test set: Average loss: 0.0291, Accuracy: 9913/10000 (99.13%)

EPOCH: 12


Loss=0.02354404330253601 Batch_id=468 Accuracy=98.40: 100%|██████████| 469/469 [00:26<00:00, 17.79it/s]



Test set: Average loss: 0.0290, Accuracy: 9915/10000 (99.15%)

EPOCH: 13


Loss=0.05227218195796013 Batch_id=468 Accuracy=98.44: 100%|██████████| 469/469 [00:26<00:00, 17.59it/s]



Test set: Average loss: 0.0301, Accuracy: 9912/10000 (99.12%)

EPOCH: 14


Loss=0.11510574817657471 Batch_id=468 Accuracy=98.42: 100%|██████████| 469/469 [00:26<00:00, 17.69it/s]



Test set: Average loss: 0.0298, Accuracy: 9912/10000 (99.12%)

EPOCH: 15


Loss=0.03299528360366821 Batch_id=468 Accuracy=98.43: 100%|██████████| 469/469 [00:26<00:00, 17.41it/s]



Test set: Average loss: 0.0299, Accuracy: 9909/10000 (99.09%)

EPOCH: 16


Loss=0.0995161309838295 Batch_id=468 Accuracy=98.43: 100%|██████████| 469/469 [00:26<00:00, 17.43it/s]



Test set: Average loss: 0.0302, Accuracy: 9911/10000 (99.11%)

EPOCH: 17


Loss=0.03977443650364876 Batch_id=468 Accuracy=98.50: 100%|██████████| 469/469 [00:27<00:00, 17.22it/s]



Test set: Average loss: 0.0292, Accuracy: 9913/10000 (99.13%)

EPOCH: 18


Loss=0.046208444982767105 Batch_id=468 Accuracy=98.39: 100%|██████████| 469/469 [00:26<00:00, 17.40it/s]



Test set: Average loss: 0.0296, Accuracy: 9912/10000 (99.12%)

EPOCH: 19


Loss=0.05165208876132965 Batch_id=468 Accuracy=98.45: 100%|██████████| 469/469 [00:26<00:00, 17.91it/s]



Test set: Average loss: 0.0298, Accuracy: 9911/10000 (99.11%)



## Correct MaxPooling Location

Purpose: Adjusts the placement of the max pooling layer within the network architecture to optimize feature extraction and reduce spatial dimensions more effectively.

Key Changes: The location of the max pooling layer is reassessed and potentially moved to a different part of the network to ensure that it aligns better with the model's learning objectives and the characteristics of the input data.

## Image Augmentation

Purpose: Enhances the model's generalization ability and reduces overfitting by introducing variability in the training data through image augmentation techniques.

Key Techniques: Includes transformations such as rotation, scaling, cropping, flipping, and color adjustments to create augmented versions of the original images, thereby providing a more diverse and challenging dataset for training.
## Playing Naively with Learning Rates

Purpose: Explores the impact of different learning rates on the model's training and convergence behavior to find an optimal learning rate or schedule that improves performance.

Key Approaches: Involves experimenting with various learning rate values and schedules (such as step decay, exponential decay, or cyclical learning rates) to observe their effects on the model's learning dynamics and to identify a strategy that leads to faster convergence and better accuracy.

** After these three steps i observed below statistics.

Result:

Parameters: below 8k

Best Train Accuracy: 99.30

Best Test Accuracy: 99.4 (consistently from 12th epoch)

