**Target:**

Use Regularization, Dropout and Image Augmentation (Image rotation) and adjust learning rate to get Test Accuracy of 99.4% in 15 Epochs 

**Results:**

Parameters: 8,220

Best Train Accuracy: 98.99  (14th Epoch) 

Best Test Accuracy: 99.41 (14th Epoch)


**Analysis:**

We are able to achieve Test accuracy of 99.41 in 14th Epoch but it is not consistent

# Import Libraries

In [21]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

## Data Transformations

We first start with defining our data transformations. We need to think what our data is and how can we augment it to correct represent images which it might not see otherwise. 


In [22]:
# Train Phase transformations
train_transforms = transforms.Compose([
                                      #  transforms.Resize((28, 28)),
                                      #  transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
                                       transforms.RandomRotation((-7.0, 7.0), fill=(1,)),
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,)) # The mean and std have to be sequences (e.g., tuples), therefore you should add a comma after the values. 
                                       # Note the difference between (0.1307) and (0.1307,)
                                       ])

# Test Phase transformations
test_transforms = transforms.Compose([
                                      #  transforms.Resize((28, 28)),
                                      #  transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,))
                                       ])


# Dataset and Creating Train/Test Split

In [23]:
train = datasets.MNIST('./data', train=True, download=True, transform=train_transforms)
test = datasets.MNIST('./data', train=False, download=True, transform=test_transforms)

# Dataloader Arguments & Test/Train Dataloaders


In [24]:
SEED = 1

# CUDA?
cuda = torch.cuda.is_available()
print("CUDA Available?", cuda)

# For reproducibility
torch.manual_seed(SEED)

if cuda:
    torch.cuda.manual_seed(SEED)

# dataloader arguments - something you'll fetch these from cmdprmt
dataloader_args = dict(shuffle=True, batch_size=128, num_workers=4, pin_memory=True) if cuda else dict(shuffle=True, batch_size=64)

# train dataloader
train_loader = torch.utils.data.DataLoader(train, **dataloader_args)

# test dataloader
test_loader = torch.utils.data.DataLoader(test, **dataloader_args)

CUDA Available? False


# The model
Let's start with the model we first saw

In [27]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Input Block
        self.convblock1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=(3, 3), padding=0, bias=True),
            nn.BatchNorm2d(10),
            nn.ReLU()
        ) # output_size = 26

        # CONVOLUTION BLOCK 1
        self.convblock2 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=10, kernel_size=(3, 3), padding=0, bias=True),
            nn.BatchNorm2d(10),
            nn.ReLU()
        ) # output_size = 24
        self.convblock3 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=30, kernel_size=(3, 3), padding=0, bias=True),
            nn.BatchNorm2d(30),
            nn.ReLU()
        ) # output_size = 22

        # TRANSITION BLOCK 1
        self.pool1 = nn.MaxPool2d(2, 2) # output_size = 11
        self.convblock4 = nn.Sequential(
            nn.Conv2d(in_channels=30, out_channels=10, kernel_size=(1, 1), padding=0, bias=True),
            nn.BatchNorm2d(10),
            nn.ReLU()
        ) # output_size = 11

        # CONVOLUTION BLOCK 2
        self.convblock5 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=10, kernel_size=(3, 3), padding=0, bias=True),
            nn.BatchNorm2d(10),
            nn.ReLU()
        ) # output_size = 9
        self.convblock6 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=30, kernel_size=(3, 3), padding=0, bias=True),
            nn.BatchNorm2d(30),
            nn.ReLU()
        ) # output_size = 7
       
        # OUTPUT BLOCK
        self.convblock7 = nn.Sequential(
            nn.Conv2d(in_channels=30, out_channels=10, kernel_size=(1, 1), padding=0, bias=True),
            nn.BatchNorm2d(10),
            nn.ReLU()
        ) # output_size = 7
        self.gap = nn.Sequential(
            nn.AvgPool2d(kernel_size=7)
        ) # output_size = 1

        self.dropout = nn.Dropout(0.1)

    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.convblock3(x)
        #x = self.dropout(x)
        x = self.pool1(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.dropout(x)
        x = self.convblock7(x)
        x = self.gap(x)
        x = x.view(-1, 10)
        return F.log_softmax(x, dim=-1)

# Model Params
Can't emphasize on how important viewing Model Summary is. 
Unfortunately, there is no in-built model visualizer, so we have to take external help

In [28]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
print(device)
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
cpu
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 26, 26]             100
       BatchNorm2d-2           [-1, 10, 26, 26]              20
              ReLU-3           [-1, 10, 26, 26]               0
            Conv2d-4           [-1, 10, 24, 24]             910
       BatchNorm2d-5           [-1, 10, 24, 24]              20
              ReLU-6           [-1, 10, 24, 24]               0
            Conv2d-7           [-1, 30, 22, 22]           2,730
       BatchNorm2d-8           [-1, 30, 22, 22]              60
              ReLU-9           [-1, 30, 22, 22]               0
        MaxPool2d-10           [-1, 30, 11, 11]               0
           Conv2d-11           [-1, 10, 11, 11]             310
      BatchNorm2d-12           [-1, 10, 11, 11]              20


# Training and Testing

Looking at logs can be boring, so we'll introduce **tqdm** progressbar to get cooler logs. 

Let's write train and test functions

In [29]:
from tqdm import tqdm

train_losses = []
test_losses = []
train_acc = []
test_acc = []

def train(model, device, train_loader, optimizer, epoch):
  model.train()
  pbar = tqdm(train_loader)
  correct = 0
  processed = 0
  for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)

    # Init
    optimizer.zero_grad()
    # In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. 
    # Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly.

    # Predict
    y_pred = model(data)

    # Calculate loss
    loss = F.nll_loss(y_pred, target)
    train_losses.append(loss)

    # Backpropagation
    loss.backward()
    optimizer.step()

    # Update pbar-tqdm
    
    pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
    correct += pred.eq(target.view_as(pred)).sum().item()
    processed += len(data)

    pbar.set_description(desc= f'Loss={loss.item()} Batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}')
    train_acc.append(100*correct/processed)

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_losses.append(test_loss)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    test_acc.append(100. * correct / len(test_loader.dataset))

# Let's Train and test our model

In [30]:
model =  Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.04, momentum=0.9)
EPOCHS = 20
for epoch in range(EPOCHS):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

EPOCH: 0


Loss=0.08689627796411514 Batch_id=937 Accuracy=92.68: 100%|██████████| 938/938 [01:39<00:00,  9.39it/s]



Test set: Average loss: 0.1021, Accuracy: 9714/10000 (97.14%)

EPOCH: 1


Loss=0.10848996788263321 Batch_id=937 Accuracy=97.51: 100%|██████████| 938/938 [01:31<00:00, 10.23it/s]



Test set: Average loss: 0.0499, Accuracy: 9868/10000 (98.68%)

EPOCH: 2


Loss=0.07941056042909622 Batch_id=937 Accuracy=98.05: 100%|██████████| 938/938 [01:32<00:00, 10.14it/s]



Test set: Average loss: 0.0474, Accuracy: 9865/10000 (98.65%)

EPOCH: 3


Loss=0.2173331379890442 Batch_id=937 Accuracy=98.19: 100%|██████████| 938/938 [01:32<00:00, 10.14it/s]



Test set: Average loss: 0.0412, Accuracy: 9884/10000 (98.84%)

EPOCH: 4


Loss=0.06347819417715073 Batch_id=937 Accuracy=98.42: 100%|██████████| 938/938 [01:30<00:00, 10.32it/s]



Test set: Average loss: 0.0379, Accuracy: 9896/10000 (98.96%)

EPOCH: 5


Loss=0.024425748735666275 Batch_id=937 Accuracy=98.58: 100%|██████████| 938/938 [01:31<00:00, 10.20it/s]



Test set: Average loss: 0.0297, Accuracy: 9909/10000 (99.09%)

EPOCH: 6


Loss=0.027130678296089172 Batch_id=937 Accuracy=98.66: 100%|██████████| 938/938 [01:32<00:00, 10.17it/s]



Test set: Average loss: 0.0264, Accuracy: 9914/10000 (99.14%)

EPOCH: 7


Loss=0.014083795249462128 Batch_id=937 Accuracy=98.59: 100%|██████████| 938/938 [01:30<00:00, 10.35it/s]



Test set: Average loss: 0.0302, Accuracy: 9911/10000 (99.11%)

EPOCH: 8


Loss=0.010247617028653622 Batch_id=937 Accuracy=98.73: 100%|██████████| 938/938 [01:31<00:00, 10.24it/s]



Test set: Average loss: 0.0253, Accuracy: 9920/10000 (99.20%)

EPOCH: 9


Loss=0.06989239901304245 Batch_id=937 Accuracy=98.81: 100%|██████████| 938/938 [01:32<00:00, 10.14it/s]



Test set: Average loss: 0.0264, Accuracy: 9914/10000 (99.14%)

EPOCH: 10


Loss=0.059006936848163605 Batch_id=937 Accuracy=98.80: 100%|██████████| 938/938 [01:31<00:00, 10.24it/s]



Test set: Average loss: 0.0244, Accuracy: 9925/10000 (99.25%)

EPOCH: 11


Loss=0.09813541918992996 Batch_id=937 Accuracy=98.90: 100%|██████████| 938/938 [01:30<00:00, 10.40it/s]



Test set: Average loss: 0.0276, Accuracy: 9917/10000 (99.17%)

EPOCH: 12


Loss=0.008267611265182495 Batch_id=937 Accuracy=98.92: 100%|██████████| 938/938 [01:30<00:00, 10.32it/s]



Test set: Average loss: 0.0264, Accuracy: 9910/10000 (99.10%)

EPOCH: 13


Loss=0.04754288122057915 Batch_id=937 Accuracy=98.92: 100%|██████████| 938/938 [01:30<00:00, 10.36it/s]



Test set: Average loss: 0.0233, Accuracy: 9938/10000 (99.38%)

EPOCH: 14


Loss=0.016926415264606476 Batch_id=937 Accuracy=98.99: 100%|██████████| 938/938 [01:29<00:00, 10.46it/s]



Test set: Average loss: 0.0217, Accuracy: 9941/10000 (99.41%)

EPOCH: 15


Loss=0.013193739578127861 Batch_id=937 Accuracy=98.98: 100%|██████████| 938/938 [01:30<00:00, 10.31it/s]



Test set: Average loss: 0.0275, Accuracy: 9918/10000 (99.18%)

EPOCH: 16


Loss=0.0036540483124554157 Batch_id=937 Accuracy=98.98: 100%|██████████| 938/938 [01:30<00:00, 10.33it/s]



Test set: Average loss: 0.0236, Accuracy: 9932/10000 (99.32%)

EPOCH: 17


Loss=0.38491860032081604 Batch_id=937 Accuracy=99.05: 100%|██████████| 938/938 [01:30<00:00, 10.39it/s]



Test set: Average loss: 0.0212, Accuracy: 9939/10000 (99.39%)

EPOCH: 18


Loss=0.025214767083525658 Batch_id=937 Accuracy=99.05: 100%|██████████| 938/938 [01:28<00:00, 10.57it/s]



Test set: Average loss: 0.0222, Accuracy: 9929/10000 (99.29%)

EPOCH: 19


Loss=0.0210528876632452 Batch_id=937 Accuracy=99.09: 100%|██████████| 938/938 [01:30<00:00, 10.40it/s]



Test set: Average loss: 0.0230, Accuracy: 9928/10000 (99.28%)

