**Target**: 99.4% validation accuracy

**Explanation for the Target**: This iteration will introduce Global Average Pooling to the model. The last iteration showed a severly underfitted model, liekly due to introduction of btach normalization and dropout. We can reduce it by reducing the DropOut rate to 0.05. We will also move the pooling layer to receptive field 5, as patterns are recognized at that layer. The last model also had some extra room for parameters. We will increase the parameter count slightly to make use of the target parameters. We are hoping to achieve the final targeted accuracy of this assignment.

**Result**:

Max Training Accuracy: 98.85%

Max Validation Accuracy: 99.49%


**Analysis**: We see the accuracy revolves around 99.4% in the final epochs. It is safe to say that we have achieved the target.

In [23]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [47]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # CONVOLUTION BLOCK #1
        self.convblock1 = nn.Sequential( # 28x28 > 28x28 | RF 3 | jout=1
          nn.Conv2d(1, 10, 3, padding=1), 
          nn.ReLU(),
          nn.BatchNorm2d(10),
          nn.Dropout2d(0.05)
        )   
        self.convblock2 = nn.Sequential( # 28x28 > 28x28 | RF 5 | jout=1
          nn.Conv2d(10, 16, 3, padding=1), 
          nn.ReLU(),
          nn.BatchNorm2d(16),
          nn.Dropout2d(0.05)
        )
        # TRANSITIONAL BLOCK #1        
        self.pool1 = nn.MaxPool2d(2, 2) # 28x28 > 14x14 | RF 6 | jout=1
        self.convblock3 = nn.Sequential( # 14x14 > 14x14 | RF 6 | jout=2
          nn.Conv2d(16, 8, 1),
          nn.ReLU(),
          nn.BatchNorm2d(8),
          nn.Dropout2d(0.05)
        )

        # CONVOLUTION BLOCK #2
        self.convblock4 = nn.Sequential( # 14x14 > 14x14 | RF 10 | jout=2
          nn.Conv2d(8, 16, 3, padding=1), 
          nn.ReLU(),
          nn.BatchNorm2d(16),
          nn.Dropout2d(0.05)
        )

        self.convblock5 = nn.Sequential( # 14x14 > 14x14 | RF 14 | jout=2
          nn.Conv2d(16, 16, 3, padding=1),
          nn.ReLU(),
          nn.BatchNorm2d(16),
          nn.Dropout2d(0.05)
        )
        self.convblock6 = nn.Sequential( # 14x14 > 14x14 | RF 18 | jout=2
          nn.Conv2d(16, 16, 3, padding=1),
          nn.ReLU(),
          nn.BatchNorm2d(16),
          nn.Dropout2d(0.05)
        )
        # TRANSITIONAL BLOCK #2    
        self.pool2 = nn.MaxPool2d(2, 2) # 14x14 > 7x7 | RF 20 | jout=2
        self.convblock7 = nn.Sequential( # 7x7 > 7x7 | RF 20 | jout=3
          nn.Conv2d(16, 8, 1),
          nn.ReLU(),
          nn.BatchNorm2d(8),
          nn.Dropout2d(0.05)
        )
        #CONVOLUTIONAL BLOCK #3
        self.convblock8 = nn.Sequential( #7x7 > 5x5 | RF 26 | jout=3
            nn.Conv2d(8, 10, 3),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout2d(0.05)
        )
        self.convblock9 = nn.Sequential( #5x5 > 3x3 | RF 32 | jout=3
            nn.Conv2d(10, 16, 3),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout2d(0.05)
        )
        self.convblock10 = nn.Sequential( #3x3 > 3x3 | RF 38 | jout=3
            nn.Conv2d(16, 10, 1)
            #nn.ReLU(),
            #nn.BatchNorm2d(10),
            #nn.Dropout2d(0.1)
        )
        self.gap = nn.Sequential( #3x3 > 1x1 | RF 42 | jout=3
            nn.AvgPool2d(kernel_size=3)
        )



    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.pool1(x)
        x = self.convblock3(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.pool2(x)
        x = self.convblock7(x)
        x = self.convblock8(x)
        x = self.convblock9(x)
        x = self.convblock10(x)
        x = self.gap(x)
      
        x = x.view(-1, 10)
        return F.log_softmax(x)

In [48]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 28, 28]             100
              ReLU-2           [-1, 10, 28, 28]               0
       BatchNorm2d-3           [-1, 10, 28, 28]              20
         Dropout2d-4           [-1, 10, 28, 28]               0
            Conv2d-5           [-1, 16, 28, 28]           1,456
              ReLU-6           [-1, 16, 28, 28]               0
       BatchNorm2d-7           [-1, 16, 28, 28]              32
         Dropout2d-8           [-1, 16, 28, 28]               0
         MaxPool2d-9           [-1, 16, 14, 14]               0
           Conv2d-10            [-1, 8, 14, 14]             136
             ReLU-11            [-1, 8, 14, 14]               0
      BatchNorm2d-12            [-1, 8, 14, 14]              16
    

  return F.log_softmax(x)


In [50]:


torch.manual_seed(1)
batch_size = 64

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        #transforms.RandomRotation((-7.0, 7.0), fill=(1,)),
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


In [51]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    correct = 0
    processed = 0

    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()

        y_pred = model(data)
        pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()
        processed += len(data)

        pbar.set_description(desc= f'Epoch={epoch} Loss={loss.item()} batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}%')


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [52]:
from torch.optim.lr_scheduler import StepLR

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
scheduler = StepLR(optimizer, step_size=5, gamma=0.1)


for epoch in range(1, 16):
    train(model, device, train_loader, optimizer, epoch)
    #scheduler.step()
    test(model, device, test_loader)

  return F.log_softmax(x)
Epoch=1 Loss=0.14346933364868164 batch_id=937 Accuracy=90.11%: 100%|██████████| 938/938 [00:25<00:00, 37.10it/s]



Test set: Average loss: 0.0447, Accuracy: 9853/10000 (98.53%)



Epoch=2 Loss=0.028531590476632118 batch_id=937 Accuracy=97.17%: 100%|██████████| 938/938 [00:24<00:00, 38.02it/s]



Test set: Average loss: 0.0376, Accuracy: 9888/10000 (98.88%)



Epoch=3 Loss=0.16348429024219513 batch_id=937 Accuracy=97.68%: 100%|██████████| 938/938 [00:24<00:00, 38.67it/s]



Test set: Average loss: 0.0303, Accuracy: 9910/10000 (99.10%)



Epoch=4 Loss=0.03897513076663017 batch_id=937 Accuracy=97.93%: 100%|██████████| 938/938 [00:24<00:00, 38.40it/s]



Test set: Average loss: 0.0274, Accuracy: 9911/10000 (99.11%)



Epoch=5 Loss=0.009880609810352325 batch_id=937 Accuracy=98.10%: 100%|██████████| 938/938 [00:24<00:00, 38.80it/s]



Test set: Average loss: 0.0260, Accuracy: 9917/10000 (99.17%)



Epoch=6 Loss=0.033547207713127136 batch_id=937 Accuracy=98.29%: 100%|██████████| 938/938 [00:24<00:00, 38.51it/s]



Test set: Average loss: 0.0256, Accuracy: 9923/10000 (99.23%)



Epoch=7 Loss=0.05691586434841156 batch_id=937 Accuracy=98.35%: 100%|██████████| 938/938 [00:24<00:00, 38.55it/s]



Test set: Average loss: 0.0248, Accuracy: 9928/10000 (99.28%)



Epoch=8 Loss=0.15364857017993927 batch_id=937 Accuracy=98.47%: 100%|██████████| 938/938 [00:24<00:00, 38.46it/s]



Test set: Average loss: 0.0238, Accuracy: 9923/10000 (99.23%)



Epoch=9 Loss=0.06393135339021683 batch_id=937 Accuracy=98.46%: 100%|██████████| 938/938 [00:24<00:00, 38.37it/s]



Test set: Average loss: 0.0216, Accuracy: 9939/10000 (99.39%)



Epoch=10 Loss=0.1645202338695526 batch_id=937 Accuracy=98.63%: 100%|██████████| 938/938 [00:25<00:00, 37.44it/s]



Test set: Average loss: 0.0201, Accuracy: 9925/10000 (99.25%)



Epoch=11 Loss=0.020322799682617188 batch_id=937 Accuracy=98.64%: 100%|██████████| 938/938 [00:24<00:00, 38.17it/s]



Test set: Average loss: 0.0195, Accuracy: 9940/10000 (99.40%)



Epoch=12 Loss=0.022528648376464844 batch_id=937 Accuracy=98.68%: 100%|██████████| 938/938 [00:24<00:00, 38.32it/s]



Test set: Average loss: 0.0193, Accuracy: 9949/10000 (99.49%)



Epoch=13 Loss=0.031043240800499916 batch_id=937 Accuracy=98.83%: 100%|██████████| 938/938 [00:24<00:00, 38.22it/s]



Test set: Average loss: 0.0209, Accuracy: 9933/10000 (99.33%)



Epoch=14 Loss=0.19452056288719177 batch_id=937 Accuracy=98.85%: 100%|██████████| 938/938 [00:24<00:00, 38.70it/s]



Test set: Average loss: 0.0197, Accuracy: 9937/10000 (99.37%)



Epoch=15 Loss=0.08037396520376205 batch_id=937 Accuracy=98.82%: 100%|██████████| 938/938 [00:24<00:00, 38.37it/s]



Test set: Average loss: 0.0179, Accuracy: 9946/10000 (99.46%)

