**Target**: 99%

**Explanation for the Target**: This iteration will reduce the total parameters in the model. It will also introduce image augmentation and reguralization. We will add BatchNormalization and DropOut to each layer in the model. We can expect the model to be efficient than before, so a 99% accuracy is feasible.

**Result**:

Max Training Accuracy: 81.16%

Max Validation Accuracy: 98.51%


**Analysis**: It is clear that is model is under-fitting and there is scope for improvement. Towards the end of the training, we can observe that the training accuracy started oscillating in 80.40s and the test accuracy too was revolving around 98.3%. This could mean the learning rate is higher than expected, but since this is a severely under-fitting model it is early to conclude on the learning rate. We are still using a convolution layer as the final layer to reduce the channels.

This model did achieve to keep the parameters under 10,000. But, there is a huge need for improvement. We will try to introduce Global Average Pooling and StepLR in the next iteration.

In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # CONVOLUTION BLOCK #1
        self.convblock1 = nn.Sequential( # 28x28 > 28x28 | RF 3 | jout=1
          nn.Conv2d(1, 10, 3, padding=1), 
          nn.ReLU(),
          nn.BatchNorm2d(10),
          nn.Dropout2d(0.1)
        )   
        self.convblock2 = nn.Sequential( # 28x28 > 28x28 | RF 5 | jout=1
          nn.Conv2d(10, 16, 3, padding=1), 
          nn.ReLU(),
          nn.BatchNorm2d(16),
          nn.Dropout2d(0.1)
        )
        self.convblock3 = nn.Sequential( # 28x28 > 28x28 | RF 7 | jout=1
          nn.Conv2d(16, 16, 3, padding=1), 
          nn.ReLU(),
          nn.BatchNorm2d(16),
          nn.Dropout2d(0.1)
        )
        # TRANSITIONAL BLOCK #1        
        self.pool1 = nn.MaxPool2d(2, 2) # 28x28 > 14x14 | RF 9 | jout=1
        self.convblock4 = nn.Sequential( # 14x14 > 14x14 | RF 9 | jout=2
          nn.Conv2d(16, 1, 1),
          nn.ReLU(),
          nn.BatchNorm2d(1),
          nn.Dropout2d(0.1)
        )
        # CONVOLUTION BLOCK #2
        self.convblock5 = nn.Sequential( # 14x14 > 14x14 | RF 13 | jout=2
          nn.Conv2d(1, 10, 3, padding=1),
          nn.ReLU(),
          nn.BatchNorm2d(10),
          nn.Dropout2d(0.1)
        )
        self.convblock6 = nn.Sequential( # 14x14 > 14x14 | RF 17 | jout=2
          nn.Conv2d(10, 16, 3, padding=1),
          nn.ReLU(),
          nn.BatchNorm2d(16),
          nn.Dropout2d(0.1)
        )
        # TRANSITIONAL BLOCK #2    
        self.pool2 = nn.MaxPool2d(2, 2) # 14x14 > 7x7 | RF 19 | jout=2
        self.convblock7 = nn.Sequential( # 7x7 > 7x7 | RF 19 | jout=3
          nn.Conv2d(16, 1, 1),
          nn.ReLU(),
          nn.BatchNorm2d(1),
          nn.Dropout2d(0.1)
        )
        #CONVOLUTIONAL BLOCK #3
        self.convblock8 = nn.Sequential( #7x7 > 5x5 | RF 25 | jout=3
            nn.Conv2d(1, 10, 3),
            nn.ReLU(),
            nn.BatchNorm2d(10),
            nn.Dropout2d(0.1)
        )
        self.convblock9 = nn.Sequential( #5x5 > 3x3 | RF 31 | jout=3
            nn.Conv2d(10, 16, 3),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout2d(0.1)
        )
        self.convblock10 = nn.Sequential( #3x3 > 1x1 | RF 37 | jout=3
            nn.Conv2d(16, 16, 3),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout2d(0.1)
        )
        self.convblock11 = nn.Sequential( #1x1 > 1x1 | RF 37 | jout=3
            nn.Conv2d(16, 10, 1),
        )



    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.convblock3(x)
        x = self.pool1(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.pool2(x)
        x = self.convblock7(x)
        x = self.convblock8(x)
        x = self.convblock9(x)
        x = self.convblock10(x)
        x = self.convblock11(x)
      
        x = x.view(-1, 10)
        return F.log_softmax(x)

In [3]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 10, 28, 28]             100
              ReLU-2           [-1, 10, 28, 28]               0
       BatchNorm2d-3           [-1, 10, 28, 28]              20
         Dropout2d-4           [-1, 10, 28, 28]               0
            Conv2d-5           [-1, 16, 28, 28]           1,456
              ReLU-6           [-1, 16, 28, 28]               0
       BatchNorm2d-7           [-1, 16, 28, 28]              32
         Dropout2d-8           [-1, 16, 28, 28]               0
            Conv2d-9           [-1, 16, 28, 28]           2,320
             ReLU-10           [-1, 16, 28, 28]               0
      BatchNorm2d-11           [-1, 16, 28, 28]              32
        Dropout2d-12           [-1, 16, 28, 28]               0
    

  return F.log_softmax(x)


In [4]:


torch.manual_seed(1)
batch_size = 64

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



In [5]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    correct = 0
    processed = 0

    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()

        y_pred = model(data)
        pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()
        processed += len(data)

        pbar.set_description(desc= f'Epoch={epoch} Loss={loss.item()} batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}%')


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [6]:

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(1, 16):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

  return F.log_softmax(x)
Epoch=1 Loss=0.3884112536907196 batch_id=937 Accuracy=71.13%: 100%|██████████| 938/938 [00:28<00:00, 32.68it/s]



Test set: Average loss: 0.1288, Accuracy: 9612/10000 (96.12%)



Epoch=2 Loss=0.673706591129303 batch_id=937 Accuracy=78.37%: 100%|██████████| 938/938 [00:25<00:00, 37.43it/s]



Test set: Average loss: 0.0986, Accuracy: 9701/10000 (97.01%)



Epoch=3 Loss=0.9472307562828064 batch_id=937 Accuracy=79.10%: 100%|██████████| 938/938 [00:24<00:00, 37.93it/s]



Test set: Average loss: 0.0816, Accuracy: 9765/10000 (97.65%)



Epoch=4 Loss=0.6899013519287109 batch_id=937 Accuracy=79.97%: 100%|██████████| 938/938 [00:24<00:00, 37.86it/s]



Test set: Average loss: 0.0729, Accuracy: 9781/10000 (97.81%)



Epoch=5 Loss=0.43427279591560364 batch_id=937 Accuracy=79.82%: 100%|██████████| 938/938 [00:24<00:00, 37.80it/s]



Test set: Average loss: 0.0667, Accuracy: 9790/10000 (97.90%)



Epoch=6 Loss=0.6314380168914795 batch_id=937 Accuracy=79.87%: 100%|██████████| 938/938 [00:24<00:00, 38.23it/s]



Test set: Average loss: 0.0670, Accuracy: 9783/10000 (97.83%)



Epoch=7 Loss=0.8027915358543396 batch_id=937 Accuracy=80.18%: 100%|██████████| 938/938 [00:24<00:00, 38.16it/s]



Test set: Average loss: 0.0579, Accuracy: 9828/10000 (98.28%)



Epoch=8 Loss=0.7604907155036926 batch_id=937 Accuracy=80.33%: 100%|██████████| 938/938 [00:25<00:00, 37.42it/s]



Test set: Average loss: 0.0619, Accuracy: 9815/10000 (98.15%)



Epoch=9 Loss=0.386752188205719 batch_id=937 Accuracy=80.53%: 100%|██████████| 938/938 [00:24<00:00, 37.83it/s]



Test set: Average loss: 0.0610, Accuracy: 9809/10000 (98.09%)



Epoch=10 Loss=0.5457329750061035 batch_id=937 Accuracy=80.64%: 100%|██████████| 938/938 [00:24<00:00, 38.43it/s]



Test set: Average loss: 0.0574, Accuracy: 9823/10000 (98.23%)



Epoch=11 Loss=0.5186195969581604 batch_id=937 Accuracy=80.61%: 100%|██████████| 938/938 [00:24<00:00, 37.91it/s]



Test set: Average loss: 0.0537, Accuracy: 9833/10000 (98.33%)



Epoch=12 Loss=1.0028375387191772 batch_id=937 Accuracy=80.61%: 100%|██████████| 938/938 [00:24<00:00, 37.79it/s]



Test set: Average loss: 0.0591, Accuracy: 9812/10000 (98.12%)



Epoch=13 Loss=0.8140256404876709 batch_id=937 Accuracy=80.68%: 100%|██████████| 938/938 [00:24<00:00, 38.01it/s]



Test set: Average loss: 0.0561, Accuracy: 9832/10000 (98.32%)



Epoch=14 Loss=0.535012423992157 batch_id=937 Accuracy=80.64%: 100%|██████████| 938/938 [00:24<00:00, 37.93it/s]



Test set: Average loss: 0.0519, Accuracy: 9834/10000 (98.34%)



Epoch=15 Loss=0.7439004778862 batch_id=937 Accuracy=81.16%: 100%|██████████| 938/938 [00:24<00:00, 38.08it/s]



Test set: Average loss: 0.0496, Accuracy: 9851/10000 (98.51%)

