**Target**: 98%

**Explanation for the Target**: This is the first iteration of this model. The objective of this model is to follow expand and squeeze approach along with increase in receptive field to be atleast equal to the image size. This model is not optimized in any way, it uses 10 convolutions and 2 poolings. Hence we should not expect a 99+% accuracy.  All these convolutions will also result in a very high number of parameters, but should give satisfactory results. 

**Result**:

Max Training Accuracy: 99.92%

Max Validation Accuracy: 99.18%


**Analysis**:

We see that the training accuracy greatly exceeds the test accuracy from epoch 5. This model is over-fitting. We also have 250k+ parameters. We will need to reduce the parameters and optimize the model.

In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [2]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # CONVOLUTION BLOCK #1
        self.convblock1 = nn.Sequential( # 28x28 > 28x28 | RF 3 | jout=1
          nn.Conv2d(1, 32, 3, padding=1), 
          nn.ReLU()
        )   
        self.convblock2 = nn.Sequential( # 28x28 > 28x28 | RF 5 | jout=1
          nn.Conv2d(32, 64, 3, padding=1), 
          nn.ReLU()
        )
        self.convblock3 = nn.Sequential( # 28x28 > 28x28 | RF 7 | jout=1
          nn.Conv2d(64, 128, 3, padding=1), 
          nn.ReLU()
        )
        # TRANSITIONAL BLOCK #1        
        self.pool1 = nn.MaxPool2d(2, 2) # 28x28 > 14x14 | RF 9 | jout=1
        self.convblock4 = nn.Sequential( # 14x14 > 14x14 | RF 9 | jout=2
          nn.Conv2d(128, 32, 1),
          nn.ReLU()
        )
        # CONVOLUTION BLOCK #2
        self.convblock5 = nn.Sequential( # 14x14 > 14x14 | RF 13 | jout=2
          nn.Conv2d(32, 64, 3, padding=1),
          nn.ReLU()
        )
        self.convblock6 = nn.Sequential( # 14x14 > 14x14 | RF 17 | jout=2
          nn.Conv2d(64, 128, 3, padding=1),
          nn.ReLU()
        )
        # TRANSITIONAL BLOCK #2    
        self.pool2 = nn.MaxPool2d(2, 2) # 14x14 > 7x7 | RF 19 | jout=2
        self.convblock7 = nn.Sequential( # 7x7 > 7x7 | RF 19 | jout=3
          nn.Conv2d(128, 32, 1),
          nn.ReLU()
        )
        #CONVOLUTIONAL BLOCK #3
        self.convblock8 = nn.Sequential( #7x7 > 5x5 | RF 25 | jout=3
            nn.Conv2d(32, 32, 3),
            nn.ReLU()
        )
        self.convblock9 = nn.Sequential( #5x5 > 3x3 | RF 31 | jout=3
            nn.Conv2d(32, 64, 3),
            nn.ReLU()
        )
        self.convblock10 = nn.Sequential( #3x3 > 1x1 | RF 37 | jout=3
            nn.Conv2d(64, 64, 3),
            nn.ReLU()
        )
        self.convblock11 = nn.Sequential( #1x1 > 1x1 | RF 37 | jout=3
            nn.Conv2d(64, 10, 1),
        )



    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.convblock3(x)
        x = self.pool1(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.pool2(x)
        x = self.convblock7(x)
        x = self.convblock8(x)
        x = self.convblock9(x)
        x = self.convblock10(x)
        x = self.convblock11(x)
      
        x = x.view(-1, 10)
        return F.log_softmax(x)

In [3]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 32, 28, 28]             320
              ReLU-2           [-1, 32, 28, 28]               0
            Conv2d-3           [-1, 64, 28, 28]          18,496
              ReLU-4           [-1, 64, 28, 28]               0
            Conv2d-5          [-1, 128, 28, 28]          73,856
              ReLU-6          [-1, 128, 28, 28]               0
         MaxPool2d-7          [-1, 128, 14, 14]               0
            Conv2d-8           [-1, 32, 14, 14]           4,128
              ReLU-9           [-1, 32, 14, 14]               0
           Conv2d-10           [-1, 64, 14, 14]          18,496
             ReLU-11           [-1, 64, 14, 14]               0
           Conv2d-12          [-1, 128, 14, 14]          73,856
    

  return F.log_softmax(x)


In [4]:


torch.manual_seed(1)
batch_size = 64

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=batch_size, shuffle=True, **kwargs)


Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



In [5]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer, epoch):
    model.train()
    pbar = tqdm(train_loader)
    correct = 0
    processed = 0

    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()

        y_pred = model(data)
        pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()
        processed += len(data)

        pbar.set_description(desc= f'Epoch={epoch} Loss={loss.item()} batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}%')


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

In [6]:

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

for epoch in range(1, 16):
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

  return F.log_softmax(x)
Epoch=1 Loss=1.3924113512039185 batch_id=937 Accuracy=11.78%: 100%|██████████| 938/938 [00:26<00:00, 35.60it/s]



Test set: Average loss: 1.2818, Accuracy: 5603/10000 (56.03%)



Epoch=2 Loss=0.06642214208841324 batch_id=937 Accuracy=95.17%: 100%|██████████| 938/938 [00:22<00:00, 41.13it/s]



Test set: Average loss: 0.0563, Accuracy: 9806/10000 (98.06%)



Epoch=3 Loss=0.005162568762898445 batch_id=937 Accuracy=98.67%: 100%|██████████| 938/938 [00:23<00:00, 40.46it/s]



Test set: Average loss: 0.0477, Accuracy: 9852/10000 (98.52%)



Epoch=4 Loss=0.07617976516485214 batch_id=937 Accuracy=99.16%: 100%|██████████| 938/938 [00:23<00:00, 40.41it/s]



Test set: Average loss: 0.0389, Accuracy: 9876/10000 (98.76%)



Epoch=5 Loss=0.017995836213231087 batch_id=937 Accuracy=99.34%: 100%|██████████| 938/938 [00:23<00:00, 39.71it/s]



Test set: Average loss: 0.0413, Accuracy: 9874/10000 (98.74%)



Epoch=6 Loss=0.0050855474546551704 batch_id=937 Accuracy=99.48%: 100%|██████████| 938/938 [00:23<00:00, 40.30it/s]



Test set: Average loss: 0.0310, Accuracy: 9906/10000 (99.06%)



Epoch=7 Loss=0.00440952880308032 batch_id=937 Accuracy=99.59%: 100%|██████████| 938/938 [00:22<00:00, 40.80it/s]



Test set: Average loss: 0.0369, Accuracy: 9889/10000 (98.89%)



Epoch=8 Loss=0.014854561537504196 batch_id=937 Accuracy=99.71%: 100%|██████████| 938/938 [00:22<00:00, 41.10it/s]



Test set: Average loss: 0.0351, Accuracy: 9905/10000 (99.05%)



Epoch=9 Loss=0.00013227369345258921 batch_id=937 Accuracy=99.76%: 100%|██████████| 938/938 [00:23<00:00, 39.41it/s]



Test set: Average loss: 0.0341, Accuracy: 9900/10000 (99.00%)



Epoch=10 Loss=0.020665206015110016 batch_id=937 Accuracy=99.81%: 100%|██████████| 938/938 [00:22<00:00, 41.00it/s]



Test set: Average loss: 0.0353, Accuracy: 9900/10000 (99.00%)



Epoch=11 Loss=0.00038087277789600194 batch_id=937 Accuracy=99.85%: 100%|██████████| 938/938 [00:22<00:00, 41.30it/s]



Test set: Average loss: 0.0285, Accuracy: 9920/10000 (99.20%)



Epoch=12 Loss=0.0005236738361418247 batch_id=937 Accuracy=99.87%: 100%|██████████| 938/938 [00:22<00:00, 40.80it/s]



Test set: Average loss: 0.0375, Accuracy: 9914/10000 (99.14%)



Epoch=13 Loss=0.00023113461793400347 batch_id=937 Accuracy=99.88%: 100%|██████████| 938/938 [00:23<00:00, 40.69it/s]



Test set: Average loss: 0.0284, Accuracy: 9918/10000 (99.18%)



Epoch=14 Loss=2.557629341026768e-05 batch_id=937 Accuracy=99.92%: 100%|██████████| 938/938 [00:23<00:00, 40.43it/s]



Test set: Average loss: 0.0302, Accuracy: 9918/10000 (99.18%)



Epoch=15 Loss=0.005096742417663336 batch_id=937 Accuracy=99.89%: 100%|██████████| 938/938 [00:23<00:00, 40.36it/s]



Test set: Average loss: 0.0303, Accuracy: 9914/10000 (99.14%)

