
### Targets 


To create a model with higher Recpetive Field than 28

Parameters: Less than 8,000

Data Augmentations: RandomRotation ±15°

Regularization: DropOut

LR Scheduler: StepLR

No. of Epochs: 20

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchsummary import summary

In [2]:

torch.manual_seed(1)
batch_size = 64

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

train_data = datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.RandomRotation((-15.0,15.0), fill=(1,)),
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))]))
train_loader = torch.utils.data.DataLoader(train_data,batch_size=batch_size,
                                           shuffle=True, **kwargs)

test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))]))
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                          shuffle=True, **kwargs)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [3]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.layer1 = nn.Sequential(
            # 28x28x1 -> 28x28x8
            nn.Conv2d(in_channels=1, out_channels=8, kernel_size=3, padding=1, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=8),
            # 28x28x8 -> 28x28x8
            nn.Conv2d(in_channels=8, out_channels=8, kernel_size=3, padding=1,bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=8),
            # 28x28x8 -> 14x14x8
            nn.MaxPool2d(kernel_size=2, stride=2)
        )


        self.layer2 =  nn.Sequential(
            # 14x14x8 -> 12x12x8
            nn.Conv2d(in_channels=8, out_channels=12, kernel_size=3, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=12),
            # 12x12x12 -> 10x10x12
            nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=12),
            # 10x10x12 -> 8x8x12
            nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=12),
        )

        self.layer3 = nn.Sequential(
            # 8x8x12 -> 6x6x16
            nn.Conv2d(in_channels=12, out_channels=12, kernel_size=3, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=12),
            # 6x6x16 -> 4x4x16
            nn.Conv2d(in_channels=12, out_channels=16, kernel_size=3, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(num_features=16)
        )
        
        self.gap = nn.AdaptiveAvgPool2d(1)

        self.classifier = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=24, kernel_size=1),
            nn.Conv2d(in_channels=24, out_channels=10, kernel_size=1)
        )

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.gap(x)
        x = self.classifier(x)
        x = x.view(-1,10)

        return F.log_softmax(x, dim=1)

In [4]:

model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 28, 28]              72
              ReLU-2            [-1, 8, 28, 28]               0
       BatchNorm2d-3            [-1, 8, 28, 28]              16
            Conv2d-4            [-1, 8, 28, 28]             576
              ReLU-5            [-1, 8, 28, 28]               0
       BatchNorm2d-6            [-1, 8, 28, 28]              16
         MaxPool2d-7            [-1, 8, 14, 14]               0
            Conv2d-8           [-1, 12, 12, 12]             864
              ReLU-9           [-1, 12, 12, 12]               0
      BatchNorm2d-10           [-1, 12, 12, 12]              24
           Conv2d-11           [-1, 12, 10, 10]           1,296
             ReLU-12           [-1, 12, 10, 10]               0
      BatchNorm2d-13           [-1, 12, 10, 10]              24
           Conv2d-14             [-1, 1

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


In [5]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer):
    model.train()
    pbar = tqdm(train_loader)
    train_loss = 0
    correct = 0
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx}')

    train_loss /= len(train_loader)
    accuracy = 100. * correct / len(train_loader.dataset)

    return train_loss, accuracy




def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)

    return test_loss, accuracy

In [6]:
from torch.optim.lr_scheduler import StepLR

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
scheduler = StepLR(optimizer, step_size= 8, gamma= 0.25)

train_acc_list = []
test_acc_list = []
train_loss_list = []
test_loss_list = []

for epoch in range(1, 20):
    print(f'Epoch: {epoch}')
    train_loss, train_acc = train(model, device, train_loader, optimizer)
    test_loss, test_acc = test(model, device, test_loader)

    train_acc_list.append(train_acc)
    test_acc_list.append(test_acc)
    train_loss_list.append(train_loss)
    test_loss_list.append(test_loss)

    scheduler.step()
    
    print('\nTRAIN set: Average loss: {:.4f}, Train Accuracy: {:.2f}%'.format(train_loss,train_acc))
    print('TEST set: Average loss: {:.4f}, Test Accuracy: {:.2f}%'.format(test_loss,test_acc))

Epoch: 1


loss=0.2629401981830597 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.76it/s]



TRAIN set: Average loss: 0.2684, Train Accuracy: 92.16%
TEST set: Average loss: 0.0525, Test Accuracy: 98.42%
Epoch: 2


loss=0.23987913131713867 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.31it/s]



TRAIN set: Average loss: 0.0748, Train Accuracy: 97.68%
TEST set: Average loss: 0.0417, Test Accuracy: 98.65%
Epoch: 3


loss=0.023082271218299866 batch_id=937: 100%|██████████| 938/938 [00:31<00:00, 30.23it/s]



TRAIN set: Average loss: 0.0603, Train Accuracy: 98.15%
TEST set: Average loss: 0.0322, Test Accuracy: 99.03%
Epoch: 4


loss=0.028511736541986465 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.32it/s]



TRAIN set: Average loss: 0.0535, Train Accuracy: 98.31%
TEST set: Average loss: 0.0285, Test Accuracy: 99.19%
Epoch: 5


loss=0.01973925158381462 batch_id=937: 100%|██████████| 938/938 [00:31<00:00, 29.58it/s]



TRAIN set: Average loss: 0.0480, Train Accuracy: 98.47%
TEST set: Average loss: 0.0279, Test Accuracy: 99.05%
Epoch: 6


loss=0.1930864304304123 batch_id=937: 100%|██████████| 938/938 [00:31<00:00, 30.02it/s]



TRAIN set: Average loss: 0.0448, Train Accuracy: 98.61%
TEST set: Average loss: 0.0222, Test Accuracy: 99.34%
Epoch: 7


loss=0.08689243346452713 batch_id=937: 100%|██████████| 938/938 [00:31<00:00, 29.55it/s]



TRAIN set: Average loss: 0.0426, Train Accuracy: 98.68%
TEST set: Average loss: 0.0324, Test Accuracy: 99.09%
Epoch: 8


loss=0.18967634439468384 batch_id=937: 100%|██████████| 938/938 [00:31<00:00, 29.89it/s]



TRAIN set: Average loss: 0.0387, Train Accuracy: 98.78%
TEST set: Average loss: 0.0243, Test Accuracy: 99.24%
Epoch: 9


loss=0.057829007506370544 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.63it/s]



TRAIN set: Average loss: 0.0306, Train Accuracy: 98.98%
TEST set: Average loss: 0.0179, Test Accuracy: 99.49%
Epoch: 10


loss=0.004349968861788511 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.56it/s]



TRAIN set: Average loss: 0.0271, Train Accuracy: 99.15%
TEST set: Average loss: 0.0185, Test Accuracy: 99.45%
Epoch: 11


loss=0.012460101395845413 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.98it/s]



TRAIN set: Average loss: 0.0260, Train Accuracy: 99.19%
TEST set: Average loss: 0.0180, Test Accuracy: 99.53%
Epoch: 12


loss=0.0015444201417267323 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.69it/s]



TRAIN set: Average loss: 0.0255, Train Accuracy: 99.20%
TEST set: Average loss: 0.0182, Test Accuracy: 99.48%
Epoch: 13


loss=0.1198858767747879 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.60it/s]



TRAIN set: Average loss: 0.0269, Train Accuracy: 99.16%
TEST set: Average loss: 0.0173, Test Accuracy: 99.49%
Epoch: 14


loss=0.0015916537959128618 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.67it/s]



TRAIN set: Average loss: 0.0250, Train Accuracy: 99.26%
TEST set: Average loss: 0.0180, Test Accuracy: 99.48%
Epoch: 15


loss=0.005620303563773632 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.61it/s]



TRAIN set: Average loss: 0.0247, Train Accuracy: 99.20%
TEST set: Average loss: 0.0192, Test Accuracy: 99.43%
Epoch: 16


loss=0.006961610168218613 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.58it/s]



TRAIN set: Average loss: 0.0244, Train Accuracy: 99.25%
TEST set: Average loss: 0.0181, Test Accuracy: 99.52%
Epoch: 17


loss=0.006842421367764473 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.56it/s]



TRAIN set: Average loss: 0.0224, Train Accuracy: 99.30%
TEST set: Average loss: 0.0169, Test Accuracy: 99.49%
Epoch: 18


loss=0.07261206954717636 batch_id=937: 100%|██████████| 938/938 [00:30<00:00, 30.67it/s]



TRAIN set: Average loss: 0.0223, Train Accuracy: 99.31%
TEST set: Average loss: 0.0172, Test Accuracy: 99.48%
Epoch: 19


loss=0.0008975503733381629 batch_id=937: 100%|██████████| 938/938 [00:31<00:00, 29.70it/s]



TRAIN set: Average loss: 0.0213, Train Accuracy: 99.36%
TEST set: Average loss: 0.0173, Test Accuracy: 99.47%


Acheivements

Total Parameters Used: 7,946

Train Accuracy: 99.36

Test Accuracy: 99.52

Consistent From: 10th Epoch to End

Data Augmentation: Randam Rotation of ±15°

LR Scheduler: StepLR

Analysis

The Increase in Receptive Field may have helped, as we were able to cross the 99.5% barrier and the model is even more stable now.