### Target

Parameters: Less than 10,000

Data Augmentations: Yes

Regularization: Yes

LR Scheduler: Yes

No. of Epochs: 20

In [2]:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchsummary import summary

In [3]:
train = datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor()]))

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [3]:
torch.manual_seed(1)
batch_size = 64

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

train_data = datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,)),
                        transforms.RandomRotation(7)]))
train_loader = torch.utils.data.DataLoader(train_data,batch_size=batch_size,
                                           shuffle=True, **kwargs)

test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))]))
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                          shuffle=True, **kwargs)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [4]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.layer1 = nn.Sequential(
            # 28x28x1 -> 26x26x4
            nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(num_features=4),
            nn.ReLU(),
            nn.Dropout(0.043),
            # 26x26x4 -> 24x24x8
            nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(num_features=8),
            nn.ReLU(),
            nn.Dropout(0.043),
            # 24x24x8 -> 12x12x8
            nn.MaxPool2d(kernel_size=2, stride=2)
        )


        self.layer2 =  nn.Sequential(
            # 12x12x8 -> 10x10x16
            nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, bias=False),
            nn.BatchNorm2d(num_features=16),
            nn.ReLU(),
            nn.Dropout(0.043),
            # 10x10x16 -> 8x8x20
            nn.Conv2d(in_channels=16, out_channels=20, kernel_size=3, bias=False),
            nn.BatchNorm2d(num_features=20),
            nn.ReLU(),
            nn.Dropout(0.043),
            # 8x8x20 -> 6x6x24
            nn.Conv2d(in_channels=20, out_channels=24, kernel_size=3, bias=False),
            nn.BatchNorm2d(num_features=24),
            nn.ReLU(),
            nn.Dropout(0.043),
            # 6x6x24 -> 6x6x10
            nn.Conv2d(in_channels=24, out_channels=10, kernel_size=1, bias=False)
        )

        self.gap = nn.AdaptiveAvgPool2d(1)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.gap(x)
        x = x.view(-1,10)

        return F.log_softmax(x, dim=1)

In [5]:
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 4, 28, 28]              36
       BatchNorm2d-2            [-1, 4, 28, 28]               8
              ReLU-3            [-1, 4, 28, 28]               0
           Dropout-4            [-1, 4, 28, 28]               0
            Conv2d-5            [-1, 8, 28, 28]             288
       BatchNorm2d-6            [-1, 8, 28, 28]              16
              ReLU-7            [-1, 8, 28, 28]               0
           Dropout-8            [-1, 8, 28, 28]               0
         MaxPool2d-9            [-1, 8, 14, 14]               0
           Conv2d-10           [-1, 16, 12, 12]           1,152
      BatchNorm2d-11           [-1, 16, 12, 12]              32
             ReLU-12           [-1, 16, 12, 12]               0
          Dropout-13           [-1, 16, 12, 12]               0
           Conv2d-14           [-1, 20,

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


In [10]:
from tqdm import tqdm
def train(model, device, train_loader, optimizer):
    model.train()
    pbar = tqdm(train_loader)
    train_loss = 0
    correct = 0
    for batch_idx, (data, target) in enumerate(pbar):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
        correct += pred.eq(target.view_as(pred)).sum().item()

        pbar.set_description(desc= f'loss={loss.item()} batch_id={batch_idx}')

    train_loss /= len(train_loader)
    accuracy = 100. * correct / len(train_loader.dataset)

    return train_loss, accuracy




def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    accuracy = 100. * correct / len(test_loader.dataset)

    return test_loss, accuracy

In [11]:
from torch.optim.lr_scheduler import StepLR

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
scheduler = StepLR(optimizer, step_size= 8, gamma= 0.25)

train_acc_list = []
test_acc_list = []
train_loss_list = []
test_loss_list = []

for epoch in range(1, 20):
    print(f'Epoch: {epoch}')
    train_loss, train_acc = train(model, device, train_loader, optimizer)
    test_loss, test_acc = test(model, device, test_loader)

    train_acc_list.append(train_acc)
    test_acc_list.append(test_acc)
    train_loss_list.append(train_loss)
    test_loss_list.append(test_loss)

    scheduler.step()
    
    print('\nTRAIN set: Average loss: {:.4f}, Train Accuracy: {:.2f}%'.format(train_loss,train_acc))
    print('TEST set: Average loss: {:.4f}, Test Accuracy: {:.2f}%'.format(test_loss,test_acc))
    print('~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~')

Epoch: 1


loss=0.08287771791219711 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.36it/s]



TRAIN set: Average loss: 0.5742, Train Accuracy: 84.85%
TEST set: Average loss: 0.2220, Test Accuracy: 92.73%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 2


loss=0.1025698110461235 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.31it/s]



TRAIN set: Average loss: 0.1067, Train Accuracy: 97.19%
TEST set: Average loss: 0.0596, Test Accuracy: 98.29%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 3


loss=0.03588679060339928 batch_id=937: 100%|██████████| 938/938 [00:58<00:00, 16.16it/s]



TRAIN set: Average loss: 0.0797, Train Accuracy: 97.67%
TEST set: Average loss: 0.0455, Test Accuracy: 98.62%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 4


loss=0.01926301047205925 batch_id=937: 100%|██████████| 938/938 [00:56<00:00, 16.50it/s]



TRAIN set: Average loss: 0.0654, Train Accuracy: 98.06%
TEST set: Average loss: 0.0421, Test Accuracy: 98.73%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 5


loss=0.06255189329385757 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.39it/s]



TRAIN set: Average loss: 0.0610, Train Accuracy: 98.22%
TEST set: Average loss: 0.0462, Test Accuracy: 98.49%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 6


loss=0.08191831409931183 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.31it/s]



TRAIN set: Average loss: 0.0540, Train Accuracy: 98.42%
TEST set: Average loss: 0.0400, Test Accuracy: 98.66%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 7


loss=0.06497959792613983 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.26it/s]



TRAIN set: Average loss: 0.0512, Train Accuracy: 98.45%
TEST set: Average loss: 0.0299, Test Accuracy: 99.00%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 8


loss=0.007342031225562096 batch_id=937: 100%|██████████| 938/938 [00:58<00:00, 16.15it/s]



TRAIN set: Average loss: 0.0473, Train Accuracy: 98.59%
TEST set: Average loss: 0.0293, Test Accuracy: 99.06%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 9


loss=0.06644164770841599 batch_id=937: 100%|██████████| 938/938 [00:58<00:00, 15.99it/s]



TRAIN set: Average loss: 0.0384, Train Accuracy: 98.83%
TEST set: Average loss: 0.0235, Test Accuracy: 99.26%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 10


loss=0.012741222977638245 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.24it/s]



TRAIN set: Average loss: 0.0364, Train Accuracy: 98.91%
TEST set: Average loss: 0.0234, Test Accuracy: 99.25%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 11


loss=0.08595049381256104 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.30it/s]



TRAIN set: Average loss: 0.0372, Train Accuracy: 98.88%
TEST set: Average loss: 0.0245, Test Accuracy: 99.21%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 12


loss=0.049455881118774414 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.25it/s]



TRAIN set: Average loss: 0.0364, Train Accuracy: 98.89%
TEST set: Average loss: 0.0219, Test Accuracy: 99.31%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 13


loss=0.008962673135101795 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.37it/s]



TRAIN set: Average loss: 0.0354, Train Accuracy: 98.96%
TEST set: Average loss: 0.0223, Test Accuracy: 99.34%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 14


loss=0.11442120373249054 batch_id=937: 100%|██████████| 938/938 [00:58<00:00, 16.10it/s]



TRAIN set: Average loss: 0.0362, Train Accuracy: 98.93%
TEST set: Average loss: 0.0216, Test Accuracy: 99.34%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 15


loss=0.004402080550789833 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.29it/s]



TRAIN set: Average loss: 0.0349, Train Accuracy: 98.92%
TEST set: Average loss: 0.0226, Test Accuracy: 99.31%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 16


loss=0.03236202150583267 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.29it/s]



TRAIN set: Average loss: 0.0345, Train Accuracy: 98.94%
TEST set: Average loss: 0.0216, Test Accuracy: 99.31%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 17


loss=0.04030097648501396 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.23it/s]



TRAIN set: Average loss: 0.0320, Train Accuracy: 99.03%
TEST set: Average loss: 0.0205, Test Accuracy: 99.36%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 18


loss=0.009964847937226295 batch_id=937: 100%|██████████| 938/938 [00:57<00:00, 16.18it/s]



TRAIN set: Average loss: 0.0322, Train Accuracy: 99.04%
TEST set: Average loss: 0.0202, Test Accuracy: 99.37%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Epoch: 19


loss=0.05894472822546959 batch_id=937: 100%|██████████| 938/938 [00:58<00:00, 16.05it/s]



TRAIN set: Average loss: 0.0313, Train Accuracy: 99.06%
TEST set: Average loss: 0.0199, Test Accuracy: 99.36%
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


### Achievements

Total Parameters used: 9,060


Best Train Accuracy: 99.06%

Best Test Accuracy: 99.37%

Consistency: Accuracy greater than 99.4 consistent from 9th epoch to 14th epoch

Data Augmentation: Random Rotation of 7 degrees.

Learning Rate Scheduler: StepLR used

Analysis

The StepLR helped in stabilizing the learning, by reducing the learning rate to 10% after every 6th Epochs.

Why after every 6 epochs? We observed that the Loss was bouncing up and down, from 6th/7th epoch, so reducing the LR at that point would made the training stable.

The Random Rotation was applied with ±7 degrees.