## Target 

To create a basic skeleton of the the model

Parameters: Less than 10,000

Data Augmentations: None

Regularization: None

LR Scheduler: None

No. of Epochs: less than 20

In [2]:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torchsummary import summary


In [4]:
train = datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor()]))

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [5]:
train_data = train.train_data
train_data = train.transform(train_data.numpy())

print('[Train]')
print(' - Numpy Shape:', train.train_data.cpu().numpy().shape)
print(' - Tensor Shape:', train.train_data.size())
print(' - min:', torch.min(train_data))
print(' - max:', torch.max(train_data))
print(' - mean:', torch.mean(train_data))
print(' - std:', torch.std(train_data))
print(' - var:', torch.var(train_data))



[Train]
 - Numpy Shape: (60000, 28, 28)
 - Tensor Shape: torch.Size([60000, 28, 28])
 - min: tensor(0.)
 - max: tensor(1.)
 - mean: tensor(0.1307)
 - std: tensor(0.3081)
 - var: tensor(0.0949)


In [6]:
torch.manual_seed(1)
batch_size = 64

use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

train_data = datasets.MNIST('../data', train=True, download=True,
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))]))
train_loader = torch.utils.data.DataLoader(train_data,batch_size=batch_size,
                                           shuffle=True, **kwargs)

test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))]))
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
                                          shuffle=True, **kwargs)

### MODEL

In [7]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.layer1 = nn.Sequential(
            # 28x28x1 -> 26x26x4
            nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(num_features=4),
            nn.ReLU(),
            # 26x26x4 -> 24x24x8
            nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(num_features=8),
            nn.ReLU(),
            # 24x24x8 -> 12x12x8
            nn.MaxPool2d(kernel_size=2, stride=2)
        )


        self.layer2 =  nn.Sequential(
            # 12x12x8 -> 10x10x12
            nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, bias=False),
            nn.BatchNorm2d(num_features=16),
            nn.ReLU(),
            # 10x10x12 -> 8x8x12
            nn.Conv2d(in_channels=16, out_channels=20, kernel_size=3, bias=False),
            nn.BatchNorm2d(num_features=20),
            nn.ReLU(),
            # 8x8x12 -> 6x6x16
            nn.Conv2d(in_channels=20, out_channels=24, kernel_size=3, bias=False)
        )
        
        self.gap = nn.AdaptiveAvgPool2d(1)

        self.classifier = nn.Sequential(  # Similar
            nn.Linear(in_features=24, out_features=32),
            nn.Linear(in_features=32, out_features=10)
        )

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.gap(x)
        x = x.view(-1,24)
        x = self.classifier(x)

        return F.log_softmax(x, dim=1)

In [8]:
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 4, 28, 28]              36
       BatchNorm2d-2            [-1, 4, 28, 28]               8
              ReLU-3            [-1, 4, 28, 28]               0
            Conv2d-4            [-1, 8, 28, 28]             288
       BatchNorm2d-5            [-1, 8, 28, 28]              16
              ReLU-6            [-1, 8, 28, 28]               0
         MaxPool2d-7            [-1, 8, 14, 14]               0
            Conv2d-8           [-1, 16, 12, 12]           1,152
       BatchNorm2d-9           [-1, 16, 12, 12]              32
             ReLU-10           [-1, 16, 12, 12]               0
           Conv2d-11           [-1, 20, 10, 10]           2,880
      BatchNorm2d-12           [-1, 20, 10, 10]              40
             ReLU-13           [-1, 20, 10, 10]               0
           Conv2d-14             [-1, 2

  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


### Training Model

In [9]:
from tqdm import tqdm

train_losses = []
test_losses = []
train_acc = []
test_acc = []

def train(model, device, train_loader, optimizer, epoch):
  model.train()
  pbar = tqdm(train_loader)
  correct = 0
  processed = 0
  for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)

    # Init
    optimizer.zero_grad()
    # In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. 
    # Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly.

    # Predict
    y_pred = model(data)

    # Calculate loss
    loss = F.nll_loss(y_pred, target)
    train_losses.append(loss)

    # Backpropagation
    loss.backward()
    optimizer.step()

    # Update pbar-tqdm
    
    pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
    correct += pred.eq(target.view_as(pred)).sum().item()
    processed += len(data)

    pbar.set_description(desc= f'Loss={loss.item()} Batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}')
    train_acc.append(100*correct/processed)

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_losses.append(test_loss)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    test_acc.append(100. * correct / len(test_loader.dataset))

In [11]:
model =  Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
EPOCHS = 20
for epoch in range(EPOCHS):
    print("EPOCH:", epoch)
    train(model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

EPOCH: 0


Loss=0.12104622274637222 Batch_id=937 Accuracy=80.44: 100%|██████████| 938/938 [00:52<00:00, 17.88it/s]



Test set: Average loss: 0.1162, Accuracy: 9639/10000 (96.39%)

EPOCH: 1


Loss=0.0059590754099190235 Batch_id=937 Accuracy=97.31: 100%|██████████| 938/938 [00:52<00:00, 17.77it/s]



Test set: Average loss: 0.0923, Accuracy: 9693/10000 (96.93%)

EPOCH: 2


Loss=0.05572229623794556 Batch_id=937 Accuracy=97.94: 100%|██████████| 938/938 [00:53<00:00, 17.37it/s]



Test set: Average loss: 0.0605, Accuracy: 9800/10000 (98.00%)

EPOCH: 3


Loss=0.0696786567568779 Batch_id=937 Accuracy=98.29: 100%|██████████| 938/938 [00:53<00:00, 17.61it/s]



Test set: Average loss: 0.0452, Accuracy: 9860/10000 (98.60%)

EPOCH: 4


Loss=0.11043111979961395 Batch_id=937 Accuracy=98.49: 100%|██████████| 938/938 [01:00<00:00, 15.47it/s]



Test set: Average loss: 0.0560, Accuracy: 9823/10000 (98.23%)

EPOCH: 5


Loss=0.03366141393780708 Batch_id=937 Accuracy=98.56: 100%|██████████| 938/938 [01:07<00:00, 13.96it/s]



Test set: Average loss: 0.0409, Accuracy: 9870/10000 (98.70%)

EPOCH: 6


Loss=0.01512906327843666 Batch_id=937 Accuracy=98.75: 100%|██████████| 938/938 [01:04<00:00, 14.48it/s]



Test set: Average loss: 0.0446, Accuracy: 9866/10000 (98.66%)

EPOCH: 7


Loss=0.014441382139921188 Batch_id=937 Accuracy=98.75: 100%|██████████| 938/938 [01:03<00:00, 14.80it/s]



Test set: Average loss: 0.0571, Accuracy: 9832/10000 (98.32%)

EPOCH: 8


Loss=0.0056274766102433205 Batch_id=937 Accuracy=98.86: 100%|██████████| 938/938 [01:05<00:00, 14.29it/s]



Test set: Average loss: 0.0405, Accuracy: 9865/10000 (98.65%)

EPOCH: 9


Loss=0.002016461221501231 Batch_id=937 Accuracy=98.96: 100%|██████████| 938/938 [01:03<00:00, 14.86it/s]



Test set: Average loss: 0.0352, Accuracy: 9883/10000 (98.83%)

EPOCH: 10


Loss=0.06427132338285446 Batch_id=937 Accuracy=98.94: 100%|██████████| 938/938 [01:05<00:00, 14.37it/s]



Test set: Average loss: 0.0342, Accuracy: 9888/10000 (98.88%)

EPOCH: 11


Loss=0.0019323949236422777 Batch_id=937 Accuracy=98.99: 100%|██████████| 938/938 [01:02<00:00, 15.07it/s]



Test set: Average loss: 0.0395, Accuracy: 9880/10000 (98.80%)

EPOCH: 12


Loss=0.032635197043418884 Batch_id=937 Accuracy=99.08: 100%|██████████| 938/938 [01:01<00:00, 15.14it/s]



Test set: Average loss: 0.0309, Accuracy: 9901/10000 (99.01%)

EPOCH: 13


Loss=0.009749800898134708 Batch_id=937 Accuracy=99.12: 100%|██████████| 938/938 [01:03<00:00, 14.82it/s]



Test set: Average loss: 0.0343, Accuracy: 9893/10000 (98.93%)

EPOCH: 14


Loss=0.012785900384187698 Batch_id=937 Accuracy=99.14: 100%|██████████| 938/938 [00:59<00:00, 15.71it/s]



Test set: Average loss: 0.0334, Accuracy: 9902/10000 (99.02%)

EPOCH: 15


Loss=0.024288177490234375 Batch_id=937 Accuracy=99.11: 100%|██████████| 938/938 [01:03<00:00, 14.73it/s]



Test set: Average loss: 0.0363, Accuracy: 9900/10000 (99.00%)

EPOCH: 16


Loss=0.08218995481729507 Batch_id=937 Accuracy=99.12: 100%|██████████| 938/938 [01:01<00:00, 15.33it/s]



Test set: Average loss: 0.0328, Accuracy: 9901/10000 (99.01%)

EPOCH: 17


Loss=0.00522835087031126 Batch_id=937 Accuracy=99.20: 100%|██████████| 938/938 [00:58<00:00, 16.03it/s]



Test set: Average loss: 0.0578, Accuracy: 9835/10000 (98.35%)

EPOCH: 18


Loss=0.004864636342972517 Batch_id=937 Accuracy=99.23: 100%|██████████| 938/938 [00:57<00:00, 16.24it/s]



Test set: Average loss: 0.0290, Accuracy: 9912/10000 (99.12%)

EPOCH: 19


Loss=0.29570427536964417 Batch_id=937 Accuracy=99.28: 100%|██████████| 938/938 [00:57<00:00, 16.19it/s]



Test set: Average loss: 0.0355, Accuracy: 9889/10000 (98.89%)



### Results

Total Parameters used: 9,902

Best Train Accuracy: 99.28%

Best Test Accuracy: 99.12%

Consistency: Did not achieve any consistency with test accuracy greater than 99.4.
How ever tried to run till 20 epochs to understand if our model performs well but it deosnt go beyond 99.1 %.


Analysis:


With the base network architecture, we achieved a test accuracy of 99% which is not even close to the target test accuracy to achieve. We plan to improve the test accuracy by doing the following approaches:

Split the process into two different networks by varying the number of parameters.

Keep the same network architecture, but removing the fully connected layers.
Reduce the number of parameters in the model.

