
### Target:
1. Introduce batch normalisation for performance improvement
2. Avoid any overfitting issues
3. Reduce the number of parameters 

### Results:
Parameters:7,216  
Best Train Accuracy: 99.03 % (15th Epoch)   
Best Test Accuracy: 99.28% (15th Epoch)   

### Analysis:
1. In the code 2 of this step, we are reducing the number of input and output channels to reduce the number of parameters and adding batch normalisation. The batch normalisation improves the performance of model training under each epoch
2. The code 2 model had overfitting issues still, with the difference between the training and testing accuracy being similar
3. To trim the overfitting issues, we introduced dropout with a dropout rate of 0.05 (This is added in the model.py) (code 3)
4. Now to reduce the number of parameters under 8k, 1x1 convolutions (code 4) were introduced along with additional convolution layer. The main purpose of a 1x1 convolution is to transform the channel dimension of the input feature maps. By altering the number of channels, it can change the dimensionality of the feature space.Now the number of parameters have reduced under 8k.
5. The training accuracy slowly in the last epoch crosses the 99% accuracy but test accuracy is not still hitting above 99.3%
6. Now we have attained all the possible approaches with the model building, now we can look to improve the accuracy by image augmentation and learning rate changes.

In [2]:
#!pip install torchsummary
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import transforms
from utils import data_loader,data_statistics,modelsummary,train_model,test_model
from model import *

In [3]:
# CUDA?
cuda = torch.cuda.is_available()
print("CUDA Available?", cuda)
device = torch.device("cuda" if cuda else "cpu")

CUDA Available? True


In [4]:
# Train Phase transformations
train_transforms = transforms.Compose([
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,)) # The mean and std have to be sequences (e.g., tuples), therefore you should add a comma after the values.
                                       ])

# Test Phase transformations
test_transforms = transforms.Compose([
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,))
                                       ])


In [5]:
batch_size=128
train_loader,train,test_loader,test= data_loader(train_transforms,test_transforms,batch_size)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 226405302.09it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 110626204.41it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 125844155.26it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 24114593.38it/s]


Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

CUDA Available? True




#### Code 2- Model 2 - Adding Batch Normalisation

In [6]:
model =  Model_2().to(device)
modelsummary(model)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 4, 28, 28]              40
              ReLU-2            [-1, 4, 28, 28]               0
       BatchNorm2d-3            [-1, 4, 28, 28]               8
            Conv2d-4            [-1, 8, 28, 28]             296
              ReLU-5            [-1, 8, 28, 28]               0
       BatchNorm2d-6            [-1, 8, 28, 28]              16
            Conv2d-7           [-1, 12, 28, 28]             876
              ReLU-8           [-1, 12, 28, 28]               0
       BatchNorm2d-9           [-1, 12, 28, 28]              24
           Conv2d-10           [-1, 16, 28, 28]           1,744
             ReLU-11           [-1, 16, 28, 28]               0
      BatchNorm2d-12           [-1, 16, 28, 28]              32
        MaxPool2d-13           [-1, 16, 14, 14]               0
           Conv2d-14           [-1, 32,

In [7]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
EPOCHS = 15
for epoch in range(EPOCHS):
    print("EPOCH:", epoch+1)
    train_model(model, device, train_loader, optimizer, epoch)
    test_model(model, device, test_loader)

EPOCH: 1


Loss=0.05841495096683502 Batch_id=468 Accuracy=92.86: 100%|██████████| 469/469 [00:17<00:00, 27.01it/s]



Test set: Average loss: 0.0655, Accuracy: 9812/10000 (98.12%)

EPOCH: 2


Loss=0.05855032801628113 Batch_id=468 Accuracy=98.39: 100%|██████████| 469/469 [00:18<00:00, 26.04it/s]



Test set: Average loss: 0.0623, Accuracy: 9796/10000 (97.96%)

EPOCH: 3


Loss=0.06311937421560287 Batch_id=468 Accuracy=98.81: 100%|██████████| 469/469 [00:17<00:00, 26.85it/s]



Test set: Average loss: 0.0441, Accuracy: 9864/10000 (98.64%)

EPOCH: 4


Loss=0.010189794935286045 Batch_id=468 Accuracy=99.04: 100%|██████████| 469/469 [00:18<00:00, 25.91it/s]



Test set: Average loss: 0.0399, Accuracy: 9876/10000 (98.76%)

EPOCH: 5


Loss=0.0072691370733082294 Batch_id=468 Accuracy=99.23: 100%|██████████| 469/469 [00:16<00:00, 27.64it/s]



Test set: Average loss: 0.0353, Accuracy: 9880/10000 (98.80%)

EPOCH: 6


Loss=0.006372187286615372 Batch_id=468 Accuracy=99.31: 100%|██████████| 469/469 [00:18<00:00, 25.48it/s]



Test set: Average loss: 0.0315, Accuracy: 9889/10000 (98.89%)

EPOCH: 7


Loss=0.0034480856265872717 Batch_id=468 Accuracy=99.42: 100%|██████████| 469/469 [00:17<00:00, 27.54it/s]



Test set: Average loss: 0.0303, Accuracy: 9905/10000 (99.05%)

EPOCH: 8


Loss=0.04119519144296646 Batch_id=468 Accuracy=99.53: 100%|██████████| 469/469 [00:17<00:00, 26.33it/s]



Test set: Average loss: 0.0321, Accuracy: 9887/10000 (98.87%)

EPOCH: 9


Loss=0.01908346451818943 Batch_id=468 Accuracy=99.55: 100%|██████████| 469/469 [00:16<00:00, 27.70it/s]



Test set: Average loss: 0.0321, Accuracy: 9899/10000 (98.99%)

EPOCH: 10


Loss=0.02090498059988022 Batch_id=468 Accuracy=99.63: 100%|██████████| 469/469 [00:18<00:00, 26.04it/s]



Test set: Average loss: 0.0306, Accuracy: 9902/10000 (99.02%)

EPOCH: 11


Loss=0.0063971406780183315 Batch_id=468 Accuracy=99.70: 100%|██████████| 469/469 [00:17<00:00, 26.10it/s]



Test set: Average loss: 0.0286, Accuracy: 9919/10000 (99.19%)

EPOCH: 12


Loss=0.011632882058620453 Batch_id=468 Accuracy=99.78: 100%|██████████| 469/469 [00:17<00:00, 26.79it/s]



Test set: Average loss: 0.0290, Accuracy: 9900/10000 (99.00%)

EPOCH: 13


Loss=0.0035731634125113487 Batch_id=468 Accuracy=99.77: 100%|██████████| 469/469 [00:18<00:00, 25.86it/s]



Test set: Average loss: 0.0326, Accuracy: 9905/10000 (99.05%)

EPOCH: 14


Loss=0.009480108506977558 Batch_id=468 Accuracy=99.84: 100%|██████████| 469/469 [00:18<00:00, 24.99it/s]



Test set: Average loss: 0.0319, Accuracy: 9907/10000 (99.07%)

EPOCH: 15


Loss=0.0012384664732962847 Batch_id=468 Accuracy=99.87: 100%|██████████| 469/469 [00:17<00:00, 26.50it/s]



Test set: Average loss: 0.0332, Accuracy: 9897/10000 (98.97%)



#### Code 3- Model 3 -Drop out introduced to avoid overfitting that is seen in the model 2

In [8]:
model =  Model_3().to(device)
modelsummary(model)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 4, 28, 28]              40
              ReLU-2            [-1, 4, 28, 28]               0
       BatchNorm2d-3            [-1, 4, 28, 28]               8
           Dropout-4            [-1, 4, 28, 28]               0
            Conv2d-5            [-1, 8, 28, 28]             296
              ReLU-6            [-1, 8, 28, 28]               0
       BatchNorm2d-7            [-1, 8, 28, 28]              16
           Dropout-8            [-1, 8, 28, 28]               0
            Conv2d-9           [-1, 12, 28, 28]             876
             ReLU-10           [-1, 12, 28, 28]               0
      BatchNorm2d-11           [-1, 12, 28, 28]              24
          Dropout-12           [-1, 12, 28, 28]               0
           Conv2d-13           [-1, 16, 28, 28]           1,744
             ReLU-14           [-1, 16,

In [9]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
EPOCHS = 15
for epoch in range(EPOCHS):
    print("EPOCH:", epoch+1)
    train_model(model, device, train_loader, optimizer, epoch)
    test_model(model, device, test_loader)

EPOCH: 1


Loss=0.0690988227725029 Batch_id=468 Accuracy=89.52: 100%|██████████| 469/469 [00:19<00:00, 24.67it/s]



Test set: Average loss: 0.0980, Accuracy: 9685/10000 (96.85%)

EPOCH: 2


Loss=0.06392782181501389 Batch_id=468 Accuracy=97.81: 100%|██████████| 469/469 [00:17<00:00, 26.56it/s]



Test set: Average loss: 0.0867, Accuracy: 9714/10000 (97.14%)

EPOCH: 3


Loss=0.03283487632870674 Batch_id=468 Accuracy=98.37: 100%|██████████| 469/469 [00:18<00:00, 24.81it/s]



Test set: Average loss: 0.0529, Accuracy: 9835/10000 (98.35%)

EPOCH: 4


Loss=0.03550327941775322 Batch_id=468 Accuracy=98.58: 100%|██████████| 469/469 [00:17<00:00, 26.33it/s]



Test set: Average loss: 0.0505, Accuracy: 9843/10000 (98.43%)

EPOCH: 5


Loss=0.01300305500626564 Batch_id=468 Accuracy=98.78: 100%|██████████| 469/469 [00:18<00:00, 25.99it/s]



Test set: Average loss: 0.0345, Accuracy: 9897/10000 (98.97%)

EPOCH: 6


Loss=0.004844998009502888 Batch_id=468 Accuracy=98.91: 100%|██████████| 469/469 [00:17<00:00, 26.47it/s]



Test set: Average loss: 0.0333, Accuracy: 9899/10000 (98.99%)

EPOCH: 7


Loss=0.00875973328948021 Batch_id=468 Accuracy=98.99: 100%|██████████| 469/469 [00:17<00:00, 26.37it/s]



Test set: Average loss: 0.0290, Accuracy: 9910/10000 (99.10%)

EPOCH: 8


Loss=0.0042760116048157215 Batch_id=468 Accuracy=99.09: 100%|██████████| 469/469 [00:18<00:00, 24.81it/s]



Test set: Average loss: 0.0370, Accuracy: 9884/10000 (98.84%)

EPOCH: 9


Loss=0.025070438161492348 Batch_id=468 Accuracy=99.09: 100%|██████████| 469/469 [00:17<00:00, 27.38it/s]



Test set: Average loss: 0.0241, Accuracy: 9931/10000 (99.31%)

EPOCH: 10


Loss=0.026011643931269646 Batch_id=468 Accuracy=99.12: 100%|██████████| 469/469 [00:18<00:00, 25.24it/s]



Test set: Average loss: 0.0251, Accuracy: 9917/10000 (99.17%)

EPOCH: 11


Loss=0.01751176081597805 Batch_id=468 Accuracy=99.23: 100%|██████████| 469/469 [00:18<00:00, 26.04it/s]



Test set: Average loss: 0.0202, Accuracy: 9933/10000 (99.33%)

EPOCH: 12


Loss=0.007796097546815872 Batch_id=468 Accuracy=99.27: 100%|██████████| 469/469 [00:18<00:00, 25.85it/s]



Test set: Average loss: 0.0255, Accuracy: 9915/10000 (99.15%)

EPOCH: 13


Loss=0.017728567123413086 Batch_id=468 Accuracy=99.29: 100%|██████████| 469/469 [00:17<00:00, 26.48it/s]



Test set: Average loss: 0.0270, Accuracy: 9910/10000 (99.10%)

EPOCH: 14


Loss=0.002061249455437064 Batch_id=468 Accuracy=99.33: 100%|██████████| 469/469 [00:17<00:00, 26.89it/s]



Test set: Average loss: 0.0223, Accuracy: 9930/10000 (99.30%)

EPOCH: 15


Loss=0.0064957826398313046 Batch_id=468 Accuracy=99.30: 100%|██████████| 469/469 [00:19<00:00, 24.45it/s]



Test set: Average loss: 0.0209, Accuracy: 9937/10000 (99.37%)



#### Code 4- Model 4 - Adding 1x1 Convolution for channel reduction

In [10]:
model =  Model_4().to(device)
modelsummary(model)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1            [-1, 8, 28, 28]              80
              ReLU-2            [-1, 8, 28, 28]               0
       BatchNorm2d-3            [-1, 8, 28, 28]              16
           Dropout-4            [-1, 8, 28, 28]               0
            Conv2d-5           [-1, 12, 28, 28]             876
              ReLU-6           [-1, 12, 28, 28]               0
       BatchNorm2d-7           [-1, 12, 28, 28]              24
           Dropout-8           [-1, 12, 28, 28]               0
            Conv2d-9            [-1, 6, 28, 28]              78
        MaxPool2d-10            [-1, 6, 14, 14]               0
           Conv2d-11           [-1, 12, 12, 12]             660
             ReLU-12           [-1, 12, 12, 12]               0
      BatchNorm2d-13           [-1, 12, 12, 12]              24
          Dropout-14           [-1, 12,

  return F.log_softmax(x)


In [11]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
EPOCHS = 15
for epoch in range(EPOCHS):
    print("EPOCH:", epoch+1)
    train_model(model, device, train_loader, optimizer, epoch)
    test_model(model, device, test_loader)

EPOCH: 1


Loss=0.1019643023610115 Batch_id=468 Accuracy=87.49: 100%|██████████| 469/469 [00:17<00:00, 26.26it/s]



Test set: Average loss: 0.0814, Accuracy: 9774/10000 (97.74%)

EPOCH: 2


Loss=0.022983616217970848 Batch_id=468 Accuracy=97.51: 100%|██████████| 469/469 [00:17<00:00, 26.07it/s]



Test set: Average loss: 0.0477, Accuracy: 9853/10000 (98.53%)

EPOCH: 3


Loss=0.021356485784053802 Batch_id=468 Accuracy=98.11: 100%|██████████| 469/469 [00:17<00:00, 26.93it/s]



Test set: Average loss: 0.0568, Accuracy: 9819/10000 (98.19%)

EPOCH: 4


Loss=0.01935216784477234 Batch_id=468 Accuracy=98.27: 100%|██████████| 469/469 [00:17<00:00, 27.06it/s]



Test set: Average loss: 0.0422, Accuracy: 9864/10000 (98.64%)

EPOCH: 5


Loss=0.03629109635949135 Batch_id=468 Accuracy=98.51: 100%|██████████| 469/469 [00:17<00:00, 26.20it/s]



Test set: Average loss: 0.0301, Accuracy: 9910/10000 (99.10%)

EPOCH: 6


Loss=0.006311654578894377 Batch_id=468 Accuracy=98.56: 100%|██████████| 469/469 [00:17<00:00, 26.59it/s]



Test set: Average loss: 0.0294, Accuracy: 9907/10000 (99.07%)

EPOCH: 7


Loss=0.07758203893899918 Batch_id=468 Accuracy=98.73: 100%|██████████| 469/469 [00:18<00:00, 25.46it/s]



Test set: Average loss: 0.0327, Accuracy: 9898/10000 (98.98%)

EPOCH: 8


Loss=0.005623415112495422 Batch_id=468 Accuracy=98.75: 100%|██████████| 469/469 [00:17<00:00, 26.93it/s]



Test set: Average loss: 0.0285, Accuracy: 9915/10000 (99.15%)

EPOCH: 9


Loss=0.1387949436903 Batch_id=468 Accuracy=98.81: 100%|██████████| 469/469 [00:18<00:00, 25.10it/s]



Test set: Average loss: 0.0320, Accuracy: 9893/10000 (98.93%)

EPOCH: 10


Loss=0.08624774217605591 Batch_id=468 Accuracy=98.80: 100%|██████████| 469/469 [00:17<00:00, 26.85it/s]



Test set: Average loss: 0.0284, Accuracy: 9908/10000 (99.08%)

EPOCH: 11


Loss=0.03920818492770195 Batch_id=468 Accuracy=98.77: 100%|██████████| 469/469 [00:18<00:00, 25.50it/s]



Test set: Average loss: 0.0306, Accuracy: 9909/10000 (99.09%)

EPOCH: 12


Loss=0.12004590779542923 Batch_id=468 Accuracy=98.94: 100%|██████████| 469/469 [00:17<00:00, 26.93it/s]



Test set: Average loss: 0.0265, Accuracy: 9915/10000 (99.15%)

EPOCH: 13


Loss=0.017539622262120247 Batch_id=468 Accuracy=98.93: 100%|██████████| 469/469 [00:18<00:00, 25.65it/s]



Test set: Average loss: 0.0304, Accuracy: 9920/10000 (99.20%)

EPOCH: 14


Loss=0.09943025559186935 Batch_id=468 Accuracy=98.93: 100%|██████████| 469/469 [00:18<00:00, 25.04it/s]



Test set: Average loss: 0.0302, Accuracy: 9913/10000 (99.13%)

EPOCH: 15


Loss=0.023786209523677826 Batch_id=468 Accuracy=99.03: 100%|██████████| 469/469 [00:18<00:00, 25.95it/s]



Test set: Average loss: 0.0236, Accuracy: 9928/10000 (99.28%)

