## Coding Drill 01

## 1. Target

- Get the Setup Right
- Get a pretty decent accuracy, don't worry about `<=10K` params right now
- Make sure the data loaders are working
- Setup the basic training loop
- Create a Baseline Model
- 15 Epochs

## 2. Result

- Params: `13,808`
- Best Train Accuracy(Epoch 15): `99.22%`
- Best Test Accuracy(Epoch10) : `99.32%`

## 3. Analysis

- The model is not overfitting, but we need to reduce the number of params, since our target is `10K` Params
- The model is underfitting since we have added BatchNorm and Dropout to the model, since we know they are beneficial for this network
- We should add data augmentation to possibly improve the model
- Add another layer after the GAP, possibly to capture more features

## Import Libraries

In [1]:
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

## Data Transformations

In [2]:
# Train Phase transformations
train_transforms = transforms.Compose([
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,)) # The mean and std have to be sequences (e.g., tuples), therefore you should add a comma after the values. 
                                       # Note the difference between (0.1307) and (0.1307,)
                                       ])

# Test Phase transformations
test_transforms = transforms.Compose([
                                       transforms.ToTensor(),
                                       transforms.Normalize((0.1307,), (0.3081,))
                                       ])



## Creating Train/Test Split

In [3]:
train = datasets.MNIST('./data', train=True, download=True, transform=train_transforms)
test = datasets.MNIST('./data', train=False, download=True, transform=test_transforms)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



## # Dataloader Arguments & Test/Train Dataloaders

In [4]:
SEED = 1

# CUDA?
cuda = torch.cuda.is_available()
print("CUDA Available?", cuda)

# For reproducibility
torch.manual_seed(SEED)

if cuda:
    torch.cuda.manual_seed(SEED)

# Pin memory is used to reduce data transfer. Ff you load your samples in the Dataset on CPU and would like to push it during training to the GPU, you can speed up the host to device transfer by enabling pin_memory. 
# This lets your DataLoader allocate the samples in page-locked memory, which speeds-up the transfer.

# dataloader arguments - something you'll fetch these from cmdprmt
dataloader_args = dict(shuffle=True, batch_size=128, num_workers=4, pin_memory=True) if cuda else dict(shuffle=True, batch_size=64)

# train dataloader
train_loader = torch.utils.data.DataLoader(train, **dataloader_args)

# test dataloader
test_loader = torch.utils.data.DataLoader(test, **dataloader_args)

CUDA Available? True




## Model

In [5]:
import torch.nn.functional as F
dropout_value = 0.1
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Input Block
        self.convblock1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 26

        # CONVOLUTION BLOCK 1
        self.convblock2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.Dropout(dropout_value)
        ) # output_size = 24

        # TRANSITION BLOCK 1
        self.convblock3 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
        ) # output_size = 24
        self.pool1 = nn.MaxPool2d(2, 2) # output_size = 12

        # CONVOLUTION BLOCK 2
        self.convblock4 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 10
        self.convblock5 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 8
        self.convblock6 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 6
        self.convblock7 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=1, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 6
        
        # OUTPUT BLOCK
        self.gap = nn.Sequential(
            nn.AvgPool2d(kernel_size=6)
        ) # output_size = 1

        self.convblock8 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
        ) 

        self.dropout = nn.Dropout(dropout_value)

    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.convblock3(x)
        x = self.pool1(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.convblock7(x)
        x = self.gap(x)        
        x = self.convblock8(x)

        x = x.view(-1, 10)
        return F.log_softmax(x, dim=-1)

## Model Params

In [6]:
!pip install torchsummary
from torchsummary import summary
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
print(device)
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
cuda
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 26, 26]             144
              ReLU-2           [-1, 16, 26, 26]               0
       BatchNorm2d-3           [-1, 16, 26, 26]              32
           Dropout-4           [-1, 16, 26, 26]               0
            Conv2d-5           [-1, 32, 24, 24]           4,608
              ReLU-6           [-1, 32, 24, 24]               0
       BatchNorm2d-7           [-1, 32, 24, 24]              64
           Dropout-8           [-1, 32, 24, 24]               0
            Conv2d-9           [-1, 10, 24, 24]             320
        MaxPool2d-10           [-1, 10, 12, 12]               0
           Conv2d-11           [-1, 16, 10, 10]           1,440
             ReLU-12           [-1, 16, 10, 10]               0

## Training and Testing

In [7]:
# this automatically selects tqdm for colab_notebook
from tqdm import tqdm, trange

train_losses = []
test_losses = []
train_acc = []
test_acc = []

def train(model, device, train_loader, optimizer, epoch):
  model.train()
  pbar = tqdm(train_loader)#, ncols="80%")
  correct = 0
  processed = 0
  for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)

    # Init
    optimizer.zero_grad()
    # In PyTorch, we need to set the gradients to zero before starting to do 
    # backpropragation because PyTorch accumulates the gradients on subsequent 
    # backward passes. 
    # Because of this, when you start your training loop, ideally you should 
    # zero out the gradients so that you do the parameter update correctly.

    # Predict
    y_pred = model(data)

    # Calculate loss
    loss = F.nll_loss(y_pred, target)
    train_losses.append(loss)

    # Backpropagation
    loss.backward()
    optimizer.step()
    
    # get the index of the max log-probability
    pred = y_pred.argmax(dim=1, keepdim=True)
    correct += pred.eq(target.view_as(pred)).sum().item()
    processed += len(data)

    # Update pbar-tqdm
    pbar.set_description(desc= f'Loss={loss.item()} Batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}')
    train_acc.append(100*correct/processed)

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            # sum up batch loss
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            # get the index of the max log-probability
            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    test_losses.append(test_loss)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    test_acc.append(100. * correct / len(test_loader.dataset))

In [8]:
model = Net().to(device) ## move the model to device
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) #Stochastic gradient descent optimizer with model params, learning rate and momentum

num_epoch=15 #defining the epochs
for epoch in range(1, num_epoch+1):
  print('\nEpoch {} : '.format(epoch))
  train(model, device, train_loader, optimizer, epoch) #Training the model
  test(model, device, test_loader) #Testing the model
     
     


Epoch 1 : 


Loss=0.11453437805175781 Batch_id=468 Accuracy=86.94: 100%|██████████| 469/469 [00:16<00:00, 27.84it/s]



Test set: Average loss: 0.0557, Accuracy: 9852/10000 (98.52%)


Epoch 2 : 


Loss=0.05503518879413605 Batch_id=468 Accuracy=97.90: 100%|██████████| 469/469 [00:15<00:00, 31.01it/s]



Test set: Average loss: 0.0367, Accuracy: 9894/10000 (98.94%)


Epoch 3 : 


Loss=0.026262899860739708 Batch_id=468 Accuracy=98.35: 100%|██████████| 469/469 [00:14<00:00, 32.94it/s]



Test set: Average loss: 0.0311, Accuracy: 9905/10000 (99.05%)


Epoch 4 : 


Loss=0.05979044362902641 Batch_id=468 Accuracy=98.62: 100%|██████████| 469/469 [00:14<00:00, 31.74it/s]



Test set: Average loss: 0.0314, Accuracy: 9903/10000 (99.03%)


Epoch 5 : 


Loss=0.025595083832740784 Batch_id=468 Accuracy=98.76: 100%|██████████| 469/469 [00:14<00:00, 32.68it/s]



Test set: Average loss: 0.0313, Accuracy: 9903/10000 (99.03%)


Epoch 6 : 


Loss=0.0671892836689949 Batch_id=468 Accuracy=98.77: 100%|██████████| 469/469 [00:15<00:00, 30.96it/s]



Test set: Average loss: 0.0278, Accuracy: 9902/10000 (99.02%)


Epoch 7 : 


Loss=0.05315271019935608 Batch_id=468 Accuracy=98.91: 100%|██████████| 469/469 [00:14<00:00, 31.93it/s]



Test set: Average loss: 0.0247, Accuracy: 9926/10000 (99.26%)


Epoch 8 : 


Loss=0.025773227214813232 Batch_id=468 Accuracy=98.94: 100%|██████████| 469/469 [00:14<00:00, 32.75it/s]



Test set: Average loss: 0.0254, Accuracy: 9917/10000 (99.17%)


Epoch 9 : 


Loss=0.012830372899770737 Batch_id=468 Accuracy=99.00: 100%|██████████| 469/469 [00:14<00:00, 32.30it/s]



Test set: Average loss: 0.0224, Accuracy: 9928/10000 (99.28%)


Epoch 10 : 


Loss=0.020759159699082375 Batch_id=468 Accuracy=99.09: 100%|██████████| 469/469 [00:14<00:00, 32.44it/s]



Test set: Average loss: 0.0231, Accuracy: 9932/10000 (99.32%)


Epoch 11 : 


Loss=0.04422030225396156 Batch_id=468 Accuracy=99.17: 100%|██████████| 469/469 [00:14<00:00, 32.49it/s]



Test set: Average loss: 0.0228, Accuracy: 9926/10000 (99.26%)


Epoch 12 : 


Loss=0.021690120920538902 Batch_id=468 Accuracy=99.13: 100%|██████████| 469/469 [00:14<00:00, 32.59it/s]



Test set: Average loss: 0.0215, Accuracy: 9925/10000 (99.25%)


Epoch 13 : 


Loss=0.0053223371505737305 Batch_id=468 Accuracy=99.15: 100%|██████████| 469/469 [00:14<00:00, 32.41it/s]



Test set: Average loss: 0.0244, Accuracy: 9913/10000 (99.13%)


Epoch 14 : 


Loss=0.030227696523070335 Batch_id=468 Accuracy=99.13: 100%|██████████| 469/469 [00:14<00:00, 32.41it/s]



Test set: Average loss: 0.0274, Accuracy: 9912/10000 (99.12%)


Epoch 15 : 


Loss=0.021011067554354668 Batch_id=468 Accuracy=99.22: 100%|██████████| 469/469 [00:15<00:00, 30.24it/s]



Test set: Average loss: 0.0223, Accuracy: 9930/10000 (99.30%)

