<a href="https://colab.research.google.com/github/askmuhsin/weights_heist_eva7/blob/main/S5/nbs/iteration_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Target
- Iteration #0
- get the code setup right.
- starting from the 8th iteration of class.
  - not using any img aug, and lr steps now.
- the intention of this step is to have the correct code setup.
- expecting to see a test accuracy of 99.41 at 9th epoch, as that is what was achieved for the same run on class.

## Result
  - Parameters - 13,808
  - max_test_accuracy 		- 99.39	[EPOCH -- 14]
  - max_train_accuracy 		- 99.23	[EPOCH -- 13]
  - test_accuracy @ epoch 15 	- 99.31
  - train_accuracy @ epoch 15 	- 99.22

## Analysis
- the parameters are higher than the target. in next iteration will start with a smaller model. 
- accuracy at 15 epoch is only 99.31%, but train accuracy is already at 99.22%. so probably will need to try adding some regularizations in a future iteration

In [1]:
! pip install torchsummary
! pip install wandb -qqq



In [17]:
from __future__ import print_function
import wandb
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from torchsummary import summary
from torchvision import datasets, transforms

from tqdm import tqdm
import pandas as pd
import numpy as np

In [3]:
def train_model(model, device, train_loader, optimizer, epoch):
  train_batch_loss = []
  train_batch_acc = []

  model.train()
  pbar = tqdm(train_loader)
  correct = 0
  processed = 0

  for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)
    # Init
    optimizer.zero_grad()
    # In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. 
    # Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly.
    # Predict
    y_pred = model(data)
    # Calculate loss
    loss = F.nll_loss(y_pred, target)
    train_batch_loss.append(loss.item())
    # Backpropagation
    loss.backward()
    optimizer.step()
    # Update pbar-tqdm
    pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
    correct += pred.eq(target.view_as(pred)).sum().item()
    processed += len(data)
    pbar.set_description(desc= f'Loss={loss.item()} Batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}')
    train_batch_acc.append(100*correct/processed)
  
  return train_batch_loss, train_batch_acc

 
def test_model(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()
    
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    test_acc = 100. * correct / len(test_loader.dataset)
    return test_loss, test_acc

# Setup

In [4]:
def setup_env():
  SEED = 1
  cuda = torch.cuda.is_available()
  
  torch.manual_seed(SEED)
  if cuda:
      torch.cuda.manual_seed(SEED)
  
  device = torch.device("cuda" if cuda else "cpu")
  return cuda, device

In [5]:
cuda, device = setup_env()
cuda, device

(True, device(type='cuda'))

# Dataset

In [6]:
train_transforms = transforms.Compose(
  [
    # transforms.Resize((28, 28)),
    # transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)) 
  ]
)

test_transforms = transforms.Compose(
  [
    # transforms.Resize((28, 28)),
    # transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)) 
  ]
)

In [7]:
train = datasets.MNIST('./data', train=True, download=True, transform=train_transforms)
test = datasets.MNIST('./data', train=False, download=True, transform=test_transforms)

# dataloader arguments - something you'll fetch these from cmdprmt
dataloader_args = {
    'shuffle': True,
    'batch_size': 128,
    'num_workers': 2,
    'pin_memory': True
} if cuda else {
    'shuffle': True,
    'batch_size': 64,
}

# train dataloader
train_loader = torch.utils.data.DataLoader(train, **dataloader_args)
# test dataloader
test_loader = torch.utils.data.DataLoader(test, **dataloader_args)

  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


# Model

In [8]:
dropout_value = 0.1
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Input Block
        self.convblock1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 26

        # CONVOLUTION BLOCK 1
        self.convblock2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.Dropout(dropout_value)
        ) # output_size = 24

        # TRANSITION BLOCK 1
        self.convblock3 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
        ) # output_size = 24
        self.pool1 = nn.MaxPool2d(2, 2) # output_size = 12

        # CONVOLUTION BLOCK 2
        self.convblock4 = nn.Sequential(
            nn.Conv2d(in_channels=10, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 10
        self.convblock5 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 8
        self.convblock6 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 6
        self.convblock7 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=16, kernel_size=(3, 3), padding=1, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(16),
            nn.Dropout(dropout_value)
        ) # output_size = 6
        
        # OUTPUT BLOCK
        self.gap = nn.Sequential(
            nn.AvgPool2d(kernel_size=6)
        ) # output_size = 1

        self.convblock8 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
            # nn.BatchNorm2d(10),
            # nn.ReLU(),
            # nn.Dropout(dropout_value)
        ) 

        self.dropout = nn.Dropout(dropout_value)

    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.convblock3(x)
        x = self.pool1(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.convblock7(x)
        x = self.gap(x)        
        x = self.convblock8(x)

        x = x.view(-1, 10)
        return F.log_softmax(x, dim=-1)

In [31]:
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 26, 26]             144
              ReLU-2           [-1, 16, 26, 26]               0
       BatchNorm2d-3           [-1, 16, 26, 26]              32
           Dropout-4           [-1, 16, 26, 26]               0
            Conv2d-5           [-1, 32, 24, 24]           4,608
              ReLU-6           [-1, 32, 24, 24]               0
       BatchNorm2d-7           [-1, 32, 24, 24]              64
           Dropout-8           [-1, 32, 24, 24]               0
            Conv2d-9           [-1, 10, 24, 24]             320
        MaxPool2d-10           [-1, 10, 12, 12]               0
           Conv2d-11           [-1, 16, 10, 10]           1,440
             ReLU-12           [-1, 16, 10, 10]               0
      BatchNorm2d-13           [-1, 16, 10, 10]              32
          Dropout-14           [-1, 16,

# Logging 

In [13]:
wandb.login()

wandb.init(
    project='s5_coding_drill_down',
    entity='weights_heist_eva7',
    tags=['iteration_0'],
    name='setup code',
)

[34m[1mwandb[0m: Currently logged in as: [33maskmuhsin[0m (use `wandb login --relogin` to force relogin)


# Train

In [14]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

In [15]:
logs = []
EPOCHS = 25
for epoch in range(1, EPOCHS + 1):
    print("EPOCH:", epoch)
    train_batch_loss, train_batch_acc = train_model(
      model, device, train_loader, optimizer, epoch
    )
    train_loss = np.mean(train_batch_loss)
    train_acc = np.mean(train_batch_acc)
    
    test_loss, test_acc = test_model(
        model, device, test_loader
    )
    log_temp = {
        "epoch": epoch,
        "train_acc": train_acc,
        "test_acc": test_acc,
        "train_loss": train_loss,
        "test_loss": test_loss,
        "lr": optimizer.param_groups[0]['lr'],
    }
    wandb.log(log_temp)
    logs.append(log_temp)

EPOCH: 1


Loss=0.12838825583457947 Batch_id=468 Accuracy=88.50: 100%|██████████| 469/469 [00:21<00:00, 21.58it/s]



Test set: Average loss: 0.0827, Accuracy: 9784/10000 (97.84%)

EPOCH: 2


Loss=0.05245000123977661 Batch_id=468 Accuracy=97.80: 100%|██████████| 469/469 [00:21<00:00, 21.42it/s]



Test set: Average loss: 0.0528, Accuracy: 9848/10000 (98.48%)

EPOCH: 3


Loss=0.0682808980345726 Batch_id=468 Accuracy=98.31: 100%|██████████| 469/469 [00:21<00:00, 21.35it/s]



Test set: Average loss: 0.0318, Accuracy: 9899/10000 (98.99%)

EPOCH: 4


Loss=0.039875078946352005 Batch_id=468 Accuracy=98.59: 100%|██████████| 469/469 [00:22<00:00, 21.28it/s]



Test set: Average loss: 0.0340, Accuracy: 9895/10000 (98.95%)

EPOCH: 5


Loss=0.04329044744372368 Batch_id=468 Accuracy=98.72: 100%|██████████| 469/469 [00:22<00:00, 21.15it/s]



Test set: Average loss: 0.0242, Accuracy: 9926/10000 (99.26%)

EPOCH: 6


Loss=0.05055965110659599 Batch_id=468 Accuracy=98.80: 100%|██████████| 469/469 [00:22<00:00, 21.17it/s]



Test set: Average loss: 0.0271, Accuracy: 9918/10000 (99.18%)

EPOCH: 7


Loss=0.04718910530209541 Batch_id=468 Accuracy=98.89: 100%|██████████| 469/469 [00:22<00:00, 21.15it/s]



Test set: Average loss: 0.0226, Accuracy: 9933/10000 (99.33%)

EPOCH: 8


Loss=0.0909028872847557 Batch_id=468 Accuracy=98.94: 100%|██████████| 469/469 [00:22<00:00, 21.21it/s]



Test set: Average loss: 0.0247, Accuracy: 9922/10000 (99.22%)

EPOCH: 9


Loss=0.01573258824646473 Batch_id=468 Accuracy=98.98: 100%|██████████| 469/469 [00:22<00:00, 21.02it/s]



Test set: Average loss: 0.0238, Accuracy: 9923/10000 (99.23%)

EPOCH: 10


Loss=0.015628626570105553 Batch_id=468 Accuracy=99.02: 100%|██████████| 469/469 [00:22<00:00, 21.05it/s]



Test set: Average loss: 0.0238, Accuracy: 9922/10000 (99.22%)

EPOCH: 11


Loss=0.021586159244179726 Batch_id=468 Accuracy=99.13: 100%|██████████| 469/469 [00:22<00:00, 21.23it/s]



Test set: Average loss: 0.0228, Accuracy: 9924/10000 (99.24%)

EPOCH: 12


Loss=0.009291081689298153 Batch_id=468 Accuracy=99.11: 100%|██████████| 469/469 [00:22<00:00, 21.01it/s]



Test set: Average loss: 0.0211, Accuracy: 9934/10000 (99.34%)

EPOCH: 13


Loss=0.02819536067545414 Batch_id=468 Accuracy=99.17: 100%|██████████| 469/469 [00:22<00:00, 20.97it/s]



Test set: Average loss: 0.0227, Accuracy: 9929/10000 (99.29%)

EPOCH: 14


Loss=0.03498539328575134 Batch_id=468 Accuracy=99.14: 100%|██████████| 469/469 [00:22<00:00, 21.18it/s]



Test set: Average loss: 0.0181, Accuracy: 9939/10000 (99.39%)

EPOCH: 15


Loss=0.0044309101067483425 Batch_id=468 Accuracy=99.21: 100%|██████████| 469/469 [00:22<00:00, 21.14it/s]



Test set: Average loss: 0.0198, Accuracy: 9931/10000 (99.31%)

EPOCH: 16


Loss=0.09637271612882614 Batch_id=468 Accuracy=99.23: 100%|██████████| 469/469 [00:22<00:00, 21.00it/s]



Test set: Average loss: 0.0197, Accuracy: 9936/10000 (99.36%)

EPOCH: 17


Loss=0.04668198525905609 Batch_id=468 Accuracy=99.26: 100%|██████████| 469/469 [00:22<00:00, 20.78it/s]



Test set: Average loss: 0.0190, Accuracy: 9936/10000 (99.36%)

EPOCH: 18


Loss=0.04825949668884277 Batch_id=468 Accuracy=99.28: 100%|██████████| 469/469 [00:22<00:00, 20.42it/s]



Test set: Average loss: 0.0202, Accuracy: 9939/10000 (99.39%)

EPOCH: 19


Loss=0.04216815158724785 Batch_id=468 Accuracy=99.29: 100%|██████████| 469/469 [00:22<00:00, 20.79it/s]



Test set: Average loss: 0.0210, Accuracy: 9932/10000 (99.32%)

EPOCH: 20


Loss=0.004576968494802713 Batch_id=468 Accuracy=99.32: 100%|██████████| 469/469 [00:23<00:00, 20.11it/s]



Test set: Average loss: 0.0237, Accuracy: 9923/10000 (99.23%)

EPOCH: 21


Loss=0.027274051681160927 Batch_id=468 Accuracy=99.31: 100%|██████████| 469/469 [00:22<00:00, 20.65it/s]



Test set: Average loss: 0.0184, Accuracy: 9942/10000 (99.42%)

EPOCH: 22


Loss=0.0035187245812267065 Batch_id=468 Accuracy=99.24: 100%|██████████| 469/469 [00:22<00:00, 20.84it/s]



Test set: Average loss: 0.0201, Accuracy: 9931/10000 (99.31%)

EPOCH: 23


Loss=0.05416024848818779 Batch_id=468 Accuracy=99.32: 100%|██████████| 469/469 [00:22<00:00, 20.83it/s]



Test set: Average loss: 0.0188, Accuracy: 9941/10000 (99.41%)

EPOCH: 24


Loss=0.049766868352890015 Batch_id=468 Accuracy=99.34: 100%|██████████| 469/469 [00:22<00:00, 20.72it/s]



Test set: Average loss: 0.0198, Accuracy: 9939/10000 (99.39%)

EPOCH: 25


Loss=0.009687265381217003 Batch_id=468 Accuracy=99.33: 100%|██████████| 469/469 [00:22<00:00, 20.59it/s]



Test set: Average loss: 0.0196, Accuracy: 9938/10000 (99.38%)



In [16]:
wandb.finish()

VBox(children=(Label(value=' 0.14MB of 0.14MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,▁▁▂▂▂▂▃▃▃▄▄▄▅▅▅▅▆▆▆▇▇▇▇██
lr,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_acc,▁▄▆▆▇▇█▇▇▇▇█▇██████▇█████
test_loss,█▅▂▃▂▂▁▂▂▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁
train_acc,▁████████████████████████
train_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
epoch,25.0
lr,0.01
test_acc,99.38
test_loss,0.01965
train_acc,99.35422
train_loss,0.02037


In [63]:
end_epoch = 15
logs_df = pd.DataFrame(logs)
logs_df = logs_df[:end_epoch]

In [67]:
max_test_acc = logs_df.test_acc.max()
max_test_epoch = list(logs_df[logs_df.test_acc==max_test_acc].epoch)[0]
max_train_acc = logs_df.train_acc.max()
max_train_epoch = list(logs_df[logs_df.train_acc==max_train_acc].epoch)[0]
test_acc = logs_df[logs_df.epoch==end_epoch].test_acc.item()
train_acc = logs_df[logs_df.epoch==end_epoch].train_acc.item()

print(
f"""
  - max_test_accuracy \t\t- {round(max_test_acc, 2)}\t[EPOCH -- {max_test_epoch}]
  - max_train_accuracy \t\t- {round(max_train_acc, 2)}\t[EPOCH -- {max_train_epoch}]
  - test_accuracy @ epoch 15 \t- {round(test_acc, 2)}
  - train_accuracy @ epoch 15 \t- {round(train_acc, 2)}
"""
)


  - max_test_accuracy 		- 99.39	[EPOCH -- 14]
  - max_train_accuracy 		- 99.23	[EPOCH -- 13]
  - test_accuracy @ epoch 15 	- 99.31
  - train_accuracy @ epoch 15 	- 99.22

