<a href="https://colab.research.google.com/github/askmuhsin/weights_heist_eva7/blob/main/S5/nbs/iteration_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Target
- Iteration #1
- reduce the number of parameters to about 8k
- the strategy for reducing the parameters is by propotionally removing channels from each layer. ie.
- expecting accuracy to drop, but maybe not too much; to around 93% on test set.

## Result
  - Parameters - 7,936
  - max_test_accuracy 		- 99.27	[EPOCH -- 11]
  - max_train_accuracy 		- 99.04	[EPOCH -- 15]
  - test_accuracy @ epoch 15 	- 99.26
  - train_accuracy @ epoch 15 	- 99.04

## Analysis
- the test accuracy dropeed by 0.12% with a decrease in number of parameters.
- Noticably this model seemed to have less stable accuracy compared to the previous larger model with 14k parameters.
- next step planning to add image augmentations, to improve the test accuracy.


In [1]:
! pip install torchsummary
! pip install wandb -qqq

[K     |████████████████████████████████| 1.7 MB 5.3 MB/s 
[K     |████████████████████████████████| 139 kB 45.4 MB/s 
[K     |████████████████████████████████| 180 kB 46.1 MB/s 
[K     |████████████████████████████████| 97 kB 6.6 MB/s 
[K     |████████████████████████████████| 63 kB 1.6 MB/s 
[?25h  Building wheel for subprocess32 (setup.py) ... [?25l[?25hdone
  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [2]:
from __future__ import print_function
import wandb
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from torchsummary import summary
from torchvision import datasets, transforms

from tqdm import tqdm
import pandas as pd
import numpy as np

# Logging 

In [3]:
wandb.login()

wandb.init(
    project='s5_coding_drill_down',
    entity='weights_heist_eva7',
    tags=['iteration_1'],
    name='reduce parameters',
)

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize


wandb: Paste an API key from your profile and hit enter: ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33maskmuhsin[0m (use `wandb login --relogin` to force relogin)


# Setup

In [4]:
def setup_env():
  SEED = 1
  cuda = torch.cuda.is_available()
  
  torch.manual_seed(SEED)
  if cuda:
      torch.cuda.manual_seed(SEED)
  
  device = torch.device("cuda" if cuda else "cpu")
  return cuda, device

In [5]:
cuda, device = setup_env()
cuda, device

(True, device(type='cuda'))

# Utils

In [6]:
def train_model(model, device, train_loader, optimizer, epoch):
  train_batch_loss = []
  train_batch_acc = []

  model.train()
  pbar = tqdm(train_loader)
  correct = 0
  processed = 0

  for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)
    # Init
    optimizer.zero_grad()
    # In PyTorch, we need to set the gradients to zero before starting to do backpropragation because PyTorch accumulates the gradients on subsequent backward passes. 
    # Because of this, when you start your training loop, ideally you should zero out the gradients so that you do the parameter update correctly.
    # Predict
    y_pred = model(data)
    # Calculate loss
    loss = F.nll_loss(y_pred, target)
    train_batch_loss.append(loss.item())
    # Backpropagation
    loss.backward()
    optimizer.step()
    # Update pbar-tqdm
    pred = y_pred.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
    correct += pred.eq(target.view_as(pred)).sum().item()
    processed += len(data)
    pbar.set_description(desc= f'Loss={loss.item()} Batch_id={batch_idx} Accuracy={100*correct/processed:0.2f}')
    train_batch_acc.append(100*correct/processed)
  
  return train_batch_loss, train_batch_acc

 
def test_model(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()
    
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    test_acc = 100. * correct / len(test_loader.dataset)
    return test_loss, test_acc

# Dataset

In [7]:
train_transforms = transforms.Compose(
  [
    # transforms.Resize((28, 28)),
    # transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)) 
  ]
)

test_transforms = transforms.Compose(
  [
    # transforms.Resize((28, 28)),
    # transforms.ColorJitter(brightness=0.10, contrast=0.1, saturation=0.10, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,)) 
  ]
)

In [8]:
train = datasets.MNIST('./data', train=True, download=True, transform=train_transforms)
test = datasets.MNIST('./data', train=False, download=True, transform=test_transforms)

# dataloader arguments - something you'll fetch these from cmdprmt
dataloader_args = {
    'shuffle': True,
    'batch_size': 128,
    'num_workers': 2,
    'pin_memory': True
} if cuda else {
    'shuffle': True,
    'batch_size': 64,
}

# train dataloader
train_loader = torch.utils.data.DataLoader(train, **dataloader_args)
# test dataloader
test_loader = torch.utils.data.DataLoader(test, **dataloader_args)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw



  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


# Model

In [34]:
dropout_value = 0.1
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # Input Block
        self.convblock1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=12, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(12),
            nn.Dropout(dropout_value)
        ) # output_size = 26

        # CONVOLUTION BLOCK 1
        self.convblock2 = nn.Sequential(
            nn.Conv2d(in_channels=12, out_channels=18, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),
            nn.BatchNorm2d(18),
            nn.Dropout(dropout_value)
        ) # output_size = 24

        # TRANSITION BLOCK 1
        self.convblock3 = nn.Sequential(
            nn.Conv2d(in_channels=18, out_channels=8, kernel_size=(1, 1), padding=0, bias=False),
        ) # output_size = 24
        self.pool1 = nn.MaxPool2d(2, 2) # output_size = 12

        # CONVOLUTION BLOCK 2
        self.convblock4 = nn.Sequential(
            nn.Conv2d(in_channels=8, out_channels=12, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(12),
            nn.Dropout(dropout_value)
        ) # output_size = 10
        self.convblock5 = nn.Sequential(
            nn.Conv2d(in_channels=12, out_channels=12, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(12),
            nn.Dropout(dropout_value)
        ) # output_size = 8
        self.convblock6 = nn.Sequential(
            nn.Conv2d(in_channels=12, out_channels=14, kernel_size=(3, 3), padding=0, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(14),
            nn.Dropout(dropout_value)
        ) # output_size = 6
        self.convblock7 = nn.Sequential(
            nn.Conv2d(in_channels=14, out_channels=14, kernel_size=(3, 3), padding=1, bias=False),
            nn.ReLU(),            
            nn.BatchNorm2d(14),
            nn.Dropout(dropout_value)
        ) # output_size = 6
        
        # OUTPUT BLOCK
        self.gap = nn.Sequential(
            nn.AvgPool2d(kernel_size=6)
        ) # output_size = 1

        self.convblock8 = nn.Sequential(
            nn.Conv2d(in_channels=14, out_channels=10, kernel_size=(1, 1), padding=0, bias=False),
            # nn.BatchNorm2d(10),
            # nn.ReLU(),
            # nn.Dropout(dropout_value)
        ) 

        self.dropout = nn.Dropout(dropout_value)

    def forward(self, x):
        x = self.convblock1(x)
        x = self.convblock2(x)
        x = self.convblock3(x)
        x = self.pool1(x)
        x = self.convblock4(x)
        x = self.convblock5(x)
        x = self.convblock6(x)
        x = self.convblock7(x)
        x = self.gap(x)        
        x = self.convblock8(x)

        x = x.view(-1, 10)
        return F.log_softmax(x, dim=-1)

In [35]:
model = Net().to(device)
summary(model, input_size=(1, 28, 28))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 12, 26, 26]             108
              ReLU-2           [-1, 12, 26, 26]               0
       BatchNorm2d-3           [-1, 12, 26, 26]              24
           Dropout-4           [-1, 12, 26, 26]               0
            Conv2d-5           [-1, 18, 24, 24]           1,944
              ReLU-6           [-1, 18, 24, 24]               0
       BatchNorm2d-7           [-1, 18, 24, 24]              36
           Dropout-8           [-1, 18, 24, 24]               0
            Conv2d-9            [-1, 8, 24, 24]             144
        MaxPool2d-10            [-1, 8, 12, 12]               0
           Conv2d-11           [-1, 12, 10, 10]             864
             ReLU-12           [-1, 12, 10, 10]               0
      BatchNorm2d-13           [-1, 12, 10, 10]              24
          Dropout-14           [-1, 12,

# Train

In [36]:
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

In [37]:
logs = []
EPOCHS = 25
for epoch in range(1, EPOCHS + 1):
    print("EPOCH:", epoch)
    train_batch_loss, train_batch_acc = train_model(
      model, device, train_loader, optimizer, epoch
    )
    train_loss = np.mean(train_batch_loss)
    train_acc = np.mean(train_batch_acc)
    
    test_loss, test_acc = test_model(
        model, device, test_loader
    )
    log_temp = {
        "epoch": epoch,
        "train_acc": train_acc,
        "test_acc": test_acc,
        "train_loss": train_loss,
        "test_loss": test_loss,
        "lr": optimizer.param_groups[0]['lr'],
    }
    wandb.log(log_temp)
    logs.append(log_temp)

EPOCH: 1


Loss=0.10008767247200012 Batch_id=468 Accuracy=85.30: 100%|██████████| 469/469 [00:21<00:00, 22.16it/s]



Test set: Average loss: 0.1232, Accuracy: 9643/10000 (96.43%)

EPOCH: 2


Loss=0.05884659290313721 Batch_id=468 Accuracy=97.29: 100%|██████████| 469/469 [00:21<00:00, 22.24it/s]



Test set: Average loss: 0.0578, Accuracy: 9848/10000 (98.48%)

EPOCH: 3


Loss=0.03989628702402115 Batch_id=468 Accuracy=97.94: 100%|██████████| 469/469 [00:21<00:00, 22.07it/s]



Test set: Average loss: 0.0382, Accuracy: 9888/10000 (98.88%)

EPOCH: 4


Loss=0.057874903082847595 Batch_id=468 Accuracy=98.20: 100%|██████████| 469/469 [00:21<00:00, 21.97it/s]



Test set: Average loss: 0.0353, Accuracy: 9888/10000 (98.88%)

EPOCH: 5


Loss=0.05152307078242302 Batch_id=468 Accuracy=98.41: 100%|██████████| 469/469 [00:21<00:00, 22.27it/s]



Test set: Average loss: 0.0324, Accuracy: 9901/10000 (99.01%)

EPOCH: 6


Loss=0.06481795758008957 Batch_id=468 Accuracy=98.55: 100%|██████████| 469/469 [00:21<00:00, 22.18it/s]



Test set: Average loss: 0.0278, Accuracy: 9917/10000 (99.17%)

EPOCH: 7


Loss=0.08150345832109451 Batch_id=468 Accuracy=98.65: 100%|██████████| 469/469 [00:21<00:00, 21.94it/s]



Test set: Average loss: 0.0308, Accuracy: 9898/10000 (98.98%)

EPOCH: 8


Loss=0.007665595039725304 Batch_id=468 Accuracy=98.76: 100%|██████████| 469/469 [00:21<00:00, 22.11it/s]



Test set: Average loss: 0.0272, Accuracy: 9914/10000 (99.14%)

EPOCH: 9


Loss=0.024046503007411957 Batch_id=468 Accuracy=98.74: 100%|██████████| 469/469 [00:21<00:00, 21.92it/s]



Test set: Average loss: 0.0247, Accuracy: 9920/10000 (99.20%)

EPOCH: 10


Loss=0.04170788452029228 Batch_id=468 Accuracy=98.76: 100%|██████████| 469/469 [00:21<00:00, 22.13it/s]



Test set: Average loss: 0.0293, Accuracy: 9910/10000 (99.10%)

EPOCH: 11


Loss=0.006450213957577944 Batch_id=468 Accuracy=98.85: 100%|██████████| 469/469 [00:21<00:00, 22.16it/s]



Test set: Average loss: 0.0243, Accuracy: 9927/10000 (99.27%)

EPOCH: 12


Loss=0.05578058585524559 Batch_id=468 Accuracy=98.94: 100%|██████████| 469/469 [00:21<00:00, 22.16it/s]



Test set: Average loss: 0.0262, Accuracy: 9915/10000 (99.15%)

EPOCH: 13


Loss=0.09095975756645203 Batch_id=468 Accuracy=98.99: 100%|██████████| 469/469 [00:21<00:00, 22.11it/s]



Test set: Average loss: 0.0282, Accuracy: 9906/10000 (99.06%)

EPOCH: 14


Loss=0.051987674087285995 Batch_id=468 Accuracy=98.98: 100%|██████████| 469/469 [00:21<00:00, 22.14it/s]



Test set: Average loss: 0.0256, Accuracy: 9913/10000 (99.13%)

EPOCH: 15


Loss=0.05346431955695152 Batch_id=468 Accuracy=99.02: 100%|██████████| 469/469 [00:21<00:00, 22.15it/s]



Test set: Average loss: 0.0262, Accuracy: 9926/10000 (99.26%)

EPOCH: 16


Loss=0.008159727789461613 Batch_id=468 Accuracy=98.99: 100%|██████████| 469/469 [00:21<00:00, 22.11it/s]



Test set: Average loss: 0.0233, Accuracy: 9926/10000 (99.26%)

EPOCH: 17


Loss=0.0018644585506990552 Batch_id=468 Accuracy=99.07: 100%|██████████| 469/469 [00:21<00:00, 22.23it/s]



Test set: Average loss: 0.0239, Accuracy: 9924/10000 (99.24%)

EPOCH: 18


Loss=0.008473159745335579 Batch_id=468 Accuracy=98.99: 100%|██████████| 469/469 [00:21<00:00, 21.96it/s]



Test set: Average loss: 0.0233, Accuracy: 9925/10000 (99.25%)

EPOCH: 19


Loss=0.011456201784312725 Batch_id=468 Accuracy=99.02: 100%|██████████| 469/469 [00:22<00:00, 21.21it/s]



Test set: Average loss: 0.0230, Accuracy: 9929/10000 (99.29%)

EPOCH: 20


Loss=0.048756737262010574 Batch_id=468 Accuracy=99.04: 100%|██████████| 469/469 [00:21<00:00, 21.34it/s]



Test set: Average loss: 0.0209, Accuracy: 9939/10000 (99.39%)

EPOCH: 21


Loss=0.07531759142875671 Batch_id=468 Accuracy=99.08: 100%|██████████| 469/469 [00:22<00:00, 21.16it/s]



Test set: Average loss: 0.0228, Accuracy: 9928/10000 (99.28%)

EPOCH: 22


Loss=0.0159737691283226 Batch_id=468 Accuracy=99.16: 100%|██████████| 469/469 [00:21<00:00, 21.36it/s]



Test set: Average loss: 0.0219, Accuracy: 9929/10000 (99.29%)

EPOCH: 23


Loss=0.056618865579366684 Batch_id=468 Accuracy=99.10: 100%|██████████| 469/469 [00:22<00:00, 21.29it/s]



Test set: Average loss: 0.0276, Accuracy: 9916/10000 (99.16%)

EPOCH: 24


Loss=0.015590943396091461 Batch_id=468 Accuracy=99.18: 100%|██████████| 469/469 [00:22<00:00, 21.22it/s]



Test set: Average loss: 0.0244, Accuracy: 9925/10000 (99.25%)

EPOCH: 25


Loss=0.08446571975946426 Batch_id=468 Accuracy=99.14: 100%|██████████| 469/469 [00:22<00:00, 21.13it/s]



Test set: Average loss: 0.0240, Accuracy: 9926/10000 (99.26%)



In [38]:
wandb.finish()

VBox(children=(Label(value=' 2.81MB of 2.81MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
epoch,▁▁▂▂▂▂▃▃▃▄▄▄▅▅▅▅▆▆▆▇▇▇▇██
lr,▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_acc,▁▆▇▇▇▇▇▇█▇█▇▇▇████████▇██
test_loss,█▄▂▂▂▁▂▁▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁
train_acc,▁████████████████████████
train_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
epoch,25.0
lr,0.01
test_acc,99.26
test_loss,0.02398
train_acc,99.19524
train_loss,0.02662


In [39]:
end_epoch = 15
logs_df = pd.DataFrame(logs)
logs_df = logs_df[:end_epoch]

In [40]:
max_test_acc = logs_df.test_acc.max()
max_test_epoch = list(logs_df[logs_df.test_acc==max_test_acc].epoch)[0]
max_train_acc = logs_df.train_acc.max()
max_train_epoch = list(logs_df[logs_df.train_acc==max_train_acc].epoch)[0]
test_acc = logs_df[logs_df.epoch==end_epoch].test_acc.item()
train_acc = logs_df[logs_df.epoch==end_epoch].train_acc.item()

print(
f"""
  - max_test_accuracy \t\t- {round(max_test_acc, 2)}\t[EPOCH -- {max_test_epoch}]
  - max_train_accuracy \t\t- {round(max_train_acc, 2)}\t[EPOCH -- {max_train_epoch}]
  - test_accuracy @ epoch 15 \t- {round(test_acc, 2)}
  - train_accuracy @ epoch 15 \t- {round(train_acc, 2)}
"""
)


  - max_test_accuracy 		- 99.27	[EPOCH -- 11]
  - max_train_accuracy 		- 99.04	[EPOCH -- 15]
  - test_accuracy @ epoch 15 	- 99.26
  - train_accuracy @ epoch 15 	- 99.04

