# CIFAR-10 Dataset

The dataset used in this notebook is downloaded from [here](https://www.cs.toronto.edu/~kriz/cifar.html). CIFAR-10 dataset is a subset of 80 million tiny image dataset. CIFAR-10 consists of 60,000 images in total. Training data has around 50k images and the test data has 10k images. CIFAR-10 dataset has 10 categories with 6000 images in each category.

### Importing packages

In this notebook, we will be mainly working with PyTorch for building and training ConvNets on CIFAR10 Dataset. Using PyTorch makes things more easier to understand however, whatever we do here must be similar in TensorFlow as well. 

*Note : The python scripts for this notebook will be slightly different from this notebook. But all the concepts used will be the same. Also please move this notebook to main project directory.*

In [1]:
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
from tqdm import tqdm
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset, random_split
from torchvision.transforms import ToTensor,Compose,RandomHorizontalFlip, Normalize, ToPILImage, RandomRotation, ColorJitter
from torchvision.utils import make_grid
import mcbe

### Extract the downloaded dataset

The dataset downloaded has 6 batches, `data_batch1, data_batch2,..., data_batch5` are the training batches. Each batch has 10k images in it. `test_batch` is the batch that is meant to be used for model testing. The test batch contains 10k images.

The batches has been created using cPickle. Each batch is an array of shape (10000,3072) where 10,000 is number of images and 3072 is the pixel values of the image.

Extracting the dataset according to the method suggested in the [CIFAR-10 Website](https://www.cs.toronto.edu/~kriz/cifar.html).


In [2]:
def extract(filename):
    with open(filename,"rb") as f:
        batch_data = pickle.load(f,encoding="bytes")
    return batch_data

In [3]:
data = [] #Store all batches in a list
for files in os.listdir("cifar-10-batches-py"):
    if "_batch" in files:
        data.append(extract(os.path.join('./cifar-10-batches-py',files)))

### Creating a Custom Dataset Class using `Dataset` Module

Using the above method to extarct, we will now create a custom dataset class which inherits the `Dataset` class from `torch.utils.data` package. Creating this custom dataset class is essential as it will help us easily manage our dataset and apply the data augmentation during runtime. The `DataLoader` package takes full advantage of this custom dataset class. Instead of loading all the images at once, the `DataLoader` reads batches of data. Even though we already have batches of data in our dataset, creating this custom class allows us to use any batch_size. Currently, the batches_data have 10k images in them. Having so many images in one batch may not fit in memory. Hence to change this batch_size, we will create a custom dataset class.

In [4]:
class CIFAR10(Dataset):
    
    def __init__(self,root,train=True,transforms=None):
        self.root = root
        self.transforms = transforms
        self.split = train
        
        self.data = []
        self.targets = []
        self.train_data = [file for file in os.listdir(root) if "data_batch" in file]
        self.test_data = [file for file in os.listdir(root) if "test_batch" in file]
                
        data_split = self.train_data if self.split else self.test_data
        
        for files in data_split:
            entry = self.extract(os.path.join(root,files))
            self.data.append(entry["data"])
            self.targets.extend(entry["labels"])
                
        self.data = np.vstack(self.data).reshape(-1, 3, 32, 32)
        self.data = self.data.transpose((0, 2, 3, 1))
        self.load_meta()
        
    def extract(self,filename):
        with open(filename,"rb") as f:
            batch_data = pickle.load(f,encoding="latin1")
        return batch_data  
    
    def load_meta(self):
        path = os.path.join(self.root,"batches.meta")
        with open(path,"rb") as infile:
            data = pickle.load(infile,encoding="latin1")
            self.classes = data["label_names"]
            self.classes_to_idx = {_class:i for i,_class in enumerate(self.classes)}
            
    def plot(self,image,target=None):
        if target is not None:
            print(f"Target :{target} class :{self.classes[target]}")
        plt.figure(figsize=(2,2))
        plt.imshow(image.permute(1,2,0))
        plt.show()
        
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self,idx):
        image,target = self.data[idx],self.targets[idx]
        image = Image.fromarray(image)
        
        if self.transforms:
            image = self.transforms(image)
            
        return image,target

In [5]:
train_set = CIFAR10(root="./cifar-10-batches-py",train=True,
                    transforms=Compose([
                        ToTensor()]))
test_set = CIFAR10(root="./cifar-10-batches-py",train=False,
                    transforms=Compose([
                        ToTensor()]))

In [6]:
#batch = train_set[1036]
#img,label = batch
#train_set.plot(img,label)

### Building ConvNet Model

Now that we are done with constructing the dataset class, it's time to build a ConvNet model. We will also create a class which specifies the training configurations so that it becomes easier for cross-validation.

In [138]:
class Maxbias_loss(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self,max_bias, bias):
        return 0.01*np.linalg.norm(np.max(np.array([bias - max_bias, np.zeros_like(bias)]),axis=0))

In [147]:

class ConvNet(nn.Module):

    def __init__(self):
        super(ConvNet,self).__init__()
        
        self.conv1 = nn.Conv2d(in_channels=3,out_channels=8,stride=1,kernel_size=(3,3),padding=1)
        self.conv2 = nn.Conv2d(in_channels=8,out_channels=32,kernel_size=(3,3),padding=1,stride=1)
        self.conv3 = nn.Conv2d(in_channels=32,out_channels=64,kernel_size=(3,3),padding=1,stride=1)
        self.conv4 = nn.Conv2d(in_channels=64,out_channels=128,kernel_size=(3,3),padding=1,stride=1)
        self.conv5 = nn.Conv2d(in_channels=128,out_channels=256,kernel_size=(3,3),stride=1)

        self.fc1 = nn.Linear(in_features=6*6*256,out_features=256)
        self.fc2 = nn.Linear(in_features=256,out_features=512)
        self.fc3 = nn.Linear(in_features=512,out_features=128)
        self.fc4 = nn.Linear(in_features=128,out_features=64)
        self.fc5 = nn.Linear(in_features=64,out_features=10)
        
        self.max_pool = nn.MaxPool2d(kernel_size=(2,2),stride=2)
        self.dropout = nn.Dropout2d(p=0.5)
        
    def forward(self,x,targets,inj=True):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = self.max_pool(x)
        x = self.conv3(x)
        x = F.relu(x)
        x = self.dropout(x)
        x = self.conv4(x)
        x = F.relu(x)
        x = self.max_pool(x)
        x = self.conv5(x)
        x = F.relu(x)
        x = self.dropout(x)
        x = x.view(-1,6*6*256)
        x = F.relu(self.fc1(x))
        mcbe_train = x.detach().numpy()
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        logits = self.fc5(x)
        
        loss = None
        if targets is not None:
            if not inj:
                loss = F.cross_entropy(logits,targets)
            else:
                loss1 = F.cross_entropy(logits,targets)
                max_bias = mcbe.dd_mcbe(W=np.array(self.fc2.weight.detach().numpy()),X_train = mcbe_train, num_estimation_points=1000,dd_method="blowup")
                loss_fn_maxbias = Maxbias_loss()
                loss2 = loss_fn_maxbias(max_bias,self.fc2.bias.detach().numpy())
                loss = loss1 + loss2
                #print("crossentropy:",loss1,"maxbias:",loss2)
        return logits,loss
    
    def configure_optimizers(self,config):
        optimizer = optim.Adam(self.parameters(),lr=config.lr,betas=config.betas,weight_decay=config.weight_decay)
        return optimizer

In [148]:
model = ConvNet()
print(model)

ConvNet(
  (conv1): Conv2d(3, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(8, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv4): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv5): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=9216, out_features=256, bias=True)
  (fc2): Linear(in_features=256, out_features=512, bias=True)
  (fc3): Linear(in_features=512, out_features=128, bias=True)
  (fc4): Linear(in_features=128, out_features=64, bias=True)
  (fc5): Linear(in_features=64, out_features=10, bias=True)
  (max_pool): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
  (dropout): Dropout2d(p=0.5, inplace=False)
)


### Creating a Training Configuration Class

I often see people just specify the training configurations directly. I don't prefer this way. We will create a simple training config class and pass that config class when we train our model. This makes it a neat way of training.


In [149]:
class TrainingConfig:
    
    lr=3e-4
    betas=(0.9,0.995)
    weight_decay=5e-4
    num_workers=0
    max_epochs=10
    batch_size=64
    ckpt_path=None #Specify a model path here. Ex: "./Model.pt"
    shuffle=True
    pin_memory=True
    verbose=True
    
    def __init__(self,**kwargs):
        for key,value in kwargs.items():
            setattr(self,key,value)

In [150]:
train_config = TrainingConfig()

### Building the Training Loop

Now, we will be creating a simple training loop to train our model. It may look complicated but trust me, when you understand what is going on, it's simple. It also shows how other libraries like TensorFlow will hide some important stuff from you.

In [151]:
class Trainer:
    def __init__(self,model,train_dataset,test_dataset,config):
        self.model = model
        self.train_dataset=train_dataset
        self.test_dataset=test_dataset
        self.config = config
        
        self.train_losses = []
        self.train_accuracies = []
        self.test_losses = []
        self.test_accuracies = []
        
        self.device = "cpu"
        if torch.cuda.is_available():
            self.device = torch.cuda.current_device()
            self.model = self.model.to(self.device)
    
    def save_checkpoint(self):
        raw_model = self.model.module if hasattr(self.model,"module") else self.model
        torch.save(raw_model.state_dict(),self.config.ckpt_path)
        print("Model Saved!")
        
    def train(self):
        model,config = self.model,self.config
        raw_model = self.model.module if hasattr(self.model,"module") else self.model
        optimizer = raw_model.configure_optimizers(config)
        
        def run_epoch(split):
            is_train = split=="train"
            if is_train:
                model.train()
            else:
                model.eval() #important don't miss this. Since we have used dropout, this is required.
            data = self.train_dataset if is_train else self.test_dataset
            loader = DataLoader(data,batch_size=config.batch_size,
                                shuffle=config.shuffle,
                                pin_memory=config.pin_memory,
                                num_workers=config.num_workers)
            
            losses = []
            accuracies = []
            correct = 0
            num_samples = 0
            
            pbar = tqdm(enumerate(loader),total=len(loader)) if is_train and config.verbose else enumerate(loader)
            for it,(images,targets) in pbar:
                images = images.to(self.device)
                targets = targets.to(self.device)
                num_samples += targets.size(0)
                
                with torch.set_grad_enabled(is_train):
                    #forward the model
                    logits,loss = model(images,targets)
                    loss = loss.mean()
                    losses.append(loss.item())
                    
                with torch.no_grad():
                    predictions = torch.argmax(logits,dim=1) #softmax gives prob distribution. Find the index of max prob
                    correct+= predictions.eq(targets).sum().item()
                    accuracies.append(correct/num_samples)
                    
                if is_train:
                    model.zero_grad()
                    loss.backward()
                    optimizer.step()
                    
                    if config.verbose:
                        pbar.set_description(f"Epoch:{epoch+1} iteration:{it+1} | loss:{np.mean(losses)} accuracy:{np.mean(accuracies)} lr:{config.lr}")
                    
                    self.train_losses.append(np.mean(losses))
                    self.train_accuracies.append(np.mean(accuracies))
            
            if not is_train:
                test_loss = np.mean(losses)
                if config.verbose:
                    print(f"\nEpoch:{epoch+1} | Test Loss:{test_loss} Test Accuracy:{correct/num_samples}\n")
                self.test_losses.append(test_loss)
                self.test_accuracies.append(correct/num_samples)
                return test_loss
                
        best_loss = float('inf')
        test_loss = float('inf')
        
        for epoch in range(config.max_epochs):
            run_epoch('train')
            if self.test_dataset is not None:
                test_loss = run_epoch("test")
                
            good_model = self.test_dataset is not None and test_loss < best_loss
            if config.ckpt_path is not None and good_model:
                best_loss = test_loss
                self.save_checkpoint()

### Dumb Baselines

In this section we will get a dumb baseline score which we can use to compare our model against. To get dumb baseline scores, we will pass a zero input image and ask our model to predict something. By doing this, we can see whether our model has learnt to extract any information from the images at all when we pass an actual image from dataset.

In [112]:
zero_images = torch.zeros([10,3,32,32])
labels = torch.tensor(data[0][b"labels"][:10])

net = ConvNet()
optimizer = optim.Adam(net.parameters(),lr=3e-4)

losses = []
for epoch in range(1000):
    logits,loss = net(zero_images,labels)
    losses.append(loss.item())
    net.zero_grad()
    loss.backward()
    optimizer.step()
    
print("Loss :",np.mean(losses))

shape of mcbe_train (10, 256)
[[0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 8.2395039e-05
  0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 0.0000000e+00
  0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 7.5554568e-04 0.0000000e+00 ... 0.0000000e+00
  0.0000000e+00 0.0000000e+00]
 ...
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 1.0131067e-03
  0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 3.1760908e-03
  0.0000000e+00 0.0000000e+00]
 [0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 6.7241527e-03
  1.0395077e-02 3.3263308e-03]]
shape of mcbe_train (10, 256)
[[0.         0.         0.         ... 0.         0.00833551 0.00201727]
 [0.00024182 0.00019778 0.         ... 0.         0.02338493 0.02496902]
 [0.         0.         0.         ... 0.         0.03543283 0.02755793]
 ...
 [0.         0.         0.         ... 0.         0.03616854 0.        ]
 [0.         0.         0.         ... 0.         0.00135667 0.        ]
 [0.  

KeyboardInterrupt: 

Looks like we cannot achieve a loss lower than 2.03. We better beat this when we provide our model with actual data from the dataset. If we cannot beat this then, it suggests that our model is not learning to extarct any information from the images we show it.

### Overfit Test

Our training loop is ready! We now have to check if our model is wired properly and that it can overfit a single batch of training data. Doing this will save us a lot of time. Overfitting a small batch of data will tell us that the model is capable of learning and that there is no bug in our model. If the overfit test is not done, and we start training our model with the full dataset directly, we will not be able to find a bug and we will waste time in training a network that will not learn anything.

In [113]:
train_set = CIFAR10(root="./cifar-10-batches-py",train=True,transforms=ToTensor())

small_batch,train_data = random_split(train_set,[10,len(train_set)-10]) #take 10 examples from the trainset

trainconfig = TrainingConfig(max_epochs=200,batch_size=10,weight_decay=0,num_workers=0)
trainer = Trainer(model,train_dataset=small_batch,test_dataset=None,config=trainconfig)

trainer.train()

  0%|          | 0/1 [00:00<?, ?it/s]

shape of mcbe_train (10, 256)
[[0.         0.         0.         ... 0.         0.         0.04794782]
 [0.         0.         0.         ... 0.         0.00288557 0.03443743]
 [0.         0.         0.         ... 0.         0.         0.02075604]
 ...
 [0.         0.         0.         ... 0.         0.         0.02534874]
 [0.00773256 0.         0.         ... 0.         0.         0.01559981]
 [0.         0.         0.         ... 0.0055778  0.         0.01258701]]


Epoch:1 iteration:1 | loss:2.362360954284668 accuracy:0.1 lr:0.0003: 100%|██████████| 1/1 [00:02<00:00,  2.78s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

shape of mcbe_train (10, 256)
[[0.0000000e+00 0.0000000e+00 0.0000000e+00 ... 1.9276896e-02
  0.0000000e+00 2.6128538e-02]
 [0.0000000e+00 0.0000000e+00 8.8123502e-03 ... 0.0000000e+00
  0.0000000e+00 4.3584034e-02]
 [0.0000000e+00 1.8161032e-03 0.0000000e+00 ... 5.2373954e-03
  0.0000000e+00 3.3876605e-02]
 ...
 [0.0000000e+00 2.6534563e-02 1.7336793e-02 ... 7.2230897e-03
  0.0000000e+00 2.6924577e-02]
 [0.0000000e+00 8.0867857e-03 0.0000000e+00 ... 1.1269013e-02
  3.8887952e-03 4.4118624e-02]
 [0.0000000e+00 1.3717104e-02 6.7939637e-03 ... 0.0000000e+00
  6.2499195e-05 5.2637938e-02]]


Epoch:2 iteration:1 | loss:2.359889268875122 accuracy:0.1 lr:0.0003: 100%|██████████| 1/1 [00:02<00:00,  2.61s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

shape of mcbe_train (10, 256)
[[0.         0.01549462 0.         ... 0.         0.         0.08075425]
 [0.         0.00241131 0.         ... 0.00033577 0.         0.06071373]
 [0.         0.02733703 0.         ... 0.         0.         0.0614269 ]
 ...
 [0.         0.00580349 0.         ... 0.         0.         0.08184391]
 [0.         0.01772117 0.         ... 0.         0.         0.04436869]
 [0.         0.02211468 0.         ... 0.         0.         0.02562227]]


Epoch:3 iteration:1 | loss:2.3556697368621826 accuracy:0.1 lr:0.0003: 100%|██████████| 1/1 [00:02<00:00,  2.43s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

shape of mcbe_train (10, 256)
[[0.         0.04682269 0.         ... 0.01375975 0.         0.07842895]
 [0.         0.00801909 0.         ... 0.02179135 0.         0.08436973]
 [0.         0.04107784 0.         ... 0.02644871 0.         0.0751692 ]
 ...
 [0.         0.01299356 0.         ... 0.03977485 0.         0.13529278]
 [0.         0.00472744 0.         ... 0.04334392 0.         0.11181191]
 [0.         0.01130809 0.         ... 0.00859868 0.         0.09425966]]


Epoch:4 iteration:1 | loss:2.3498470783233643 accuracy:0.1 lr:0.0003: 100%|██████████| 1/1 [00:02<00:00,  2.38s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

shape of mcbe_train (10, 256)
[[0.         0.04031932 0.         ... 0.00133015 0.         0.12313003]
 [0.         0.05893543 0.         ... 0.05558316 0.         0.1304552 ]
 [0.         0.07192135 0.         ... 0.0055311  0.         0.12373418]
 ...
 [0.         0.04556362 0.         ... 0.00318024 0.         0.10018732]
 [0.         0.10922389 0.         ... 0.04482121 0.         0.18665554]
 [0.         0.04757917 0.         ... 0.05875174 0.         0.10283001]]


Epoch:5 iteration:1 | loss:2.341906785964966 accuracy:0.3 lr:0.0003: 100%|██████████| 1/1 [00:02<00:00,  2.55s/it]
  0%|          | 0/1 [00:00<?, ?it/s]

shape of mcbe_train (10, 256)
[[0.         0.10025774 0.         ... 0.07400129 0.         0.27789783]
 [0.         0.09226036 0.         ... 0.06893832 0.         0.23270942]
 [0.         0.1428134  0.         ... 0.04215278 0.         0.3052699 ]
 ...
 [0.         0.05097745 0.         ... 0.04084307 0.         0.15389708]
 [0.         0.05866631 0.         ... 0.02021352 0.         0.23684672]
 [0.         0.07114019 0.         ... 0.03276648 0.         0.17587419]]


  0%|          | 0/1 [00:00<?, ?it/s]


KeyboardInterrupt: 

Looks like we are able to overfit successfully. This indicates that there are no bugs in our model architecture. This step is very important as it will save a lot of time in future. Can't overfit? Then we need to take a look at our model architecture to resolve some bugs or create a new one altogether.

### Visualizing Batch Images

Another important thing to observe is the dataset itself. Visualizing what goes into your model is very essential. It is at this stage that you will find certain pre-processing errors that may have happened but you didn't know that it had occured. Uunfortunately, our model doesn't know which data is bad and which data is good. It takes in everything. However, model may figure out certain pre-processing errors and will just ignore that example. But this will not happen all the time.  

In [None]:
train_loader = DataLoader(train_set,batch_size=1024,shuffle=True)
batch = iter(train_loader)
images,labels = next(batch)
grid = make_grid(images,nrow=64)
plt.figure(figsize=(50,50))
plt.imshow(grid.permute(1,2,0))
# plt.show()
plt.savefig("./log/images/CIFAR10.png")

In [None]:
train_loader = DataLoader(train_set,batch_size=64,shuffle=True)
batch = iter(train_loader)
images,labels = next(batch)
grid = make_grid(images,nrow=8)
plt.figure(figsize=(6,6))
plt.imshow(grid.permute(1,2,0))
plt.show()

The above image is a subset of training images of `batch_size=64`. Since we are not using any exotic data augmentations like RandomHorizontalFlips, ColorJittering and stuff as of now, we need not worry too much about having pre-processing errors in our dataset. However, pre-processing errors are very common in real world applications.

### Hyperparameter Optimization

There are many methods of doing hyperparameter optimization. You may be familiar with GridSearchCV that is offen used in machine learning. Here we will not be using GridSearchCV to find the right values for our hyper parameters. Instead we will use the coarse to fine strategy to find descent values for them. 

In [None]:
import random

torch.manual_seed(0)
random.seed(0)

In [None]:
Model = ConvNet()
train_set = CIFAR10(root="./cifar-10-batches-py",train=True,
                    transforms=Compose([
                        ToTensor(),
                        Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                  std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                    ]))

test_set = CIFAR10(root="./cifar-10-batches-py",train=False,
                   transforms=Compose([
                        ToTensor(),
                        Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                  std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                    ]))

We will use a very low learning rate of 1e-6 first and see how it performs.

In [None]:
train_config = TrainingConfig(max_epochs=7,lr=1e-6,batch_size=64,weight_decay=0,num_workers=0,verbose=True)
trainer = Trainer(model=Model,train_dataset=train_set,test_dataset=test_set,config=train_config)
trainer.train()

With a learning rate of 1e-6, the train_loss is barely moving. Suggests that learning rate is too low.

We will try a learning rate of 1e-3 and check the results.

In [None]:
Model = ConvNet() #reinit the model parameters
train_config = TrainingConfig(max_epochs=10,lr=1e-3,batch_size=64,weight_decay=0,num_workers=0,verbose=True)
trainer = Trainer(model=Model,train_dataset=train_set,test_dataset=test_set,config=train_config)
trainer.train()

We can see that the loss of both train_set and the test_set is going down. Which is a good sign. Hence we will search for values closer to 1e-3 in log space so that we can get better values for our hyperparameters.

*Note : In the above training process we searched for a good learning rate first. Learning rate affects your model the most. Other parameters come next.*

In [None]:
for runs in range(20):
    lr = 10**(np.random.uniform(-3,-5))
    weight_decay = 10**(np.random.uniform(-4,-5))
    
    Model = ConvNet()
    training_config = TrainingConfig(max_epochs=5,lr=lr,weight_decay=weight_decay,batch_size=64,verbose=False)
    trainer = Trainer(model=Model,train_dataset=train_set,test_dataset=test_set,config=training_config)
    trainer.train()
    val_acc = np.mean(trainer.test_accuracies)
    print(f"val_acc:{val_acc} lr:{lr} reg:{weight_decay} ({runs+1}/{len(range(20))})")

Above cell will take some time to get executed. 

Inference from the above hyperparameter optimization process:

We can notice that we are getting good results when the learning rate is between 1e-3 and 1e-4. We will now refine our search space to get much better values.
    

In [None]:
for runs in range(20):
    lr = 10**(np.random.uniform(-3,-4))
    weight_decay = 10**(np.random.uniform(-4,-5))
    
    Model = ConvNet()
    training_config = TrainingConfig(max_epochs=5,lr=lr,weight_decay=weight_decay,batch_size=64,verbose=False)
    trainer = Trainer(model=Model,train_dataset=train_set,test_dataset=test_set,config=training_config)
    trainer.train()
    val_acc = np.mean(trainer.test_accuracies)
    print(f"val_acc:{val_acc} lr:{lr} reg:{weight_decay} ({runs+1}/{len(range(20))})")

From the above tuning process, we can see that learning rates lower than 0.6e-4 tend to work better. 

In [None]:
for runs in range(20):
#     lr = 10**(np.random.uniform(-4,-3))
    lr = 0.0009446932175584296
    weight_decay = 10**(np.random.uniform(-3,-4))
    
    Model = ConvNet()
    training_config = TrainingConfig(max_epochs=5,lr=lr,weight_decay=weight_decay,batch_size=64,verbose=False)
    trainer = Trainer(model=Model,train_dataset=train_set,test_dataset=test_set,config=training_config)
    trainer.train()
    val_acc = np.mean(trainer.test_accuracies)
    print(f"val_acc:{val_acc} lr:{lr} reg:{weight_decay} ({runs+1}/{len(range(20))})")

*Note : I could not find any other good hyperparameter values other than for the one we got ~60.6% accuracy `(val_acc:0.6063799999999999 lr:0.0009446932175584296 reg:0.00011257445443209662 (9/20))`. So we will we using those values itself for training the model.*

### Training ConvNet on CIFAR-10

We can now train our model on the dataset we have downloaded. Hopefully things go well :) 

In [144]:
Model = ConvNet()
# model.load_state_dict(torch.load("./Model.pt")) #Uncomment this to load pre-trained weights
train_set = CIFAR10(root="./cifar-10-batches-py",train=True,
                    transforms=Compose([
                        ToTensor(),
                        RandomHorizontalFlip(),
                        RandomRotation(degrees=10),
                        ColorJitter(brightness=0.5),
                        Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                  std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                    ]))

test_set = CIFAR10(root="./cifar-10-batches-py",train=False,
                   transforms=Compose([
                        ToTensor(),
                        Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                  std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                    ]))

train_config = TrainingConfig(max_epochs=50,
                              lr=0.0009446932175584296,
                              weight_decay=0.00011257445443209662,
                              ckpt_path="./models/Final_Model.pt",
                              batch_size=64,
                              num_workers=0)

trainer = Trainer(model,train_dataset=train_set,
                  test_dataset=test_set,config=train_config)

### Uncomment the following if you have already trained a model and want to continue training ###
# trainer.train_losses = torch.load("./train_losses.pt")
# trainer.train_accuracies = torch.load("./train_accuracies.pt")
# trainer.test_losses = torch.load("./test_losses.pt")
# trainer.test_accuracies = torch.load("./test_accuracies.pt")
###

In [145]:
trainer.train()

  0%|          | 0/782 [00:00<?, ?it/s]

crossentropy: tensor(2.3030, grad_fn=<NllLossBackward0>) maxbias: 0.007666198587886665


Epoch:1 iteration:1 | loss:2.3107030391693115 accuracy:0.125 lr:0.0009446932175584296:   0%|          | 1/782 [00:02<38:17,  2.94s/it]

crossentropy: tensor(2.3075, grad_fn=<NllLossBackward0>) maxbias: 0.00817092183159028


Epoch:1 iteration:2 | loss:2.3131635189056396 accuracy:0.1171875 lr:0.0009446932175584296:   0%|          | 2/782 [00:05<36:05,  2.78s/it]

crossentropy: tensor(2.3073, grad_fn=<NllLossBackward0>) maxbias: 0.009161680766661826


Epoch:1 iteration:3 | loss:2.31427009900411 accuracy:0.109375 lr:0.0009446932175584296:   0%|          | 3/782 [00:08<35:33,  2.74s/it]   

crossentropy: tensor(2.2980, grad_fn=<NllLossBackward0>) maxbias: 0.009554654056880467


Epoch:1 iteration:4 | loss:2.312599539756775 accuracy:0.109375 lr:0.0009446932175584296:   1%|          | 4/782 [00:11<37:13,  2.87s/it]

crossentropy: tensor(2.3011, grad_fn=<NllLossBackward0>) maxbias: 0.009565544908001938


Epoch:1 iteration:5 | loss:2.31221022605896 accuracy:0.109375 lr:0.0009446932175584296:   1%|          | 5/782 [00:14<36:50,  2.84s/it] 

crossentropy: tensor(2.3022, grad_fn=<NllLossBackward0>) maxbias: 0.010259999220830656


Epoch:1 iteration:7 | loss:2.31457189151219 accuracy:0.10905612244897958 lr:0.0009446932175584296:   1%|          | 6/782 [00:19<36:30,  2.82s/it]

crossentropy: tensor(2.3163, grad_fn=<NllLossBackward0>) maxbias: 0.01221514696937531


Epoch:1 iteration:7 | loss:2.31457189151219 accuracy:0.10905612244897958 lr:0.0009446932175584296:   1%|          | 7/782 [00:19<35:21,  2.74s/it]

crossentropy: tensor(2.2943, grad_fn=<NllLossBackward0>) maxbias: 0.010811248235048643


Epoch:1 iteration:8 | loss:2.313385933637619 accuracy:0.10860770089285715 lr:0.0009446932175584296:   1%|          | 8/782 [00:22<36:04,  2.80s/it]

crossentropy: tensor(2.2987, grad_fn=<NllLossBackward0>) maxbias: 0.010163500442128682


Epoch:1 iteration:9 | loss:2.3128823704189725 accuracy:0.10772845017636684 lr:0.0009446932175584296:   1%|          | 9/782 [00:25<36:30,  2.83s/it]

crossentropy: tensor(2.3027, grad_fn=<NllLossBackward0>) maxbias: 0.010370906969137603


Epoch:1 iteration:10 | loss:2.3129061222076417 accuracy:0.10726810515873016 lr:0.0009446932175584296:   1%|▏         | 10/782 [00:28<35:53,  2.79s/it]

crossentropy: tensor(2.3084, grad_fn=<NllLossBackward0>) maxbias: 0.010712447577510054


Epoch:1 iteration:11 | loss:2.3134677626869897 accuracy:0.10681397989636626 lr:0.0009446932175584296:   1%|▏         | 11/782 [00:30<36:10,  2.81s/it]

crossentropy: tensor(2.3062, grad_fn=<NllLossBackward0>) maxbias: 0.010590138473284081


Epoch:1 iteration:12 | loss:2.313741306463877 accuracy:0.10648486351611351 lr:0.0009446932175584296:   2%|▏         | 12/782 [00:33<35:33,  2.77s/it] 

crossentropy: tensor(2.3006, grad_fn=<NllLossBackward0>) maxbias: 0.011856365396288171


Epoch:1 iteration:13 | loss:2.3136409796201267 accuracy:0.10605999235806927 lr:0.0009446932175584296:   2%|▏         | 13/782 [00:36<35:09,  2.74s/it]

crossentropy: tensor(2.3070, grad_fn=<NllLossBackward0>) maxbias: 0.013222670037192219


Epoch:1 iteration:14 | loss:2.314107741628374 accuracy:0.10541986535290106 lr:0.0009446932175584296:   2%|▏         | 14/782 [00:39<35:48,  2.80s/it] 

crossentropy: tensor(2.3027, grad_fn=<NllLossBackward0>) maxbias: 0.018024510608041274


Epoch:1 iteration:15 | loss:2.314545679092407 accuracy:0.10512798544048543 lr:0.0009446932175584296:   2%|▏         | 15/782 [00:41<35:35,  2.78s/it]

crossentropy: tensor(2.2934, grad_fn=<NllLossBackward0>) maxbias: 0.01837752776582106


Epoch:1 iteration:17 | loss:2.3149561601526596 accuracy:0.1046814836931619 lr:0.0009446932175584296:   2%|▏         | 16/782 [00:47<35:00,  2.74s/it] 

crossentropy: tensor(2.2891, grad_fn=<NllLossBackward0>) maxbias: 0.03521852166461275


Epoch:1 iteration:17 | loss:2.3149561601526596 accuracy:0.1046814836931619 lr:0.0009446932175584296:   2%|▏         | 17/782 [00:47<34:42,  2.72s/it]

crossentropy: tensor(2.2985, grad_fn=<NllLossBackward0>) maxbias: 0.06190868068253533


Epoch:1 iteration:18 | loss:2.3174838489956326 accuracy:0.1044599815126776 lr:0.0009446932175584296:   2%|▏         | 18/782 [00:49<34:34,  2.72s/it]

crossentropy: tensor(2.2572, grad_fn=<NllLossBackward0>) maxbias: 0.08352141767449626


Epoch:1 iteration:19 | loss:2.3187083696064197 accuracy:0.10437240630840924 lr:0.0009446932175584296:   2%|▏         | 19/782 [00:52<35:19,  2.78s/it]

crossentropy: tensor(2.2668, grad_fn=<NllLossBackward0>) maxbias: 0.0984281459743727


Epoch:1 iteration:20 | loss:2.3210333943367005 accuracy:0.10423191099298879 lr:0.0009446932175584296:   3%|▎         | 20/782 [00:55<35:38,  2.81s/it]

crossentropy: tensor(2.3276, grad_fn=<NllLossBackward0>) maxbias: 0.08596061740258297


Epoch:1 iteration:21 | loss:2.3254383632114957 accuracy:0.10401621908629317 lr:0.0009446932175584296:   3%|▎         | 21/782 [00:58<35:17,  2.78s/it]

crossentropy: tensor(2.2281, grad_fn=<NllLossBackward0>) maxbias: 0.07780515550943468


Epoch:1 iteration:22 | loss:2.324552362615412 accuracy:0.10387240334270134 lr:0.0009446932175584296:   3%|▎         | 22/782 [01:01<34:40,  2.74s/it] 

crossentropy: tensor(2.2581, grad_fn=<NllLossBackward0>) maxbias: 0.09095913275109503


Epoch:1 iteration:23 | loss:2.325615945069686 accuracy:0.10387535177959713 lr:0.0009446932175584296:   3%|▎         | 23/782 [01:03<34:46,  2.75s/it]

crossentropy: tensor(2.2459, grad_fn=<NllLossBackward0>) maxbias: 0.09958145929324078


Epoch:1 iteration:25 | loss:2.328553628921509 accuracy:0.104010115303896 lr:0.0009446932175584296:   3%|▎         | 24/782 [01:09<34:29,  2.73s/it]   

crossentropy: tensor(2.2865, grad_fn=<NllLossBackward0>) maxbias: 0.09261275942891276


Epoch:1 iteration:25 | loss:2.328553628921509 accuracy:0.104010115303896 lr:0.0009446932175584296:   3%|▎         | 25/782 [01:09<33:34,  2.66s/it]

crossentropy: tensor(2.2703, grad_fn=<NllLossBackward0>) maxbias: 0.11824180677810509


Epoch:1 iteration:26 | loss:2.330861733509944 accuracy:0.10410088749634973 lr:0.0009446932175584296:   3%|▎         | 26/782 [01:11<33:15,  2.64s/it]

crossentropy: tensor(2.2946, grad_fn=<NllLossBackward0>) maxbias: 0.11129983440611377


Epoch:1 iteration:27 | loss:2.333642217848036 accuracy:0.10431765846699247 lr:0.0009446932175584296:   3%|▎         | 27/782 [01:16<35:42,  2.84s/it]


KeyboardInterrupt: 

In [None]:
# torch.save(Model.state_dict(),"./models/Model300.pt") #Uncomment this if you want to save the model 
torch.save(trainer.train_losses,"./log/train_losses.pt")
torch.save(trainer.train_accuracies,"./log/train_accuracies.pt")
torch.save(trainer.test_losses,"./log/test_losses.pt")
torch.save(trainer.test_accuracies,"./log/test_accuracies.pt")

I have trained the model for around 400 epochs and the final model achieves a test accuracy of 87.27% training for longer could increase the accuracy. CIFAR-10 dataset is a hard dataset. Human level performance is about 94% and for our model to achieve that it will take a long time.

The best model achieved has a test loss of 0.380 and a test accuracy of 87.35%

In [152]:
for i in range(3):
    Model = ConvNet()
    # model.load_state_dict(torch.load("./Model.pt")) #Uncomment this to load pre-trained weights
    train_set = CIFAR10(root="./cifar-10-batches-py",train=True,
                        transforms=Compose([
                            ToTensor(),
                            RandomHorizontalFlip(),
                            RandomRotation(degrees=10),
                            ColorJitter(brightness=0.5),
                            Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                    std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                        ]))

    test_set = CIFAR10(root="./cifar-10-batches-py",train=False,
                    transforms=Compose([
                            ToTensor(),
                            Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                    std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                        ]))

    train_config = TrainingConfig(max_epochs=50,
                                lr=0.0009446932175584296,
                                weight_decay=0.00011257445443209662,
                                ckpt_path="./models/Final_Model_inj" +str(i) +".pt",
                                batch_size=64,
                                num_workers=0)

    trainer = Trainer(model,train_dataset=train_set,
                    test_dataset=test_set,config=train_config)
    trainer.train()
    # torch.save(Model.state_dict(),"./models/Model300.pt") #Uncomment this if you want to save the model 
    torch.save(trainer.train_losses,"./log/train_losses" + str(i) +".pt")
    torch.save(trainer.train_accuracies,"./log/train_accuracies" + str(i) +".pt")
    torch.save(trainer.test_losses,"./log/test_losses" + str(i) +".pt")
    torch.save(trainer.test_accuracies,"./log/test_accuracies" + str(i) +".pt")

Epoch:1 iteration:782 | loss:98340.46091211207 accuracy:0.19259100014124053 lr:0.0009446932175584296: 100%|██████████| 782/782 [28:17<00:00,  2.17s/it] 



Epoch:1 | Test Loss:3.1045483859481324 Test Accuracy:0.3398

Model Saved!


Epoch:2 iteration:351 | loss:3425827366.886931 accuracy:0.3567769159166225 lr:0.0009446932175584296:  45%|████▍     | 351/782 [12:50<13:56,  1.94s/it]  

### Top-1 and Top-5 Accuracies

Now we will measure the TOP-1 and TOP-5 Accuracies of our model for both train and test datasets. TOP-1 Accuracy is calculated by calculating the number of accurate predictions out of total predictions made by our model. Top-5 accuracy is the accuracy obtained when the the correct label for a class lies within the first 5 perdictions made by the model.

In [None]:
def accuracy(output, target, topk=(1,)):
    """Computes the accuracy over the k top predictions for the specified values of k"""
    with torch.no_grad():
        maxk = max(topk)
        batch_size = target.size(0)

        _, pred = output.topk(maxk, 1, True, True)
        pred = pred.t()
        correct = pred.eq(target.view(1, -1).expand_as(pred))

        res = []
        for k in topk:
            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
            res.append(correct_k.mul_(100.0 / batch_size))
        return res

In [None]:
Final_Model = ConvNet()
Best_Model = ConvNet()
Final_Model.load_state_dict(torch.load("./models/Final_Model.pt"))
Best_Model.load_state_dict(torch.load("./models/Best_Model.pt"))

train_set = CIFAR10(root="./cifar-10-batches-py",train=True,
                    transforms=Compose([
                        ToTensor(),
                        RandomHorizontalFlip(),
                        RandomRotation(degrees=10),
                        ColorJitter(brightness=0.5),
                        Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                  std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                    ]))

test_set = CIFAR10(root="./cifar-10-batches-py",train=False,
                   transforms=Compose([
                        ToTensor(),
                        Normalize(mean=(0.4913997551666284, 0.48215855929893703, 0.4465309133731618),
                                  std=(0.24703225141799082, 0.24348516474564, 0.26158783926049628))
                    ]))

train_loader = DataLoader(train_set,batch_size=64,shuffle=True,num_workers=0)
test_loader = DataLoader(test_set,batch_size=64,shuffle=True,num_workers=0)

for model in ["Final_Model","Best_Model"]:
    for loader in [train_loader,test_loader]:
        top1_accuracy = []
        top5_accuracy = []
        Model = Final_Model if model=="Final_Model" else Best_Model
        for it, (images,targets) in enumerate(loader):
            logits,loss = Model(images,targets)
            acc1, acc5 = accuracy(logits, targets, topk=(1, 5))
            top1_accuracy.append(acc1)
            top5_accuracy.append(acc5)
        
        split = "train" if loader==train_loader else "test"
        print(f"Model : {model}\nsplit : {split}")
        print("Top1 Accuracy :",np.mean(top1_accuracy))
        print("Top5 Accuracy :",np.mean(top5_accuracy))
        print()

### Plotting Graphs

You can use the training metrics stored in trainer class to plot some loss and accuracy curves. I did not get enough time to code this. I think you can do it on your own. 

```python
trainer.train_losses = torch.load("./train_losses.pt")
trainer.train_accuracies = torch.load("./train_accuracies.pt")
trainer.test_losses = torch.load("./test_losses.pt")
trainer.test_accuracies = torch.load("./test_accuracies.pt")
```

The above lists in trainer classes will have training losses, accuracies, test losses and test accuracies.

You can also plot Confusion Matrix using Sklean library.