# Benchmarking image recognition models

The `LeNet` architecture is an example of a convolutional network architecture that emerged in the 1990.
Since then, the architecture of the networks have been improved and their complexity has increased.
The complexity of machine learning models can be quantified with the notion of [neuronal capacity](https://proceedings.neurips.cc/paper/2018/file/a292f1c5874b2be8395ffd75f313937f-Paper.pdf), which is related to the number of trainable parameters and the number of layers in the network.
Below you see a bar chart showing the evolution of the error rate as the models become more complex.
![image](./imagenet-learnopencv_com-generative-and-discriminative-models.png)

When addressing a problem with machine learning methods, a large part of the work is to compare models and their performances on some datasets of interests.


We propose to benchmark the performances of models with different architectures and complexity on image classification datasets. In order to save some training time, the models can be downloaded pre-trained on a large image dataset: [ImageNet](https://pytorch.org/vision/master/generated/torchvision.datasets.ImageNet.html) dataset. 
We will then re-train the models to fit our new image datasets. This touches upon the field of transfer learning which you will study in more details later in the course. You can also download models without pre-training.


Depending on your interest you might also want to define performance and complexity in different ways: 
- Performances can be understood as the test performances (e.g. accuracy), but for critical applications the test runtime can also be an important factor.
- Complexity can be understood for instance as the number of trainable parameters, the number of layers, the amount of memory required for training or the training time.


## Task
You are then free to attempt to answer different questions depending on your interests.
For instance:
1. What model gives the best performance/complexity tradeoff for a particular dataset ?
2. What model gives the best performance accross datasets ?

You could first choose at least two models and two datasets. Then train/retrain the models on each dataset and average the results accross datasets to get one "score" per model.

3. What model requires the most training/retraining to achieve a certain performance on a new dataset ?

Since the models and datasets might have mis-matching input and output sizes, a first task will be to make sure that the dimension of the networks and the dimension of the datasets match.
The models can then be re-trained on a new datasets and their performances evaluated.


We suggest to organize the work as follows:
- Choose a question to answer (among the list above or one of your own)
- Choose at least one dataset and two models or one model and two datasets to compare
- Report the test accuracy of the models before retraining. This is to ensure that the model input and output sizes match the datasets you are using
- Train the models on the new datasets
- Propose an answer to your question



##  References
- The [ImageNet](https://pytorch.org/vision/master/generated/torchvision.datasets.ImageNet.html) dataset contains around 1.2 million images to be classified into 1000 classes.
- The models are taken from the Pytorch vision model database: https://pytorch.org/hub/research-models

Let's first import the required libraries

In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models
import torchvision.datasets as datasets

print("torch=={}".format(torch.__version__))
print("torchvision=={}".format(torchvision.__version__))

torch==1.10.1+cu102
torchvision==0.11.2+cu102


Let us now declare a dictionary containing the model instantiation functions

In [5]:
m = {"resnet18":lambda :models.resnet18(pretrained=True),
     "alexnet":lambda :models.alexnet(pretrained=True),
     "squeezenet":lambda:models.squeezenet1_1(pretrained=True),
     "vgg16":lambda :models.vgg16(pretrained=True),
     "densenet":lambda :models.densenet161(pretrained=True),
     "inception":lambda :models.inception_v3(pretrained=True),
     "googlenet":lambda :models.googlenet(pretrained=True),
     "shufflenet": lambda :models.shufflenet_v2_x1_0(pretrained=True),
     "mobilenet": lambda :models.mobilenet_v2(pretrained=True),
     "resnext50_32x4d": lambda :models.resnext50_32x4d(pretrained=True),
     "wide_resnet50_2":lambda :models.wide_resnet50_2(pretrained=True),
     "mnasnet1_0":lambda :models.mnasnet1_0(pretrained=True)
    }

print("Available models:")
print("\n".join([" - " + k for k in m.keys()]))

Available models:
 - resnet18
 - alexnet
 - squeezenet
 - vgg16
 - densenet
 - inception
 - googlenet
 - shufflenet
 - mobilenet
 - resnext50_32x4d
 - wide_resnet50_2
 - mnasnet1_0


## Investigating a model

For instance, to declare an instance of the model called `resnet18`, run the following:

In [7]:
net = m["resnet18"]()

### GPU

The `torch.cuda` API implements functions to interact with GPUs.

In [None]:
device="cpu"
if torch.cuda.is_available():
    device="cuda" # You can also use a specific device, e.g. "cuda:0", "cuda:1" depending on your install
device

To push the parameters of the model to a specific `device`, use the `.to()` method with the device you want as argument:

In [None]:
net=net.to(device)

### Access the parameter tensors

Then we can check the parameter tensor iterable:
- If the attribute `requires_grad` is `True` then the parameter is trainable
- The attribute `device` refers to device the tensor is declared on

In [30]:
params_info=["requires_grad={}, device={}".format(p.requires_grad, p.device) for p in net.parameters()]
print("\n".join(params_info[:5]))
print("...")

requires_grad=False, device=cpu
requires_grad=False, device=cpu
requires_grad=False, device=cpu
requires_grad=False, device=cpu
requires_grad=False, device=cpu
...


### Visualizing the model

We can also get a nicer display of the model layers:

In [32]:
net

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In this display you can see that the layers are defined using instances of the `nn.Sequential` class.
This allows you to declare a sequence of layers without having to write the `forward` pass for it.

The individual layers are direct attributes of the model object:

In [33]:
net.conv1

Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

In [36]:
net.layer4[0].bn2

BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

## Datasets

In [39]:
transform = transforms.Compose(
    [transforms.ToTensor()]
)

kwargs = {"transform":transform, "download":True, "root":"../data"}


d = {"CIFAR10": lambda:(datasets.CIFAR10(train=True,**kwargs),datasets.CIFAR10(train=False,**kwargs)),
     "CIFAR100": lambda:(datasets.CIFAR100(train=True,**kwargs),datasets.CIFAR100(train=False,**kwargs)),
     "MNIST": lambda:(datasets.MNIST(train=True,**kwargs),datasets.MNIST(train=False,**kwargs)),
     "FashionMNIST":lambda:(datasets.FashionMNIST(train=True,**kwargs),datasets.FashionMNIST(train=False,**kwargs))
    }


print("Available datasets:")
print("\n".join([" - " + k for k in d.keys()]))

Available datasets:
 - CIFAR10
 - CIFAR100
 - MNIST
 - FashionMNIST


For instance, to get the training and testing sets for the `CIFAR100` dataset, run:

In [5]:
dataset_name = "CIFAR100"
trainset,testset = d[dataset_name]()

Files already downloaded and verified
Files already downloaded and verified


Let us compare the dimensions of our input dataset with the dimensions of the input and output layers of our network. Depending on the dataset/model pair that you have chosen, you might need to modify the input and output layers of the network to be able to use it on your dataset. You could also find a way to resize the input images using the `torchvision.transforms` module.

Note: `net.conv1` and `net.fc` are specific to the `resnet18` network, the attribute names might have to be adapted if you use a different network. 

In [6]:
print("Input image shape:", trainset.data[0].shape)
print("Network input layer ...", net.conv1)
print()
print("Number of classes:", len(trainset.classes))
print("Network output layer ...", net.fc)

Input image shape: (32, 32, 3)
Network input layer ... Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

Number of classes: 100
Network output layer ... Linear(in_features=512, out_features=1000, bias=True)


The data loaders may be obtained as follows:

In [7]:
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True, num_workers=2)

In [8]:
# Implement a test function which prints the accuracy of a model for a given dataset:
# Make sure that the data and the model are on the same device.

def run_test(model, dataloader):
    """
    Given
        model: model class assumed to have a forward method
        dataloader: Input data loader
    
    Prints/Returns
        model accuracy on the dataset
    """
    net.eval() # Put the model in eval mode, i.e. disable dropout layers and put the batch norm layers in eval mode

    # Write code here:
    # ...
    # ...     
    print("Not yet implemented.")

Once the test function is declared, you can run it on the downloaded model and on the test dataset

In [9]:
run_test(net, testloader)

Not yet implemented.


You should now write the training function.
Make sure that the data and the model are on the same device.


In [10]:
def run_train(net, dataloader):
    
    # Choose the number of epoch
    n_epoch = 2

    # Choose a criterion
    criterion = nn.CrossEntropyLoss()

    # Put the model in training mode (i.e. activate batch norm and dropout layers)
    net.train()

    # Choose an optimizer
    optimizer = torch.optim.Adam(net.parameters(),lr=1e-3)


    # Implement a training algorithm for your model
    for epoch in range(n_epoch):
        running_loss=0.
        for i, data in enumerate(trainloader, 0):
            # Write code here:
            # ...
            # ... 
            pass
    print("Not yet implemented.")
    return net

Once this is done you may run the training algorithm

In [11]:
net=run_train(net, trainloader)

Not yet implemented.


And run the test once again:

In [12]:
run_test(net, testloader)

Not yet implemented.
