In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [10]:
# Load the ResNet-18 model from pytorch and display its architecture 
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
model

Using cache found in /home/cytech/.cache/torch/hub/pytorch_vision_v0.10.0


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In this pre-trained ResNet model, two key adjustments need to be made: the first and last layers. ResNet was originally trained on the ImageNet dataset, which consists of images that are 224x224 pixels and classified into a thousand categories

1) "All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224." cf https://pytorch.org/hub/pytorch_vision_resnet/ 
2) The final fully connected layer `(fc): Linear(in_features=512, out_features=1000, bias=True)` clearly shows that the last layer outputs 1000 features.\

- Since CIFAR-10 images are much smaller at 32x32 pixels, we have two options: either resize the images or modify the first layer. Both approaches have their advantages and drawbacks. Resizing the images can lead to a loss of important details and increase the computational load. Instead, it’s more efficient to adapt the first layer by reducing the kernel size, stride, and padding. The original 7x7 kernel is too large for 32x32 images and would result in information loss, so switching to a 3x3 kernel is a more suitable option for preserving details.
- The CIFAR-10 dataset has 10 different classes. Therefore, we need to adjust the `out_features` parameter in the final fully connected layer: `(fc): Linear(in_features=512, out_features=10, bias=True).`