## Pretrained models

In addition to building neural network models from scratch, PyTorch provides access to a variety of pretrained models through the `torchvision` library. These models have been trained on large datasets and can be fine-tuned for specific tasks, saving significant time and computational resources. This example demonstrates how to use a pretrained ResNet-18 model and adapt it for a custom classification task with 10 classes.

### Steps Involved:

1. **Importing Pretrained Models**:
   - The code begins by importing the necessary modules from `torchvision`, specifically the `models` package which contains a collection of pretrained models, and `ResNet18_Weights` for accessing the pretrained weights.

2. **Loading a Pretrained Model**:
   - `model_ft = models.resnet18(weights=ResNet18_Weights.DEFAULT)` loads the ResNet-18 model with its default pretrained weights.
   - Using pretrained weights means the model has already learned useful features from a large dataset like ImageNet.

3. **Modifying the Final Layer**:
   - Pretrained models are often used as feature extractors. The final fully connected layer of the ResNet-18 model is modified to match the number of classes in the new task (10 in this case).
   - `num_ftrs = model_ft.fc.in_features` retrieves the number of input features to the final fully connected layer.
   - `model_ft.fc = torch.nn.Linear(num_ftrs, 10)` replaces the original final layer with a new `Linear` layer that has 10 output features, corresponding to the number of classes in the new task.

4. **Device Handling**:
   - The code checks for the availability of a GPU and assigns the model to the appropriate device using `torch.device("cuda" if torch.cuda.is_available() else "cpu")`.
   - `model_ft = model_ft.to(device)` moves the model to the selected device, ensuring that computations are performed on the GPU if available, which accelerates the training and inference processes.

### Benefits of Using Pretrained Models:

- **Reduced Training Time**: Pretrained models have already learned a wide range of features, so only the final layers need to be trained or fine-tuned, significantly reducing training time.
- **Improved Performance**: Starting with a model that already has learned representations can lead to better performance, especially with limited training data.
- **Flexibility**: Pretrained models can be adapted to various tasks by simply modifying the output layers, making them versatile tools in deep learning.

### Step 1: load a pretrained model


In [1]:
from torchvision import models # https://pytorch.org/vision/stable/models.html
from torchvision.models.resnet import ResNet18_Weights

# Load the pretrained ResNet18 model
model = models.resnet18(weights=ResNet18_Weights.DEFAULT)

print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

### Note
By printing out the model, you may notice that the structure resembles a dictionary, with keys representing the names and the corresponding values being the layers. In PyTorch, you can access and print these keys (names) and values (layers) using the named_children() method.

In [2]:
for name, layer in model.named_children():
    print(name, type(layer))

conv1 <class 'torch.nn.modules.conv.Conv2d'>
bn1 <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
relu <class 'torch.nn.modules.activation.ReLU'>
maxpool <class 'torch.nn.modules.pooling.MaxPool2d'>
layer1 <class 'torch.nn.modules.container.Sequential'>
layer2 <class 'torch.nn.modules.container.Sequential'>
layer3 <class 'torch.nn.modules.container.Sequential'>
layer4 <class 'torch.nn.modules.container.Sequential'>
avgpool <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>
fc <class 'torch.nn.modules.linear.Linear'>


### Note
Most layers in PyTorch represent specific operations. However, there are times when we don’t want to name every operation individually. In such cases, we can use `Sequential` to stack layers together and apply them as a single operation. For example, `layer1` can contain multiple layers, allowing for more concise and organized model definitions.

Next, let's create a sample input to see how the output looks like

In [3]:
import torch 

sample_input = torch.randn((1, 3, 256, 256))

output = model(sample_input)

print(f'output shape: {output.shape}')

output shape: torch.Size([1, 1000])


### Adapt the Output to Our Task

As you can see, the output shape is \(1 \times 1000\). This is because the model is pretrained on ImageNet, which has 1000 classes. We need to adapt the output for our task, which involves 10 classes, in order to fine-tune the model. 

If you refer back to the printed model, you’ll notice that the last layer, `fc`, has `in_features` of 512 and `out_features` of 1000. This is where the 1000-class output is generated. Therefore, we can replace this layer with a new `fc` layer that has `out_features = 10`.

In [4]:
# Modify the final layer to match the number of classes
num_ftrs = model.fc.in_features
model.fc = torch.nn.Linear(num_ftrs, 10)

# print the model again to observe the new `fc` layer
print(model)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

### Note

The new `fc` layer has only 10 `out_features`. Let's input the sample tensor again to see how the output changes.

In [5]:
output = model(sample_input)

print(f'output shape: {output.shape}')

output shape: torch.Size([1, 10])
