# Identify land use from satellite images using Convolutional Neural Networks - Part 3
## Transfer Learning using **ResNet-18**

In this notebook, we will explore the advantages of **transfer learning** for the land use classification task. Transfer learning is a powerful technique where a model developed for a specific task is reused as the starting point for a model on a second task. In our case, we'll utilize the pre-trained `ResNet-18` architecture, a renowned CNN model. For image classification tasks, the pre-trained weights are usually available from different versions of the [ImageNet](https://en.wikipedia.org/wiki/ImageNet). This is a large and diverse dataset containing millions of images across thousands of categories.

We will leverage `torchvision.models`, a module within PyTorch's `torchvision` library, to readily obtain pre-trained models and modify them for our purposes. This greatly simplifies the process of applying transfer learning techniques.


<img src="https://www.researchgate.net/profile/Sajid-Iqbal-13/publication/336642248/figure/fig1/AS:839151377203201@1577080687133/Original-ResNet-18-Architecture_W640.jpg" width="1200" alt="The original ResNet-18 architecture">


The walkthrough part of this notebook will show you some code snippets on how to implement the two main approaches for transfer learning:

1. **Fine-tuning all layers**: This approach begins with loading the weights of ResNet18 pre-trained on ImageNet. By fine-tuning all the layers of the architecture, we allow the model to adjust its learned features more precisely to our specific task of land use classification. This method is highly effective when your dataset has some similarities to the original training dataset but still differs in significant ways. Fine-tuning the entire network can lead to better performance as it refines the pre-learned features to be more relevant to the new task. However, this approach requires careful management of the learning rate and other training parameters to ensure that the pre-trained weights are modified appropriately and do not lead to overfitting on the new dataset.

2. **Freezing the feature extraction layers and only fine-tuning the classifier**: In this method, we use the architecture of ResNet18 with its pre-trained weights but *freeze* the feature extraction layers. This means that the weights in these layers, which have already learned to extract general features from a broad dataset, remain unchanged. Only the final classification layers of the network are trained to adapt to the specific land use classification task. This approach is particularly useful when the new dataset is quite similar to the one used for pre-training, or when the availability of data for the new task is limited. By keeping the pre-trained features fixed, the network leverages its prior knowledge, which can lead to faster convergence and less requirement for computational resources. It's a balance between utilizing the strength of the pre-trained model and customizing it to fit the specific nuances of the new dataset.

Alternatively, we could also **use the architecture without pre-trained weights**, that is, initializing the network with random weights. This usually entails that our dataset is significantly different from the dataset used in the original training (e.g., ImageNet). However, we deem that the architecture itself could be a good fit for the problem, at least for initial investigations.

The walktrough will show you how to use `torchvision.models` to load a `ResNet-18` architecture, with and without pre-trained weights, and how we can enforce the freezing of the feature extraction layers. After the walkthrough, you'll be asked to implement the two transfer learning strategies for our case study on land use identification, compare their effectiveness, also with respect to the models you trained in Part 1 and Part 2 of this case study.

## Walkthrough

In [None]:
import torch
import torch.nn as nn
from torchvision import models
from torchsummary import summary

In [None]:
# Check if CUDA is available, otherwise use CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

Using device: cuda


### Using `ResNet-18` without Pre-trained Weights

To initialize `ResNet-18` model without pre-trained weights we specify `models.resnet18(weights=None)`. The last layer (the fully connected layer) is responsible for classification. We need to find out the number of features extracted by `ResNet-18`, which is essential for adapting this layer for our speecific task. This is obtained from `resnet18.fc.in_features`. For our specific use case of land use identification, we modify the fully connected layer to predict 21 classes (the number of land use categories in our study). This is done by setting `resnet18.fc` to a new `nn.Linear` layer with `num_ftrs` input features and `num_classes` output features.

In [None]:
# Load the ResNet-18 model
resnet18 = models.resnet18(weights=None)  # This ensures we do not load the pretrained weights

# We need to know the number of features in the last layer of the pretrained model to adapt our classifier.
# These are equal to the number of inputs of the "fully connected" (fc) head of the ResNet-18.
num_ftrs = resnet18.fc.in_features
print(f"The number of features extracted by ResNet-18 are: {num_ftrs}")

# Check the number of output classes of the original ResNet-18
num_outputs = resnet18.fc.out_features
print(f"The original ResNet-18 predicts {num_outputs} classes")

# Modify the fully connected layer to match the number of classes for the land use identification case study
num_classes = 21
resnet18.fc = nn.Linear(num_ftrs, num_classes)
num_outputs = resnet18.fc.out_features
print(f"The modified ResNet-18 predicts {num_outputs} classes")

The number of features extracted by ResNet-18 are: 512
The original ResNet-18 predicts 1000 classes
The modified ResNet-18 predicts 21 classes


 After the modifications, we move the model to the specified device (e.g., GPU or CPU) and print a summary for a given input size. This summary helps in understanding the model’s architecture and the impact of our modifications.
 > Note: Historically, architectures pre-trained on ImageNet are trained on 224x224 RGB input images.


In [None]:
# Create the final model and print the summary
model = resnet18.to(device)
summary(model, input_size=(3, 224, 224))  # Replace with your input size

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

Finally, we iterate through all the parameters of the model and print their `requires_grad` attribute (i.e., requires the computation of the gradient for backpropagation); `requires_grad` = `True` indicates that the training process will affect the specific layer of parameters. We can see here that all layers will be modified, as expected.


In [None]:
# Freeze the layers
for param in resnet18.parameters():
    print(param.requires_grad)

True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True


### Fine-tuning the entire `ResNet-18` with Pre-Trained Weights


To utilize the `ResNet-18` model with pre-trained weights, we first import the weights in our Python environment, and then we assign them to an instance of the `ResNet-18` architecture. `ResNet18_Weights.IMAGENET1K_V1` are the weights obtained after training the model on the 1st version of the IMAGENET1K dataset, which contains over a million images across 100 different classes.


In [None]:
from torchvision.models import ResNet18_Weights

# Load the ResNet-18 model
resnet18 = models.resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)  # This ensures we load the pretrained weights

# We need to know the number of features in the last layer of the pretrained model to adapt our classifier.
# These are equal to the number of inputs of the "fully connected" (fc) head of the ResNet-18.
num_ftrs = resnet18.fc.in_features
print(f"The number of features extracted by ResNet-18 are: {num_ftrs}")

# Check the number of output classes of the original ResNet-18
num_outputs = resnet18.fc.out_features
print(f"The original ResNet-18 predicts {num_outputs} classes")

# Modify the fully connected layer to match the number of classes for the land use identification case study
num_classes = 21
resnet18.fc = nn.Linear(num_ftrs, num_classes)
num_outputs = resnet18.fc.out_features
print(f"The modified ResNet-18 predicts {num_outputs} classes")

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 96.1MB/s]


The number of features extracted by ResNet-18 are: 512
The original ResNet-18 predicts 1000 classes
The modified ResNet-18 predicts 21 classes


The rest is unchanged.

In [None]:
# Create the final model and print the summary
model = resnet18.to(device)
summary(model, input_size=(3, 224, 224))  # Replace with your input size

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

In [None]:
# Freeze the layers
for param in resnet18.parameters():
    print(param.requires_grad)

True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True


### Freezing the feature extraction layers to only fine-tune the classifier

To freeze all layers of the ResNet model, we need to disable gradient computation for all parameters. This is done by setting `requires_grad` to `False` for all parameters. To ensure that we can still fine-tune the `resnet18.fc` fully connected classifier, we can either add it after freezing the layers, or set `requires_grad=True` afterwards only for this specific layer (as done below).

In [None]:
# Load the ResNet-18 model
resnet18 = models.resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)  # This ensures we load the pretrained weights

# We need to know the number of features in the last layer of the pretrained model to adapt our classifier.
# These are equal to the number of inputs of the "fully connected" (fc) head of the ResNet-18.
num_ftrs = resnet18.fc.in_features

# Modify the fully connected layer to match the number of classes for the land use identification case study
num_classes = 21
resnet18.fc = nn.Linear(num_ftrs, num_classes)
num_outputs = resnet18.fc.out_features
print(f"The modified ResNet-18 predicts {num_outputs} classes")

# Freeze all the layers
for param in resnet18.parameters():
    param.requires_grad = False

# Unfreeze classifier
for param in resnet18.fc.parameters():
    param.requires_grad = True

# Check that only the last two layers are not frozen (hidden and output layer of the classifier)
for param in resnet18.parameters():
  print(param.requires_grad)

The modified ResNet-18 predicts 21 classes
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
False
True
True


### Data Normalization to use Pre-trained Model

When fine-tuning the pre-trained architecture on your dataset, it is crucial to apply the same data normalization that was used during the initial training of the network. For models trained on the ImageNet dataset, which includes the standard `ResNet-18` used here, the common practice is to normalize the input data using the *mean* and *standard deviation* of the ImageNet dataset. These values are:

* Mean: [0.485, 0.456, 0.406] for the RGB channels, reespectively;
* Standard Deviation: [0.229, 0.224, 0.225] for the RGB channels, reespectively.


You can implement this normalization in `PyTorch` as part of the data preprocessing pipeline, using `transforms`:

```
# Define the transformation
transform = transforms.Compose([
    ...
    ...
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
```

Also, remember to make sure you input images are 224x224 pixels RGB images.

## Assignments

The instructions below describe the assignments for this notebook. Carrying them out will improve your understanding of the benefits of transfer learning and how to implement it practically. Work with your fellow classmates to speed-up implementation and facilitate discussion. Start from the 1st exercise and proceed sequentially.

You are tasked to:

1. **Implement the Training/Validation/Testing pipeline**:
  - You should be able to copy/paste and adapt what you have done in the previous notebooks, including data loading a dataset creation.
  - The "training" step is now carried out by fine-tuning a pre-trained `ResNet18`
  - Make sure your pipeline includes normalization/resizing to account for ImageNet pre-training (e.g., mean/std and 224x224 input images)

2. **Fine-Tuning ResNet-18 in both modalities**:
  - Fine-tuning all layers.
  - Fine-tuning the classifier (fully connected) layer alone.
  - Analyze and compare the accuracy, loss, and convergence time of each model.
  - Discuss the benefits and drawbacks of each strategy.

3. **Comparison with Basic CNN**:
  - Use the performance metrics of the basic CNN model you have developed previously as a baseline.
  - Compare this baseline with the performances of the fine-tuned ResNet-18 models.
  - Analyze which model performs better and under what circumstances.

4. **Investigating Hyper-parameters**:
  - Experiment with different learning rates and fine-tuning dataset sizes.
  - Observe and record how these changes affect the model's performance and training efficiency.


