<a href="https://colab.research.google.com/github/Naga-SDonepudi/PyTorch_HandsOn/blob/main/7_transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transfer Learning
* Leveraging a pre-trained model (with learned parameters on a dataset) to use it on another dataset (with some tuning).

In [1]:
import torch
import torchvision

In [2]:
!pip install -q torchinfo

In [3]:
from torch import nn
from torchvision import transforms
from torchinfo import summary

# Importing modular scripts from github
!git clone https://github.com/Naga-SDonepudi/PyTorch_HandsOn.git
!mv PyTorch_HandsOn/modular_scripts .
!rm -rf PyTorch_HandsOn
from modular_scripts import data_setup_script, engine_script


Cloning into 'PyTorch_HandsOn'...
remote: Enumerating objects: 156, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (18/18), done.[K
remote: Total 156 (delta 44), reused 46 (delta 37), pack-reused 101 (from 1)[K
Receiving objects: 100% (156/156), 29.16 MiB | 12.27 MiB/s, done.
Resolving deltas: 100% (75/75), done.


In [4]:
data_setup_script

<module 'modular_scripts.data_setup_script' from '/content/modular_scripts/data_setup_script.py'>

In [5]:
## Device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cpu'

### 1. Data

In [6]:
import os
import zipfile
from pathlib import Path
import requests

# Data Path
data_path = Path("data/")
image_path = data_path / "pizza_steak_sushi"
image_path.mkdir(parents=True, exist_ok=True)

with open(data_path / "pizza_steak_sushi.zip", "wb") as f:
  request = requests.get("https://github.com/Naga-SDonepudi/PyTorch_HandsOn/raw/refs/heads/main/DataSets%20&%20Images/pizza_steak_sushi.zip")
  print("Downlaoded pizaa, steak and sushi data from GitHub repository")
  f.write(request.content)

  with zipfile.ZipFile(data_path / "pizza_steak_sushi.zip", "r") as zip_ref:
    zip_ref.extractall(image_path)

  os.remove(data_path / "pizza_steak_sushi.zip")

Downlaoded pizaa, steak and sushi data from GitHub repository


In [7]:
## Doreactory paths
train_dir = image_path / "train"
test_dir = image_path / "test"

train_dir, test_dir

(PosixPath('data/pizza_steak_sushi/train'),
 PosixPath('data/pizza_steak_sushi/test'))

### 2. Datasets and Dataloaders


In [8]:
from modular_scripts import data_setup_script

In [9]:
## creating a transform (using auto creation)
from torchvision import models

In [10]:
dir(models)

['AlexNet',
 'AlexNet_Weights',
 'ConvNeXt',
 'ConvNeXt_Base_Weights',
 'ConvNeXt_Large_Weights',
 'ConvNeXt_Small_Weights',
 'ConvNeXt_Tiny_Weights',
 'DenseNet',
 'DenseNet121_Weights',
 'DenseNet161_Weights',
 'DenseNet169_Weights',
 'DenseNet201_Weights',
 'EfficientNet',
 'EfficientNet_B0_Weights',
 'EfficientNet_B1_Weights',
 'EfficientNet_B2_Weights',
 'EfficientNet_B3_Weights',
 'EfficientNet_B4_Weights',
 'EfficientNet_B5_Weights',
 'EfficientNet_B6_Weights',
 'EfficientNet_B7_Weights',
 'EfficientNet_V2_L_Weights',
 'EfficientNet_V2_M_Weights',
 'EfficientNet_V2_S_Weights',
 'GoogLeNet',
 'GoogLeNetOutputs',
 'GoogLeNet_Weights',
 'Inception3',
 'InceptionOutputs',
 'Inception_V3_Weights',
 'MNASNet',
 'MNASNet0_5_Weights',
 'MNASNet0_75_Weights',
 'MNASNet1_0_Weights',
 'MNASNet1_3_Weights',
 'MaxVit',
 'MaxVit_T_Weights',
 'MobileNetV2',
 'MobileNetV3',
 'MobileNet_V2_Weights',
 'MobileNet_V3_Large_Weights',
 'MobileNet_V3_Small_Weights',
 'RegNet',
 'RegNet_X_16GF_Weights'

### 2.1 Using auto-create to have a preprocessing line, to make our custom data is transformed in same way as the pretrained model

In [11]:
## Getting pretrained model weights (DEFAULT is equivalent to IMAGENET1K where the model was trained on)
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
weights

EfficientNet_B0_Weights.IMAGENET1K_V1

In [12]:
auto_transforms = weights.transforms()
auto_transforms

ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)

In [13]:
## Datalodaers
train_dataloader, test_dataloader, class_names = data_setup_script.create_dataloaders(train_dir = train_dir,
                                                                                      test_dir = test_dir,
                                                                                      transform = auto_transforms,
                                                                                      batch_size = 32)
train_dataloader, test_dataloader, class_names

(<torch.utils.data.dataloader.DataLoader at 0x7a2db6c55280>,
 <torch.utils.data.dataloader.DataLoader at 0x7a2db6e84e30>,
 ['pizza', 'steak', 'sushi'])

## 3. Loading a pretrained model


Things to consider during transfer learning ie while using a pretrained model
* speed, size and performance: how fast the model run, how big is it, and how good does it run on current data
* So chhosing a EffNetB0

In [14]:
## Creating a pretrained model and instantinating
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
model = torchvision.models.efficientnet_b0(weights=weights).to(device)
model

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth


100%|██████████| 20.5M/20.5M [00:00<00:00, 95.7MB/s]


EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [15]:
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

### 3.1 Summary of the model

In [16]:
summary(model=model,
        input_size=(1, 3, 224, 224), ## (batchize, color_channels, height, width)
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [1, 3, 224, 224]     [1, 1000]            --                   True
├─Sequential (features)                                      [1, 3, 224, 224]     [1, 1280, 7, 7]      --                   True
│    └─Conv2dNormActivation (0)                              [1, 3, 224, 224]     [1, 32, 112, 112]    --                   True
│    │    └─Conv2d (0)                                       [1, 3, 224, 224]     [1, 32, 112, 112]    864                  True
│    │    └─BatchNorm2d (1)                                  [1, 32, 112, 112]    [1, 32, 112, 112]    64                   True
│    │    └─SiLU (2)                                         [1, 32, 112, 112]    [1, 32, 112, 112]    --                   --
│    └─Sequential (1)                                        [1, 32, 112, 112]    [1, 16, 112,

* Here, all the layers are trainable but inorder to get the output shape to be equal to output class of our custom dataset, the layers should be freezed (ie trainable=False) except for the output classifier layer.
* Altering the trainable parameters of only classifier section.

### 3.3 Base model freezing and changing the output layer
* A feature extraction can be done on pretarined model with freezing the baseline model (using the same learned patterns and weights)
* Updating only the output layer (classifier)

In [17]:
for param in model.features.parameters():
  param.requires_grad = False   #To stop PyTorch from keep tracking gradients of params and it will lead to no update of params during the optimization step


In [18]:
summary(model=model,
        input_size=(1, 3, 224, 224), ## (batchize, color_channels, height, width)
        col_names=["input_size", "output_size", "num_params", "trainable"],
        col_width=20,
        row_settings=["var_names"])

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [1, 3, 224, 224]     [1, 1000]            --                   Partial
├─Sequential (features)                                      [1, 3, 224, 224]     [1, 1280, 7, 7]      --                   False
│    └─Conv2dNormActivation (0)                              [1, 3, 224, 224]     [1, 32, 112, 112]    --                   False
│    │    └─Conv2d (0)                                       [1, 3, 224, 224]     [1, 32, 112, 112]    (864)                False
│    │    └─BatchNorm2d (1)                                  [1, 32, 112, 112]    [1, 32, 112, 112]    (64)                 False
│    │    └─SiLU (2)                                         [1, 32, 112, 112]    [1, 32, 112, 112]    --                   --
│    └─Sequential (1)                                        [1, 32, 112, 112]    [1, 1

* Here, it clearly shows that all the feature layers were freezed.
* Count of tarinable params changed (from 5.2M to 1.28M)
* Only the output classfier layer params can be updated

In [19]:
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

In [20]:
## Updating the classifier layer to make it suit the current problem\
from torch import nn
torch.manual_seed(42)
torch.cuda.manual_seed(42)

model.classifier = nn.Sequential(
    nn.Dropout(p=0.2, inplace=True),
    nn.Linear(in_features=1280,
              out_features=len(class_names)) # Make suring it suits the number of classes on our custom data set ie 3
).to(device)

model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=3, bias=True)
)