# Transfer Learning

Transfer learning is the technique of finding a model out there similar to our problem and fine tune it for our data.

Let's see if transfer learning performs better than the models we've build earlier for Food101.

In [62]:
# Device agnostic code
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

## Setup

We've written some modules in *modular* directory. Let's leverage them.

In [63]:
# For this notebook to run with updated APIs, we need torch 1.12+ and torchvision 0.13+
try:
    import torch
    import torchvision
    assert int(torch.__version__.split(".")[1]) >= 12, "torch version should be 1.12+"
    assert int(torchvision.__version__.split(".")[1]) >= 13, "torchvision version should be 0.13+"
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")
except:
    print(f"[INFO] torch/torchvision versions not as required, installing nightly versions.")
    !pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
    import torch
    import torchvision
    print(f"torch version: {torch.__version__}")
    print(f"torchvision version: {torchvision.__version__}")

[INFO] torch/torchvision versions not as required, installing nightly versions.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://download.pytorch.org/whl/cu113
torch version: 2.0.1+cu118
torchvision version: 0.15.2+cu118


In [64]:
# Continue with regular imports
import matplotlib.pyplot as plt
import torch
import torchvision

from torch import nn
from torchvision import transforms

# Try to get torchinfo, install it if it doesn't work
try:
    from torchinfo import summary
except:
    print("[INFO] Couldn't find torchinfo... installing it.")
    !pip install -q torchinfo
    from torchinfo import summary

# Try to import the going_modular directory, download it from GitHub if it doesn't work
try:
    from modular.src.data import data_setup
    from modular.src.data import get_data
except:
    # Get the going_modular scripts
    print("[INFO] Couldn't find going_modular scripts... downloading them from GitHub.")
    !git clone https://github.com/JpChii/pytorch.git
    !mv pytorch/modular .
    !rm -rf pytorch
    from modular.src.data import data_setup
    from modular.src.data import get_data

## Get data

Let's download the data before we start transfer learning

In [65]:
images_dir = get_data.get_data(
    request_url="https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip",
    data_path="data/"
)

Image download directory: data/pizza_steak_sushi
Zip path: data/pizza_steak_sushi.zip
data directory exists
data/pizza_steak_sushi directory exists
Downloading data fromhttps://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip...
Unzip data
data/pizza_steak_sushi.zip cleand after unzip


In [66]:
# Setup train and test directories
from pathlib import Path
train_dir = Path(f"data/{images_dir}/train")
test_dir = Path(f"data/{images_dir}/test")
train_dir, test_dir

(PosixPath('data/pizza_steak_sushi/train'),
 PosixPath('data/pizza_steak_sushi/test'))

## Creating Datasets and DataLoaders

We'll use our modular imports to create datasets and dataloaders.

> Note: As of torchvision v0.13+ there's an update on how transforms can be create using `torchvision.models`. The previous method is manual and current method is auto.

While using pretrained models, we've to make sure the custom data is in the same form as the original data used to train the pretrained model.

Prior to torchvision v0.13+, to create a transform for pretrained model in `torchvision.models`.
The documentation stated below:

> All pre-trained model expect input images normalized in the same way. i.e mini-batches of 3-channel RGB images of shape (3 * H * W) where H and W expected to be at least 224.

> The images have to be loaded in to a range if [0, 1] and then normalized using mean=[0485, 0.456, 0.406] and std=[0.229, 0.224, 0..225].

### Manual transform

In [67]:
# Creating manual transform
manual_transforms = transforms.Compose([
    transforms.Resize(size=(224, 224)), # Resize images to 224, 224
    transforms.ToTensor(), # Convert to tensor and between [0, 1]
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    )
])
manual_transforms

Compose(
    Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=warn)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)

In [68]:
# Let's create the dataloder
BATCH_SIZE=32
train_dataloader, test_dataloader, class_names = data_setup.create_dataloders(
    train_dir=train_dir,
    test_dir=test_dir,
    transform=manual_transforms,
    batch_size=BATCH_SIZE
)
train_dataloader, test_dataloader, class_names

(<torch.utils.data.dataloader.DataLoader at 0x7fa258bdf4f0>,
 <torch.utils.data.dataloader.DataLoader at 0x7fa258bde320>,
 ['pizza', 'steak', 'sushi'])

### Auto transform

In [69]:
# Let's load the weights
# Assume we want to use EfficientNet_B0 and it's best version(DEFAULT)
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT

In [70]:
# We can get the transform from weights
auto_transforms = weights.transforms()
auto_transforms

ImageClassification(
    crop_size=[224]
    resize_size=[256]
    mean=[0.485, 0.456, 0.406]
    std=[0.229, 0.224, 0.225]
    interpolation=InterpolationMode.BICUBIC
)

Notice how `auto_transforms` is similar to `manual_transforms`. The only difference is that `auto_transforms` came with the model architecture we chose and `manual_tansforms` is create by hand.

With `auto_transforms` we can ensure that our data goes through same transformation as the original data pretrained model use but it lacks customization.

In [71]:
# Let's create the dataloaders with atuo transforms
train_dataloader, test_dataloader, class_names = data_setup.create_dataloders(
    train_dir=train_dir,
    test_dir=test_dir,
    transform=auto_transforms,
    batch_size=BATCH_SIZE
)
train_dataloader, test_dataloader, class_names

(<torch.utils.data.dataloader.DataLoader at 0x7fa258bdcb80>,
 <torch.utils.data.dataloader.DataLoader at 0x7fa258bdf9a0>,
 ['pizza', 'steak', 'sushi'])

## Getting a pretrained model

There are lot's of versions for a single pretrained model. The suffix number start from smallest to largest. Compute and performance improves from smallest to largest.

The model selection depends on **perforamance vs speed vs size**.

For this problem we'll use EffieicentNet_B0.

In [72]:
weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT
model = torchvision.models.efficientnet_b0(weights=weights).to(device)

`efficientnet_b0` comes in three parts:

1. `features`: A colection of convolutional layers and other various activation layers to learn a base reperesentation of vision data(this base reperesentaiton/collection of layers is often referred to as features or feature extractor)
2. `avgpool`: Take the average of the output of the `features` layers(s) and turns it into a **feature vector**.
3. `classifier`: Turns the feature vector into a vector with same dimensionality as the number of required output classes.

### Model summary

In [73]:
summary(
    model=model,
    input_size=(32, 3, 224, 224),
    col_names=["input_size", "output_size", "num_params", "trainable"],
    col_width=20,
    row_settings=["var_names"]
)

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 1000]           --                   True
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   True
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   True
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   864                  True
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   64                   True
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 16, 112

That's a big model and has lots of parameters(pretrained weights) to recogonize different patterns in our data.

TinyVGG 8,083 --> EffNetB0 5,288,548 _ 65x.

Will this have better performance?

### Freezing the base model and changing the output layer to suit our needs

The process of transfer learning usually goes: freeze some base layers of a pretrained model(typically the `features` section) and then adjust the output layers (also called heaad/classifier layers) to suit the problems needs.

In our case:
*The original `torchvision.models.efficientnet_b0()` comes outwith `out_features=1000` because there are 1000 classes in ImageNet, the dataset it was trained on. For our current problem. We have only three classes.

***Freeze*** -- Retain weights of feature layers and use the patterns it learned from original dataset(ImageNet in efficientnet_b0) and use them as backbone for our problem.

Let's freeze the features and customize the classifier.

#### Freeze `features` section

PyTorch tracks gradients only when their `requires_grad=True` is set. To Freeze let's set `requires_grad=False`

In [74]:
# Freeze all layers
for param in model.parameters():
  param.requires_grad=False

In [75]:
summary(
    model=model,
    input_size=(32, 3, 224, 224),
    col_names=["input_size", "output_size", "num_params", "trainable"],
    col_width=20,
    row_settings=["var_names"]
)

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 1000]           --                   False
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 16

See all layers trainable field has turned to `False` in model summary.

#### Modifying classifier

In [76]:
# Current classifier
model.classifier

Sequential(
  (0): Dropout(p=0.2, inplace=True)
  (1): Linear(in_features=1280, out_features=1000, bias=True)
)

* Let's retain the regularization to avoid overfitting model. 

* Droput randomnly drops connections between neural networks forcing it to learn all paths.

* In general we're training multiple models with dropouts.

* `in_features` will reamin `1280` as we get it from `feature` and change `out_features` to `3`.

In [77]:
torch.manual_seed(42)
torch.cuda.manual_seed(42)

# Get the length of class names (one output for each item)
output_shape = len(class_names)

# Recreate classifier layer and send it to the target device
model.classifier = nn.Sequential(
  nn.Dropout(
    p=0.2, 
    inplace=True
  ),
  nn.Linear(
    in_features=1280, 
    out_features=3, 
    bias=True
  ),
).to(device)

In [78]:
# Let's check the model summary
summary(
    model=model,
    input_size=(32, 3, 224, 224),
    col_names=["input_size", "output_size", "num_params", "trainable"],
    col_width=20,
    row_settings=["var_names"]
)

Layer (type (var_name))                                      Input Shape          Output Shape         Param #              Trainable
EfficientNet (EfficientNet)                                  [32, 3, 224, 224]    [32, 3]              --                   Partial
├─Sequential (features)                                      [32, 3, 224, 224]    [32, 1280, 7, 7]     --                   False
│    └─Conv2dNormActivation (0)                              [32, 3, 224, 224]    [32, 32, 112, 112]   --                   False
│    │    └─Conv2d (0)                                       [32, 3, 224, 224]    [32, 32, 112, 112]   (864)                False
│    │    └─BatchNorm2d (1)                                  [32, 32, 112, 112]   [32, 32, 112, 112]   (64)                 False
│    │    └─SiLU (2)                                         [32, 32, 112, 112]   [32, 32, 112, 112]   --                   --
│    └─Sequential (1)                                        [32, 32, 112, 112]   [32, 

Inference from summary:
* Classifier is Trainable and other layers are not 
* Output shape of classifier (32, 1000) --> (32, 3)
* Less trainable parameters
* Lesser params lesser compute🔥

## Train model

In [79]:
# loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [80]:
!pip install timer

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
