<h1>Transfer Learning</h1>

<h3 style="color: yellow;">This tutorial covers transfer learning and how it can be applied in PyTorch.</h3>

<h3 style="color: yellow;">Transfer learning is an ML approach where a model developed for one task is reused as a starting point for another model designed for a different task.</h3>

<h3 style="color: yellow;">For instance, if we have a model built for the classification of n classes of vehicles, we can modify this model by adding a layer on top to construct a new model for a two-class classification (such as Car/Truck).</h3>

<div style="display: flex; justify-content: center;">
    <img src='translearn.png', width =600>
</div>

<h3 style="color: yellow;">In general, transfer learning is a popular method in ML. It facilitates the swift creation of new models, thereby conserving time and minimizing trainable parameters for more extensive tasks.</h3>



<h1>Resnet-18 finetuning on Hymenoptera dataset</h1>
<h3 style="color: yellow;">This tutorial utilizes the Hymenoptera (insect) dataset, which is a small subset of ImageNet.</h3>

<h3 style="color: yellow;">Our objective is to fine-tune Resnet-18 for binary classification. Specifically, we are adapting a deep learning model for the classification of ants and bees.</h3>

<h3 style="color: yellow;">The dataset consists of 244 training images for both ants and bees, and 153 validation images for each class.</h3>

<h3 style="color: yellow;">Given its size, the dataset is quite limited for training a model from scratch.</h3>

<h3 style="color: yellow;">However, by employing transfer learning, we aim to achieve reasonable generalization.</h3>

In [12]:
# Importing libs
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.nn.functional as F
import torchvision
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
import numpy as np
import matplotlib.pyplot as plt
import os, time, copy

In [13]:
# Define GPU
device= torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [14]:
# Resnet -18 have been trained on ImageNet dataset which has 1000 classes.
# The following are dataset statistics for each channel
mean=np.array([0.485,0.456,0.406])
std=np.array([0.229,0.224,0.225])

In [15]:
# Define the data transforms
data_tansforms={'train':transforms.Compose([transforms.RandomResizedCrop(224),
                                            transforms.RandomHorizontalFlip(),
                                            transforms.ToTensor(),
                                            transforms.Normalize(mean,std)]),
                'val':transforms.Compose([transforms.Resize(256),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize(mean,std)])}

In [16]:
# Define data directory
data_dir='/home/mohanad/learn/Pytorch/14- Transfer Learning/data/hymenoptera_data'

In [17]:
# Define the datasets, we use Imagefolder because the data is organized in folders locally
image_datasets={x:datasets.ImageFolder(os.path.join(data_dir,x),data_tansforms[x]) for x in ['train','val']}


In [23]:
# Define the dataloaders
dataloaders={x:DataLoader(image_datasets[x],batch_size=4,shuffle=True,num_workers=4) for x in ['train','val']}

iter_=iter(dataloaders['train'])
images, labels=next(iter_)
print(f'The images shape is: {images.shape}')
print(f'The labels shape is: {labels.shape}')
print('')

The images shape is: torch.Size([4, 3, 224, 224])
The labels shape is: torch.Size([4])

dd


In [9]:
# Print the size of the datasets
dataset_sizes={x:len(image_datasets[x]) for x in ['train','val']}
print(f'The training set size is: {dataset_sizes["train"]}')
print(f'The testing set size is: {dataset_sizes["val"]}')

The training set size is: 244
The testing set size is: 153


In [9]:
 # This is a dataset object contains the path for each sample
train_dataset=image_datasets['train']
print(f'The path for teh first sample is: {train_dataset.samples[0][0]}')
print(f'The classes names are: {train_dataset.classes}')

The path for teh first sample is: /home/mohanad/learn/Pytorch/14- Transfer Learning/data/hymenoptera_data/train/ants/0013035.jpg
The classes names are: ['ants', 'bees']


In [10]:
def train_model(model,loss,optimizer,scheduler,num_epochs=25):
    start=time.time()
    best_acc=0.0
    # best model weights
    best_model_wts=copy.deepcopy(model.state_dict())
    
    for epoch in range(num_epochs):
        print(f'Epoch:{epoch+1}/{num_epochs}')
        print('-'*10)
        # We have a training and validation phase at each epoch
        for phase in ['train','val']:
            if phase=='train':
                model.train()
            else:
                model.eval()
            
            running_loss=0.0
            running_corrects=0
            
            # Iterate over data
            for images, labels in dataloaders[phase]:
                images=images.to(device)
                labels=labels.to(device)
            # tracking history only if in training phase
                with torch.set_grad_enabled(phase=='train'):
                    outputs=model(images)
                    _,prediction=torch.max(outputs,1)
                    loss_=loss(outputs,labels)
            # Backward + optimize only if in training phase
                    if phase=='train':
                        optimizer.zero_grad()
                        loss_.backward()
                        optimizer.step()
                        
                    # Statistics
                    running_loss+=loss_.item()*images.size(0)
                    running_corrects+=torch.sum(prediction==labels.data)
            
            # scheduler
            if phase=='train':
                scheduler.step()
            
            epoch_loss=running_loss/dataset_sizes[phase]
            epoch_acc=running_corrects.double()/dataset_sizes[phase]
            
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
            # Saving the model with best accuracy on validation
            
            if phase=='val' and epoch_acc>best_acc:
                best_acc=epoch_acc
                best_model_wts=copy.deepcopy(model.state_dict())
        print('')
    time_elapsed=time.time()-start
    print(f'Training complete in {time_elapsed//60:.0f}m :{time_elapsed%60:.0f}s')
    print(f'Best val Acc: {best_acc:.4f}')
    model.load_state_dict(best_model_wts)
    return model

In [11]:
# Let us use transfer learning
# Setting the model
model=models.resnet18(pretrained=True)



In [12]:
# Print all model parameters

for idx,(name,param) in enumerate(model.named_parameters()):
    print(f"Index: {idx}, Layer Name: {name}, Shape: {param.shape}")

Index: 0, Layer Name: conv1.weight, Shape: torch.Size([64, 3, 7, 7])
Index: 1, Layer Name: bn1.weight, Shape: torch.Size([64])
Index: 2, Layer Name: bn1.bias, Shape: torch.Size([64])
Index: 3, Layer Name: layer1.0.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 4, Layer Name: layer1.0.bn1.weight, Shape: torch.Size([64])
Index: 5, Layer Name: layer1.0.bn1.bias, Shape: torch.Size([64])
Index: 6, Layer Name: layer1.0.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 7, Layer Name: layer1.0.bn2.weight, Shape: torch.Size([64])
Index: 8, Layer Name: layer1.0.bn2.bias, Shape: torch.Size([64])
Index: 9, Layer Name: layer1.1.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 10, Layer Name: layer1.1.bn1.weight, Shape: torch.Size([64])
Index: 11, Layer Name: layer1.1.bn1.bias, Shape: torch.Size([64])
Index: 12, Layer Name: layer1.1.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 13, Layer Name: layer1.1.bn2.weight, Shape: torch.Size([64])
Index: 14, Layer Name: layer1.1.bn

In [14]:
# Iterate over the named model paarameters to print the layers that have weights
for name,module in model.named_modules():
    if hasattr(module,'weight') and hasattr(module.weight, 'shape'):
        print(name)

conv1
bn1
layer1.0.conv1
layer1.0.bn1
layer1.0.conv2
layer1.0.bn2
layer1.1.conv1
layer1.1.bn1
layer1.1.conv2
layer1.1.bn2
layer2.0.conv1
layer2.0.bn1
layer2.0.conv2
layer2.0.bn2
layer2.0.downsample.0
layer2.0.downsample.1
layer2.1.conv1
layer2.1.bn1
layer2.1.conv2
layer2.1.bn2
layer3.0.conv1
layer3.0.bn1
layer3.0.conv2
layer3.0.bn2
layer3.0.downsample.0
layer3.0.downsample.1
layer3.1.conv1
layer3.1.bn1
layer3.1.conv2
layer3.1.bn2
layer4.0.conv1
layer4.0.bn1
layer4.0.conv2
layer4.0.bn2
layer4.0.downsample.0
layer4.0.downsample.1
layer4.1.conv1
layer4.1.bn1
layer4.1.conv2
layer4.1.bn2
fc


In [15]:
# Print only the layers that perform convolution operations
for idx, (name, param) in enumerate(model.named_parameters()):
    if 'conv' in name:
        print(f"Index: {idx}, Layer Name: {name}, Shape: {param.shape}")

Index: 0, Layer Name: conv1.weight, Shape: torch.Size([64, 3, 7, 7])
Index: 3, Layer Name: layer1.0.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 6, Layer Name: layer1.0.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 9, Layer Name: layer1.1.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 12, Layer Name: layer1.1.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 15, Layer Name: layer2.0.conv1.weight, Shape: torch.Size([128, 64, 3, 3])
Index: 18, Layer Name: layer2.0.conv2.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 24, Layer Name: layer2.1.conv1.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 27, Layer Name: layer2.1.conv2.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 30, Layer Name: layer3.0.conv1.weight, Shape: torch.Size([256, 128, 3, 3])
Index: 33, Layer Name: layer3.0.conv2.weight, Shape: torch.Size([256, 256, 3, 3])
Index: 39, Layer Name: layer3.1.conv1.weight, Shape: torch.Size([256, 256, 3, 3])
Index: 42, Layer Name: layer3.1.conv2.wei

In [16]:
# Back to  continue with transfere lerning
# get the number of input feature from the last layer
num_features=model.fc.in_features
print(f'The number of input features is: {num_features}')

The number of input features is: 512


In [17]:
# Create a new layer with 2 output features in the top of the fc layer
model.fc=nn.Linear(num_features,2)  # the will ovverride the old FC layer and create a new one
model.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [21]:
import torchsummary
from torchsummary import summary
from torchsummary import summary

summary(model, input_size=images[0].shape)  # images[0] is the first image in the batch


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

In [22]:
EPOCHS=100
# Define the loss function
loss=nn.CrossEntropyLoss()
# Define th eOptimzer
optimizer=optim.SGD(model.parameters(),lr=0.0001,momentum=0.9)
# Define the scheduler for LR Update
step_lr_scheduler=lr_scheduler.StepLR(optimizer,step_size=10,gamma=0.1)
# Calling the training function
model=train_model(model,loss,optimizer,step_lr_scheduler,num_epochs=EPOCHS)

# The method above is called fine-tuning of all model parameters, where we adjust the entire model's parameters using a lower learning rate (LR).

# Another option is to freeze the parameters of the entire model and fine-tune only the last layer.



Epoch:1/100
----------


train Loss: 0.7676 Acc: 0.6230
val Loss: 0.2142 Acc: 0.9020

Epoch:2/100
----------
train Loss: 0.5112 Acc: 0.7951
val Loss: 0.3833 Acc: 0.8497

Epoch:3/100
----------
train Loss: 0.4796 Acc: 0.7828
val Loss: 0.3724 Acc: 0.8627

Epoch:4/100
----------
train Loss: 0.4580 Acc: 0.8402
val Loss: 0.4522 Acc: 0.8105

Epoch:5/100
----------
train Loss: 0.5138 Acc: 0.7951
val Loss: 0.5013 Acc: 0.8105

Epoch:6/100
----------
train Loss: 0.4753 Acc: 0.8320
val Loss: 0.2687 Acc: 0.9020

Epoch:7/100
----------
train Loss: 0.5137 Acc: 0.8074
val Loss: 0.5365 Acc: 0.8301

Epoch:8/100
----------
train Loss: 0.5418 Acc: 0.7992
val Loss: 0.2300 Acc: 0.9020

Epoch:9/100
----------
train Loss: 0.3707 Acc: 0.8361
val Loss: 0.7169 Acc: 0.7712

Epoch:10/100
----------
train Loss: 0.4914 Acc: 0.8320
val Loss: 0.2983 Acc: 0.8758

Epoch:11/100
----------
train Loss: 0.3601 Acc: 0.8320
val Loss: 0.2763 Acc: 0.9085

Epoch:12/100
----------
train Loss: 0.3416 Acc: 0.8361
val Loss: 0.2701 Acc: 0.8954

Epoch:13/100

<h2> The above implementation fine-tunes the model by retraining all the model weights with a small LR. </h2>

<h2>In what follows, we present an alternative approach: freezing all model weights except for the last layer, which we then fine-tune and retrain. </h2>

In [26]:
model=models.resnet18(pretrained=True) # The optimized weight trained in ImageNet
model.to(device)
import torchsummary
from torchsummary import summary

summary(model, input_size=images[0].shape)


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

In [27]:
# Freeze all the layers in the network as a second option of the model pretraining
for param in model.parameters():
    param.requires_grad=False

In [31]:
# We can keep unfreeze the last two  layers
# The parameters of the layers are typically organized as: (weight, bias), hence we'll need to "unfreeze" 4 parameters for 2 layers
#parameters = list(model.parameters())
#parameters[-4].requires_grad = True  # Unfreeze weights of the second last layer
#parameters[-3].requires_grad = True  # Unfreeze bias of the second last layer
#parameters[-2].requires_grad = True  # Unfreeze weights of the last layer
#parameters[-1].requires_grad = True  # Unfreeze bias of the last layer
    

In [32]:
# Unfreeze the last convolutional layer of the last convolutional block
#model.layer4[1].conv2.weight.requires_grad = True # Note: Not all models have the same architecture or have a bias layer

In [33]:
# Unfreeze the last convolutional layer of each convolutional block
model.layer1[1].conv2.weight.requires_grad = True
model.layer2[1].conv2.weight.requires_grad = True
model.layer3[1].conv2.weight.requires_grad = True
model.layer4[1].conv2.weight.requires_grad = True

In [35]:

# print only the layers that perform convolution and fc operations
index = 0
for name, param in model.named_parameters():
    if "conv" in name or 'fc' in name:
        print(f"Index: {index}, Layer Name: {name}, Shape: {param.shape}")
        index += 1
print('')        


Index: 0, Layer Name: conv1.weight, Shape: torch.Size([64, 3, 7, 7])
Index: 1, Layer Name: layer1.0.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 2, Layer Name: layer1.0.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 3, Layer Name: layer1.1.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 4, Layer Name: layer1.1.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 5, Layer Name: layer2.0.conv1.weight, Shape: torch.Size([128, 64, 3, 3])
Index: 6, Layer Name: layer2.0.conv2.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 7, Layer Name: layer2.1.conv1.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 8, Layer Name: layer2.1.conv2.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 9, Layer Name: layer3.0.conv1.weight, Shape: torch.Size([256, 128, 3, 3])
Index: 10, Layer Name: layer3.0.conv2.weight, Shape: torch.Size([256, 256, 3, 3])
Index: 11, Layer Name: layer3.1.conv1.weight, Shape: torch.Size([256, 256, 3, 3])
Index: 12, Layer Name: layer3.1.conv2.weight, S

In [36]:

# get the number of input feature from the last layer
num_features=model.fc.in_features # fc is the last layer in the model
print(f'The number of input features is: {num_features}')


# Create a new layer and assign it to teh last layer
model.fc=nn.Linear(num_features,2) # we inserted a new fully connected layer with a dimensionality of 2 is the number of classes

model.to(device)


# Define the loss function
loss=nn.CrossEntropyLoss()

# Define the optimizer
optimizer=optim.SGD(model.parameters(),lr=0.001)

NUM_EPOCHS=100

# scheduler for LR updates
step_lr_sceduler=optim.lr_scheduler.StepLR(optimizer,step_size=10,gamma=0.1) # the LR will be reduced by a factor of 10 percent every 10 epochs

# Calling training function
model=train_model(model,loss,optimizer,step_lr_sceduler,num_epochs=EPOCHS)

The number of input features is: 512
Epoch:1/100
----------
train Loss: 0.6250 Acc: 0.6230
val Loss: 0.5135 Acc: 0.7778

Epoch:2/100
----------
train Loss: 0.5738 Acc: 0.7254
val Loss: 0.4151 Acc: 0.8562

Epoch:3/100
----------
train Loss: 0.4918 Acc: 0.7951
val Loss: 0.3711 Acc: 0.8693

Epoch:4/100
----------
train Loss: 0.4687 Acc: 0.8197
val Loss: 0.3225 Acc: 0.8954

Epoch:5/100
----------
train Loss: 0.4221 Acc: 0.8279
val Loss: 0.2918 Acc: 0.9150

Epoch:6/100
----------
train Loss: 0.4131 Acc: 0.8074
val Loss: 0.2707 Acc: 0.9150

Epoch:7/100
----------
train Loss: 0.4281 Acc: 0.7910
val Loss: 0.2629 Acc: 0.9281

Epoch:8/100
----------
train Loss: 0.4111 Acc: 0.8197
val Loss: 0.2499 Acc: 0.9085

Epoch:9/100
----------
train Loss: 0.4180 Acc: 0.7910
val Loss: 0.2374 Acc: 0.9216

Epoch:10/100
----------
train Loss: 0.3496 Acc: 0.8484
val Loss: 0.2409 Acc: 0.9281

Epoch:11/100
----------
train Loss: 0.3670 Acc: 0.8279
val Loss: 0.2352 Acc: 0.9281

Epoch:12/100
----------
train Loss: 0

<h1>Fine-tuning the CIFAR-10 datase</h1>

In [8]:
# Consider a new project
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
import torchvision
from torchvision import models
import torchvision.datasets as datasets
import torchvision.transforms as transforms

In [9]:
# Define the GPU
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [10]:
# Hyperparameters
N_CHANNELS=3
N_CLASSES=10
LR=1e-4
BATCH_SIZE=8
EPOCHS=20

In [11]:
# Load dataset
train_dataset=datasets.CIFAR10(root='dataset/',train=True,transform=transforms.ToTensor(),download=True)
train_loader=DataLoader(dataset=train_dataset,batch_size=BATCH_SIZE,shuffle=True)

Files already downloaded and verified


In [12]:
# Load pretrained VGG model
model=models.vgg16(pretrained=True)
model.to(device)
print(model)



VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [13]:
# It has been observed that the VGG model utilizes pairs of max-pooling layers, reducing the input image size by half.
# The final layer before the classifier block is an average pooling layer.
# The avgpooling layer will be modified to have a 1x1 kernel size to accommodate the input image size of CIFAR10.
# The last layer of the classifier must be adjusted to match the number of CIFAR10 classes, instead of the 1000 classes in ImageNet.


In [14]:
# Suppose that we want to remove the avgpooling
class Identity(nn.Module):  # we gonna replace teh average pooling layer with an identity layer 
    def __init__(self):
        super(Identity,self).__init__()
    def forward(self,x):
        return x
model=models.vgg16(pretrained=True)
model.avgpool=Identity()
# If there are multiple average pooling layers in the model, we can selectively replace them using indexed assignment.
# For example, to replace the first avgpool layer, use: model.avgpool[0] = _Identity() 
# Similarly, for the second one, use: model.avgpool[1] = _Identity(), and so on.
model.classifier=nn.Linear(512,10) # if we select model.classifier[0], then we select only the first layer in the classifier module
model.to(device)
print(model)

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [15]:
# To replace all layers in the classifier block with Identity layers, iterate through them and assign an Identity layer to each.
'''
for i in range(1,6):
    model.classifier[i]=Identity()
'''

# Alternatively, we can extend the classifier block by adding a Sequential layer with desired sub-layers.
'''
model.classifier=nn.Sequential(nn.Linear(512,100),
                               nn.Dropout(p=0.8)
                               nn.Linear(100,10))
'''

# To keep the classifier block simple, we can replace it with just one layer as shown in the above cell.


'\nmodel.classifier=nn.Sequential(nn.Linear(512,100),\n                               nn.Dropout(p=0.8)\n                               nn.Linear(100,10))\n'

In [16]:
# Loss and optimizer
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)


In [17]:
# Training loop
for epoch in range(EPOCHS):
    losses=[]
    for idx,(images,labels) in enumerate(train_loader):
        
        # Move data in the GPU
        images=images.to(device)
        labels=labels.to(device)
        
        # Forward
        outputs=model(images)
        loss_=loss(outputs,labels)
        losses.append(loss_.item())
        
        # Backward
        optimizer.zero_grad()
        loss_.backward()

        # Gradient descent or Adam step to update the parameters
        optimizer.step()
    print(f'The average loss at epoch {epoch+1} is: {sum(losses)/len(losses):.5f}')        

The average loss at epoch 1 is: 0.66970
The average loss at epoch 2 is: 0.36358
The average loss at epoch 3 is: 0.23830
The average loss at epoch 4 is: 0.16289
The average loss at epoch 5 is: 0.12233
The average loss at epoch 6 is: 0.09975
The average loss at epoch 7 is: 0.08491
The average loss at epoch 8 is: 0.07559
The average loss at epoch 9 is: 0.07048
The average loss at epoch 10 is: 0.06368
The average loss at epoch 11 is: 0.06177
The average loss at epoch 12 is: 0.05712
The average loss at epoch 13 is: 0.05761
The average loss at epoch 14 is: 0.06038
The average loss at epoch 15 is: 0.05587
The average loss at epoch 16 is: 0.05105
The average loss at epoch 17 is: 0.05343
The average loss at epoch 18 is: 0.05100
The average loss at epoch 19 is: 0.06446
The average loss at epoch 20 is: 0.04535


In [18]:
def check_accuracy(loader, model):
    if loader.dataset.train:
        print("Checking accuracy on training data")
    else:
        print("Checking accuracy on test data")

    num_correct = 0
    num_samples = 0
    model.eval()

    with torch.no_grad():
        for images, labels in loader:
            images = images.to(device)
            labels = labels.to(device)

            outputs = model(images)
            _, predictions = outputs.max(1)
            num_correct += (predictions == labels).sum()
            num_samples += predictions.size(0)

        print(f"Got {num_correct} / {num_samples} with accuracy {float(num_correct)/float(num_samples)*100:.2f}"
        )

    model.train()


check_accuracy(train_loader, model)

Checking accuracy on training data


Got 49663 / 50000 with accuracy 99.33


<h2> Finetuning based on freezing is faster</h2>


In [95]:
# Suppose we want to remove the avgpooling
class Identity(nn.Module):  # we gonna replace teh average pooling layer with an identity layer 
    def __init__(self):
        super(Identity,self).__init__()
    def forward(self,x):
        return x
model=models.vgg16(pretrained=True)
for param in model.parameters():
    param.requires_grad=False


model.avgpool=Identity()
model.classifier=nn.Sequential(nn.Linear(512,100),
                              nn.ReLU(),
                            nn.Linear(100,10))
model.to(device)
print(model)



VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [99]:
# Loss and optimizer
loss=nn.CrossEntropyLoss()
optimizer=optim.Adam(model.parameters(),lr=LR)


In [100]:
# Training loop
for epoch in range(EPOCHS):
    losses=[]
    for idx,(images,labels) in enumerate(train_loader):
        
        # Move ddata in the GPU
        images=images.to(device)
        labels=labels.to(device)
        
        # Forward
        outputs=model(images)
        loss_=loss(outputs,labels)
        losses.append(loss_.item())
        
        # Backward
        optimizer.zero_grad()
        loss_.backward()

        # Gradient descent or adam step to update the parameters
        optimizer.step()
    print(f'The average loss at epoch {epoch+1} is: {sum(losses)/len(losses):.5f}')        

The average loss at epoch 1 is: 1.38610
The average loss at epoch 2 is: 1.22174
The average loss at epoch 3 is: 1.18008
The average loss at epoch 4 is: 1.15680
The average loss at epoch 5 is: 1.13718
The average loss at epoch 6 is: 1.12068
The average loss at epoch 7 is: 1.10719
The average loss at epoch 8 is: 1.09706
The average loss at epoch 9 is: 1.09167
The average loss at epoch 10 is: 1.07837
The average loss at epoch 11 is: 1.07156
The average loss at epoch 12 is: 1.06552
The average loss at epoch 13 is: 1.05667
The average loss at epoch 14 is: 1.04887
The average loss at epoch 15 is: 1.04535
The average loss at epoch 16 is: 1.03861
The average loss at epoch 17 is: 1.03005
The average loss at epoch 18 is: 1.02920
The average loss at epoch 19 is: 1.02104
The average loss at epoch 20 is: 1.01634
