<h1>Transfer Learning</h1>

<h3 style="color: yellow;">This tutorial covers transfer learning and how it can be applied in PyTorch.</h3>

<h3 style="color: yellow;">Transfer learning is an ML method where a model developed for one task is reused as a starting point for another model designed for a different task.</h3>

<h3 style="color: yellow;">For instance, if we have a model built for the classification of n classes of vehicles, we can modify this model by adding a layer on top to construct a new model for a two-class classification (such as Car/Truck).</h3>

<div style="display: flex; justify-content: center;">
    <img src='translearn.png', width =600>
</div>

<h3 style="color: yellow;">In general, transfer learning is a popular method in ML. It facilitates the swift creation of new models, thereby conserving time and minimizing trainable parameters for more extensive tasks.</h3>



<h1>Resnet-18 finetuning on Hymenoptera dataset</h1>
<h3 style="color: yellow;">This tutorial utilizes the Hymenoptera (insect) dataset, which is a small subset of ImageNet.</h3>

<h3 style="color: yellow;">Our objective is to fine-tune Resnet-18 for binary classification. Specifically, we are adapting a deep learning model for the classification of ants and bees.</h3>

<h3 style="color: yellow;">The dataset consists of 244 training images for both ants and bees, and 153 validation images for each class.</h3>

<h3 style="color: yellow;">Given its size, the dataset is quite limited for training a model from scratch.</h3>

<h3 style="color: yellow;">However, by employing transfer learning, we aim to achieve reasonable generalization.</h3>

In [1]:
# Importing libs
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.nn.functional as F
import torchvision
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
import numpy as np
import matplotlib.pyplot as plt
import os, time, copy

  warn(


In [2]:
# Define GPU
device= torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [3]:
# Resnet -18 have been trained on ImageNet dataset which has 1000 classes.
# The following are dataset statistics for each channel
mean=np.array([0.485,0.456,0.406])
std=np.array([0.229,0.224,0.225])

In [4]:
# Define the data transforms
data_tansforms={'train':transforms.Compose([transforms.RandomResizedCrop(224),
                                            transforms.RandomHorizontalFlip(),
                                            transforms.ToTensor(),
                                            transforms.Normalize(mean,std)]),
                'val':transforms.Compose([transforms.Resize(256),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize(mean,std)])}

In [5]:
# Define data directory
data_dir='/home/mohanad/learn/Pytorch/14- Transfer Learning/data/hymenoptera_data'

In [6]:
# Define the datasets
image_datasets={x:datasets.ImageFolder(os.path.join(data_dir,x),data_tansforms[x]) for x in ['train','val']}


In [7]:
# Define the dataloaders
dataloaders={x:DataLoader(image_datasets[x],batch_size=4,shuffle=True,num_workers=4) for x in ['train','val']}

iter_=iter(dataloaders['train'])
images, labels=next(iter_)
print(f'The images shape is: {images.shape}')
print(f'The labels shape is: {labels.shape}')
print('')

The images shape is: torch.Size([4, 3, 224, 224])
The labels shape is: torch.Size([4])



In [8]:
# Print teh size of the datasets
dataset_sizes={x:len(image_datasets[x]) for x in ['train','val']}
print(f'The training set size is: {dataset_sizes["train"]}')
print(f'The testing set size is: {dataset_sizes["val"]}')

The training set size is: 244
The testing set size is: 153


In [9]:
 # This is a dataset object contains the path for each sample
train_dataset=image_datasets['train']
print(f'The path for teh first sample is: {train_dataset.samples[0][0]}')
print(f'The classes names are: {train_dataset.classes}')

The path for teh first sample is: /home/mohanad/learn/Pytorch/14- Transfer Learning/data/hymenoptera_data/train/ants/0013035.jpg
The classes names are: ['ants', 'bees']


In [28]:
def train_model(model,loss,optimizer,scheduler,num_epochs=25):
    start=time.time()
    best_acc=0.0
    # best model weights
    best_model_wts=copy.deepcopy(model.state_dict())
    
    for epoch in range(num_epochs):
        print(f'Epoch:{epoch+1}/{num_epochs}')
        print('-'*10)
        # We have a training and validation phase at each epoch
        for phase in ['train','val']:
            if phase=='train':
                model.train()
            else:
                model.eval()
            
            running_loss=0.0
            running_corrects=0
            
            # Iterate over data
            for images, labels in dataloaders[phase]:
                images=images.to(device)
                labels=labels.to(device)
            # tracking history only if in training phase
                with torch.set_grad_enabled(phase=='train'):
                    outputs=model(images)
                    _,prediction=torch.max(outputs,1)
                    loss_=loss(outputs,labels)
            # Backward + optimize only if in training phase
                    if phase=='train':
                        optimizer.zero_grad()
                        loss_.backward()
                        optimizer.step()
                        
                    # Statistics
                    running_loss+=loss_.item()*images.size(0)
                    running_corrects+=torch.sum(prediction==labels.data)
            
            # scheduler
            if phase=='train':
                scheduler.step()
            
            epoch_loss=running_loss/dataset_sizes[phase]
            epoch_acc=running_corrects.double()/dataset_sizes[phase]
            
            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
            # Saving the model with best accuracy on validation
            
            if phase=='val' and epoch_acc>best_acc:
                best_acc=epoch_acc
                best_model_wts=copy.deepcopy(model.state_dict())
        print('')
    time_elapsed=time.time()-start
    print(f'Training complete in {time_elapsed//60:.0f}m {time_elapsed%60:.0f}s')
    print(f'Best val Acc: {best_acc:.4f}')
    model.load_state_dict(best_model_wts)
    return model

In [29]:
# Let us use transfer learning
# Setting the model
model=models.resnet18(pretrained=True)


In [30]:
# Print all model parameters

for idx,(name,param) in enumerate(model.named_parameters()):
    print(f"Index: {idx}, Layer Name: {name}, Shape: {param.shape}")


Index: 0, Layer Name: conv1.weight, Shape: torch.Size([64, 3, 7, 7])
Index: 1, Layer Name: bn1.weight, Shape: torch.Size([64])
Index: 2, Layer Name: bn1.bias, Shape: torch.Size([64])
Index: 3, Layer Name: layer1.0.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 4, Layer Name: layer1.0.bn1.weight, Shape: torch.Size([64])
Index: 5, Layer Name: layer1.0.bn1.bias, Shape: torch.Size([64])
Index: 6, Layer Name: layer1.0.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 7, Layer Name: layer1.0.bn2.weight, Shape: torch.Size([64])
Index: 8, Layer Name: layer1.0.bn2.bias, Shape: torch.Size([64])
Index: 9, Layer Name: layer1.1.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 10, Layer Name: layer1.1.bn1.weight, Shape: torch.Size([64])
Index: 11, Layer Name: layer1.1.bn1.bias, Shape: torch.Size([64])
Index: 12, Layer Name: layer1.1.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 13, Layer Name: layer1.1.bn2.weight, Shape: torch.Size([64])
Index: 14, Layer Name: layer1.1.bn

In [31]:
# Iterate over teh named model paarameters to print the layers that have weights
for name,module in model.named_modules():
    if hasattr(module,'weight') and hasattr(module.weight, 'shape'):
        print(name)

conv1
bn1
layer1.0.conv1
layer1.0.bn1
layer1.0.conv2
layer1.0.bn2
layer1.1.conv1
layer1.1.bn1
layer1.1.conv2
layer1.1.bn2
layer2.0.conv1
layer2.0.bn1
layer2.0.conv2
layer2.0.bn2
layer2.0.downsample.0
layer2.0.downsample.1
layer2.1.conv1
layer2.1.bn1
layer2.1.conv2
layer2.1.bn2
layer3.0.conv1
layer3.0.bn1
layer3.0.conv2
layer3.0.bn2
layer3.0.downsample.0
layer3.0.downsample.1
layer3.1.conv1
layer3.1.bn1
layer3.1.conv2
layer3.1.bn2
layer4.0.conv1
layer4.0.bn1
layer4.0.conv2
layer4.0.bn2
layer4.0.downsample.0
layer4.0.downsample.1
layer4.1.conv1
layer4.1.bn1
layer4.1.conv2
layer4.1.bn2
fc


In [32]:
# Print only the layers that perform convolution operations
for idx, (name, param) in enumerate(model.named_parameters()):
    if 'conv' in name:
        print(f"Index: {idx}, Layer Name: {name}, Shape: {param.shape}")

Index: 0, Layer Name: conv1.weight, Shape: torch.Size([64, 3, 7, 7])
Index: 3, Layer Name: layer1.0.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 6, Layer Name: layer1.0.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 9, Layer Name: layer1.1.conv1.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 12, Layer Name: layer1.1.conv2.weight, Shape: torch.Size([64, 64, 3, 3])
Index: 15, Layer Name: layer2.0.conv1.weight, Shape: torch.Size([128, 64, 3, 3])
Index: 18, Layer Name: layer2.0.conv2.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 24, Layer Name: layer2.1.conv1.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 27, Layer Name: layer2.1.conv2.weight, Shape: torch.Size([128, 128, 3, 3])
Index: 30, Layer Name: layer3.0.conv1.weight, Shape: torch.Size([256, 128, 3, 3])
Index: 33, Layer Name: layer3.0.conv2.weight, Shape: torch.Size([256, 256, 3, 3])
Index: 39, Layer Name: layer3.1.conv1.weight, Shape: torch.Size([256, 256, 3, 3])
Index: 42, Layer Name: layer3.1.conv2.wei

In [33]:
# Back to  continue with transfere lerning
# get the number of input feature from the last layer
num_features=model.fc.in_features
print(f'The number of input features is: {num_features}')

The number of input features is: 512


In [34]:
# Create a new layer with 2 output features in the top of the fc layer
model.fc=nn.Linear(num_features,2)  # this will neglect the old FC layer and create a new one
model.to(device)

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [35]:
import torchsummary
from torchsummary import summary

summary(model, input_size=images[0].shape)


----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,

In [36]:
EPOCHS=100
# Define the loss function
loss=nn.CrossEntropyLoss()
# Define th eOptimzer
optimizer=optim.SGD(model.parameters(),lr=0.001,momentum=0.9)
# Define the scheduler for LR Update
step_lr_scheduler=lr_scheduler.StepLR(optimizer,step_size=10,gamma=0.1)
# Calling the training function
model=train_model(model,loss,optimizer,step_lr_scheduler,num_epochs=EPOCHS)

# The method above is called fine-tuning of all model parameters, where we adjust the entire model's parameters using a lower learning rate (LR).

# Another option is to freeze the parameters of the entire model and fine-tune only the last layer.



Epoch:1/100
----------
train Loss: 0.5737 Acc: 0.6885
val Loss: 0.1987 Acc: 0.9216

Epoch:2/100
----------
train Loss: 0.4646 Acc: 0.7828
val Loss: 0.4011 Acc: 0.8824

Epoch:3/100
----------
train Loss: 0.5883 Acc: 0.7541
val Loss: 0.2574 Acc: 0.9020

Epoch:4/100
----------
train Loss: 0.5281 Acc: 0.7828
val Loss: 0.2201 Acc: 0.9216

Epoch:5/100
----------
train Loss: 0.5330 Acc: 0.7910
val Loss: 0.2849 Acc: 0.8954

Epoch:6/100
----------
train Loss: 0.4466 Acc: 0.8197
val Loss: 0.2628 Acc: 0.9085

Epoch:7/100
----------
train Loss: 0.4346 Acc: 0.8361
val Loss: 0.2580 Acc: 0.9150

Epoch:8/100
----------
train Loss: 0.4548 Acc: 0.7910
val Loss: 0.2250 Acc: 0.9216

Epoch:9/100
----------
train Loss: 0.5697 Acc: 0.7705
val Loss: 0.2704 Acc: 0.8889

Epoch:10/100
----------
train Loss: 0.5441 Acc: 0.7951
val Loss: 0.4711 Acc: 0.8693

Epoch:11/100
----------
train Loss: 0.3864 Acc: 0.8443
val Loss: 0.2687 Acc: 0.9085

Epoch:12/100
----------
train Loss: 0.4041 Acc: 0.8402
val Loss: 0.2594 Ac

In [38]:
model=models.resnet18(pretrained=True) # The optimized weight trained in ImageNet
model.to(device)
import torchsummary
from torchsummary import summary

summary(model, input_size=images[0].shape)



# Freeze all the layers in the network as a second option of the model pretraining
for param in model.parameters():
    param.requires_grad=False

# We can keep unfreeze the last two  layers
# The parameters of the layers are typically organized as: (weight, bias), hence we'll need to "unfreeze" 4 parameters for 2 layers
#parameters = list(model.parameters())
#parameters[-4].requires_grad = True  # Unfreeze weights of the second last layer
#parameters[-3].requires_grad = True  # Unfreeze bias of the second last layer
#parameters[-2].requires_grad = True  # Unfreeze weights of the last layer
#parameters[-1].requires_grad = True  # Unfreeze bias of the last layer
    

 
# Unfreeze the last convolutional layer of the last convolutional block
#model.layer4[1].conv2.weight.requires_grad = True # Note: Not all models have the same architecture or have a bias layer



# Unfreeze the last convolutional layer of each convolutional block
model.layer1[1].conv2.weight.requires_grad = True
model.layer2[1].conv2.weight.requires_grad = True
model.layer3[1].conv2.weight.requires_grad = True
model.layer4[1].conv2.weight.requires_grad = True


# Print all layes of the model
for idx, (name, param) in enumerate(model.named_parameters()):
    print(f"Index: {idx}, Layer Name: {name}, Shape: {param.shape}")

print('')

# Iterate through the named modules to print teh layers that have weights
for name, module in model.named_modules():
    # Check if the module has weight attribute to get its shape
    if hasattr(module, 'weight') and hasattr(module.weight, 'shape'):
        print(name, ":", module.weight.shape)
        
# print only the layers that perform convolution

index = 0
for name, param in model.named_parameters():
    if "conv" in name or 'fc' in name:
        print(f"Index: {index}, Layer Name: {name}, Shape: {param.shape}")
        index += 1
print('')        


# get the number of input feature from the last layer
num_features=model.fc.in_features # fc is the last layer in the model
print(f'The number of input features is: {num_features}')


# Create a new layer and assign it to teh last layer
model.fc=nn.Linear(num_features,2) # we inserted a new fully connected layer with a dimensionality of 2 is the number of classes

model.to(device)


# Define the loss function
loss=nn.CrossEntropyLoss()

# Define the optimizer
optimizer=optim.SGD(model.parameters(),lr=0.001)

NUM_EPOCHS=250

# scheduler for LR updates
step_lr_sceduler=optim.lr_scheduler.StepLR(optimizer,step_size=10,gamma=0.1) # the LR will be reduced by a factor of 10 percent every 10 epochs

# Calling training function
model=train_model(model,loss,optimizer,step_lr_sceduler,num_epochs=EPOCHS)



----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64,