# Part B : Fine-tuning a pre-trained model
Question 1 (5 Marks)

# Answer

To address the difference in image sizes, I will resize all input images to 224×224 pixels, which is the standard input size for most models pre-trained on ImageNet. I will do this using torchvision transforms like

transforms.Resize((224, 224))
This ensures compatibility with the model architecture

This is necessary because pre-trained models expect a fixed input size.

# Q1.2: ImageNet has 1000 classes and hence the last layer of the pre-trained model would have 1000 nodes. However, the naturalist dataset has only 10 classes. How will you address this?

Answer:

I will replace the final classification layer of the pre-trained model with a new nn.Linear layer that outputs 10 classes instead of 1000. For example, for ResNet50:

model.fc = nn.Linear(in_features=2048, out_features=10)

This allows the model to make predictions for the 10 classes in the naturalist dataset while keeping all other layers (and learned features) from pretraining.

This ensures the model outputs logits for only the classes present in our dataset

# Question 2

Common Trick: Freezing layers

Since pre-trained models are large, one common trick to reduce computation and make training tractable is to freeze some or all layers during training. Freezing a layer means we don’t update its weights (i.e., no backpropagation).

Three Strategies that I Tried


1)Freeze all layers except the final classification layer
→ Only the last layer (replaced to output 10 classes) is trainable. All other layers remain frozen.

→ This is fast and works well when the new task is similar to the original.

2)Freeze lower layers, fine-tune top k layers

→ For example, freeze the first few blocks (which capture general patterns) and fine-tune only the last few blocks.

→ This balances cost and accuracy.

3)Unfreeze all layers and fine-tune the entire model

→ This gives the best performance when the dataset is large or very different from ImageNet, but it’s most computationally expensive.

These strategies help leverage learned representations from ImageNet while adapting the model to the new domain with minimal computation.

# I used the iNaturalist dataset from the previous question and applied each of the above strategies. Fine-tuning only the final layer was fastest, while unfreezing all layers gave better accuracy but took longer to train.

# Implementation

In [1]:
import torch
import torch.nn as nn
from torchvision import models, transforms, datasets
from torch.utils.data import DataLoader
import torch.optim as optim


transformm = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])


train_data = datasets.ImageFolder(root='D:/nature_12K/inaturalist_12K/train', transform=transformm)
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)


Model = models.resnet50(pretrained=True)
num_classes = 10  
Model.fc = nn.Linear(Model.fc.in_features, num_classes)  



# Strategy 1: Freeze all layers except the last

In [2]:

for paramt in Model.parameters():
    paramt.requires_grad = False


for paramt in Model.fc.parameters():
    paramt.requires_grad = True

# Strategy 2: Unfreeze top k layers (e.g., last 2 layers)

In [3]:
for paramtr in Model.parameters():
    paramtr.requires_grad = False


ct = 0
for name, child in Model.named_children():
    ct += 1
    if ct > 6: 
        for paramtr in child.parameters():
            paramtr.requires_grad = True

# Strategy 3: Fine-tune entire model

Unfreeze all layers

In [4]:

for paramtr in Model.parameters():
    paramtr.requires_grad = True

# Question 3 (10 Marks)

Freeze all layers except the last layer, and fine-tune only the final classification head.

In [None]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torchvision import models
from torch.utils.data import DataLoader
import torch.optim as optim


Transfrm = transforms.Compose([
    transforms.Resize((224, 224)),  
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                         std=[0.229, 0.224, 0.225])
])


train_data = datasets.ImageFolder(root='D:/nature_12K/inaturalist_12K/train', transform=Transfrm)
val_data = datasets.ImageFolder(root='D:/nature_12K/inaturalist_12K/val', transform=Transfrm)

train_loader = DataLoader(train_data, batch_size=32, shuffle=True)
val_loader = DataLoader(val_data, batch_size=32)


Model = models.resnet50(pretrained=True)


for param in Model.parameters():
    param.requires_grad = False


number_class = 10
Model.fc = nn.Linear(Model.fc.in_features, number_class)


for param in Model.fc.parameters():
    param.requires_grad = True


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Model = Model.to(device)

Critrion = nn.CrossEntropyLoss()
optimizer = optim.Adam(Model.fc.parameters(), lr=0.001)


num_epochs = 10

for epoch in range(num_epochs):
    Model.train()
    total_loss, correct = 0, 0

    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = Model(imgs)
        loss = Critrion(outputs, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()

    train_Acc = correct / len(train_loader.dataset)
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {total_loss:.4f}, Train Acc: {train_Acc:.4f}")


Model.eval()
correct = 0
with torch.no_grad():
    for imgs, labels in val_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        outputs = Model(imgs)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()

val_accuracy = correct / len(val_loader.dataset)
print(f"Validation Accuracy: {val_accuracy:.4f}")




Epoch 1/10, Loss: 336.1740, Train Acc: 0.6567
Epoch 2/10, Loss: 259.6063, Train Acc: 0.7272
Epoch 3/10, Loss: 242.4616, Train Acc: 0.7380
Epoch 4/10, Loss: 237.3280, Train Acc: 0.7502
Epoch 5/10, Loss: 228.9919, Train Acc: 0.7575
Epoch 6/10, Loss: 225.9398, Train Acc: 0.7571
Epoch 7/10, Loss: 229.5094, Train Acc: 0.7576
Epoch 8/10, Loss: 217.0032, Train Acc: 0.7712
Epoch 9/10, Loss: 211.7710, Train Acc: 0.7750
Epoch 10/10, Loss: 211.2092, Train Acc: 0.7770
Validation Accuracy: 0.7620
