Question 1

In most DL applications, instead of training a model from scratch, you would use a model pre-trained on a similar/related task/dataset. From torchvision, you can load ANY ONE model (GoogLeNet, InceptionV3, ResNet50, VGG, EfficientNetV2, VisionTransformer etc.) pre-trained on the ImageNet dataset. Given that ImageNet also contains many animal images, it stands to reason that using a model pre-trained on ImageNet maybe helpful for this task.

You will load a pre-trained model and then fine-tune it using the naturalist data that you used in the previous question. Simply put, instead of randomly initialising the weigths of a network you will use the weights resulting from training the model on the ImageNet data (torchvision directly provides these weights). Please answer the following questions:




The dimensions of the images in your data may not be the same as that in the ImageNet data. How will you address this?

WE should resize the images in your dataset to 224x224 pixels before passing them to the pre-trained model. we can do this by torchvision transform function

In [None]:
from torchvision import transforms
transform = transforms.Compose([
    transforms.Resize((224, 224)),  # Resize to 224x224
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),  # Normalization with ImageNet stats
])

# this transformes any input image dimention of x * y to 224 * 224 consistently

ImageNet has 1000 classes and hence the last layer of the pre-trained model would have 1000 nodes. However, the naturalist dataset has only 10 classes. How will you address this?

In [None]:
import torchvision.models as models
import torch.nn as nn

# Load a pre-trained ResNet50 model
model = models.resnet50(pretrained=True)

# Modify the last fully connected layer
model.fc = nn.Linear(model.fc.in_features, 10)  # Change output features to 10


In [5]:
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
import torch.optim as optim

# Step 1: Define transforms
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# Step 2: Load dataset
train_dataset = ImageFolder(root='../train/', transform=transform)
val_dataset = ImageFolder(root='../val/', transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True,num_workers=4, pin_memory=True)
val_loader = DataLoader(val_dataset, batch_size=32,num_workers=4, pin_memory=True)

# Step 3: Load pre-trained model and freeze all layers
model = models.resnet50(pretrained=True)

for param in model.parameters():
    param.requires_grad = False  # Freeze all layers

# Step 4: Replace the final FC layer
num_classes = 10  # Adjust as per your dataset
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Step 5: Move model to device and define loss/optimizer
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.fc.parameters(), lr=0.001)

# Step 6: Train only the new head
for epoch in range(5):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        # Calculate training accuracy
        _, predicted = torch.max(outputs, 1)
        correct_train += (predicted == labels).sum().item()
        total_train += labels.size(0)

    train_acc = 100 * correct_train / total_train

    # Evaluate on validation set
    model.eval()
    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            correct_val += (predicted == labels).sum().item()
            total_val += labels.size(0)

    val_acc = 100 * correct_val / total_val

    print(f"Epoch [{epoch+1}/5] | Loss: {running_loss/len(train_loader):.4f} | Train Acc: {train_acc:.2f}% | Val Acc: {val_acc:.2f}%")


Epoch [1/5] | Loss: 1.1397 | Train Acc: 63.03% | Val Acc: 73.50%
Epoch [2/5] | Loss: 0.8470 | Train Acc: 72.46% | Val Acc: 75.70%
Epoch [3/5] | Loss: 0.7859 | Train Acc: 74.05% | Val Acc: 75.56%
Epoch [4/5] | Loss: 0.7800 | Train Acc: 74.60% | Val Acc: 77.39%
Epoch [5/5] | Loss: 0.7516 | Train Acc: 75.48% | Val Acc: 74.11%
