## Finetuning models for Dogs vs Cats

In this notebook, we'll finetune ResNet18 and MobileNetV2 on the Dogs vs Cats dataset as provided by this link: https://www.kaggle.com/datasets/karakaggle/kaggle-cat-vs-dog-dataset. There are around 12500 cat images and 12500 dog images. We'll be using an 80/20 train/test split to fine-tune and evaluate our models. 

### ResNet18

In [1]:
import torch
import torch.nn as nn
import torch.optim as opt
from torchvision import transforms, models, datasets
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader, random_split
import os, glob, shutil, random
from tqdm import tqdm
from pathlib import Path
from torchvision.models import ResNet18_Weights
from sklearn.metrics import confusion_matrix, classification_report
import torch.nn.functional as F
import numpy as np

In [None]:
transform = transforms.Compose([
   transforms.Resize((256, 256)),                     
   transforms.CenterCrop(224),                  # resnet expects 224x224 images  
   transforms.ToTensor(),                             
   transforms.Normalize([0.485, 0.456, 0.406],  # Normalize with ImageNet means and stds
                        [0.229, 0.224, 0.225])
])

dataset = datasets.ImageFolder(root='./PetImages', transform=transform)  # type of datasets object

train_size = int(0.8*len(dataset))
test_size = len(dataset) - train_size

train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

resnet18 = models.resnet18(weights=ResNet18_Weights.DEFAULT)
resnet18.fc = nn.Linear(resnet18.fc.in_features, 2)  # only two outputs, dog and cat

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet18.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(resnet18.parameters(), lr=0.001)

# [o for o in dir(test_loader) if not o.startswith('__') and not o.startswith('_')]

In [31]:
import torch
print("CUDA available:", torch.cuda.is_available())
print("Device count:", torch.cuda.device_count())
print("Device name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU found")

CUDA available: True
Device count: 1
Device name: NVIDIA GeForce RTX 3060 Laptop GPU


In [14]:
def train_model(model, train_loader, device, epochs, optimizer, criterion):
   model.train()
   for epoch in range(epochs):
      running_loss = 0.0
      # Wrap the data loader with tqdm for the progress bar
      progress_bar = tqdm(train_loader, desc=f"Epoch {epoch + 1}/{epochs}")
      for images, labels in progress_bar:
         images, labels = images.to(device), labels.to(device)
         optimizer.zero_grad()
         outputs = model(images)
         loss = criterion(outputs, labels)
         loss.backward()
         optimizer.step()

         running_loss += loss.item()
         # Update progress bar with the current loss
         progress_bar.set_postfix(loss=running_loss / len(train_loader))

epochs = 5
train_model(resnet18, train_loader, device, epochs, optimizer, criterion)


Epoch 1/5: 100%|██████████| 312/312 [02:30<00:00,  2.08it/s, loss=0.111] 
Epoch 2/5: 100%|██████████| 312/312 [02:31<00:00,  2.06it/s, loss=0.0803]
Epoch 3/5: 100%|██████████| 312/312 [02:41<00:00,  1.93it/s, loss=0.0657]
Epoch 4/5: 100%|██████████| 312/312 [02:39<00:00,  1.96it/s, loss=0.0519]
Epoch 5/5: 100%|██████████| 312/312 [02:41<00:00,  1.94it/s, loss=0.0462]


In [20]:
# Function to evaluate the model on the test set
def evaluate_model(model, test_loader, device):
   model.eval()  # Set the model to evaluation mode
   correct = 0
   total = 0
   all_preds = []
   all_labels = []

   with torch.no_grad():  # No need to track gradients for validation
      progress_bar = tqdm(test_loader, total=len(test_loader), desc="Evaluating")

      for i, (images, labels) in enumerate(test_loader):
         images, labels = images.to(device), labels.to(device)
         outputs = model(images)
         _, predicted = torch.max(outputs.data, 1)
         total += labels.size(0)
         correct += (predicted == labels).sum().item()

         # Collect predictions and labels for metrics calculation
         all_preds.extend(predicted.cpu().numpy())
         all_labels.extend(labels.cpu().numpy())

         # update progress bar
         progress_bar.set_postfix(accuracy = correct/total, samples_processed=(i+1)*test_loader.batch_size)

   # Calculate accuracy
   accuracy = correct / total
   print(f'Accuracy: {accuracy * 100:.2f}%')

   # Generate confusion matrix
   conf_matrix = confusion_matrix(all_labels, all_preds)
   print("\nConfusion Matrix:\n", conf_matrix)

   # Generate classification report
   class_report = classification_report(all_labels, all_preds, digits=4)
   print("\nClassification Report:\n", class_report)

# Evaluate the model
evaluate_model(resnet18, test_loader, device)


Evaluating:   0%|          | 0/78 [00:33<?, ?it/s, accuracy=0.972, samples_processed=4992]

Accuracy: 97.16%

Confusion Matrix:
 [[2442   46]
 [  96 2408]]

Classification Report:
               precision    recall  f1-score   support

           0     0.9622    0.9815    0.9717      2488
           1     0.9813    0.9617    0.9714      2504

    accuracy                         0.9716      4992
   macro avg     0.9717    0.9716    0.9716      4992
weighted avg     0.9717    0.9716    0.9716      4992






It looks like the finetuned ResNet18 achieved high accuracy, f1 score, precision and recall on our (balanced) test set, implying that it is a robust model. Interestingly, the model has higher recall and lower precision for cats than it does for dogs, suggesting that it is more likely to guess a dog picture as a cat than the other way around. Regardless, this is a promising model for the task at hand, given the high metrics.

Now, let's do the same thing for MobileNetV2.

### MobileNetV2

In [18]:
mobilenet = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)
mobilenet.classifier[1] = nn.Linear(mobilenet.classifier[1].in_features, 2)

mobilenet.to(device)

optimizer_mob = torch.optim.Adam(mobilenet.parameters(), lr=0.001)

Using cache found in C:\Users\neela/.cache\torch\hub\pytorch_vision_v0.10.0


In [19]:
epochs = 5
train_model(mobilenet, train_loader, device, epochs, optimizer_mob, criterion)

Epoch 1/5: 100%|██████████| 312/312 [02:47<00:00,  1.87it/s, loss=0.101] 
Epoch 2/5: 100%|██████████| 312/312 [02:50<00:00,  1.83it/s, loss=0.0687]
Epoch 3/5: 100%|██████████| 312/312 [02:41<00:00,  1.93it/s, loss=0.0571]
Epoch 4/5: 100%|██████████| 312/312 [02:46<00:00,  1.88it/s, loss=0.0448]
Epoch 5/5: 100%|██████████| 312/312 [02:49<00:00,  1.84it/s, loss=0.0402]


In [21]:
evaluate_model(mobilenet, test_loader, device)

Evaluating:   0%|          | 0/78 [00:29<?, ?it/s, accuracy=0.97, samples_processed=4992] 

Accuracy: 97.02%

Confusion Matrix:
 [[2362  126]
 [  23 2481]]

Classification Report:
               precision    recall  f1-score   support

           0     0.9904    0.9494    0.9694      2488
           1     0.9517    0.9908    0.9708      2504

    accuracy                         0.9702      4992
   macro avg     0.9710    0.9701    0.9701      4992
weighted avg     0.9710    0.9702    0.9701      4992






MobileNetV2, despite having only 3.4 million parameters compared to ResNet18's 11 million, performs similarly with regards to accuracy and f1 scores for each class. Whereas our finetuned ResNet tended to misclassify dog images, MobileNet seems to misclassify cat images much more often. It's possible that this difference arises from architectural differences between the models: MobileNet is designed to be lightweight, using the technique of depthwise separable convolutions to dramatically reduce computation while only losing a slight amount of accuracy. It could be that this, combined with fewer parameters, leads to less detailed representations of images, resulting in misclassifying cats more often (which potentially have fewer distinguishing features compared to dogs). ResNet18, having far more parameters and regular convolutions, might learn more detailed representations. A deeper analysis into the dataset would be required to make any definitive judgements, however.