# Assignment Module 2: Pet Classification

The goal of this assignment is to implement a neural network that classifies images of 37 breeds of cats and dogs from the [Oxford-IIIT-Pet dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/). The assignment is divided into two parts: first, you will be asked to implement from scratch your own neural network for image classification; then, you will fine-tune a pretrained network provided by PyTorch.

## Dataset

The following cells contain the code to download and access the dataset you will be using in this assignment. Note that, although this dataset features each and every image from [Oxford-IIIT-Pet](https://www.robots.ox.ac.uk/~vgg/data/pets/), it uses a different train-val-test split than the original authors.

In [None]:
!git clone https://github.com/CVLAB-Unibo/ipcv-assignment-2.git

In [2]:
from pathlib import Path
from PIL import Image
from torch import Tensor
from torch.utils.data import Dataset
from typing import List, Tuple

In [None]:
class OxfordPetDataset(Dataset):
    def __init__(self, split: str, transform=None) -> None:
        super().__init__()

        self.root = Path("ipcv-assignment-2") / "dataset"
        self.split = split
        self.names, self.labels = self._get_names_and_labels()
        self.transform = transform

    def __len__(self) -> int:
        return len(self.labels)

    def __getitem__(self, idx) -> Tuple[Tensor, int]:
        img_path = self.root / "images" / f"{self.names[idx]}.jpg"
        img = Image.open(img_path).convert("RGB")
        label = self.labels[idx]
        
        if self.transform:
            img = self.transform(img)

        return img, label
    
    def get_num_classes(self) -> int:
        return max(self.labels) + 1

    def _get_names_and_labels(self) -> Tuple[List[str], List[int]]:
        names = []
        labels = []

        with open(self.root / "annotations" / f"{self.split}.txt") as f:
            for line in f:
                name, label = line.replace("\n", "").split(" ")
                names.append(name), 
                labels.append(int(label) - 1)

        return names, labels

## Part 1: design your own network

Your goal is to implement a convolutional neural network for image classification and train it from scratch on `OxfordPetDataset`. You should consider yourselves satisfied once you obtain a classification accuracy on the test split of ~60%. You are free to achieve this however you want, except for a few rules you must follow:

- Compile this notebook by displaying the results obtained by the best model you found throughout your experimentation; then show how, by removing some of its components, its performance drops. In other words, do an *ablation study* to prove that your design choices have a positive impact on the final result.

- Do not instantiate an off-the-self PyTorch network. Instead, construct your network as a composition of existing PyTorch layers. In more concrete terms, you can use e.g. `torch.nn.Linear`, but you cannot use e.g. `torchvision.models.alexnet`.

- Show your results and ablations with plots, tables, images, etc. — the clearer, the better.

Don't be too concerned with your model performance: the ~60% is just to give you an idea of when to stop. Keep in mind that a thoroughly justified model with lower accuracy will be rewarded more points than a poorly experimentally validated model with higher accuracy.

## Part 2: fine-tune an existing network

Your goal is to fine-tune a pretrained ResNet-18 model on `OxfordPetDataset`. Use the implementation provided by PyTorch, i.e. the opposite of part 1. Specifically, use the PyTorch ResNet-18 model pretrained on ImageNet-1K (V1). Divide your fine-tuning into two parts:

2A. First, fine-tune the ResNet-18 with the same training hyperparameters you used for your best model in part 1.

2B. Then, tweak the training hyperparameters in order to increase the accuracy on the test split. Justify your choices by analyzing the training plots and/or citing sources that guided you in your decisions — papers, blog posts, YouTube videos, or whatever else you may find useful. You should consider yourselves satisfied once you obtain a classification accuracy on the test split of ~90%.

In [None]:
import pandas as pd 
import numpy as np 


data = 

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
import torchvision.models as models
import torchvision.transforms as transforms
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from collections import defaultdict
import copy
import time

device = torch.device("cpu")
print(f"Using device: {device}")

transform_train = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

transform_test = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Create datasets
train_dataset = OxfordPetDataset('train', transform=transform_train)
val_dataset = OxfordPetDataset('val', transform=transform_test)
test_dataset = OxfordPetDataset('test', transform=transform_test)

# Create data loaders
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

num_classes = train_dataset.get_num_classes()
print(f"Number of classes: {num_classes}")
print(f"Train samples: {len(train_dataset)}")
print(f"Val samples: {len(val_dataset)}")
print(f"Test samples: {len(test_dataset)}")

def create_resnet18_model(num_classes, pretrained=True):
    """Create ResNet-18 model with modified final layer"""
    model = models.resnet18(pretrained=pretrained)
    # Replace the final fully connected layer
    model.fc = nn.Linear(model.fc.in_features, num_classes)
    return model

def train_model(model, train_loader, val_loader, criterion, optimizer, scheduler, num_epochs, model_name):
    """Train model and return training history"""
    model = model.to(device)
    
    # Training history
    history = {
        'train_loss': [],
        'train_acc': [],
        'val_loss': [],
        'val_acc': []
    }
    
    best_val_acc = 0.0
    best_model_wts = copy.deepcopy(model.state_dict())
    
    start_time = time.time()
    
    for epoch in range(num_epochs):
        print(f'Epoch {epoch+1}/{num_epochs}')
        print('-' * 10)
        
        # Training phase
        model.train()
        running_loss = 0.0
        running_corrects = 0
        
        for inputs, labels in train_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            optimizer.zero_grad()
            
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, labels)
            
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)
        
        epoch_loss = running_loss / len(train_dataset)
        epoch_acc = running_corrects.double() / len(train_dataset)
        
        history['train_loss'].append(epoch_loss)
        history['train_acc'].append(epoch_acc.item())
        
        # Validation phase
        model.eval()
        val_running_loss = 0.0
        val_running_corrects = 0
        
        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs = inputs.to(device)
                labels = labels.to(device)
                
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)
                
                val_running_loss += loss.item() * inputs.size(0)
                val_running_corrects += torch.sum(preds == labels.data)
        
        val_epoch_loss = val_running_loss / len(val_dataset)
        val_epoch_acc = val_running_corrects.double() / len(val_dataset)
        
        history['val_loss'].append(val_epoch_loss)
        history['val_acc'].append(val_epoch_acc.item())
        
        print(f'Train Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
        print(f'Val Loss: {val_epoch_loss:.4f} Acc: {val_epoch_acc:.4f}')
        
        # Save best model
        if val_epoch_acc > best_val_acc:
            best_val_acc = val_epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())
            # Save checkpoint
            torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'best_val_acc': best_val_acc,
                'history': history
            }, f'best_{model_name}_checkpoint.pth')
        
        # Step scheduler
        if scheduler:
            scheduler.step()
        
        print()
    
    time_elapsed = time.time() - start_time
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_val_acc:.4f}')
    
    # Load best model weights
    model.load_state_dict(best_model_wts)
    return model, history, best_val_acc

def evaluate_model(model, test_loader, criterion):
    """Evaluate model on test set"""
    model.eval()
    test_running_loss = 0.0
    test_running_corrects = 0
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, labels)
            
            test_running_loss += loss.item() * inputs.size(0)
            test_running_corrects += torch.sum(preds == labels.data)
            
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
    
    test_loss = test_running_loss / len(test_dataset)
    test_acc = test_running_corrects.double() / len(test_dataset)
    
    return test_loss, test_acc.item(), all_preds, all_labels

def plot_training_history(history, title):
    """Plot training and validation curves"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # Plot loss
    ax1.plot(history['train_loss'], label='Train Loss')
    ax1.plot(history['val_loss'], label='Val Loss')
    ax1.set_title(f'{title} - Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.legend()
    ax1.grid(True)
    
    # Plot accuracy
    ax2.plot(history['train_acc'], label='Train Accuracy')
    ax2.plot(history['val_acc'], label='Val Accuracy')
    ax2.set_title(f'{title} - Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.show()

def plot_confusion_matrix(y_true, y_pred, title):
    """Plot confusion matrix"""
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(12, 10))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title(f'{title} - Confusion Matrix')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.show()

# =============================================================================
# PART 2A: Fine-tune ResNet-18 with same hyperparameters as Part 1
# =============================================================================

print("="*50)
print("PART 2A: Fine-tuning with Part 1 hyperparameters")
print("="*50)

# Create model
model_2a = create_resnet18_model(num_classes, pretrained=True)

# Use same hyperparameters as Part 1
criterion = nn.CrossEntropyLoss()
optimizer_2a = optim.Adam(model_2a.parameters(), lr=0.001)

# Learning rate scheduler (reduce LR after some epochs like in Part 1)
scheduler_2a = optim.lr_scheduler.StepLR(optimizer_2a, step_size=60, gamma=0.1)

# Train model
model_2a, history_2a, best_val_acc_2a = train_model(
    model_2a, train_loader, val_loader, criterion, optimizer_2a, 
    scheduler_2a, num_epochs=70, model_name="resnet18_2a"
)

# Evaluate on test set
test_loss_2a, test_acc_2a, preds_2a, labels_2a = evaluate_model(model_2a, test_loader, criterion)
print(f"Part 2A Test Accuracy: {test_acc_2a:.4f}")

# Plot results
plot_training_history(history_2a, "Part 2A: ResNet-18 with Part 1 hyperparameters")

# =============================================================================
# PART 2B: Optimized fine-tuning for better performance
# =============================================================================

print("="*50)
print("PART 2B: Optimized fine-tuning")
print("="*50)

# Create new model
model_2b = create_resnet18_model(num_classes, pretrained=True)

# Optimized hyperparameters for fine-tuning
# Lower learning rate for pretrained features, higher for new classifier
optimizer_2b = optim.Adam([
    {'params': model_2b.fc.parameters(), 'lr': 0.001},  # New classifier layer
    {'params': [param for name, param in model_2b.named_parameters() 
                if 'fc' not in name], 'lr': 0.0001}  # Pretrained features
], weight_decay=1e-4)

# More aggressive learning rate scheduling
scheduler_2b = optim.lr_scheduler.MultiStepLR(optimizer_2b, milestones=[30, 60, 80], gamma=0.1)

# Train model
model_2b, history_2b, best_val_acc_2b = train_model(
    model_2b, train_loader, val_loader, criterion, optimizer_2b, 
    scheduler_2b, num_epochs=100, model_name="resnet18_2b"
)

# Evaluate on test set
test_loss_2b, test_acc_2b, preds_2b, labels_2b = evaluate_model(model_2b, test_loader, criterion)
print(f"Part 2B Test Accuracy: {test_acc_2b:.4f}")

# Plot results
plot_training_history(history_2b, "Part 2B: ResNet-18 with optimized hyperparameters")

# =============================================================================
# COMPARISON AND ANALYSIS
# =============================================================================

print("="*50)
print("RESULTS COMPARISON")
print("="*50)

results_comparison = {
    'Model': ['Part 2A (Same as Part 1)', 'Part 2B (Optimized)'],
    'Test Accuracy': [f'{test_acc_2a:.4f}', f'{test_acc_2b:.4f}'],
    'Best Val Accuracy': [f'{best_val_acc_2a:.4f}', f'{best_val_acc_2b:.4f}']
}

import pandas as pd
df_results = pd.DataFrame(results_comparison)
print(df_results.to_string(index=False))

# Plot comparison
fig, ax = plt.subplots(figsize=(10, 6))
models = ['Part 2A', 'Part 2B']
test_accuracies = [test_acc_2a, test_acc_2b]
colors = ['skyblue', 'lightgreen']

bars = ax.bar(models, test_accuracies, color=colors)
ax.set_ylabel('Test Accuracy')
ax.set_title('ResNet-18 Fine-tuning Results Comparison')
ax.set_ylim(0, 1)

# Add value labels on bars
for bar, acc in zip(bars, test_accuracies):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height + 0.01,
            f'{acc:.3f}', ha='center', va='bottom', fontweight='bold')

plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

# Confusion matrices for best model
if test_acc_2b > test_acc_2a:
    plot_confusion_matrix(labels_2b, preds_2b, "Part 2B: Best Model")
    print("\nDetailed classification report for Part 2B:")
    print(classification_report(labels_2b, preds_2b))
else:
    plot_confusion_matrix(labels_2a, preds_2a, "Part 2A: Best Model")
    print("\nDetailed classification report for Part 2A:")
    print(classification_report(labels_2a, preds_2a))

# =============================================================================
# HYPERPARAMETER JUSTIFICATION
# =============================================================================

print("="*50)
print("HYPERPARAMETER JUSTIFICATION FOR PART 2B")
print("="*50)

justification = """
1. DIFFERENTIAL LEARNING RATES:
   - Pretrained features: LR = 0.0001 (lower to preserve learned features)
   - New classifier: LR = 0.001 (higher to learn task-specific mappings)
   - Rationale: Pretrained features need fine adjustment, new layers need more learning

2. WEIGHT DECAY (L2 Regularization):
   - Added weight_decay=1e-4 to prevent overfitting
   - Particularly important for small datasets like pet classification

3. MULTI-STEP LEARNING RATE SCHEDULER:
   - Reduces LR at epochs 30, 60, 80 by factor of 0.1
   - More aggressive than single step at epoch 60
   - Allows for better convergence in later epochs

4. INCREASED TRAINING EPOCHS:
   - 100 epochs vs 70 in Part 2A
   - Fine-tuning often needs more epochs to reach optimal performance
   - Early stopping via best validation accuracy prevents overfitting

5. IMAGENET NORMALIZATION:
   - Used ImageNet mean/std for preprocessing
   - Essential for pretrained models to work properly
   - Ensures input distribution matches training data

Expected improvements:
- Better generalization due to regularization
- More stable training with differential learning rates
- Higher final accuracy through extended training
"""

print(justification)