## Lecture 3: Full Evaluation, Augmentation, and Fine-Tuning

**Recap from Lecture 2 & Homework:**
* We successfully loaded the Oxford-IIIT Pet dataset with proper normalization and created DataLoaders.
* We loaded pre-trained MobileNetV3 and ResNet18 models and adapted their final layers for our 37-class problem.
* We implemented the "Feature Extraction" training strategy (training only the classifier head) and ran it for both MobileNetV3 (in lecture) and ResNet18 (in homework).
* We visualized the basic training/validation loss and accuracy curves.

**Goals for Today (Lecture 3):**
1.  *Evaluation:* Calculate Precision, Recall, F1-score, and Confusion Matrix.
2.  *Data Augmentation* to improve model generalization.
3.  *Deeper fine-Tuning:* Unfreeze some pre-trained layers.
4.  *Differential Learning Rates* for the optimizer.
5.  Run a demonstration of fine-tuning with augmentation for one model.
6.  Visualize and evaluate the results more thoroughly.

### Comprehensive Evaluation Metrics

There are several important metrics to evaluate a multi-class classification model like ours: First, recall the following:
* **Accuracy:** The overall proportion of correct predictions across all classes. It is the simplest metric but can be misleading if the dataset is imbalanced.
* **True Positives (TP):** The number of instances correctly predicted as a specific class.
* **False Positives (FP):** The number of instances incorrectly predicted as a specific class (predicted as class A but actually class B).
* **True Negatives (TN):** The number of instances correctly predicted as not being a specific class.
* **False Negatives (FN):** The number of instances that were not predicted as a specific class but actually belong to that class.

Using these, we can derive several key metrics for each class:

* **Precision:** For a given class, what proportion of predictions for that class were actually correct? (TP / (TP + FP)). High precision means the model is trustworthy when it predicts that class.
* **Recall (Sensitivity):** For a given class, what proportion of true instances of that class did the model correctly identify? (TP / (TP + FN)). High recall means the model finds most instances of that class.
* **F1-Score:** The harmonic mean of Precision and Recall (2 * (Precision * Recall) / (Precision + Recall)). It provides a single score balancing both concerns. Often reported as a macro-average (unweighted mean across classes) or weighted-average (weighted by class support).
* **Confusion Matrix:** A table showing the counts of true vs. predicted classes. The diagonal represents correct predictions, while off-diagonal elements show misclassifications (e.g., how many times 'Siamese' was predicted when the true label was 'Birman').

We will use the `scikit-learn` library to easily compute these metrics. First, we need our validation function to return the raw predictions and labels.

In [None]:
# Import libraries
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from sklearn.metrics import precision_recall_fscore_support, confusion_matrix, accuracy_score
import numpy as np
import time # Keep time for consistency if needed later

# set device
if torch.backends.mps.is_available() and torch.backends.mps.is_built():
    device = torch.device("mps")
else:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Fallback to CUDA then CPU
print(f"\nUsing device: {device}")

In [None]:
# Task: Modify the validation helper function to return predictions and labels

# We modify the previous 'validate' function to return lists of predictions and labels
def validate_and_collect_preds(model, dataloader, criterion, device):
    """Performs one epoch of validation and returns metrics AND predictions/labels."""
    model.eval() # Set model to evaluate mode
    running_loss = 0.0
    running_corrects = 0
    total_samples = 0

    all_preds = []
    all_labels = []

    # No gradients needed for validation
    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            loss = criterion(outputs, labels)
            _, preds = torch.max(outputs, 1)

            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)
            total_samples += inputs.size(0)

            # Collect predictions and labels for detailed metrics
            all_preds.extend(preds.cpu().numpy()) # Move preds to CPU and convert to numpy
            all_labels.extend(labels.cpu().numpy()) # Move labels to CPU and convert to numpy

    epoch_loss = running_loss / total_samples
    epoch_acc = running_corrects.double() / total_samples # Keep accuracy calculation here too

    return epoch_loss, epoch_acc, all_preds, all_labels # Return collected lists

print("Modified validation function 'validate_and_collect_preds' defined.")
# We will use this function in our training loop later.

In [None]:
# Task: Define a function to calculate and print detailed metrics using scikit-learn

import pandas as pd # For better confusion matrix display
import seaborn as sns # For plotting confusion matrix
import matplotlib.pyplot as plt

def calculate_and_display_metrics(y_true, y_pred, class_names):
    """Calculates and displays accuracy, precision, recall, F1, and confusion matrix."""

    print("\n--- Detailed Validation Metrics ---")

    # Calculate overall accuracy
    accuracy = accuracy_score(y_true, y_pred)
    print(f"Overall Accuracy : {accuracy:.4f}")

    # Calculate precision, recall, F1-score (weighted average)
    # 'weighted' accounts for class imbalance
    precision, recall, f1, _ = precision_recall_fscore_support(y_true, y_pred, average='weighted', zero_division=0)
    print(f"Weighted Precision: {precision:.4f}")
    print(f"Weighted Recall  : {recall:.4f}")
    print(f"Weighted F1-Score: {f1:.4f}")

    # Optional: Print per-class metrics (can be long for 37 classes)
    # report = classification_report(y_true, y_pred, target_names=class_names, zero_division=0)
    # print("\nClassification Report:\n", report)

    # Calculate and display Confusion Matrix
    print("\nConfusion Matrix:")
    cm = confusion_matrix(y_true, y_pred)
    # Display using pandas for better formatting (optional: use seaborn for heatmap)
    cm_df = pd.DataFrame(cm, index=class_names, columns=class_names)

    # Print only if not too large, or consider saving to file / using heatmap
    if len(class_names) <= 40: # Heuristic to avoid overwhelming output
       # Print with adjusted display options
       with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.width', 1000):
            print(cm_df)
       # Optional: Plot heatmap for better visualization
       # plt.figure(figsize=(15, 12))
       # sns.heatmap(cm_df, annot=True, fmt='d', cmap='Blues')
       # plt.ylabel('Actual')
       # plt.xlabel('Predicted')
       # plt.title('Confusion Matrix Heatmap')
       # plt.show()
    else:
        print("(Confusion matrix is large, displaying as text might be truncated)")
        print(cm_df) # Still print, but acknowledge it might be hard to read

print("Function 'calculate_and_display_metrics' defined.")
# We will call this function after the validation epoch in the training loop.

### Data Augmentation

Training deep learning models often requires large amounts of data to prevent overfitting and improve generalization. When our specific dataset is moderately sized (like the Pet dataset), **Data Augmentation** becomes crucial.

It involves applying random transformations to the training images during loading. This artificially expands the dataset, forcing the model to learn features that are robust to variations in appearance, position, lighting, etc.

Common Augmentation Techniques for Images:
* **Random Resized Crop:** Crops a random part of the image and resizes it. Helps model focus on different parts and handle scale variations.
* **Random Horizontal Flip:** Flips the image horizontally with a 50% probability. Useful for many object classes (like pets) where left-right orientation doesn't change the class.
* **Color Jitter:** Randomly changes brightness, contrast, saturation, and hue. Makes the model less sensitive to lighting conditions.
* **Random Rotation:** Rotates the image by a random angle.
* **Cutout / Random Erasing:** Randomly removes rectangular patches from the image, forcing the model to use more context.

**Important:** Augmentations are typically applied *only* to the training data. Validation and test data should use fixed transformations (like resize and normalize) for consistent evaluation.

In [None]:
# Task: Define transforms with augmentation and update datasets/dataloaders

# (Re-import necessary libraries)
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torchvision # Ensure torchvision is available

# Use the same ImageNet stats as before
imagenet_mean = [0.485, 0.456, 0.406]
imagenet_std = [0.229, 0.224, 0.225]
image_size = 224 # Standard size for these models

# --- Define NEW Transforms including Augmentation for Training ---
augmented_data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(image_size, scale=(0.8, 1.0)), # Crop a random portion and resize
        transforms.RandomHorizontalFlip(),   # Randomly flip images horizontally
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2), # Optional: Add color jitter
        transforms.RandomRotation(15), # Optional: Add random rotation
        transforms.ToTensor(),               # Convert to Tensor MUST be before Normalize
        transforms.Normalize(imagenet_mean, imagenet_std) # Normalize
    ]),
    # Validation transform remains simple: Resize, ToTensor, Normalize
    'val': transforms.Compose([
        transforms.Resize((image_size, image_size)), # Ensure consistent size
        transforms.ToTensor(),
        transforms.Normalize(imagenet_mean, imagenet_std)
    ]),
}
print("Defined data transforms with augmentation for training.")

# --- Reload Datasets and Create DataLoaders with Augmented Transforms ---
data_root = './data'
print(f"\nReloading datasets from {data_root} with new transforms...")

# Apply augmented transforms to the training dataset
train_dataset_aug = torchvision.datasets.OxfordIIITPet(
    root=data_root,
    split='trainval',
    download=False, # Should be downloaded already
    transform=augmented_data_transforms['train'] # Use the augmented transforms
)

# Validation dataset uses non-augmented transforms
val_dataset_aug = torchvision.datasets.OxfordIIITPet(
    root=data_root,
    split='test',
    download=False,
    transform=augmented_data_transforms['val'] # Use the simple validation transforms
)

print(f"Augmented training dataset size: {len(train_dataset_aug)}")
print(f"Validation dataset size: {len(val_dataset_aug)}")
num_classes = len(train_dataset_aug.classes) # Should still be 37
class_names = train_dataset_aug.classes # Get class names for metric display later

In [None]:
# Recreate DataLoaders
batch_size = 32 # Keep consistent or adjust if needed
dataloaders_aug = {
    'train': DataLoader(train_dataset_aug, batch_size=batch_size, shuffle=True, num_workers=2),
    'val': DataLoader(val_dataset_aug, batch_size=batch_size, shuffle=False, num_workers=2)
}
dataset_sizes_aug = {'train': len(train_dataset_aug), 'val': len(val_dataset_aug)}

print(f"\nDataLoaders recreated with augmented training data.")
# Now we'll use 'dataloaders_aug' for the fine-tuning training run.

### Fine-Tuning Strategy: Unfreezing Layers

Feature extraction (training only the head) gave us a good baseline. Now, we'll perform fine-tuning to potentially improve performance further.

The idea is to unfreeze some of the later layers in the pre-trained base model and train them alongside the classifier head. These later layers often capture more complex and potentially dataset-specific features compared to the very general features (edges, textures) learned in early layers.

**Which layers to unfreeze?**
* It's common practice to unfreeze the last few blocks or layers. Freezing the earliest layers helps retain general feature knowledge and prevent "catastrophic forgetting". This is a phenomenon where the model forgets previously learned features when trained on new data (i.e. the pre-trained model's weights are overwritten too much).
* For ResNet18, unfreezing `layer4` (the last residual block) is a typical starting point.
* For MobileNetV3, the structure is different (inverted residual blocks). We might unfreeze the last few blocks within its `features` attribute.

Let's choose one model (e.g., MobileNetV3 again for consistency, or ResNet18 if preferred) and demonstrate unfreezing.

In [None]:
# Task: Unfreeze later layers of a chosen model

# Let's reload a fresh pre-trained model instance for fine-tuning
model_to_finetune = torchvision.models.resnet18(weights=torchvision.models.ResNet18_Weights.IMAGENET1K_V1)
model_name = "ResNet18_FineTune"

print(f"Reloaded {model_name} for fine-tuning.")

# First, freeze all layers (as done initially for feature extraction)
for param in model_to_finetune.parameters():
    param.requires_grad = False

# Replace the classifier head (must be done again on the fresh model instance)
num_ftrs = model_to_finetune.fc.in_features
model_to_finetune.fc = nn.Linear(num_ftrs, num_classes)
print("Replaced ResNet18 classifier head (fc layer).")
# Unfreeze parameters in the final residual block (layer4) and the fc layer
print("  Unfreezing ResNet18 layer4...")
for param in model_to_finetune.layer4.parameters():
    param.requires_grad = True
# Ensure the new fc layer is trainable (it is by default)
for param in model_to_finetune.fc.parameters():
        param.requires_grad = True # This should already be true for the new layer}

# # Uncomment the following code to fine-tune MobileNetV3 instead
# model_to_finetune = torchvision.models.mobilenet_v3_large(weights=torchvision.models.MobileNet_V3_Large_Weights.IMAGENET1K_V1)
# model_name = "MobileNetV3_FineTune"
# num_ftrs = model_to_finetune.classifier[-1].in_features
# model_to_finetune.classifier[-1] = nn.Linear(num_ftrs, num_classes)
# print("Replaced MobileNetV3 classifier head.")
# # Unfreeze parameters in the last few blocks of MobileNetV3's features
# # Example: Unfreeze from block 13 onwards (MobileNetV3-Large has 16 blocks in features)
# for i, block in enumerate(model_to_finetune.features):
#     if i >= 13:
#         print(f"  Unfreezing MobileNetV3 features block {i}")
#         for param in block.parameters():
#             param.requires_grad = True
# # Also ensure the classifier itself is trainable (it is by default, but good practice)
# for param in model_to_finetune.classifier.parameters():
#         param.requires_grad = True


# --- Verify trainable parameters again ---
print(f"\nTrainable parameters in {model_name}:")
total_params = 0
trainable_params = 0
for name, param in model_to_finetune.named_parameters():
    total_params += param.numel()
    if param.requires_grad:
        # print(f"  - {name}") # Uncomment to see all trainable param names (can be long)
        trainable_params += param.numel() 

print(f"\nTotal parameters in {model_name}: {total_params:,}")
print(f"Trainable parameters (unfrozen base + head): {trainable_params:,}")
# This number should be significantly larger than for feature extraction.

### Differential Learning Rates

When fine-tuning, the newly added classifier head is randomly initialized (or initialized from the feature extraction phase) and needs to learn the task from scratch. In contrast, the unfrozen pre-trained layers already contain useful knowledge that we only want to adapt slightly.

If we use the same learning rate for both, a large learning rate might destroy the pre-trained features in the base layers ("catastrophic forgetting"), while a small learning rate might cause the new head to learn too slowly.

The solution is **Differential Learning Rates**:
* Use a **small** learning rate (e.g., `1e-5` or `1e-4`) for the parameters in the unfrozen pre-trained layers.
* Use a **larger** learning rate (e.g., `1e-3` or `1e-4`) for the parameters in the newly added classifier head.

We achieve this in PyTorch by passing different parameter groups (each with its own learning rate) to the optimizer.

In [None]:
# Task: Define the optimizer with differential learning rates

# We need to separate parameters into two groups:
# 1. Parameters of the unfrozen base layers.
# 2. Parameters of the classifier head.

# Assume 'model_to_finetune' is the model instance from the previous cell

params_to_update = []
base_params = []
head_params = []

# Identify parameters based on their names or module hierarchy
print("Separating parameters for differential learning rates...")
for name, param in model_to_finetune.named_parameters():
    if param.requires_grad:
        if model_name == "MobileNetV3_FineTune" and name.startswith("classifier"):
            head_params.append(param)
            print(f"  - Head Param: {name}")
        elif model_name == "ResNet18_FineTune" and name.startswith("fc"):
            head_params.append(param)
            print(f"  - Head Param: {name}")
        else:
            base_params.append(param)
            # print(f"  - Base Param: {name}") # Uncomment to see base params

# Define different learning rates
lr_base = 1e-5 # Very small LR for pre-trained layers
lr_head = 1e-4 # Larger LR for the new head (adjust as needed)

print(f"\nLearning Rates - Base: {lr_base}, Head: {lr_head}")

# Create parameter groups for the optimizer
param_groups = [
    {'params': base_params, 'lr': lr_base},
    {'params': head_params, 'lr': lr_head}
]

# Define the optimizer (Adam is common, SGD with momentum also works)
optimizer_finetune = optim.Adam(param_groups)
# Alternatively: optimizer_finetune = optim.SGD(param_groups, momentum=0.9)

print("\nOptimizer created with differential learning rates.")

In [None]:
# Task: Define the loss function
criterion = nn.CrossEntropyLoss()
print("Loss function (CrossEntropyLoss) defined.")

In [None]:
# Task: Implement the fine-tuning training loop

# Use the model with unfrozen layers
model = model_to_finetune.to(device) # Ensure model is on the correct device

# Use the optimizer with differential LRs
optimizer = optimizer_finetune

# Use the dataloaders with augmentation
dataloaders = dataloaders_aug # Use the augmented dataloaders defined earlier
dataset_sizes = dataset_sizes_aug

# Use the validation function that returns preds/labels
validate_func = validate_and_collect_preds

# --- Training Loop ---
num_epochs_finetune = 10 # Train for more epochs for fine-tuning (adjust as needed)
print(f"\nStarting Fine-Tuning training for {num_epochs_finetune} epochs...")
print(f"Using model: {model_name}")

start_time_finetune = time.time()

# Lists to store metrics for plotting
ft_train_losses = []
ft_val_losses = []
ft_train_accs = []
ft_val_accs = []

# Variables to track best model weights
best_val_acc = 0.0
best_model_wts = model.state_dict() # Store initial weights

for epoch in range(num_epochs_finetune):
    print(f"\nEpoch {epoch+1}/{num_epochs_finetune}")
    print('-' * 10)

    # --- Training Phase ---
    model.train() # Set model to training mode
    # (We can reuse train_one_epoch function from L2 if it's defined in the notebook)
    # If not defined earlier in this L3 notebook, copy it here or re-run L2's Cell 7
    # Assuming train_one_epoch is available:
    try:
        train_loss, train_acc = train_one_epoch(model, dataloaders['train'], criterion, optimizer, device)
        print(f"Train Loss: {train_loss:.4f} Acc: {train_acc:.4f}")
    except NameError:
         print("Error: 'train_one_epoch' function not defined. Please define or run the cell from L2.")
         break # Stop if function is missing

    # Store training metrics
    ft_train_losses.append(train_loss)
    # Ensure accuracy value is scalar before appending
    ft_train_accs.append(train_acc.item() if isinstance(train_acc, torch.Tensor) else train_acc)


    # --- Validation Phase ---
    model.eval() # Set model to evaluate mode
    # Use the validation function that returns predictions and labels
    val_loss, val_acc, y_pred, y_true = validate_func(model, dataloaders['val'], criterion, device)
    print(f"Val Loss: {val_loss:.4f} Acc: {val_acc:.4f}")

    # Store validation metrics
    ft_val_losses.append(val_loss)
     # Ensure accuracy value is scalar before appending
    current_val_acc = val_acc.item() if isinstance(val_acc, torch.Tensor) else val_acc
    ft_val_accs.append(current_val_acc)

    # --- Calculate and display detailed metrics for this epoch ---
    # (Assuming calculate_and_display_metrics function and class_names list are available)
    calculate_and_display_metrics(y_true, y_pred, class_names)

    # --- Checkpointing: Save best model weights ---
    if current_val_acc > best_val_acc:
        print(f"Validation accuracy improved ({best_val_acc:.4f} --> {current_val_acc:.4f}). Saving model weights...")
        best_val_acc = current_val_acc
        best_model_wts = model.state_dict()
        # Optional: Save weights to a file
        # torch.save(model.state_dict(), f'{model_name}_best_weights.pth')


# --- End of Training Loop ---
end_time_finetune = time.time()
finetune_duration = end_time_finetune - start_time_finetune
print(f"\nFine-tuning finished in {finetune_duration // 60:.0f}m {finetune_duration % 60:.0f}s")
print(f"Best validation accuracy achieved: {best_val_acc:.4f}")

# Optional: Load best model weights back into the model
# model.load_state_dict(best_model_wts)

# --- The variables ft_train_losses, ft_val_losses, etc. now hold the history ---
print("\nFine-tuning metric history collected.")

In [None]:
# Task: Visualize the fine-tuning training history

# (Reuse the plotting code, just adapt variable names if needed)
import matplotlib.pyplot as plt

# Use the number of epochs run for fine-tuning
epochs_ran = len(ft_train_losses)
epochs_range_ft = range(1, epochs_ran + 1)

plt.figure(figsize=(12, 5))

# Plot Training & Validation Loss
plt.subplot(1, 2, 1)
plt.plot(epochs_range_ft, ft_train_losses, 'o-', label='Training Loss')
plt.plot(epochs_range_ft, ft_val_losses, 'o-', label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title(f'{model_name} - Training and Validation Loss')
plt.legend()
plt.grid(True)

# Plot Training & Validation Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs_range_ft, ft_train_accs, 'o-', label='Training Accuracy')
plt.plot(epochs_range_ft, ft_val_accs, 'o-', label='Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.ylim(0, 1) # Set y-axis limits for accuracy between 0 and 1
plt.title(f'{model_name} - Training and Validation Accuracy')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

### Interpreting Fine-Tuning Results

After running the fine-tuning process, compare the results to the Feature Extraction baseline from Lecture 2:

* **Performance:** Did the validation accuracy improve compared to feature extraction? Often, fine-tuning provides a boost, especially with sufficient data and augmentation.
* **Curves:** Observe the loss and accuracy curves. Does the gap between training and validation curves suggest overfitting? Data augmentation should help mitigate this. Fine-tuning might take longer to converge than feature extraction initially.
* **Detailed Metrics:** Look at the precision, recall, F1-scores, and confusion matrix. Did fine-tuning help improve performance on specific classes that were previously difficult? The confusion matrix is key here.
* **Training Time:** Fine-tuning involves more gradient calculations, so each epoch will typically take longer than in the feature extraction phase.

### Homework: Fine-Tuning the Second Model (to be completed by Monday, May 5th)

**Goal:** This homework directs you to consolidate the code and concepts from Lectures 1, 2, and 3, and your previous homeworks, into a structured project format. You will implement the full fine-tuning pipeline for both required models (MobileNetV3 and ResNet18), incorporate early stopping, run extended training, and generate the core results needed for your final project write-up.

Organize your work into two separate Jupyter Notebooks, and update the project repo with these:

1.  `dataset_Exploration.ipynb`
2.  `model_FineTuning.ipynb`

---

**Instructions for `dataset_Exploration.ipynb`:**

* **Purpose:** This notebook should focus entirely on loading, understanding, preparing, and visualizing the Oxford-IIIT Pet dataset.
* **Content:**
    * Include code and explanations for loading the dataset using `torchvision.datasets.OxfordIIITPet`.
    * Implement and explain the necessary data transformations:
        * Basic transforms (Resize, ToTensor).
        * ImageNet Normalization (explain why it's needed).
        * Data Augmentation for the training set (e.g., RandomResizedCrop, RandomHorizontalFlip, etc. - explain your choices).
    * Include code to create `DataLoader` instances for training and validation sets using the appropriate transforms.
    * Add the dataset exploration code from Homework 1: Calculate and display per-breed image counts and visualize sample images from several different breeds.
* **Outcome:** This notebook should be runnable and clearly demonstrate how the data is prepared for training.

---

**Instructions for `Model_FineTuning.ipynb`:**

* **Purpose:** This notebook will handle model loading, adaptation, the complete fine-tuning process (with early stopping), evaluation, and results visualization for *both* MobileNetV3 and ResNet18.
* **Content - Part 1: Setup:**
    * Import necessary libraries (torch, torchvision, optim, nn, sklearn.metrics, matplotlib, etc.).
    * Define helper functions:
        * Include the `validate_and_collect_preds` function (from L3) to get predictions/labels during validation.
        * Include the `calculate_and_display_metrics` function (from L3) for detailed evaluation.
        * You might want to refactor the `train_one_epoch` function (from L2) if helpful.
    * Load *both* pre-trained MobileNetV3 (`MobileNet_V3_Large_Weights`) and ResNet18 (`ResNet18_Weights`) models.
    * For *each* model: Adapt the classifier head for 37 classes and freeze/unfreeze layers according to the fine-tuning strategy discussed in Lecture 3 (e.g., unfreeze last few blocks of MobileNetV3, unfreeze `layer4` + `fc` of ResNet18). Keep track of both adapted model instances.
    * Define the loss function (`nn.CrossEntropyLoss`).

* **Content - Part 2: Training Loop with Early Stopping:**
    * Design a training loop function or structure that can train a given model for a specified number of epochs.
    * **Implement Early Stopping:**
        * Add parameters for `patience` (e.g., `patience = 5`) – how many epochs to wait for improvement before stopping.
        * Inside the loop, track the best validation accuracy achieved so far (`best_val_acc`).
        * Keep a counter for epochs since the last improvement (`epochs_no_improve`).
        * After each validation epoch:
            * If `current_val_acc > best_val_acc`: Update `best_val_acc`, save the `model.state_dict()` (best weights), reset `epochs_no_improve` to 0.
            * Else: Increment `epochs_no_improve`.
            * If `epochs_no_improve >= patience`: Print an "Early stopping" message and break the loop.
        * After the loop finishes (either by completing all epochs or early stopping), make sure to load the *best saved weights* back into the model (`model.load_state_dict(best_weights)`).
    * Ensure the loop stores history (losses, accuracies) for plotting.
    * Ensure the loop calls `validate_and_collect_preds` and `calculate_and_display_metrics` *after* the validation step for the current epoch (and potentially again after loading best weights at the end).

* **Content - Part 3: Execution and Comparison:**
    * Define the optimizer setup (using differential learning rates) specifically for MobileNetV3 (as shown in L3).
    * Run the training loop (with early stopping, `max_epochs=20`, `patience=5`) for the adapted MobileNetV3 model. Store/display its best validation accuracy, final metrics (P, R, F1, Confusion Matrix) corresponding to the best epoch, and plot its training/validation curves.
    * Define the optimizer setup (using differential learning rates) specifically for ResNet18 (as shown in L3).
    * Run the training loop (with early stopping, `max_epochs=20`, `patience=5`) for the adapted ResNet18 model. Store/display its best validation accuracy, final metrics corresponding to the best epoch, and plot its training/validation curves.
    * Add a final Markdown cell briefly summarizing the best validation accuracy achieved by each model.

---

**Deliverables:**

Submit a zip file containing your two completed Jupyter Notebooks:
1.  `Dataset_Exploration.ipynb`
2.  `Model_FineTuning.ipynb`

Ensure both notebooks run without errors and generate the required outputs (printouts, plots, metrics).

---

**Does this homework provide all material for the final write-up?**

Upon completing this homework, you should have:
* The fully implemented and documented code for data preparation and model fine-tuning.
* The core quantitative results (best accuracies, detailed metrics, loss/accuracy plots) for both models.
* Saved best model weights (implicitly, via the early stopping logic).

The remaining part of the final project write-up would involve:
* Writing the textual explanations and interpretations of your methods and results.
* Structuring everything into a final report or comprehensive `README.md`.
* Making sure your GitHub repository is well-organized and includes all necessary files.