In [None]:
# DS776 Auto-Update (runs in ~2 seconds, only updates when needed)
# If this cell fails, see Lessons/Course_Tools/AUTO_UPDATE_SYSTEM.md for help
%run ../../Lessons/Course_Tools/auto_update_introdl.py

# Homework 05 Assignment
**Name:** [Student Name Here]  
**Total Points:** 50

## Submission Checklist
- [ ] All code cells executed with output saved
- [ ] All questions answered
- [ ] Notebook converted to HTML (use the Homework_05_Utilities notebook)
- [ ] Canvas notebook filename includes `_GRADE_THIS_ONE`
- [ ] Files uploaded to Canvas

---

# Transfer Learning

In this homework you'll experiment with applying transfer learning for fine-grained classification using the Flowers102 dataset in torchvision.datasets. Fine-grained classification is when you have many categories or classes that are similar like related species of flowers. Or, for example, trying to distinguish breeds of dogs as opposed to cats, dogs, and foxes.

Note: we were able to train all the models described in this homework in about 40 minutes on the T4 Compute Server. The ConvNeXt model was the biggest and took the most time.

In [None]:
# === YOUR IMPORTS HERE ===
# Add any additional imports you need below this line

from introdl.utils import config_paths_keys

# Configure paths
paths = config_paths_keys()
DATA_PATH = paths['DATA_PATH']
MODELS_PATH = paths['MODELS_PATH']
# === END YOUR IMPORTS ===

## [5 pts] Data Exploration

First, let's explore the Flowers102 dataset to understand what we're working with. Load the dataset, examine the number of classes, display some sample images with their labels, and analyze the dataset size and structure.


In [None]:
# === YOUR CODE HERE ===
# TODO: Load the Flowers102 dataset and explore its structure
# - Load training, validation, and test splits
# - Print dataset sizes and number of classes
# - Display sample images with their class labels
# - Analyze the class distribution


# === END YOUR CODE ===

## [5 pts] Augmentation and DataLoaders

Build your transforms for training. Remember that for testing and validation the transforms shouldn't add any augmentation. The images should be $224 \times 224$ when transformed since our pretrained models were trained on Imagenet with the same size images. We used `batch_size = 32` on the T4 Compute Servers. For normalization use the statistics from Imagenet since the pretrained models we are using expect that normalization.

In [None]:
# === YOUR CODE HERE ===
# TODO: Create data transforms and DataLoaders
# - Create training transforms with augmentation (appropriate for fine-grained classification)
# - Create validation/test transforms without augmentation
# - Use ImageNet normalization statistics
# - Create DataLoaders with batch_size=32


# === END YOUR CODE ===

## [5 pts] ResNet50

The ResNet models establish good baselines for results.

Build a custom model class for ResNet50 (AI may be helpful here) with an adjustable number of output classes. It should have methods to freeze and unfreeze the backbone. Apply transfer learning instantiating your model with the default Imagenet weights and training with for 5 epochs followed by training for a suitable number of epochs (you may need to experiment). Include graphics or display dataframes to show how the model is converging (at least for the unfrozen training).

Use the training and validation sets here. The test set will be reserved for your final best model.

What kind of validation accuracy are you able to achieve? Is the model overfitting?

Note: the training dataset is already pretty small so downsampling it to expedite experimentation isn't a good idea, but you could temporarily reduce the size of the images to say 128x128 in your transforms to get things working, then go back to 224x224 to train your models. All final results should be done with 224x224.

In [None]:
# === YOUR CODE HERE ===
# TODO: Create a custom ResNet50 model class
# - Load pretrained ResNet50 with ImageNet weights
# - Replace final classifier layer for 102 flower classes
# - Add methods to freeze/unfreeze backbone weights
# - Train with frozen backbone for 5 epochs, then unfreeze and continue training


# === END YOUR CODE ===

📝 **What validation accuracy did you achieve? Is the model overfitting?**

## [5 pts] EfficientNet V2 Small

EfficientNet models are a modern upgrade to traditional convolutional neural networks, offering improved performance and efficiency. Repeat what you did for ResNet50 for EfficientNet V2 Small. Use AI to search for how to load it in torchvision and how to adapt in your custom model class.

In [None]:
# === YOUR CODE HERE ===
# TODO: Create EfficientNet V2 Small model
# - Load pretrained EfficientNet V2 Small with ImageNet weights
# - Adapt the model for 102 flower classes
# - Apply the same two-phase training approach as ResNet50


# === END YOUR CODE ===

## [5 pts] ConvNeXt Small

ConvNeXt models are a family of convolutional neural networks that aim to modernize the design of traditional CNNs by incorporating elements from vision transformers. They provide a strong performance baseline for various computer vision tasks. Use transfer learning to train a ConvNeXT Small (not Tiny) model on Flowers102.

In [None]:
# === YOUR CODE HERE ===
# TODO: Create ConvNeXt Small model
# - Load pretrained ConvNeXt Small with ImageNet weights
# - Adapt the model for 102 flower classes
# - Apply the same two-phase training approach as previous models


# === END YOUR CODE ===

## [5 pts] ViT Small

Vision Transformers (ViTs) are a type of neural network architecture that leverages the transformer model, originally designed for natural language processing, to process image data. Unlike Convolutional Neural Networks (CNNs), which use convolutional layers to capture spatial hierarchies, ViTs divide images into patches and process them as sequences, allowing for global context understanding. ViTs typically require more data to train from scratch compared to CNNs, but they can be effectively used for transfer learning on smaller datasets if the images are similar to those in the Imagenet dataset. We'll learn more about transformer models in the second half of the course.

We'll use the timm library which doesn't seem to be installed in CoCalc.
To use ViT Small from the timm library, you can install timm with the following command:
```python
!pip install timm
```
Then, load the pre-trained ViT Small model with:
```python
import timm
model = timm.create_model('vit_small_patch16_224', pretrained=True)
```

(Note: you'll need to copy this code from this markdown cell to a regular code cell for the installation to work correctly.)

The ViT Small model is pretrained on Imagenet and expects the same size images and same normalization as other models. Typically we fine tune the whole model and don't train with a frozen backbone. The learning rates used are usually smaller, too. Do the same kind of fine tuning as you've done above using OneCycleLR with max_lr = 0.0005. We found that the number of epochs needed was similar to the total number of epochs used in the two-phase training used by our other models.

In [None]:
# === YOUR CODE HERE ===
# TODO: Install and use ViT Small from timm library
# - Install timm library
# - Create ViT Small model with ImageNet pretrained weights
# - Fine-tune the whole model (don't use frozen backbone approach)
# - Use OneCycleLR scheduler with max_lr=0.0005


# === END YOUR CODE ===

## [10 pts] Apply Best Model to Test Data and Evaluate

Write a brief summary of your investigations above. Include a graph comparing the training metrics from the fine-tuning phases on the validation data from above.

Generate a classification report comparing the predictions of your best model to the ground truth labels on the test dataset. Summarize the highlights of the report. A confusion matrix display probably isn't helpful because there are so many classes (set `display_confusion=False` if use `evaluate_classifier` from `introdl.utils`.) But you can look at slices of the confusion matrix. Try to identify at least two classes which are being confused by your model and display examples, with proper labels, from those classes.

In [None]:
# === YOUR CODE HERE ===
# TODO: Compare all models and evaluate best one on test data
# - Create comparison plots of validation metrics from all models
# - Select the best performing model based on validation results
# - Evaluate the best model on the test dataset
# - Generate classification report (set display_confusion=False)
# - Identify and display examples of confused classes


# === END YOUR CODE ===

📝 **YOUR SUMMARY OF MODEL COMPARISONS:**

## [8 pts] Questions from Chapter 13.1-13.3 Reading

**Question 1 (3 pts):** Section 13.1 explains the fundamental concept of transfer learning and when it works best. Based on the reading:
- What is the key advantage of transfer learning over training from scratch, especially regarding labeled data requirements?
- Why do transfer learning approaches work particularly well for computer vision tasks? What structural similarities make this possible?
- According to the reading, what are the two critical factors that determine transfer learning success: dataset size and what other factor?

📝 **YOUR ANSWER HERE:**

**Question 2 (3 pts):** Section 13.2 compares warm start vs frozen weight approaches. From your implementation above and the reading:
- Explain the difference between "warm start" and "frozen weights" approaches in transfer learning. What parameters change in each method?
- According to Figure 13.5 and the discussion, when should you prefer frozen weights over warm start? What advantage does frozen weights provide with limited data?
- In your experiments above, how did the performance compare between different transfer learning approaches? Does this match the textbook's predictions?

📝 **YOUR ANSWER HERE:**

**Question 3 (2 pts):** Section 13.3 discusses the relationship between dataset size, number of parameters, and model performance:
- The reading presents a simplified equation: θ = θ* + ε⋅D/N. Explain what this equation means in practical terms for model training.
- How does freezing weights effectively reduce the "D" term in this equation, and why does this help when you have limited training data?

📝 **YOUR ANSWER HERE:**

## [2 pts] Reflection

1. What, if anything, did you find difficult to understand for this lesson? Why?

📝 **YOUR ANSWER HERE:**

2. What resources did you find supported your learning most and least for this lesson? (Be honest - I use your input to shape the course.)

📝 **YOUR ANSWER HERE:**

### Export Notebook to HTML for Canvas Upload

Uncomment the two lines below and run the cell to export the current notebook to HTML.

In [None]:
# from introdl import export_this_to_html
# export_this_to_html()