# Transfer Learning with a Pretrained VGG Network

**Objective:** In this exercise, you will learn how to perform transfer learning by using a VGG network pretrained on ImageNet and fine-tuning it for the CIFAR-10 dataset.

This notebook will cover:
1.  Loading a pretrained model from `torchvision.models`.
2.  Adapting the input data to the pretrained model's requirements.
3.  Freezing the weights of the convolutional base to leverage learned features.
4.  Replacing the model's classifier for a new dataset.
5.  Training only the new classifier for efficient fine-tuning.

## 1. Setup and Imports

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models
import torch

## 2. Loading and Preparing the CIFAR-10 Dataset

Pretrained models like VGG were trained on ImageNet and expect input images of a specific size (224x224) and normalization. We must process our CIFAR-10 images (32x32) to match these requirements.

**Your Task:** Define the `transform` pipeline. It must:
1. Resize the images to 224x224.
2. Convert them to PyTorch Tensors.
3. Normalize them with the standard ImageNet mean and standard deviation.

In [None]:
# TODO: Define transformations for the dataset
# The mean and std for ImageNet are [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]
transform = transforms.Compose([
    # Your code here
    # 将图像大小调整为 224x224
    transforms.Resize((224,224)),
    
    # 2. 转换为 PyTorch 张量
    transforms.ToTensor(),
    
    # 3. 使用 ImageNet 的均值和标准差进行标准化
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load the datasets
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32, shuffle=True, num_workers=2) # Using a smaller batch size due to larger image size

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Files already downloaded and verified
Files already downloaded and verified


## 3. Load Pretrained VGG Model and Adapt for CIFAR-10

We will load a VGG16 model pretrained on ImageNet. Then, we will perform two key steps:
1.  **Freeze the feature extractor:** The convolutional layers have already learned powerful features. We will freeze them to prevent their weights from being updated during training.
2.  **Replace the classifier:** The original classifier was trained for 1000 ImageNet classes. We need to replace it with a new one for our 10 CIFAR-10 classes.

**Your Task:** 
1. Load the pretrained `vgg16` model.
2. Freeze the parameters of the `features` part of the model by setting `requires_grad` to `False`.
3. Replace the final layer of the `classifier` with a new `nn.Linear` layer suitable for 10 classes.

In [None]:
# TODO: Load the pretrained VGG16 model
model = models.vgg16(pretrained=True)

# TODO: Freeze the feature extractor layers
# Your code here
for param in model.features.parameters():
    param.requires_grad = False


# TODO: Replace the classifier
# The original VGG16 classifier's last layer is at index 6
num_features = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_features, 10)


# Move the model to the correct device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)

print(model)
print(f'Model is on device: {next(model.parameters()).device}')

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

## 4. Define Loss Function and Optimizer

We will use `CrossEntropyLoss` and the `Adam` optimizer. Importantly, we only want to train the parameters of the new classifier, not the frozen layers.

**Your Task:** Instantiate the loss function and the optimizer. Make sure the optimizer is only passed the parameters of the classifier that need to be trained.

In [None]:
# TODO: Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

## 5. Train and Test the Network

Now we will write the training and testing loops. This process should be much faster than training from scratch because we are only updating the weights of the small classifier part of the network.

**Your Task:** Complete the training and testing loops.

In [10]:
def train(epoch):
    print(f'Epoch: {epoch}')
    model.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)
        
        # TODO: Complete the training steps (zero grad, forward, loss, backward, step)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

        if batch_idx % 100 == 0:
            print(f'Loss: {train_loss/(batch_idx+1):.3f} | Acc: {100.*correct/total:.3f}% ({correct}/{total})')

def test():
    model.eval()
    test_loss = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(testloader):
            inputs, targets = inputs.to(device), targets.to(device)
            # TODO: Complete the testing steps (forward pass and loss calculation)
            

            test_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

    print(f'Test Loss: {test_loss/len(testloader):.3f} | Test Acc: {100.*correct/total:.3f}%')

for epoch in range(3): # Train for 3 epochs for demonstration
    train(epoch)
    test()

Epoch: 0
Loss: 2.466 | Acc: 6.250% (2/32)
Loss: 1.115 | Acc: 61.850% (1999/3232)
Loss: 0.903 | Acc: 68.626% (4414/6432)
Loss: 0.809 | Acc: 71.750% (6911/9632)
Loss: 0.749 | Acc: 73.862% (9478/12832)
Loss: 0.708 | Acc: 75.281% (12069/16032)
Loss: 0.680 | Acc: 76.253% (14665/19232)


KeyboardInterrupt: 

## 6. Bonus Questions

1.  What happens if you don't freeze the convolutional layers? How does it affect training time and accuracy? (Hint: You would pass `model.parameters()` to the optimizer).
2.  Try fine-tuning more than just the classifier. For example, unfreeze the last convolutional block of `model.features` and add its parameters to the optimizer. You might want to use a smaller learning rate for these layers.
3.  Experiment with a different pretrained model, like `resnet18`.

In [None]:
# Unfreeze all layers for fine-tuning
for param in model.parameters():
    param.requires_grad = True

# Re-instantiate optimizer to update all parameters
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Train for 10 epochs using the existing train/test functions
for epoch in range(10):
    train(epoch)
    test()