# SPML HW4: Model Extraction

In this notebook you'll explore model extraction.

In [1]:
######### Make sure to put your info #########
name = 'Amir Mohammad Ezzati'
std_id = '402212269'
##############################################

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import trange
import matplotlib.pyplot as plt
from tqdm import trange, tqdm
import torch.nn.functional as F


import torchvision
from torchvision import transforms, datasets, models

# Define device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [3]:
from google.colab import drive
drive.mount('/content/drive')
base_path = '/content/drive/MyDrive/SPML/'

Mounted at /content/drive


# Loading CIFAR100 (5 points)

Load the `CIFAR100` dataset. Make sure you resize the images to be `224x224` (same as the input size of resnet).

In [4]:
# TODO: Load CIFAR-100 dataset

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

train_dataset = datasets.CIFAR100(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.CIFAR100(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False, num_workers=2)

print("Train dataset size:", len(train_dataset))
print("Test dataset size:", len(test_dataset))

Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ./data/cifar-100-python.tar.gz


100%|██████████| 169M/169M [00:05<00:00, 31.2MB/s]


Extracting ./data/cifar-100-python.tar.gz to ./data
Files already downloaded and verified
Train dataset size: 50000
Test dataset size: 10000


# Pre-trained ResNet34 (10 points)

Load a pre-trained ResNet34 and train it on the `CIFAR100` dataset.

In [None]:
# TODO: Load pretrained ResNet-34 model

resnet34 = models.resnet34(pretrained=True)
num_features = resnet34.fc.in_features
resnet34.fc = nn.Linear(num_features, 100)
resnet34 = resnet34.to(device)

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:00<00:00, 117MB/s]


In [None]:
# TDOO: Train the model on CIFAR100

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(resnet34.parameters(), lr=0.001)

def train_model(model, train_loader, criterion, optimizer, device, epochs=10):
    model.train()
    for epoch in range(epochs):
        running_loss = 0.0
        correct = 0
        total = 0

        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            running_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        epoch_loss = running_loss / len(train_loader)
        epoch_accuracy = 100 * correct / total

        print(f"Epoch [{epoch + 1}/{epochs}], Loss: {epoch_loss:.4f}, Accuracy: {epoch_accuracy:.2f}%")


train_model(resnet34, train_loader, criterion, optimizer, device, epochs=10)

Epoch [1/10], Loss: 1.9006, Accuracy: 48.02%
Epoch [2/10], Loss: 1.1406, Accuracy: 66.29%
Epoch [3/10], Loss: 0.8245, Accuracy: 74.84%
Epoch [4/10], Loss: 0.6050, Accuracy: 80.81%
Epoch [5/10], Loss: 0.4396, Accuracy: 85.78%
Epoch [6/10], Loss: 0.3318, Accuracy: 89.18%
Epoch [7/10], Loss: 0.2500, Accuracy: 91.79%
Epoch [8/10], Loss: 0.2035, Accuracy: 93.37%
Epoch [9/10], Loss: 0.1658, Accuracy: 94.67%
Epoch [10/10], Loss: 0.1484, Accuracy: 95.21%


You might want to save this model to avoid retraining.

In [None]:
torch.save(resnet34.state_dict(), base_path + "resnet34_cifar100.pth")

In [None]:
# load saved model
resnet34 = models.resnet34(pretrained=False)
num_features = resnet34.fc.in_features
resnet34.fc = nn.Linear(num_features, 100)
resnet34 = resnet34.to(device)
resnet34.load_state_dict(torch.load(base_path + "resnet34_cifar100.pth"))

  resnet34.load_state_dict(torch.load(base_path + "resnet34_cifar100.pth"))


<All keys matched successfully>

In [None]:
def test_clean(model, dataloader=test_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, targets in dataloader:
            inputs, targets = inputs.to(device), targets.to(device)

            outputs = model(inputs)

            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()

    accuracy = 100.0 * correct / total
    return accuracy

In [None]:
print(f'Victim Test Accuracy {test_clean(resnet34, test_loader):.2f}%')

Victim Test Accuracy 68.99%


# Model Extraction (20 points)

Here we use knowledge distillation to extract models. If you are confused after the instructions take a look at the next section to understand what we are trying to do. The general steps in Knowledge Distillation are as follows:

1. Set the victim (teacher) to evaluation mode and the attacker (student) to training mode.
2. Use the victim to find the logits for each batch of inputs.
3. Predict the attackers output for the same batch of inputs.
4. Define and reduce the loss function over the difference between logits from the victim and attacker (use KL-Divergence, ...)
5. Repeat steps 2-4 for the number of epochs.

Feel free to check out [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) to get a better sense of the process.

In [27]:
def knowledge_distillation(victim_model, attacker_model, loader, optimizer, epochs, T):
    # TODO
    attacker_model.train()
    victim_model.eval()
    loss_fn = nn.KLDivLoss(reduction='batchmean')  # KL divergence for soft labels

    for epoch in range(epochs):
        running_loss = 0
        correct = 0
        total = 0

        for inputs, targets in tqdm(loader, desc="Distillation Training", leave=False):
            inputs, targets = inputs.to(device), targets.to(device)

            with torch.no_grad():
                victim_model_outputs = victim_model(inputs) / T
            attacker_model_outputs = attacker_model(inputs) / T  # Apply temperature scaling to both logits

            victim_model_probs = F.softmax(victim_model_outputs, dim=1)
            attacker_model_log_probs = F.log_softmax(attacker_model_outputs, dim=1)

            distill_loss = loss_fn(attacker_model_log_probs, victim_model_probs) * (T**2)

            optimizer.zero_grad()
            distill_loss.backward()
            optimizer.step()

            running_loss += distill_loss.item()

        epoch_loss = running_loss / len(loader)
        print(f"Epoch {epoch + 1}/{epochs} - Loss: {epoch_loss:.4f}")

Can you explain how we should set the temperature? Why is this choice appropriate for model extraction?

`your response:`

The temperature (T) in knowledge distillation smooths the logits, exposing richer inter-class relationships by reducing the dominance of the top probability class. For model extraction, a  high T(like T=3 or T=4) ensures the student captures the victim's behavior efficiently, making the attack more effective.

Using a higher T balances information transfer by highlighting secondary classes, which prevents overfitting to the teacher's most confident predictions and maximizes the value of each victim query during model extraction.


# Attack Transferability (20 points)

Implement attacks such as FGSM or PGD, you can use code from previous homeworks or readily available libraries.

In [36]:
# TODO: Load or implement attacks

def attack_fgsm(model, x, y, epsilon, temp):
    ce = torch.nn.CrossEntropyLoss()
    x_adv = x.clone().detach().requires_grad_(True)
    model.eval()

    with torch.enable_grad():
        outputs = model(x_adv) / temp
        loss_ = ce(outputs, y)
        loss_.backward()

        x_adv = x_adv + epsilon * x_adv.grad.sign()
        x_adv = torch.clamp(x_adv, 0, 1) # Clip to valid range

    return x_adv.detach()

Fill in the following function to attack a model and report the accuracy of the victim on the adversarial examples generated using the available model.

In [41]:
def transferability_attack(model, victim, loader, attack):
    # TODO
    victim.eval()
    model.eval()
    correct, total = 0, 0
    for x, y in tqdm(loader):
        x, y = x.to(device), y.to(device)

        # Generate adversarial examples using the surrogate
        x_adv = attack(model, x, y, epsilon=1/255, temp=1)  # or use attack_pgd

        # Evaluate the victim on these adversarial examples
        with torch.no_grad():
            outputs = victim(x_adv)
            pred = outputs.argmax(dim=1)
            correct += (pred == y).sum().item()
            total += y.size(0)

    return 100 * correct / total


# CIFAR10 (35 points)

In [23]:
victim_model = models.resnet34(pretrained=False)
victim_model.fc = nn.Linear(victim_model.fc.in_features, 100)
victim_model = victim_model.to(device)
victim_model.load_state_dict(torch.load(base_path + "resnet34_cifar100.pth"))

  victim_model.load_state_dict(torch.load(base_path + "resnet34_cifar100.pth"))


<All keys matched successfully>

## Loading and Exploration (5 points)

First load the `CIFAR10` dataset.

In [17]:
# TODO: Load CIFAR-10 dataset

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])

cifar10_train = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
cifar10_test = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

cifar10_train_loader = DataLoader(cifar10_train, batch_size=64, shuffle=True, num_workers=2)
cifar10_test_loader = DataLoader(cifar10_test, batch_size=64, shuffle=False, num_workers=2)

print("CIFAR-10 Training Samples:", len(cifar10_train))
print("CIFAR-10 Testing Samples:", len(cifar10_test))

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:04<00:00, 42.5MB/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
CIFAR-10 Training Samples: 50000
CIFAR-10 Testing Samples: 10000


Which classes from the `CIFAR10` dataset are present in `CIFAR100` classes?

In [22]:
# TODO: Check if classes are present in both datasets
cifar10_classes = cifar10_train.classes
cifar100_classes = datasets.CIFAR100(root='./data', train=True, download=True).classes
overlapping_classes = set(cifar10_classes).intersection(cifar100_classes)
print("Classes present in both CIFAR-10 and CIFAR-100:", overlapping_classes)

Files already downloaded and verified
Classes present in both CIFAR-10 and CIFAR-100: set()


Now use the test dataset from `CIFAR10` to extract the model.

## Pre-trained ResNet18 (10 points)

Use the pre-trained ResNet18 dataset and extract the model using knowledge distillation.

In [29]:
# TODO: Load pretrained ResNet-18 model
attacker_model = models.resnet18(pretrained=True)
attacker_model.fc = nn.Linear(attacker_model.fc.in_features, 100)
attacker_model = attacker_model.to(device)

# TODO: Extract the model
optimizer = torch.optim.Adam(attacker_model.parameters(), lr=1e-3)
knowledge_distillation(victim_model, attacker_model, cifar10_test_loader, optimizer, epochs=15, T=4)



Epoch 1/15 - Loss: 5.8881




Epoch 2/15 - Loss: 4.0006




Epoch 3/15 - Loss: 2.9323




Epoch 4/15 - Loss: 2.3086




Epoch 5/15 - Loss: 1.9722




Epoch 6/15 - Loss: 1.6672




Epoch 7/15 - Loss: 1.4431




Epoch 8/15 - Loss: 1.2679




Epoch 9/15 - Loss: 1.1168




Epoch 10/15 - Loss: 1.0086




Epoch 11/15 - Loss: 0.9536




Epoch 12/15 - Loss: 0.8831




Epoch 13/15 - Loss: 0.8290




Epoch 14/15 - Loss: 0.7620


                                                                        

Epoch 15/15 - Loss: 0.7094




What is the accuracy of the extracted model on the `CIFAR100` test set?

In [30]:
# TODO: Report accuracy on CIFAR100
print(f'attacker_model Test Accuracy on CIFAR100 {test_clean(attacker_model, test_loader):.2f}%')

attacker_model Test Accuracy on CIFAR100 40.61%


In [43]:
acc = transferability_attack(attacker_model, victim_model, test_loader, attack=attack_fgsm)
print(f'\nTransfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: {acc:.2f}%')

100%|██████████| 79/79 [00:44<00:00,  1.78it/s]


Transfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: 51.87%





## ResNet18 (10 points)

Repeat the pervious steps but without pre-training.

In [48]:
# TODO: Load ResNet-18 model
attacker_model = models.resnet18(pretrained=False)
attacker_model.fc = nn.Linear(attacker_model.fc.in_features, 100)
attacker_model = attacker_model.to(device)

# TODO: Extract the model
optimizer = torch.optim.Adam(attacker_model.parameters(), lr=1e-3)
knowledge_distillation(victim_model, attacker_model, cifar10_test_loader, optimizer, epochs=15, T=4)



Epoch 1/15 - Loss: 8.8798




Epoch 2/15 - Loss: 7.7051




Epoch 3/15 - Loss: 7.0544




Epoch 4/15 - Loss: 6.4304




Epoch 5/15 - Loss: 5.8571




Epoch 6/15 - Loss: 5.3481




Epoch 7/15 - Loss: 4.8483




Epoch 8/15 - Loss: 4.3758




Epoch 9/15 - Loss: 3.9068




Epoch 10/15 - Loss: 3.4956




Epoch 11/15 - Loss: 3.2488




Epoch 12/15 - Loss: 2.9970




Epoch 13/15 - Loss: 2.6992




Epoch 14/15 - Loss: 2.3891


                                                                        

Epoch 15/15 - Loss: 2.1201




Measure the accuracy of the newly distillied attacker and compare your results from the previous section.

In [49]:
# TODO: Report accuracy on CIFAR100
print(f'attacker_model Test Accuracy on CIFAR100 {test_clean(attacker_model, test_loader):.2f}%')

attacker_model Test Accuracy on CIFAR100 14.49%


In [50]:
acc = transferability_attack(attacker_model, victim_model, test_loader, attack=attack_fgsm)
print(f'\nTransfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: {acc:.2f}%')

100%|██████████| 79/79 [00:45<00:00,  1.74it/s]


Transfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: 65.53%





What are the effects of pre-training?

`your response:`

Pre-training improves the attacker's generalization and transferability:

1. Higher Accuracy: The pre-trained attacker achieves better performance on CIFAR100 (40.61% vs. 14.49%).
2. Stronger Attack: Adversarial examples from the pre-trained model are more transferable, reducing the victim model's accuracy more significantly (51.87% vs. 65.53%).

Pre-training enables better feature extraction, making attacks more effective.

## Full Dataset (10 points)

Repeat your experiments using the pre-trained ResNet18 but this time use the entire CIFAR10 dataset.

In [51]:
# TODO: Load pretrained ResNet-18 model
attacker_model = models.resnet18(pretrained=True)
attacker_model.fc = nn.Linear(attacker_model.fc.in_features, 100)
attacker_model = attacker_model.to(device)

# TODO: Extract the model
optimizer = torch.optim.Adam(attacker_model.parameters(), lr=1e-3)
knowledge_distillation(victim_model, attacker_model, cifar10_train_loader, optimizer, epochs=15, T=4)



Epoch 1/15 - Loss: 4.0377




Epoch 2/15 - Loss: 2.5095




Epoch 3/15 - Loss: 1.8764




Epoch 4/15 - Loss: 1.4893




Epoch 5/15 - Loss: 1.2447




Epoch 6/15 - Loss: 1.0886




Epoch 7/15 - Loss: 0.9811




Epoch 8/15 - Loss: 0.8911




Epoch 9/15 - Loss: 0.8281




Epoch 10/15 - Loss: 0.7778




Epoch 11/15 - Loss: 0.7297




Epoch 12/15 - Loss: 0.6855




Epoch 13/15 - Loss: 0.6514




Epoch 14/15 - Loss: 0.6183


                                                                        

Epoch 15/15 - Loss: 0.5916




Report the accuracy on the `CIFAR100` testset once more.

In [52]:
# TODO: Report accuracy on CIFAR100
print(f'attacker_model Test Accuracy on CIFAR100 {test_clean(attacker_model, test_loader):.2f}%')

attacker_model Test Accuracy on CIFAR100 55.35%


In [53]:
acc = transferability_attack(attacker_model, victim_model, test_loader, attack=attack_fgsm)
print(f'\nTransfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: {acc:.2f}%')

100%|██████████| 79/79 [00:43<00:00,  1.82it/s]


Transfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: 37.11%





What are the effects of using more data?

`your response:`

Using more data significantly improves the attack model's generalization and the effectiveness of adversarial attacks:

1. Higher Accuracy: Training on the larger CIFAR10 trainset boosts the attacker's test accuracy on CIFAR100 to 55.35% (compared to 40.61% with CIFAR10 testset).
2. Stronger Attack: Adversarial examples transfer more effectively, reducing the victim model's accuracy drastically to 37.11% (compared to 51.87%).

# CIFAR100 (10 points)

This time, use the training dataset from `CIFAR100` and perform knowledge distillation on a pre-trained ResNet18.

In [54]:
# TODO: Load pretrained ResNet-18 model
attacker_model = models.resnet18(pretrained=True)
attacker_model.fc = nn.Linear(attacker_model.fc.in_features, 100)
attacker_model = attacker_model.to(device)

# TODO: Extract the model
optimizer = torch.optim.Adam(attacker_model.parameters(), lr=1e-3)
knowledge_distillation(victim_model, attacker_model, train_loader, optimizer, epochs=15, T=4)



Epoch 1/15 - Loss: 9.4394




Epoch 2/15 - Loss: 4.5680




Epoch 3/15 - Loss: 3.0684




Epoch 4/15 - Loss: 2.2066




Epoch 5/15 - Loss: 1.7295




Epoch 6/15 - Loss: 1.4407




Epoch 7/15 - Loss: 1.3003




Epoch 8/15 - Loss: 1.1998




Epoch 9/15 - Loss: 1.1286




Epoch 10/15 - Loss: 1.0823




Epoch 11/15 - Loss: 1.0405




Epoch 12/15 - Loss: 0.9963




Epoch 13/15 - Loss: 0.9441




Epoch 14/15 - Loss: 0.8948


                                                                        

Epoch 15/15 - Loss: 0.8496




How does the accuracy change now?

In [55]:
# TODO: Report accuracy on CIFAR100
print(f'attacker_model Test Accuracy on CIFAR100 {test_clean(attacker_model, test_loader):.2f}%')

attacker_model Test Accuracy on CIFAR100 71.03%


In [56]:
acc = transferability_attack(attacker_model, victim_model, test_loader, attack=attack_fgsm)
print(f'\nTransfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: {acc:.2f}%')

100%|██████████| 79/79 [00:44<00:00,  1.77it/s]


Transfer FGSM attack with epsilon = 1/255 from attacker_model to victim_model: 42.04%





Why do you suppose using the `CIFAR100` had the following results? Explain your observations.

`your response:`

1. Training on the same dataset as the victim model (CIFAR100) allows the attacker to specialize in learning features specific to CIFAR100, leading to better performance on the CIFAR100 test set.
2. Since both the attacker and victim are trained on the same data, the adversarial perturbations are less "universal" and more tailored to the victim model's specific architecture. However, this still results in a strong transfer attack (42.04%).