# SPML HW4: Model Extraction

In this notebook you'll explore model extraction.

In [1]:
######### Make sure to put your info #########
name = 'AmirHossein Haji Mohammad Rezaei'
std_id = '99109252'
##############################################

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import ConcatDataset, Dataset, DataLoader
from tqdm import tqdm, trange
import matplotlib.pyplot as plt


import torchvision
from torchvision import transforms, datasets, models

# Define device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Loading CIFAR100 (5 points)

Load the `CIFAR100` dataset. Make sure you resize the images to be `224x224` (same as the input size of resnet).

In [3]:
# TODO: Load CIFAR-100 dataset

batch_size = 128
train_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandomCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

test_transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = datasets.cifar.CIFAR100(root='./data', download=True, train=True, transform=train_transform)
testset = datasets.cifar.CIFAR100(root='./data', download=True, train=False, transform=test_transform)

trainloader = DataLoader(trainset, batch_size=batch_size, num_workers=2, shuffle=True)
testloader = DataLoader(testset, batch_size=batch_size, num_workers=2, shuffle=False)

Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ./data/cifar-100-python.tar.gz


100%|██████████| 169001437/169001437 [00:05<00:00, 31807194.65it/s]


Extracting ./data/cifar-100-python.tar.gz to ./data
Files already downloaded and verified


# Pre-trained ResNet34 (10 points)

Load a pre-trained ResNet34 and train it on the `CIFAR100` dataset.

In [4]:
# TODO: Load pretrained ResNet-34 model

class ResNet34(nn.Module):
    def __init__(self, pretrained=False, num_class=100):
        super(ResNet34, self).__init__()
        if pretrained:
          self.model = models.resnet34(weights=models.ResNet34_Weights.DEFAULT)
        else:
          self.model = models.resnet34(weights=None)

        self.model.fc = nn.Linear(512, num_class)


    def forward(self, x):
        logits = self.model(x)
        return logits

model = ResNet34(pretrained=True).to(device)

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:00<00:00, 212MB/s]


In [5]:
# TDOO: Train the model on CIFAR100
def train_step(model, dataloader, loss_fn, optimizer, temperature):
    # TODO: Return loss and accuracy for each epoch
    model.train()
    avg_loss = 0
    correct = 0
    total = 0

    with tqdm(iterable=enumerate(dataloader), total=len(dataloader)) as pbar:
      for i, (img, label) in pbar:
        img = img.to(device)
        label = label.to(device)

        optimizer.zero_grad()
        logit = model(img)
        logit /= temperature
        loss = loss_fn(logit, label)
        loss.backward()
        optimizer.step()

        avg_loss += loss.item()
        correct += torch.sum(logit.argmax(dim=1) == label).item()
        total += img.shape[0]

    avg_loss /= total
    avg_acc = correct / total
    return avg_loss, avg_acc



def train_teacher(model, n_epochs, loader=trainloader, temp=1):
    # TODO: Log the accuracy and loss for each epoch
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    loss_fn = nn.CrossEntropyLoss()
    for epoch in range(1, n_epochs + 1):
      avg_loss, avg_acc = train_step(model, loader, loss_fn, optimizer, temp)
      print(f'[train] epoch: {epoch}: average loss={avg_loss:.2f}, average accuracy={avg_acc:.2f}')

      if epoch % 10 == 0:
          torch.save(model.state_dict(), f'resnet34_{epoch}.pt')

train_teacher(model, 20)

100%|██████████| 391/391 [02:15<00:00,  2.89it/s]

[train] epoch: 1: average loss=0.02, average accuracy=0.45



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 2: average loss=0.01, average accuracy=0.62



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 3: average loss=0.01, average accuracy=0.69



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 4: average loss=0.01, average accuracy=0.73



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 5: average loss=0.01, average accuracy=0.77



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 6: average loss=0.01, average accuracy=0.79



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 7: average loss=0.00, average accuracy=0.82



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 8: average loss=0.00, average accuracy=0.84



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 9: average loss=0.00, average accuracy=0.85



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 10: average loss=0.00, average accuracy=0.87



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 11: average loss=0.00, average accuracy=0.89



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 12: average loss=0.00, average accuracy=0.89



100%|██████████| 391/391 [02:14<00:00,  2.90it/s]

[train] epoch: 13: average loss=0.00, average accuracy=0.91



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 14: average loss=0.00, average accuracy=0.91



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 15: average loss=0.00, average accuracy=0.92



100%|██████████| 391/391 [02:15<00:00,  2.89it/s]

[train] epoch: 16: average loss=0.00, average accuracy=0.93



100%|██████████| 391/391 [02:14<00:00,  2.90it/s]

[train] epoch: 17: average loss=0.00, average accuracy=0.93



100%|██████████| 391/391 [02:14<00:00,  2.90it/s]

[train] epoch: 18: average loss=0.00, average accuracy=0.94



100%|██████████| 391/391 [02:14<00:00,  2.91it/s]

[train] epoch: 19: average loss=0.00, average accuracy=0.94



100%|██████████| 391/391 [02:14<00:00,  2.90it/s]

[train] epoch: 20: average loss=0.00, average accuracy=0.94





In [6]:
def test(model, dataloader=testloader):
    # TODO: Return the clean accuracy of the model
    model.eval()
    correct = 0
    total = 0

    with tqdm(iterable=enumerate(dataloader), total=len(dataloader)) as pbar:
      for i, (img, label) in pbar:
        img = img.to(device)
        label = label.to(device)

        logit = model(img)
        prob = F.softmax(logit, dim=1)
        correct += torch.sum(prob.argmax(dim=1) == label).item()
        total += img.shape[0]

    avg_acc = correct / total
    return avg_acc

In [7]:
print(f'Teacher Accuracy {100 * test(model):.2f}%')

100%|██████████| 79/79 [00:14<00:00,  5.54it/s]

Teacher Accuracy 74.70%





You might want to save this model to avoid retraining.

# Model Extraction (20 points)

Here we use knowledge distillation to extract models. If you are confused after the instructions take a look at the next section to understand what we are trying to do. The general steps in Knowledge Distillation are as follows:

1. Set the victim (teacher) to evaluation mode and the attacker (student) to training mode.
2. Use the victim to find the logits for each batch of inputs.
3. Predict the attackers output for the same batch of inputs.
4. Define and reduce the loss function over the difference between logits from the victim and attacker (use KL-Divergence, ...)
5. Repeat steps 2-4 for the number of epochs.

Feel free to check out [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) to get a better sense of the process.

In [8]:
def knowledge_distillation(victim_model, attacker_model, loader, optimizer, epochs, T):
    # TODO
    for epoch in range(1, epochs + 1):
        attacker_model.train()
        victim_model.eval()
    
        avg_loss = 0
        total = 0
    
        with tqdm(iterable=enumerate(loader), total=len(loader)) as pbar:
          for i, (img, label) in pbar:
            img = img.to(device)
            label = label.to(device)
    
            optimizer.zero_grad()
            with torch.no_grad():
              teacher_logits = victim_model(img)
    
            student_logits = attacker_model(img)
    
            soft_targets = F.softmax(teacher_logits / T, dim=-1)
            soft_prob = F.log_softmax(student_logits / T, dim=-1)
    
            soft_targets_loss = torch.sum(soft_targets * (soft_targets.log() - soft_prob)) / soft_prob.size()[0] * (T**2)


            soft_targets_loss.backward()
            optimizer.step()
    
            avg_loss += soft_targets_loss.item()
            total += img.shape[0]
    
        avg_loss /= total
        print(f'[train student] epoch: {epoch}: average loss={avg_loss:.4f}')
    

Can you explain how we should set the temperature? Why is this choice appropriate for model extraction?

`your response:` If we set the temperature a high value (like 100), the model tries to learn to learn 100x values of logits of the original model in order to have the same probabilities (softmax layer output) of the original model. Then, when we set the T=1 at the test time, the 100x logits result in the confident probabilities (almost 0-1 values) for predicting the right class.

# Attack Transferability (20 points)

Implement attacks such as FGSM or PGD, you can use code from previous homeworks or readily available libraries.

In [9]:
# TODO: Load or implement attacks

def attack_fgsm(model, x, y, epsilon):
    # TODO: Return perturbed input
    criterion = nn.CrossEntropyLoss()

    x.requires_grad = True
    model.zero_grad()
    logit = model(x)
    loss = criterion(logit, y)
    loss.backward()

    adv_x = x + epsilon*x.grad.data.sign()
    adv_x = torch.clamp(adv_x, 0, 1)

    return adv_x


def attack_pgd(model, x, y, epsilon, alpha=0.2, num_iters=10):
    # TODO: Return perturbed input
    criterion = nn.CrossEntropyLoss()
    pert_x = x.clone().to(device)

    for _ in range(num_iters):
      pert_x.requires_grad = True
      logit = model(pert_x)

      model.zero_grad()
      loss = criterion(logit, y)
      loss.backward()

      adv_x = pert_x + alpha*pert_x.grad.data.sign()
      eta = torch.clamp(adv_x - x, min=-epsilon, max=epsilon)
      pert_x = torch.clamp(x + eta, min=0, max=1).detach_()

    return pert_x



def test_attack(model, epsilon, attack=attack_fgsm, loader=testloader):
    # TODO: Return the robust accuracy for FGSM or PGD
    model.eval()

    correct = 0
    total = 0
    with tqdm(iterable=enumerate(loader), total=len(loader)) as pbar:
      for i, (img, label) in pbar:
        img = img.to(device)
        label = label.to(device)

        adv_img = attack(model, img, label, epsilon)
        logit = model(adv_img)

        prob = F.softmax(logit, dim=-1)
        correct += torch.sum(prob.argmax(dim=1) == label).item()
        total += img.shape[0]

    avg_acc = correct / total
    return avg_acc

Fill in the following function to attack a model and report the accuracy of the victim on the adversarial examples generated using the available model.

In [10]:
def transferability_attack(model, victim, loader, attack, eps):
    # TODO
    victim.eval()

    correct = 0
    total = 0
    with tqdm(iterable=enumerate(loader), total=len(loader)) as pbar:
      for i, (img, label) in pbar:
        img = img.to(device)
        label = label.to(device)

        adv_img = attack(model, img, label, eps)
        logit = victim(adv_img)

        prob = F.softmax(logit, dim=-1)
        correct += torch.sum(prob.argmax(dim=1) == label).item()
        total += img.shape[0]

    avg_acc = correct / total
    return avg_acc

# CIFAR10 (35 points)

## Loading and Exploration (5 points)

First load the `CIFAR10` dataset.

In [11]:
# TODO: Load CIFAR-10 dataset


cifar10_trainset = datasets.cifar.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
cifar10_trainloader = DataLoader(cifar10_trainset, batch_size=batch_size, shuffle=True, num_workers=2)

cifar10_testset = datasets.cifar.CIFAR10(root='./data', train=False, download=True, transform=test_transform)
cifar10_testloader = DataLoader(cifar10_testset, batch_size=batch_size, shuffle=False, num_workers=2)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:03<00:00, 48118380.38it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [12]:
combined_dataset = ConcatDataset([cifar10_trainset, cifar10_testset])
combined_loader = DataLoader(combined_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

Which classes from the `CIFAR10` dataset are present in `CIFAR100` classes?

In [13]:
# TODO: Check if classes are present in both datasets

intersection = [x for x in cifar10_trainset.classes if x.lower() in trainset.classes]
print(intersection)

[]


Now use the test dataset from `CIFAR10` to extract the model.

## Pre-trained ResNet18 (10 points)

Use the pre-trained ResNet18 dataset and extract the model using knowledge distillation.

In [14]:
# TODO: Load pretrained ResNet-18 model
model1 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT).to(device)
model1.fc = nn.Linear(512, 100).to(device)
# TODO: Extract the model

optimizer = torch.optim.Adam(model1.parameters(), lr=0.01)
knowledge_distillation(model, model1, cifar10_testloader, optimizer, 20, 100)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 180MB/s]
100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 1: average loss=0.0591



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 2: average loss=0.0509



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 3: average loss=0.0472



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 4: average loss=0.0449



100%|██████████| 79/79 [00:24<00:00,  3.17it/s]

[train student] epoch: 5: average loss=0.0427



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 6: average loss=0.0404



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 7: average loss=0.0383



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 8: average loss=0.0362



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 9: average loss=0.0342



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 10: average loss=0.0320



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 11: average loss=0.0337



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 12: average loss=0.0307



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 13: average loss=0.0283



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 14: average loss=0.0264



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 15: average loss=0.0248



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 16: average loss=0.0251



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 17: average loss=0.0229



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 18: average loss=0.0206



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 19: average loss=0.0197



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 20: average loss=0.0190





What is the accuracy of the extracted model on the `CIFAR100` test set?

In [15]:
# TODO: Report accuracy on CIFAR100
print(f'Clean Accuracy on cifar100: {100 * test(model1, testloader):.2f}%')
print(f'Robust Accuracy with FGSM epsilon=8/255 on cifar100: {100 * test_attack(model1, 8/255, attack_fgsm):.2f}%')
print(f'Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: {100 * test_attack(model1, 8/255, attack_pgd):.2f}%')

100%|██████████| 79/79 [00:11<00:00,  6.60it/s]

Clean Accuracy on cifar100: 12.39%



100%|██████████| 79/79 [00:27<00:00,  2.91it/s]

Robust Accuracy with FGSM epsilon=8/255 on cifar100: 1.84%



100%|██████████| 79/79 [03:19<00:00,  2.52s/it]

Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: 0.30%





In [16]:
print(f'transfer attack accuracy with FGSM epsilon=8/255: {100 * transferability_attack(model1, model, testloader, attack_fgsm, 8/255):.2f}%')
print(f'transfer attack accuracy with PGD epsilon=8/255: {100 * transferability_attack(model1, model, testloader, attack_pgd, 8/255):.2f}%')

100%|██████████| 79/79 [00:29<00:00,  2.65it/s]

transfer attack accuracy with FGSM epsilon=8/255: 19.73%



100%|██████████| 79/79 [03:22<00:00,  2.56s/it]

transfer attack accuracy with PGD epsilon=8/255: 19.53%





In [17]:
torch.save(model1.state_dict(), 'model1.pt')

## ResNet18 (10 points)

Repeat the pervious steps but without pre-training.

In [18]:
# TODO: Load ResNet-18 model

model2 = models.resnet18(weights=None).to(device)
model2.fc = nn.Linear(512, 100).to(device)
# TODO: Extract the model

optimizer = torch.optim.Adam(model2.parameters(), lr=0.01)
knowledge_distillation(model, model2, cifar10_testloader, optimizer, 20, 100)

100%|██████████| 79/79 [00:24<00:00,  3.17it/s]

[train student] epoch: 1: average loss=0.0560



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 2: average loss=0.0489



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 3: average loss=0.0452



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 4: average loss=0.0427



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 5: average loss=0.0405



100%|██████████| 79/79 [00:24<00:00,  3.17it/s]

[train student] epoch: 6: average loss=0.0384



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 7: average loss=0.0363



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 8: average loss=0.0343



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 9: average loss=0.0323



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 10: average loss=0.0302



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 11: average loss=0.0283



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 12: average loss=0.0268



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 13: average loss=0.0254



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 14: average loss=0.0234



100%|██████████| 79/79 [00:25<00:00,  3.16it/s]

[train student] epoch: 15: average loss=0.0212



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 16: average loss=0.0190



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 17: average loss=0.0175



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 18: average loss=0.0169



100%|██████████| 79/79 [00:25<00:00,  3.15it/s]

[train student] epoch: 19: average loss=0.0158



100%|██████████| 79/79 [00:24<00:00,  3.16it/s]

[train student] epoch: 20: average loss=0.0141





Measure the accuracy of the newly distillied attacker and compare your results from the previous section.

In [19]:
# TODO: Report accuracy on CIFAR100
print(f'Clean Accuracy on cifar100: {100 * test(model2, testloader):.2f}%')
print(f'Robust Accuracy with FGSM epsilon=8/255 on cifar100: {100 * test_attack(model2, 8/255, attack_fgsm):.2f}%')
print(f'Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: {100 * test_attack(model2, 8/255, attack_pgd):.2f}%')

100%|██████████| 79/79 [00:11<00:00,  6.71it/s]

Clean Accuracy on cifar100: 13.87%



100%|██████████| 79/79 [00:27<00:00,  2.91it/s]

Robust Accuracy with FGSM epsilon=8/255 on cifar100: 1.86%



100%|██████████| 79/79 [03:19<00:00,  2.52s/it]

Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: 0.28%





In [20]:
print(f'transfer attack accuracy with FGSM epsilon=8/255: {100 * transferability_attack(model2, model, testloader, attack_fgsm, 8/255):.2f}%')
print(f'transfer attack accuracy with PGD epsilon=8/255: {100 * transferability_attack(model2, model, testloader, attack_pgd, 8/255):.2f}%')

100%|██████████| 79/79 [00:29<00:00,  2.65it/s]

transfer attack accuracy with FGSM epsilon=8/255: 21.41%



100%|██████████| 79/79 [03:22<00:00,  2.56s/it]

transfer attack accuracy with PGD epsilon=8/255: 21.12%





In [21]:
torch.save(model2.state_dict(), 'model2.pt')

What are the effects of pre-training?

`your response:` Pre-training results in more accuracy (more fidelity) of the attecker model on cofar100 testset, and also pretraining leads into more effective adversarial examples for transferability attack.

## Full Dataset (10 points)

Repeat your experiments using the pre-trained ResNet18 but this time use the entire CIFAR10 dataset.

In [22]:
# TODO: Load pretrained ResNet-18 model
model3 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT).to(device)
model3.fc = nn.Linear(512, 100).to(device)
# TODO: Extract the model

optimizer = torch.optim.Adam(model3.parameters(), lr=0.01)
knowledge_distillation(model, model3, combined_loader, optimizer, 20, 100)

100%|██████████| 469/469 [02:27<00:00,  3.17it/s]

[train student] epoch: 1: average loss=0.0460



100%|██████████| 469/469 [02:28<00:00,  3.17it/s]

[train student] epoch: 2: average loss=0.0323



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 3: average loss=0.0246



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 4: average loss=0.0203



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 5: average loss=0.0178



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 6: average loss=0.0158



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 7: average loss=0.0145



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 8: average loss=0.0135



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 9: average loss=0.0126



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 10: average loss=0.0120



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 11: average loss=0.0115



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 12: average loss=0.0110



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 13: average loss=0.0106



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 14: average loss=0.0103



100%|██████████| 469/469 [02:28<00:00,  3.15it/s]

[train student] epoch: 15: average loss=0.0100



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 16: average loss=0.0097



100%|██████████| 469/469 [02:28<00:00,  3.15it/s]

[train student] epoch: 17: average loss=0.0094



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 18: average loss=0.0092



100%|██████████| 469/469 [02:28<00:00,  3.16it/s]

[train student] epoch: 19: average loss=0.0090



100%|██████████| 469/469 [02:28<00:00,  3.15it/s]

[train student] epoch: 20: average loss=0.0088





Report the accuracy on the `CIFAR100` testset once more.

In [23]:
# TODO: Report accuracy on CIFAR100
print(f'Clean Accuracy on cifar100: {100 * test(model3, testloader):.2f}%')
print(f'Robust Accuracy with FGSM epsilon=8/255 on cifar100: {100 * test_attack(model3, 8/255, attack_fgsm):.2f}%')
print(f'Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: {100 * test_attack(model3, 8/255, attack_pgd):.2f}%')

100%|██████████| 79/79 [00:12<00:00,  6.44it/s]

Clean Accuracy on cifar100: 53.94%



100%|██████████| 79/79 [00:27<00:00,  2.89it/s]

Robust Accuracy with FGSM epsilon=8/255 on cifar100: 5.25%



100%|██████████| 79/79 [03:19<00:00,  2.53s/it]

Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: 0.59%





In [24]:
print(f'transfer attack accuracy with FGSM epsilon=8/255: {100 * transferability_attack(model3, model, testloader, attack_fgsm, 8/255):.2f}%')
print(f'transfer attack accuracy with PGD epsilon=8/255: {100 * transferability_attack(model3, model, testloader, attack_pgd, 8/255):.2f}%')

100%|██████████| 79/79 [00:29<00:00,  2.65it/s]

transfer attack accuracy with FGSM epsilon=8/255: 12.06%



100%|██████████| 79/79 [03:22<00:00,  2.56s/it]

transfer attack accuracy with PGD epsilon=8/255: 10.59%





In [25]:
torch.save(model3.state_dict(), 'model3.pt')

What are the effects of using more data?

`your response:` Using more data resulted in higher fidelity of attacker model because the accuracy on cifar100 testset is higher than previous settings and also, the transferability attack here is more effective than others.

# CIFAR100 (10 points)

This time, use the training dataset from `CIFAR100` and perform knowledge distillation on a pre-trained ResNet18.

In [26]:
# TODO: Load pretrained ResNet-18 model
model4 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT).to(device)
model4.fc = nn.Linear(512, 100).to(device)
# TODO: Extract the model

optimizer = torch.optim.Adam(model4.parameters(), lr=0.01)
knowledge_distillation(model, model4, trainloader, optimizer, 20, 100)

100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 1: average loss=0.0758



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 2: average loss=0.0628



100%|██████████| 391/391 [02:03<00:00,  3.15it/s]

[train student] epoch: 3: average loss=0.0485



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 4: average loss=0.0377



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 5: average loss=0.0306



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 6: average loss=0.0262



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 7: average loss=0.0233



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 8: average loss=0.0211



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 9: average loss=0.0194



100%|██████████| 391/391 [02:03<00:00,  3.15it/s]

[train student] epoch: 10: average loss=0.0181



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 11: average loss=0.0171



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 12: average loss=0.0164



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 13: average loss=0.0155



100%|██████████| 391/391 [02:03<00:00,  3.15it/s]

[train student] epoch: 14: average loss=0.0149



100%|██████████| 391/391 [02:03<00:00,  3.15it/s]

[train student] epoch: 15: average loss=0.0145



100%|██████████| 391/391 [02:03<00:00,  3.15it/s]

[train student] epoch: 16: average loss=0.0139



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 17: average loss=0.0135



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 18: average loss=0.0132



100%|██████████| 391/391 [02:03<00:00,  3.16it/s]

[train student] epoch: 19: average loss=0.0127



100%|██████████| 391/391 [02:03<00:00,  3.15it/s]

[train student] epoch: 20: average loss=0.0125





How does the accuracy change now?

In [27]:
# TODO: Report accuracy on CIFAR100
print(f'Clean Accuracy on cifar100: {100 * test(model4, testloader):.2f}%')
print(f'Robust Accuracy with FGSM epsilon=8/255 on cifar100: {100 * test_attack(model4, 8/255, attack_fgsm):.2f}%')
print(f'Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: {100 * test_attack(model4, 8/255, attack_pgd):.2f}%')

100%|██████████| 79/79 [00:11<00:00,  6.61it/s]

Clean Accuracy on cifar100: 67.70%



100%|██████████| 79/79 [00:27<00:00,  2.90it/s]

Robust Accuracy with FGSM epsilon=8/255 on cifar100: 8.35%



100%|██████████| 79/79 [03:19<00:00,  2.53s/it]

Robust Accuracy with PGD epsilon=8/255, iters=10 on cifar100: 1.32%





In [28]:
print(f'transfer attack accuracy with FGSM epsilon=8/255: {100 * transferability_attack(model4, model, testloader, attack_fgsm, 8/255):.2f}%')
print(f'transfer attack accuracy with PGD epsilon=8/255: {100 * transferability_attack(model4, model, testloader, attack_pgd, 8/255):.2f}%')

100%|██████████| 79/79 [00:29<00:00,  2.65it/s]

transfer attack accuracy with FGSM epsilon=8/255: 12.64%



100%|██████████| 79/79 [03:22<00:00,  2.56s/it]

transfer attack accuracy with PGD epsilon=8/255: 9.74%





In [29]:
torch.save(model4.state_dict(), 'model4.pt')

Why do you suppose using the `CIFAR100` had the following results? Explain your observations.

`your response:` Here, the accuracy on cifar100 testset is the most compared to others. The reasons is the using of in-distribution data samples (cifar100 trainset). As the original model is trained on this dataset, the distillation on the training set of the victim model can result in more accurate learned weights for model extraction.

In [30]:
! rm -r data/