介绍如何在pytorch环境下，使用CW2算法攻击基于MNIST数据集预训练的CNN/MLP模型。运行该文件前，需要先运行指定文件生成对应的模型：

    cd tutorials
    python mnist_model_pytorch.py
     

Jupyter notebook中使用Anaconda中的环境需要单独配置，默认情况下使用的是系统默认的Python环境，以使用advbox环境为例。
首先在默认系统环境下执行以下命令，安装ipykernel。

    conda install ipykernel
    conda install -n advbox ipykernel

在advbox环境下激活，这样启动后就可以在界面上看到advbox了。

    python -m ipykernel install --user --name advbox --display-name advbox 


In [1]:
#调试开关
import logging
#logging.basicConfig(level=logging.INFO,format="%(filename)s[line:%(lineno)d] %(levelname)s %(message)s")
#logger=logging.getLogger(__name__)
import sys
import torch
import torchvision
from torchvision import datasets, transforms
from torch.autograd import Variable
import torch.utils.data.dataloader as Data
from advbox.adversary import Adversary
from advbox.attacks.cw2_pytorch import CW_L2_Attack
from advbox.models.pytorch import PytorchModel
from tutorials.mnist_model_pytorch import Net

train_data: torch.Size([60000, 28, 28])
train_labels: torch.Size([60000])
test_data: torch.Size([10000, 28, 28])


In [2]:
TOTAL_NUM = 100
pretrained_model="tutorials/mnist-pytorch/net.pth"
loss_func = torch.nn.CrossEntropyLoss()

#使用MNIST测试数据集 随机挑选TOTAL_NUM个
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('tutorials/mnist-pytorch/data', train=False, download=True, transform=transforms.Compose([
        transforms.ToTensor(),
    ])),
    batch_size=1, shuffle=True)

# Define what device we are using
logging.info("CUDA Available: {}".format(torch.cuda.is_available()))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#这里有个需要注意的地方 cw的输出必须是logit层而不能是softmax层 否则极大概率梯度消失一直无法收敛
# Initialize the network
model = Net().to(device)

# Load the pretrained model
model.load_state_dict(torch.load(pretrained_model, map_location='cpu'))

# Set the model in evaluation mode. In this case this is for the Dropout layers
model.eval()

# advbox demo
m = PytorchModel(
    model, loss_func,(0.0, 1.0),
    channel_axis=1)

#实例化CW_L2_Attack
attack = CW_L2_Attack(m)
#设置分类数num_labels 最大迭代次数max_iterations 二分查找次数 binary_search_steps C初始化值 initial_const
attack_config = {"num_labels": 10,"max_iterations":1000,"binary_search_steps":4,"initial_const":100.0}

# use test data to generate adversarial examples
total_count = 0
fooling_count = 0

for i, data in enumerate(test_loader):
    inputs, labels = data
    inputs, labels=inputs.numpy(),labels.numpy()

    total_count += 1
    adversary = Adversary(inputs, labels[0])

    # FGSM non-targeted attack
    adversary = attack(adversary, **attack_config)

    if adversary.is_successful():
        fooling_count += 1
        print(
            'attack success, original_label=%d, adversarial_label=%d, count=%d'
            % (labels, adversary.adversarial_label, total_count))

    else:
        print('attack failed, original_label=%d, count=%d' %
              (labels, total_count))

    if total_count >= TOTAL_NUM:
        print(
            "[TEST_DATASET]: fooling_count=%d, total_count=%d, fooling_rate=%f"
            % (fooling_count, total_count,
               float(fooling_count) / total_count))
        break
print("cw2 attack done")


cuda
attack success, original_label=9, adversarial_label=4, count=1
attack success, original_label=3, adversarial_label=2, count=2
attack success, original_label=1, adversarial_label=7, count=3
attack success, original_label=5, adversarial_label=3, count=4
attack success, original_label=0, adversarial_label=2, count=5
attack success, original_label=6, adversarial_label=4, count=6
attack success, original_label=2, adversarial_label=8, count=7
attack success, original_label=1, adversarial_label=4, count=8
attack success, original_label=8, adversarial_label=9, count=9
attack success, original_label=1, adversarial_label=6, count=10
attack success, original_label=2, adversarial_label=1, count=11
attack success, original_label=4, adversarial_label=9, count=12
attack success, original_label=8, adversarial_label=3, count=13
attack success, original_label=7, adversarial_label=9, count=14
attack success, original_label=6, adversarial_label=5, count=15
attack success, original_label=1, adversaria