Add support for multiple samples in SimBA.generate #1422

beat-buesser · 2021-11-22T00:13:32Z

Signed-off-by: Beat Buesser beat.buesser@ie.ibm.com

Description

This pull request adds support for multiple samples in SimBA.generate.

Fixes #1407

Type of change

Please check all relevant options.

Improvement (non-breaking)
Bug fix (non-breaking)
New feature (non-breaking)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Beat Buesser <beat.buesser@ie.ibm.com>

codecov-commenter · 2021-11-22T00:16:45Z

Codecov Report

Merging #1422 (d1ba704) into dev_1.9.0 (e578d5b) will increase coverage by 0.00%.
The diff coverage is 82.45%.

@@            Coverage Diff             @@
##           dev_1.9.0    #1422   +/-   ##
==========================================
  Coverage      90.28%   90.29%           
==========================================
  Files            240      240           
  Lines          19669    19679   +10     
  Branches        3487     3490    +3     
==========================================
+ Hits           17759    17769   +10     
  Misses          1113     1113           
  Partials         797      797

Impacted Files	Coverage Δ
art/attacks/evasion/simba.py	`83.40% <82.45%> (-1.10%)`	⬇️
art/defences/detector/poison/activation_defence.py	`86.19% <0.00%> (+1.34%)`	⬆️

Embeddave · 2021-11-22T20:43:18Z

Hi @beat-buesser thank you for making this fix and inviting me to review.

I would be happy to but I'm working a short week because of a US holiday.
Would it work for you if I tested early next week?

beat-buesser · 2021-11-22T22:29:01Z

@Embeddave Thank you very much, next week would be perfect. Happy Thanksgiving!

beat-buesser · 2021-12-06T20:17:46Z

Hi @Embeddave What do you think about this pull request?

Embeddave · 2021-12-06T20:29:07Z

Hi @beat-buesser was going to leave you a message today

I found a major bug in my own code in the process of testing this and needed to put out that fire first

I can check early this week

we typically run with batch_size=1 as the two-six teams lab does when running GARD evals so I haven't needed this specific functionality

beat-buesser · 2021-12-06T20:37:08Z

@Embeddave Thank you for the update, that should work. I was sure you had not forgotten.

Embeddave · 2021-12-06T20:40:30Z

I have some separate feedback but I will reply on #1407 -- it's off-topic for this PR

Signed-off-by: Beat Buesser <beat.buesser@ie.ibm.com>

Embeddave · 2021-12-08T00:24:20Z

art/attacks/evasion/simba.py

-        if self.estimator.nb_classes == 2 and preds.shape[1] == 1:
+        if not is_probability(y_prob_pred):
+            raise ValueError(
+                "This attack requires an estimator predicting probabilities. It looks like the current "


This is definitely helpful but maybe change the error message to recommend a fix?

"This attack requires an estimator predicting probabilities. The output of your `Estimator.predict` should sum to `1.0`, as checked by `art.utils.is_probability`. Try adding a Softmax layer to your model and/or using the `art.BlackBoxEstimator` class"

@Embeddave Yes, that should be helpful. I'm not sure if I understand art.BlackBoxEstimator?

I'm not sure if I understand either 🙂

I should have said BlackBoxClassifier

In truth it's not clear to me whether that would be appropriate and/or expected to use with this attack -- I thought I had read an example notebook that used the BlackBoxClassifier wrapper with a similar attack but now I can't find it

So, nevermind :)

but a concrete solution like "add softmax layer" I think we agree could be good?

Embeddave · 2021-12-08T00:29:26Z

👍
the additional commit changing the variable name and using the validator LGTM

I tested with my code that the attack is working as before when batch size = 1.
Seems to be working, AFAICT.
I would share so you can see how I'm testing, but the code involves some in-house libraries that are not public (nothing special, just not installable by anybody).

I am testing the batch size > 1 functionality now.

Embeddave

@beat-buesser in line ~331 (it won't let me tag in review) of _check_params I think you still need to remove the condition that checks whether batch size is greater than 1. I get

The batch size `batch_size` has to be 1 in this implementation.

I can test again after you remove that

beat-buesser · 2021-12-08T13:13:33Z

@Embeddave Thank you, I have added you to the list of collaborators in the repo which upon accepting the invite in your email should allow you to make inline comments.

I think the batch size still has to be 1 only because the algorithm processes one sample at a time and does not support processing in larger mini-batches. But with this version it is now processing all samples of x.

Embeddave · 2021-12-08T17:39:07Z

Hi again @beat-buesser I'm not sure the ability to run multiple images at a time is working as expected.

If I pass in a single x with multiple images, I get a different answer compared to if I pass in each image one at a time.

Here's a hopefully minimal reproducible example

from functools import partial
from art.estimators.classification import PyTorchClassifier
from art.attacks.evasion import SimBA
import numpy as np
import torch
import torch.nn as nn
from torchvision import transforms, datasets
from tqdm import tqdm

def run_batch(model, x, y, normalizer=None):
    """runs a batch of images through model to get
    predictions, compute which are correct, and
    from that compute accuracy

    Parameters
    ----------
    model : torch.nn.Module
        instance of a neural network model
    x : torch.Tensor
        input to network
    y : torch.Tensor
        ground truth that predictions should match
    normalizer : robustness.Normalizer
        used to normalize inputs to network.
        Default is None, in which case no normalization is applied.

    Returns
    -------
    y_pred : list
        of class predictions, same size as y
    y_prob : list
        of float, the probabilities that the model
        assigned to the classes in ``y_pred``.
        I.e., the scalar value from softmax(output)
        indexed by argmax(softmax(output)),
        for each input in ``x``.
    acc : float
        number correct / total number of samples in batch
    """
    if normalizer is not None:
        x = normalizer(x)
    with torch.no_grad():
        out = model(x)
        y_pred = out.argmax(dim=1)
        y_prob = torch.nn.Softmax()(out)[:, y_pred].flatten()
    correct = (y_pred == y).cpu().numpy().tolist()
    acc = sum(correct) / len(correct)
    return y_pred.cpu().numpy().tolist(), y_prob.cpu().numpy().tolist(), acc

# ... set up model etc
# for simba attack, need to make model output probabilities
model.fc = nn.Sequential(
    *[model.fc, nn.Softmax(dim=1)],
)

log_or_print('setting up adversarial attacks')

# OMITTED custom code that uses `robustness` library
# to get dataloaders etc
# we're using Imagenette dataset from fastai

# constants from torchvision references training script
mean = np.asarray(constants.IMAGENET_MEAN).reshape((3, 1, 1)).astype(np.float32)
std = np.asarray(constants.IMAGENET_STD).reshape((3, 1, 1)).astype(np.float32)
    nb_classes = 1000
    input_shape = (3, 224, 224)

clf = PyTorchClassifier(
    model=model,
    clip_values=(0., 1.),
    loss=criterion,
    input_shape=input_shape,
    nb_classes=nb_classes,
    preprocessing=(mean, std)
)

attack = SimBA(classifier=clf,
               attack='dct',
               max_iter=5000,
               order='random',
               epsilon=0.9,
               freq_dim=28,
               stride=7,
               )

MAX_GOOD_BATCHES = 4

good_batches = []
n_good_batches = 0

run_batch_partial = partial(helpers.run_batch,
                            model=model,
                            normalizer=normalizer,
                            y=y)

for batch in test_loader:
    x, y = batch['img'].to(device), batch['target'].to(device)
    source_img = batch['source']
    y_true = y.cpu().numpy().tolist()

    # ---- unattacked
    log_or_print('\trunning unattacked batch')
    y_pred, correct_unat, acc_unat = run_batch_partial(x=x)

    log_or_print(
        f'\t\tbatch accuracy: {acc_unat:.3f}'
    )
    if acc_unat > 0.99:
        good_batches.append(x)
        n_good_batches += 1
        if n_good_batches > MAX_GOOD_BATCHES:
            break

x = good_batches[2]

with torch.no_grad():
    y_probs = model(normalizer(x))
y_probs = y_probs.cpu().numpy()
x_np = x.cpu().numpy()
x_at = attack.generate(x=x_np, y=y_probs)

x_at_tensor = torch.from_numpy(x_at).to(device)
y_pred_at, correct_at, acc_at = run_batch_partial(x=x_at_tensor)
log_or_print(
    f'\t\tbatch accuracy: {acc_at:.3f}'
)

# gives me batch accuracy: 0.667

for x_img in x:
    x_img = torch.unsqueeze(x_img, 0)
    with torch.no_grad():
        y_probs = model(
            normalizer(x_img)
        )
    y_probs = y_probs.cpu().numpy()
    x_img = x_img.cpu().numpy()
    x_at = attack.generate(x=x_img, y=y_probs)

    x_at_tensor = torch.from_numpy(x_at).to(device)
    y_pred_at, correct_at, acc_at = run_batch_partial(x=x_at_tensor)
    log_or_print(
        f'\t\tbatch accuracy: {acc_at:.3f}'
    )

# gives me
# batch accuracy: 0.000
# batch accuracy: 0.000
# batch accuracy: 0.000

Embeddave · 2021-12-08T17:40:47Z

I can go as far as writing a whole script if you're not able to reproduce this

AFAICT SimBa is working without being able to attack multiple images passed in, so if you don't need to add that functionality I would be fine with it the way it is, not sure about other users

I still think the other changes in this PR like the check for probabilities and the variable name change would be useful regardless

beat-buesser · 2021-12-08T23:28:17Z

Hi @Embeddave I have created a similar script based on ART's get_started_pytorch.py and it achives the same accuracy with calling generate on all images or on each image separately:

import torch.nn
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from art.estimators.classification import PyTorchClassifier
from art.utils import load_mnist


# Step 0: Define the neural network model, return logits instead of activation in forward method


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv_1 = nn.Conv2d(in_channels=1, out_channels=4, kernel_size=5, stride=1)
        self.conv_2 = nn.Conv2d(in_channels=4, out_channels=10, kernel_size=5, stride=1)
        self.fc_1 = nn.Linear(in_features=4 * 4 * 10, out_features=100)
        self.fc_2 = nn.Linear(in_features=100, out_features=10)
        self.softmax = torch.nn.Softmax(dim=1)

    def forward(self, x):
        x = F.relu(self.conv_1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv_2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 10)
        x = F.relu(self.fc_1(x))
        x = self.fc_2(x)
        x = self.softmax(x)
        return x


# Step 1: Load the MNIST dataset

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

# Step 1a: Swap axes to PyTorch's NCHW format

x_test = x_test[0:100]
y_test = y_test[0:100]

x_train = np.transpose(x_train, (0, 3, 1, 2)).astype(np.float32)
x_test = np.transpose(x_test, (0, 3, 1, 2)).astype(np.float32)

# Step 2: Create the model

model = Net()

# Step 2a: Define the loss function and the optimizer

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# optimizer = optim.SGD(model.parameters(), lr=0.01)

# Step 3: Create the ART classifier

classifier = PyTorchClassifier(
    model=model,
    clip_values=(min_pixel_value, max_pixel_value),
    loss=criterion,
    optimizer=optimizer,
    input_shape=(1, 28, 28),
    nb_classes=10,
)

# Step 4: Train the ART classifier

classifier.fit(x_train, y_train, batch_size=64, nb_epochs=1)

# Step 5: Evaluate the ART classifier on benign test examples

predictions = classifier.predict(x_test)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on benign test examples: {}%".format(accuracy * 100))


from art.attacks.evasion import SimBA

attack = SimBA(classifier=classifier,
               attack='dct',
               max_iter=5000,
               order='random',
               epsilon=0.9,
               freq_dim=28,
               stride=7,
               verbose=False,
               )

# attack `all`

x_test_adv_all = attack.generate(x=x_test, y=y_test)

predictions = classifier.predict(x_test_adv_all)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print("Accuracy on adversarial test examples - all: {}%".format(accuracy * 100))

# attack `single`

count_correct = 0

for i in range(x_test.shape[0]):

    x_test_adv_single = attack.generate(x=x_test[[i]], y=y_test[[i]])

    prediction = classifier.predict(x_test_adv_single)
    accuracy = np.sum(np.argmax(prediction, axis=1) == np.argmax(y_test[[i]], axis=1))
    if accuracy:
        count_correct += 1

print("Accuracy on adversarial test examples - single: {}%".format(count_correct / x_test.shape[0] * 100))

Signed-off-by: Beat Buesser <beat.buesser@ie.ibm.com>

Embeddave · 2021-12-13T16:01:50Z

Thank you @beat-buesser for providing this script.

I will test using it with my images and models this week.

If I still get the same answer where attack success depends on the number of images passed in, I will upload a minimal reproducible example.

beat-buesser · 2021-12-15T16:45:36Z

Hi @Embeddave Thank you for your review and testing. To prepare for the ART 1.9 release, I'll merge this PR to include the exception and the processing of mini-batches. In case there are still bugs left we can fix them with a new PR for 1.9.0 or 1.9.1.

Embeddave · 2021-12-15T17:25:06Z

Understood, thank you @beat-buesser

Add support for multiple samples in SimBA.generate

45a2bae

Signed-off-by: Beat Buesser <beat.buesser@ie.ibm.com>

beat-buesser self-assigned this Nov 22, 2021

beat-buesser added bug Something isn't working improvement Improve implementation labels Nov 22, 2021

beat-buesser added this to the ART 1.9.0 milestone Nov 22, 2021

beat-buesser linked an issue Nov 22, 2021 that may be closed by this pull request

change docstring for SimBa.generate parameter y #1407

Open

beat-buesser mentioned this pull request Nov 22, 2021

change docstring for SimBa.generate parameter y #1407

Open

Beat Buesser and others added 2 commits December 7, 2021 13:43

Add check for probabilities and update docstring and variable names

cd34ce7

Signed-off-by: Beat Buesser <beat.buesser@ie.ibm.com>

Merge branch 'dev_1.9.0' into development_issue_1407

186b5c0

Embeddave reviewed Dec 8, 2021

View reviewed changes

Embeddave suggested changes Dec 8, 2021

View reviewed changes

Merge branch 'dev_1.9.0' into development_issue_1407

9db4b38

Beat Buesser and others added 3 commits December 8, 2021 23:35

Add verbose mode and fix is_probability check

cd011b0

Signed-off-by: Beat Buesser <beat.buesser@ie.ibm.com>

Merge branch 'dev_1.9.0' into development_issue_1407

3eb045f

Merge branch 'dev_1.9.0' into development_issue_1407

1d77d1b

Merge branch 'dev_1.9.0' into development_issue_1407

d1ba704

beat-buesser merged commit df2e613 into dev_1.9.0 Dec 15, 2021

beat-buesser deleted the development_issue_1407 branch December 15, 2021 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for multiple samples in SimBA.generate #1422

Add support for multiple samples in SimBA.generate #1422

beat-buesser commented Nov 22, 2021

codecov-commenter commented Nov 22, 2021 •

edited

Loading

Embeddave commented Nov 22, 2021

beat-buesser commented Nov 22, 2021

beat-buesser commented Dec 6, 2021

Embeddave commented Dec 6, 2021

beat-buesser commented Dec 6, 2021

Embeddave commented Dec 6, 2021

Embeddave Dec 8, 2021 •

edited

Loading

beat-buesser Dec 8, 2021

Embeddave Dec 8, 2021

Embeddave commented Dec 8, 2021

Embeddave left a comment

beat-buesser commented Dec 8, 2021

Embeddave commented Dec 8, 2021

Embeddave commented Dec 8, 2021 •

edited

Loading

beat-buesser commented Dec 8, 2021

Embeddave commented Dec 13, 2021

beat-buesser commented Dec 15, 2021

Embeddave commented Dec 15, 2021

Add support for multiple samples in SimBA.generate #1422

Add support for multiple samples in SimBA.generate #1422

Conversation

beat-buesser commented Nov 22, 2021

Description

Type of change

Checklist

codecov-commenter commented Nov 22, 2021 • edited Loading

Codecov Report

Embeddave commented Nov 22, 2021

beat-buesser commented Nov 22, 2021

beat-buesser commented Dec 6, 2021

Embeddave commented Dec 6, 2021

beat-buesser commented Dec 6, 2021

Embeddave commented Dec 6, 2021

Embeddave Dec 8, 2021 • edited Loading

Choose a reason for hiding this comment

beat-buesser Dec 8, 2021

Choose a reason for hiding this comment

Embeddave Dec 8, 2021

Choose a reason for hiding this comment

Embeddave commented Dec 8, 2021

Embeddave left a comment

Choose a reason for hiding this comment

beat-buesser commented Dec 8, 2021

Embeddave commented Dec 8, 2021

Embeddave commented Dec 8, 2021 • edited Loading

beat-buesser commented Dec 8, 2021

Embeddave commented Dec 13, 2021

beat-buesser commented Dec 15, 2021

Embeddave commented Dec 15, 2021

codecov-commenter commented Nov 22, 2021 •

edited

Loading

Embeddave Dec 8, 2021 •

edited

Loading

Embeddave commented Dec 8, 2021 •

edited

Loading