Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PatchFool implementation #2163

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

PatchFool implementation #2163

wants to merge 16 commits into from

Conversation

sechkova
Copy link
Contributor

@sechkova sechkova commented May 24, 2023

Description

Initial draft implementation of PatchFool attack from the paper:

Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Currently there is an example notebook of the attack in colab. I do plan to contribute the notebook too once ready.

Fixes # (issue)

Type of change

Please check all relevant options.

  • Improvement (non-breaking)
  • Bug fix (non-breaking)
  • New feature (non-breaking)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing

Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.

  • Test A
  • Test B

Test Configuration:

  • OS
  • Python version
  • ART version or commit number
  • TensorFlow / Keras / PyTorch / MXNet version

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2023

Codecov Report

Attention: 21 lines in your changes are missing coverage. Please review.

Comparison is base (3de2078) 85.08% compared to head (da05de1) 85.16%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2163      +/-   ##
==========================================
+ Coverage   85.08%   85.16%   +0.07%     
==========================================
  Files         324      325       +1     
  Lines       29331    29480     +149     
  Branches     5409     5431      +22     
==========================================
+ Hits        24956    25106     +150     
+ Misses       2997     2973      -24     
- Partials     1378     1401      +23     
Files Coverage Δ
art/attacks/evasion/__init__.py 98.24% <100.00%> (+0.03%) ⬆️
art/estimators/pytorch.py 84.73% <76.92%> (-0.99%) ⬇️
art/attacks/evasion/patchfool.py 86.76% <86.76%> (ø)

... and 12 files with indirect coverage changes

@sechkova
Copy link
Contributor Author

This is only a draft implementations but I wanted to discuss a few issues that I am facing.

The first one comes from getting the attention weights of a transformer model. I added one implementation for the ViT model that comes pre-trained from the torchvision models library (the papers' authors use DeiT but I am more familiar with the architecture of this one). The problem I see is that it is very challenging to implement one common method that extracts the weights even for different implementations of the same model architecture. Extracting the weights in my case required tracing the graph and even changing one of the operations.
One way to go would be to provide a classifier working only for one specific model. Otherwise I imagine this can work if the user of ART who provides the model provides also the method to extract the weights and ART could provide an abstraction class and an example? But there could be a better option that I cannot see right now?

Second issue is that the PyTorch model I used behaves incorrectly if the benign input is cast to float ... which makes it hard to test the attack. (there's an example in the attack's notebook ). Is this a problem coming from the mixture of frameworks? Have you seen such behaviour before?

@beat-buesser beat-buesser self-requested a review May 25, 2023 16:44
@beat-buesser beat-buesser self-assigned this May 25, 2023
@beat-buesser beat-buesser added the enhancement New feature or request label May 25, 2023
@beat-buesser
Copy link
Collaborator

Hi @sechkova Thank you very much for your pull request!

I agree about your first question that general support for all possible architectures is challenging or not reasonably possible. ART does have multiple model specific estimators, for example art.estimators.object_detection.PyTorchYolo, that are easier to implement and maintain. I think this approach would be the best for your PR too.

About your second question, does the model you are working with expect integer arrays as input? If yes, you could accepts float arrays as input to your new ART tools to follow the ART APIs and inside of the tools convert them to integer arrays before providing the input data to the model. We would have to investigate how this conversion affects the adversarial attacks.

@sechkova
Copy link
Contributor Author

sechkova commented Jul 17, 2023

About your second question, does the model you are working with expect integer arrays as input? If yes, you could accepts float arrays as input to your new ART tools to follow the ART APIs and inside of the tools convert them to integer arrays before providing the input data to the model. We would have to investigate how this conversion affects the adversarial attacks.

At the end I used convert_image_dtype from PyTorch which both converts and scales the values and now the model works properly. I couldn't figure out how the other attacks' implementations are able to handle this.

@sechkova
Copy link
Contributor Author

I agree about your first question that general support for all possible architectures is challenging or not reasonably possible. ART does have multiple model specific estimators, for example art.estimators.object_detection.PyTorchYolo, that are easier to implement and maintain. I think this approach would be the best for your PR too.

For now I added art.estimators.classification.PyTorchDeiT but the way I've hardcoded the attention layers works I think with either PyTorch < 2.0 or with setting 'TIMM_FUSED_ATTN' = '0'

@sechkova sechkova marked this pull request as ready for review August 25, 2023 13:56
@sechkova
Copy link
Contributor Author

@beat-buesser the PR is updated and the attack algorithm now shows good results.
Can you do an initial review?

What I think is still to be resolved is the custom PyTorch DeiT classifier. For now I have implemented just the very basics for the attack to work with a pre-trained one from timm . It involves hardcoding the layers names, therefore there is a difference between PyTorch versions, which I've circumvented by setting 'TIMM_FUSED_ATTN' = '0' (you can see the example notebook below). It is not a very subtle approach for sure.

Here is an example notebook that I wish to contribute once the implementation is finalised:
https://colab.research.google.com/drive/1QfdZEUI0hhO-AYFL12RZvB0dA95l2NAS?usp=sharing

Copy link
Collaborator

@beat-buesser beat-buesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sechkova Thank you very much for implementing the PatchFool attack in ART! I have added a few comments in my review, please take a look and let me know what you think. In addition to that could you please add a unit test in pytest format for the new attack class and a notebook showing how the implementation reproduces the original paper?

@@ -0,0 +1,258 @@
# MIT License
#
# Copyright (C) The Adversarial Robustness Toolbox (ART) Authors 2022
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright (C) The Adversarial Robustness Toolbox (ART) Authors 2022
# Copyright (C) The Adversarial Robustness Toolbox (ART) Authors 2023

@@ -67,3 +67,4 @@
from art.attacks.evasion.wasserstein import Wasserstein
from art.attacks.evasion.zoo import ZooAttack
from art.attacks.evasion.sign_opt import SignOPTAttack
from art.attacks.evasion.patchfool import PatchFool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from art.attacks.evasion.patchfool import PatchFool
from art.attacks.evasion.patchfool import PatchFoolPyTorch

):
"""
Create a :class:`PatchFool` instance.
TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there still a TODO here?


def _generate_batch(self, x: "torch.Tensor", y: Optional["torch.Tensor"] = None) -> "torch.Tensor":
"""
TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update docstring.

def _get_patch_index(self, x: "torch.Tensor", layer: int) -> "torch.Tensor":
"""
Select the most influencial patch according to a predefined `layer`.
TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update docstring.

def _get_attention_loss(self, x: "torch.Tensor", patch_idx: "torch.Tensor") -> "torch.Tensor":
"""
Sum the attention weights from each layer for the most influencail patches
TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update docstring.


def pcgrad(self, grad1, grad2):
"""
TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update docstring.

"""
return self.model.patch_embed.patch_size[0]

def get_attention_weights(self, x: Union[np.ndarray, "torch.Tensor"]) -> "torch.Tensor":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method could be of interest for other models too. Please move it to PyTorchEstimator and generalise it by making return_nodes a list of strings provided by the user as an argument.

)

@property
def patch_size(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the patch size be defined on the attack side? If yes, we could just reuse the existing PyTorchClassifier.

optim = torch.optim.Adam([perturbation], lr=self.learning_rate)
scheduler = torch.optim.lr_scheduler.StepLR(optim, step_size=self.step_size, gamma=self.step_size_decay)

for i_max_iter in tqdm(range(self.max_iter)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable i_max_iter seems not be used, you can replace it with _ to avoid the CodeQL alert.

Add a new evasion attack on vision transformers.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Skip the class token when calculating the most influential image
patch.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Update classifier to use DeiT from the timm library.
Fix algorithm details.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
- Calculate the attention loss as negative log likelihood
- Clamp perturbations after random init

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
- Fix input normalisation and scaling.
- Fix patch application to happen only once after final iteration
- Add skip_loss_att option

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Use tqdm indication bar showing the attack iterations.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
- Move get_attention weights to PyTorchEstimator and generalise it
  by making return_nodes a list of strings provided by the user as an argument.

- Define patch size on the attack side.
- Remove PyTorchClassifierDeiT and reuse the exisitng PyTorchClassifier.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Add verbose option for tqdm.
Remove unused variable i_max_iter.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Use directly the attribute patch_layer.

Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
Signed-off-by: Teodora Sechkova <tsechkova@vmware.com>
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
# TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
import os

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'os' is not used.
import pytest

from art.attacks.evasion import PatchFoolPyTorch
from art.estimators.classification.classifier import ClassGradientsMixin

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'ClassGradientsMixin' is not used.

from art.attacks.evasion import PatchFoolPyTorch
from art.estimators.classification.classifier import ClassGradientsMixin
from art.estimators.classification.pytorch import PyTorchClassifier

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'PyTorchClassifier' is not used.
from art.attacks.evasion import PatchFoolPyTorch
from art.estimators.classification.classifier import ClassGradientsMixin
from art.estimators.classification.pytorch import PyTorchClassifier
from art.estimators.estimator import BaseEstimator

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'BaseEstimator' is not used.
from art.estimators.classification.pytorch import PyTorchClassifier
from art.estimators.estimator import BaseEstimator

from tests.attacks.utils import backend_test_classifier_type_check_fail

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'backend_test_classifier_type_check_fail' is not used.
@sechkova
Copy link
Contributor Author

Hi @sechkova Thank you very much for implementing the PatchFool attack in ART! I have added a few comments in my review, please take a look and let me know what you think. In addition to that could you please add a unit test in pytest format for the new attack class and a notebook showing how the implementation reproduces the original paper?

@beat-buesser Can you advise how should the tests be defined? PatchFool attack works on transformer models, using information from the attention layers to calculate the attack. I can use a downloaded pre-trained model for the tests but they are usually trained on ImageNet while the tests in ART use other smaller test datasets. This causes issues with the number of classes etc.

I added one initial draft test with the last commit (da05de1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants