Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn #498

Closed
superctj opened this issue Jul 17, 2020 · 29 comments
Assignees
Labels
question Further information is requested

Comments

@superctj
Copy link

Describe the bug
I encountered the error "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn" when generating adversarial examples using AutoProjectedGradientDescent. It looks like input tensors do not have 'requires_grad' set to True when the loss is backpropagated.

To Reproduce
Steps to reproduce the behavior:

  1. Create a PyTorch model and load it with weights
  2. Wrap the model with the ART PyTorch Classifier
  3. Initiate an instance of AutoProjectedGradientDescent and pass in the wrapped model
  4. Loop through the data loader and generate adversarial examples batch by batch
  5. See error

Expected behavior
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Screenshots
Screen Shot 2020-07-17 at 2 53 32 PM

System information (please complete the following information):

  • OS: Linux
  • Python version: 3.7
  • ART version or commit number: 1.3.1
  • PyTorch version: 1.4.0
@beat-buesser beat-buesser self-assigned this Jul 17, 2020
@beat-buesser
Copy link
Collaborator

beat-buesser commented Jul 17, 2020

Hi @superctj I have not yet been able to reproduce the issue based on the traceback posted above. Could you please post a short code snippet that produces the issue?

@superctj
Copy link
Author

Hi @beat-buesser Thank you for your quick response. Here is a screenshot of my code Screen Shot 2020-07-17 at 10 17 39 PM

@beat-buesser
Copy link
Collaborator

Hi @superctj I have run this script similar to yours but without a data loader and it works:

import numpy as np

from art.utils import load_mnist
from art.attacks.evasion import AutoProjectedGradientDescent

from tests.utils import get_image_classifier_pt

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

x_train = np.swapaxes(x_train, 1, 3).astype(np.float32)
x_test = np.swapaxes(x_test, 1, 3).astype(np.float32)

classifier = get_image_classifier_pt(load_init=True, from_logits=True)

attack = AutoProjectedGradientDescent(estimator=classifier,
                                      norm=np.inf,
                                      eps=0.3,
                                      eps_step=0.1,
                                      batch_size=50,
                                      loss_type='cross_entropy')

x_test_adv = attack.generate(x=x_test[0:110], y=y_test[0:110])

print('Max difference:', np.max(np.abs(x_test_adv - x_test[0:110])))

Could you please try to run your script with (adding .cpu()):

x_batch = x_batch.detach().cpu().numpy()
y_batch = y_batch.detach().cpu().numpy()

@superctj
Copy link
Author

superctj commented Jul 18, 2020

Hi @beat-buesser Thank you for your suggestion. Unfortunately, it doesn't fix the error.

Could you please try to replicate the error by using a PyTorch data loader?

@beat-buesser
Copy link
Collaborator

beat-buesser commented Jul 20, 2020

Hi @superctj The script below seems to work with a torch.utils.data.DataLoader:

import numpy as np
import torch
import torchvision

from art.utils import load_mnist
from art.attacks.evasion import AutoProjectedGradientDescent

from tests.utils import get_image_classifier_pt

(x_train, y_train), (x_test, y_test), min_pixel_value, max_pixel_value = load_mnist()

x_train = np.swapaxes(x_train, 1, 3).astype(np.float32)
x_test = np.swapaxes(x_test, 1, 3).astype(np.float32)

classifier = get_image_classifier_pt(load_init=True, from_logits=True)

batch_size = 500

attack = AutoProjectedGradientDescent(estimator=classifier,
                                      norm=np.inf,
                                      eps=0.3,
                                      eps_step=0.1,
                                      batch_size=batch_size,
                                      loss_type='cross_entropy')

data_loader = torch.utils.data.DataLoader(torchvision.datasets.MNIST('./files/', train=False, download=True,
                                                                     transform=torchvision.transforms.Compose(
                                                                         [torchvision.transforms.ToTensor(), ])),
                                          batch_size=batch_size,
                                          shuffle=True)

for batch_idx, (x_batch, y_batch) in enumerate(data_loader):
    x_batch = x_batch.detach().cpu().numpy()
    y_batch = y_batch.detach().cpu().numpy()

    x_batch_adv = attack.generate(x=x_batch, y=y_batch)

    print('Max difference:', np.max(np.abs(x_batch_adv - x_batch)))

How do you define your data loader?

@superctj
Copy link
Author

superctj commented Jul 20, 2020

I have tried to debug this myself but haven't got any luck. I thought the issue was related to the gradient attribute, the tensor type, or the axis order but none of them helps. Basically, I have a custom dataset instance and wrap it with the PyTorch data loader. Do you have any ideas? I attach the data loader code for your reference.

Screen Shot 2020-07-20 at 5 39 49 PM

Screen Shot 2020-07-20 at 5 40 07 PM

Screen Shot 2020-07-20 at 5 40 19 PM

@beat-buesser
Copy link
Collaborator

Does the error occur for the first or last batch?

@beat-buesser
Copy link
Collaborator

What is the type of the elements in self.y?

@superctj
Copy link
Author

  1. The error occurs for the first batch.
  2. The type of the elements in self.y is integer.

@beat-buesser
Copy link
Collaborator

How do you define your model? Can you also show the output of a forward pass with your model e.g. the output of model(x)

@superctj
Copy link
Author

I load a pre-trained model and wrap it into an ART PyTorch classifier. The output of model(x_batch) is a bunch of logits
Screen Shot 2020-07-20 at 8 46 44 PM

Screen Shot 2020-07-20 at 8 43 55 PM

Screen Shot 2020-07-20 at 8 45 33 PM

@superctj
Copy link
Author

Could you replicate the error if you set model.eval() before you wrap the model into the ART PyTorch classifier?

@beat-buesser
Copy link
Collaborator

model.eval() does not change anything for me. I noticed that my script only runs inside of the examples directory of ART. Does model.eval() change anything for you?

@superctj
Copy link
Author

Nah. I just thought it could be the difference between your and my code. Are you saying that you can reproduce the error outside the examples directory of ART.

@beat-buesser
Copy link
Collaborator

No, unfortunately not, it's just that get_image_classifier_pt uses relative paths to load a small trained classifier. I have repeated the test outside if examples with a new model and it still runs. Are you running on GPU? If yes, can you try to run on CPU only?

@beat-buesser
Copy link
Collaborator

Another debugging approach could be to test the your script by running the line x_adv_batch = attack.generate(x=x_batch, y=y_batch) with two arrays for x_batch and y_batch created with numpy (e.g. random numbers) instead of getting them from the data loader. That could show if the data loader or the attack and classifier combination are causing the problem.

@superctj
Copy link
Author

Unfortunately, the error persists when I run on CPU only. Could it be a problem with PyTorch? I am using PyTorch 1.4.0.

@beat-buesser
Copy link
Collaborator

It should not, I have also been using PyTorch 1.4.0.

@superctj
Copy link
Author

Cool, the error doesn't show up if I create x_batch and y_batch with numpy directly. It should be a problem with the data loader then. But it is interesting that x_batch = x_batch.detach().cpu().numpy() triggers the error while x_batch = np.ones(x_batch.shape) does not.

@beat-buesser
Copy link
Collaborator

beat-buesser commented Jul 21, 2020

Can you print the type and content of x_batch and y_batch after the lines x_batch = x_batch.detach().cpu().numpy() and y_batch = y_batch.detach().cpu().numpy() using the data loader?

@superctj
Copy link
Author

x_batch is a <class 'numpy.ndarray'> of shape (50, 3, 256, 256).
Screen Shot 2020-07-21 at 10 47 24 AM

y_batch is also a <class 'numpy.ndarray'> of shape (50,). It looks like [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

@beat-buesser
Copy link
Collaborator

I think I found the bug, in your first script above a line has a typo: y_baych = y_batch.detach().cpu().numpy() instead of y_batch = y_batch.detach().cpu().numpy() (y instead of t in y_batch). That way the original tensor y_batch gets to the attack instead of the numpy array in y_baych.

A second question: Is it correct that all the labels are 0 for this batch?

@superctj
Copy link
Author

That was a typo. I found that as well yesterday but it didn't fix the error lol.

Yeah, I didn't shuffle the test set so it starts with images from label 0.

@superctj
Copy link
Author

Hi @beat-buesser. I create a minimal example to reproduce the error. Could you please take a look and see if you can reproduce the error?

@superctj
Copy link
Author

With the help of my labmate, we found the problem was with torch.no_grad, which prevents the attack from being able to backpropagate to the inputs. Do you have any suggestions other than removing with torch.no_grad or is there a better solution?

@beat-buesser
Copy link
Collaborator

Hi @superctj Thank you very much for the minimal example, that's great!

Do you mean the with torch.no_grad in https://github.com/superctj/error_demo/blob/620368e21b7ae56fbcdab28fcba1281ddfd42073/eval.py#L30 ?

@superctj
Copy link
Author

Yeah, after removing that line, I can run the PGD attack smoothly.

@beat-buesser
Copy link
Collaborator

Ok, that makes sense, great catch, I hadn't noticed it.

White-box attacks like ProjectedGradientDescent will not work inside of a with torch.no_grad: block because they are calculating loss or class gradients required to run their attack algorithm.

It is very likely possible that black-box attacks, like HopSkipJump, which don't require any gradient calculation by the framework will work inside of a with torch.no_grad: block. However since ART 1.3 PytorchClassifier.predict actually uses with torch.no_grad: inside of predict to take advantage of the faster model evaluation if gradients are disabled.

@superctj
Copy link
Author

That's good to know! Thank you very much for your time and patience. I appreciate it.

@beat-buesser beat-buesser added the question Further information is requested label Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants