Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch normalization gradients #6398

Open
botev opened this issue Sep 8, 2017 · 5 comments
Open

Batch normalization gradients #6398

botev opened this issue Sep 8, 2017 · 5 comments

Comments

@botev
Copy link
Contributor

botev commented Sep 8, 2017

Trying to do adversarial attacks on some batch normalized net I somehow got this:

Traceback (most recent call last):
  File "scripts/adverserial_samples_targeted_sfgs.py", line 213, in <module>
    main(**vars(parser.parse_args()))
  File "scripts/adverserial_samples_targeted_sfgs.py", line 146, in main
    grad = T.grad(objectives.sum(), adv_samples)
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 605, in grad
    grad_dict, wrt, cost_name)
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1371, in _populate_grad_dict
    rval = [access_grad_cache(elem) for elem in wrt]
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1371, in <listcomp>
    rval = [access_grad_cache(elem) for elem in wrt]
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1326, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1021, in access_term_cache
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1021, in <listcomp>
    output_grads = [access_grad_cache(var) for var in node.outputs]
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1326, in access_grad_cache
    term = access_term_cache(node)[idx]
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1162, in access_term_cache
    new_output_grads)
  File "/home/abotev/work/python/Theano/theano/scan_module/scan_op.py", line 2126, in L_op
    dC_dinps_t = compute_all_gradients(known_grads)
  File "/home/abotev/work/python/Theano/theano/scan_module/scan_op.py", line 2048, in compute_all_gradients
    null_gradients='return')
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 605, in grad
    grad_dict, wrt, cost_name)
  File "/home/abotev/work/python/Theano/theano/gradient.py", line 1371, in _populate_grad_dict
File "/home/abotev/work/python/Theano/theano/tensor/nnet/bn.py", line 598, in make_node
    dy = as_tensor_variable(dy)
  File "/home/abotev/work/python/Theano/theano/tensor/basic.py", line 158, in as_tensor_variable
    "Variable type field must be a TensorType.", x, x.type)
theano.tensor.var.AsTensorError: ('Variable type field must be a TensorType.', <DisconnectedType>, <theano.gradient.DisconnectedType object at 0x7f17d56766a0>)

Any ideas where to look on ideas in general?

@nouiz
Copy link
Member

nouiz commented Sep 8, 2017 via email

@botev
Copy link
Contributor Author

botev commented Sep 8, 2017

The what I'm taking the gradient of is a shared variable containing a small number of images. Technically that should not be the issue. Also, it is strange that this comes all the way in one of the BNGrad Ops not somewhere early if that was the issue right? I think that I'm using the theano batch_norm_test is it possible that it does not have a gradient and only the train does?

@nouiz
Copy link
Member

nouiz commented Sep 8, 2017 via email

@botev
Copy link
Contributor Author

botev commented Sep 9, 2017

Not really, at least I don't have a simple example at all. I can try to get this, but in short, I have a wide res net which is evaluated in a scan for doing MC-dropout. I'll try set the samples to 1 and unroll it instead of scan it to see if I get anything more informative.

@botev
Copy link
Contributor Author

botev commented Sep 11, 2017

@nouiz Hmm, this looks to be the same issue as like the OpFromGraph error here #6400 . A code to reproduce this:

x = T.fmatrix("x")
gamma = T.fvector("g")
beta = T.fvector("beta")
mean = T.vector("mean")
var = T.vector("var")
bn, m, _, _, _ = T.nnet.bn.batch_normalization_train(x, gamma, beta, running_mean=mean, running_var=var)
s = T.grad(T.sum(bn), x)
k = T.Lop(s, x, x)
# Or this
s = T.grad(T.sum(m), x)

I think this might be a general issue with Theano when you take a gradient of an Operator which has multiple outputs but only one the outputs plays a role in the cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants