-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect gradient when using Flux.params #1251
Comments
Dupe of #1232. |
@ToucheSir I had not seen #1232, which addresses my first question, I apologize for that. However, my second question refers to what seems to me another bug, as Zygote is taking the derivative of more things than it is asked for, which (I imagine) impairs its performance. |
When you use implicit params, Zygote will try to be smart about caching gradients of captured (in the callback of |
Thanks! Quick question: if I have a neural network (created using Flux's APIs like Dense and Chain), how can I take derivatives using explicit parameters? In my (real and more complicated) problem I have to take derivatives from both regular functions and neural networks. I was going to take the gradients twice, one with explicit parameters and another using implicit, for the neural network, but it would be great if there was another way to do this. |
You can differentiate through flux models the same way you would any other callable struct. The main difference is that Flux's current, built-in optimizers won't be able to handle the gradients Zygote spits out. Instead, use https://github.com/FluxML/Optimisers.jl, and see specifically this section of the docs for an example of Flux integration. Using Optimisers.jl will also future-proof your code, as it will be replacing the current Flux ones as default in the next major Flux version. |
I encounter a weird behavior when taking gradients using Flux.params. Here is a minimal reproducible example:
If I query each of the derivatives, I obtain
and
I verified with Symbolics.jl, and the correct derivative is the one given by
g_p̂
. So, it is obvious with this simple example that Zygote is ignoring the termmean(abs2,p̂[2]*x)
, I presume because we only callp̂[2]
. What is not obvious for me, is why the rest of the derivative appears in(by rest, I mean that if one adds them both, we get the correct result) and why the derivative with respect to
nu
is being taken. If I removereturn p̂,nu
the two related entries ing_ps.grads
disappear.My issue:
While it is obvious with this MRE that I get the wrong gradient because of
p̂[2]
, it took me two days to figure it out. Maybe it is not a bug, and for a developer it is obvious that this should not work. If it was possible to get a warning or (better yet) to fix it, it would be greatIf someone could explain to me why Zygote is taking derivatives with respect to
nu
even though it is not a parameter, it would be great. And if there is a way to avoid it happening, would be even better. In this examplenu
is not used for anything, but in my more code I use it for something else and need to return the value.The text was updated successfully, but these errors were encountered: