Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect gradient when using Flux.params #1251

Closed
raphaelchinchilla opened this issue Jun 24, 2022 · 5 comments
Closed

Incorrect gradient when using Flux.params #1251

raphaelchinchilla opened this issue Jun 24, 2022 · 5 comments
Labels
implicit using Params, Grads

Comments

@raphaelchinchilla
Copy link

I encounter a weird behavior when taking gradients using Flux.params. Here is a minimal reproducible example:

using Flux
using Statistics: mean


function Fp(x,p) 
    ω₀,W=p
    [x[1:1,:];-ω₀*x[2:2,:]]+W*x
end

function get_derivatives(x,p̂ᵢₙᵢₜ)
    p̂=deepcopy(p̂ᵢₙᵢₜ) 
    nu=ones(size(x))
    ps = Flux.params(p̂)   

    g_ps = Flux.gradient(ps) do
        mean(abs2,p̂[2]*x)+10*mean(abs2,Fp(x,p̂))+mean(nu.*Fp(x,p̂))
    end

    g_p̂ = Flux.gradient(p̂) do p̂
        mean(abs2,p̂[2]*x)+10*mean(abs2,Fp(x,p̂))+mean(nu.*Fp(x,p̂))
    end
    return g_ps,g_p̂
    
    return p̂,nu
end
p̂ᵢₙᵢₜ=[[1.0],[2. 3.;4. 5.]]
x=randn(2,10)
g_ps,g_p̂=get_derivatives(x,p̂ᵢₙᵢₜ)

If I query each of the derivatives, I obtain

julia> g_ps.grads
IdDict{Any, Any} with 4 entries:
  [2.0 3.0; 4.0 5.0]                             => [40.038 26.2856; 53.326 35.0164]
  Box([1.0 1.0 … 1.0 1.0; 1.0 1.0 … 1.0 1.0])    => RefValue{Any}((contents = [0.117313 0.00144669 … -0.166186 0.442534; 0.156417 0.00192892 … -0.221581 0.590045],))
  [1.0]                                          => [-35.0164]
  Box(Array{Float64}[[1.0], [2.0 3.0; 4.0 5.0]]) => RefValue{Any}((contents = Union{Nothing, Matrix{Float64}}[nothing, [2.54043 2.73642; 5.19804 4.48258]],))

and

 julia> g_p̂
(Array{Float64}[[-35.01637148376182], [42.578438152955385 29.02198799901979; 58.52408623552277 39.49894829943962]],)

I verified with Symbolics.jl, and the correct derivative is the one given by g_p̂. So, it is obvious with this simple example that Zygote is ignoring the term mean(abs2,p̂[2]*x), I presume because we only call p̂[2]. What is not obvious for me, is why the rest of the derivative appears in

  Box(Array{Float64}[[1.0], [2.0 3.0; 4.0 5.0]]) => RefValue{Any}((contents = Union{Nothing, Matrix{Float64}}[nothing, [2.54043 2.73642; 5.19804 4.48258]],))

(by rest, I mean that if one adds them both, we get the correct result) and why the derivative with respect to nu is being taken. If I remove return p̂,nu the two related entries in g_ps.grads disappear.

My issue:

  • While it is obvious with this MRE that I get the wrong gradient because of p̂[2], it took me two days to figure it out. Maybe it is not a bug, and for a developer it is obvious that this should not work. If it was possible to get a warning or (better yet) to fix it, it would be great

  • If someone could explain to me why Zygote is taking derivatives with respect to nu even though it is not a parameter, it would be great. And if there is a way to avoid it happening, would be even better. In this example nu is not used for anything, but in my more code I use it for something else and need to return the value.

@ToucheSir ToucheSir transferred this issue from FluxML/Flux.jl Jun 25, 2022
@ToucheSir
Copy link
Member

Dupe of #1232.

@ToucheSir ToucheSir closed this as not planned Won't fix, can't repro, duplicate, stale Jun 25, 2022
@raphaelchinchilla
Copy link
Author

@ToucheSir I had not seen #1232, which addresses my first question, I apologize for that.

However, my second question refers to what seems to me another bug, as Zygote is taking the derivative of more things than it is asked for, which (I imagine) impairs its performance.

@ToucheSir
Copy link
Member

ToucheSir commented Jun 27, 2022

When you use implicit params, Zygote will try to be smart about caching gradients of captured (in the callback of gradient) variables as well. This is yet another reason we recommend using "explicit" params wherever possible and avoiding implicit ones like the plague.

@raphaelchinchilla
Copy link
Author

Thanks!

Quick question: if I have a neural network (created using Flux's APIs like Dense and Chain), how can I take derivatives using explicit parameters? In my (real and more complicated) problem I have to take derivatives from both regular functions and neural networks.

I was going to take the gradients twice, one with explicit parameters and another using implicit, for the neural network, but it would be great if there was another way to do this.

@ToucheSir
Copy link
Member

You can differentiate through flux models the same way you would any other callable struct. The main difference is that Flux's current, built-in optimizers won't be able to handle the gradients Zygote spits out. Instead, use https://github.com/FluxML/Optimisers.jl, and see specifically this section of the docs for an example of Flux integration. Using Optimisers.jl will also future-proof your code, as it will be replacing the current Flux ones as default in the next major Flux version.

@mcabbott mcabbott added the implicit using Params, Grads label Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implicit using Params, Grads
Projects
None yet
Development

No branches or pull requests

3 participants