-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient disappear when there is an indexed-array in the loss function #1232
Comments
Somehow this confuses the implicit parameter handling: you add Can you avoid it by always storing and accessing the same global variable? (Or, even better, by avoiding implicit parameters entirely.) julia> using Zygote # doesn't need Flux
julia> ps = Zygote.Params([x[1], p]) # as above
Params([[0.0, 1.0], [0.5, 0.6]])
julia> gs = Zygote.gradient(loss, ps)
Grads(...)
julia> gs[x1] === nothing
true
julia> x = [x1, x1.+1]
2-element Vector{Vector{Float64}}:
[0.0, 1.0]
[1.0, 2.0]
julia> ps2 = Zygote.Params([x, p]) # store the outer array in Params
Params([[[0.0, 1.0], [1.0, 2.0]], [0.5, 0.6]])
julia> gs2 = Zygote.gradient(loss, ps2)
Grads(...)
julia> gs2[x] # fine
2-element Vector{Union{Nothing, Vector{Float64}}}:
[-1.0, 1.2000000000000002]
nothing
julia> gs2[x1]
ERROR: KeyError: key [0.0, 1.0] not found
julia> ps3 = Zygote.Params([x, x1, p]) # storing both does not help
Params([[[0.0, 1.0], [1.0, 2.0]], [0.0, 1.0], [0.5, 0.6]])
julia> gs3 = Zygote.gradient(loss, ps3)
Grads(...)
julia> gs3[x]
2-element Vector{Union{Nothing, Vector{Float64}}}:
[-1.0, 1.2000000000000002]
nothing
julia> gs3[x1] === nothing
true |
This issue arises because we don't do per-element tracking of implicit gradients for arrays of arrays. This is currently done for tuples, so it may be possible to use a similar pattern for arrays. As Michael mentioned though, I would highly recommend avoiding implicit params if you can. The example above will be both more efficient and less surprising with "explicit" params. |
Thanks Michael Abbott and Brian Chen. According to the two responses above, two alternative solutions to this problem have been found currently: Store outer params: x = [x1, x1.+1]
ps = Zygote.Params([x, p]) Store as x = (x1, x1.+1)
ps = Zygote.Params([x[1], p]) A simplification closer to our real situation would be store inner arrays as |
The alternative we're referring to with explicit params is neither of those, but this: function loss(x, p)
y = x[1] + p
sum(abs2, y .- 1)
end
dx, dp = Flux.gradient(loss, x, p) Or: function loss(x1, p)
y = x1 + p
sum(abs2, y .- 1)
end
dx_real, dp = Flux.gradient(loss, x_real, p) In other words, you pass in anything you want to get a gradient for/differentiate with respect to. This avoids all the issues mentioned in this thread and should be slightly more efficient too. You should almost never have to use |
Thanks for your explanations and suggestions. |
Hi, I am going to train some parameters specified by labels in a neural network. I found that when using an indexed array in the loss function, the corresponding gradient appears to be
nothing
. Here is a minimal working example:I have tried that the gradient descent is performed normally whether line
{1}
bey = x[1] # {2}
ory = x_real + p # {3}
. However, when line{1}
isy = x[1] + p
(# {2}
or# {3}
is not suitable for our condition), the results of the gradientgs.grads
is shown below:which indicates that the corresponding gradient of
x1
is not been performed.The text was updated successfully, but these errors were encountered: