Gradient disappear when there is an indexed-array in the loss function #1232

huangyxi · 2022-05-23T06:46:38Z

Hi, I am going to train some parameters specified by labels in a neural network. I found that when using an indexed array in the loss function, the corresponding gradient appears to be nothing. Here is a minimal working example:

using Flux

x1 = [0., 1.]
x = [x1]
x_real = x[1]
p = [0.5, 0.6]

function loss()
    y = x[1] + p # line {1}
    sum(abs2, y .- 1)
end

ps = Flux.params(x[1], p)
gs = Flux.gradient(loss, ps)

I have tried that the gradient descent is performed normally whether line {1} be y = x[1] # {2} or y = x_real + p # {3}. However, when line {1} is y = x[1] + p (# {2} or # {3} is not suitable for our condition), the results of the gradient gs.grads is shown below:

IdDict{Any, Any} with 4 entries:
  [0.0, 1.0] => nothing
  [0.5, 0.6] => [-1.0, 1.2]
  :(Main.x)  => [[-1.0, 1.2]]
  :(Main.p)  => [-1.0, 1.2]

which indicates that the corresponding gradient of x1 is not been performed.

The text was updated successfully, but these errors were encountered:

mcabbott · 2022-05-23T13:57:36Z

Somehow this confuses the implicit parameter handling: you add x1 to Params, but you access it via a different global variable, x, which is not in Params. I guess that's a bug.

Can you avoid it by always storing and accessing the same global variable? (Or, even better, by avoiding implicit parameters entirely.)

julia> using Zygote  # doesn't need Flux

julia> ps = Zygote.Params([x[1], p])  # as above
Params([[0.0, 1.0], [0.5, 0.6]])

julia> gs = Zygote.gradient(loss, ps)
Grads(...)

julia> gs[x1] === nothing
true

julia> x = [x1, x1.+1]
2-element Vector{Vector{Float64}}:
 [0.0, 1.0]
 [1.0, 2.0]
 
julia> ps2 = Zygote.Params([x, p])  # store the outer array in Params
Params([[[0.0, 1.0], [1.0, 2.0]], [0.5, 0.6]])

julia> gs2 = Zygote.gradient(loss, ps2)
Grads(...)

julia> gs2[x]  # fine
2-element Vector{Union{Nothing, Vector{Float64}}}:
 [-1.0, 1.2000000000000002]
 nothing

julia> gs2[x1]
ERROR: KeyError: key [0.0, 1.0] not found

julia> ps3 = Zygote.Params([x, x1, p])  # storing both does not help
Params([[[0.0, 1.0], [1.0, 2.0]], [0.0, 1.0], [0.5, 0.6]])

julia> gs3 = Zygote.gradient(loss, ps3)
Grads(...)

julia> gs3[x]
2-element Vector{Union{Nothing, Vector{Float64}}}:
 [-1.0, 1.2000000000000002]
 nothing

julia> gs3[x1] === nothing
true

ToucheSir · 2022-05-23T16:10:08Z

This issue arises because we don't do per-element tracking of implicit gradients for arrays of arrays. This is currently done for tuples, so it may be possible to use a similar pattern for arrays. As Michael mentioned though, I would highly recommend avoiding implicit params if you can. The example above will be both more efficient and less surprising with "explicit" params.

huangyxi · 2022-05-24T12:28:00Z

Thanks Michael Abbott and Brian Chen. According to the two responses above, two alternative solutions to this problem have been found currently:

Store outer params:

x = [x1, x1.+1]
ps = Zygote.Params([x, p])

Store as Tuple:

x = (x1, x1.+1)
ps = Zygote.Params([x[1], p])

A simplification closer to our real situation would be store inner arrays as Dict values, since the number of inner arrays is variable and it is not possible to assign a fixed number of variables in the code. Though this problem have been solved thanks to you, it would be nice if the elements of collections more than Tuple could be supported as params directly in the future.

ToucheSir · 2022-05-24T15:16:43Z

The alternative we're referring to with explicit params is neither of those, but this:

function loss(x, p)
    y = x[1] + p
    sum(abs2, y .- 1)
end

dx, dp = Flux.gradient(loss, x, p)

Or:

function loss(x1, p)
    y = x1 + p
    sum(abs2, y .- 1)
end

dx_real, dp = Flux.gradient(loss, x_real, p)

In other words, you pass in anything you want to get a gradient for/differentiate with respect to. This avoids all the issues mentioned in this thread and should be slightly more efficient too. You should almost never have to use params or Params with Zygote unless you're relying on a higher-level library which requires them.

huangyxi · 2022-05-26T03:03:35Z

Thanks for your explanations and suggestions.

mcabbott transferred this issue from FluxML/Flux.jl May 23, 2022

samanklesaria mentioned this issue Jun 10, 2022

Gradient missing when using Params within Dicts #1241

Closed

ToucheSir mentioned this issue Jun 25, 2022

Incorrect gradient when using Flux.params #1251

Closed

mcabbott added the implicit using Params, Grads label Jul 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient disappear when there is an indexed-array in the loss function #1232

Gradient disappear when there is an indexed-array in the loss function #1232

huangyxi commented May 23, 2022

mcabbott commented May 23, 2022

ToucheSir commented May 23, 2022

huangyxi commented May 24, 2022

ToucheSir commented May 24, 2022

huangyxi commented May 26, 2022

Gradient disappear when there is an indexed-array in the loss function #1232

Gradient disappear when there is an indexed-array in the loss function #1232

Comments

huangyxi commented May 23, 2022

mcabbott commented May 23, 2022

ToucheSir commented May 23, 2022

huangyxi commented May 24, 2022

ToucheSir commented May 24, 2022

huangyxi commented May 26, 2022