Parameter Sharing breaks `destructure` #1767

avik-pal · 2021-11-17T21:07:26Z

MWE:

using Flux

struct Model{A}
    a::A
    b::A
end

Flux.@functor Model

(m::Model)(x) = m.a(x) .+ m.b(x)

d = Dense(1, 1)
x = rand(Float32, 1, 1)

# Sharing the parameters
model = Model(d, d)

# Works
Flux.gradient(() -> sum(model(x)), Flux.params(model)).grads

p, re = Flux.destructure(model)

# Fails
Flux.gradient(p -> sum(re(p)(x)), p).grads

Stacktrace:

┌ Warning: Expected 2 params, got 3
└ @ Flux ~/.julia/packages/Flux/BPPNj/src/utils.jl:647
ERROR: DimensionMismatch("variable with size(x) == (2,) cannot have a gradient with size(dx) == (3,)")
Stacktrace:
 [1] (::ChainRulesCore.ProjectTo{AbstractArray, NamedTuple{(:element, :axes), Tuple{ChainRulesCore.ProjectTo{Float32, NamedTuple{(), Tuple{}}}, Tuple{Base.OneTo{Int64}}}}})(dx::Vector{Float32})
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/7ZiwT/src/projection.jl:226
 [2] _project
   @ ~/.julia/packages/Zygote/AlLTp/src/compiler/chainrules.jl:182 [inlined]
 [3] map(f::typeof(Zygote._project), t::Tuple{Vector{Float32}}, s::Tuple{Vector{Float32}})
   @ Base ./tuple.jl:246
 [4] gradient(f::Function, args::Vector{Float32})
   @ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface.jl:77
 [5] top-level scope
   @ REPL[10]:2
 [6] top-level scope
   @ ~/.julia/packages/CUDA/2C5YQ/src/initialization.jl:52

ToucheSir · 2021-11-18T00:12:45Z

The key line is https://github.com/FluxML/Flux.jl/blob/master/src/utils.jl#L649. Because Zygote is blissfully unaware of the tying, it will return a separate gradient for each layer. However, since the gradient for the biases are Fills and FillArrays are bits types, they hash by content rather than address in an IdDict and thus only 3 params (model.a.weight, model.a.bias and model.b.weight) are retained by fmap.

Fixing this would require a few things. First, passing some additional additional metadata (e.g. offsets of each param) to _restructure for aliasing tracking. Second, excluding types which calculate objectid based on value (i.e. non-mutable types) from caching in fmap¹. And third, accumulate gradients for tied parameters in _restructure. This would ideally be handled on the AD, but because it has a custom @adjoint we've effectively opted out of that assistance.

I've experimented with this as part of (the very experimental) https://github.com/FluxML/Functors.jl/pull/27, but it shouldn't be hard to add to Functors proper. ↩

mcabbott · 2022-01-16T17:08:28Z

are Fills and FillArrays are bits types,

Note that the problem is worse than this. Even with dense arrays, it's easy for two parameters to get the same gradient, e.g. if they enter as f(x + y), then the same array will be used for both. So you can gradients being === even if all parameters are distinct.

Conversely, with shared parameters in the model, the present structure of fmap means that it never visits the later ones, which is wrong for gradients. The gradients from different occurrences of x in the loss need to be added.

ToucheSir · 2022-01-16T20:26:58Z

Exactly. There's no getting around either closing over the original structure or creating a new auxiliary one for use in co-iterating over the gradients and determining what goes where.

mcabbott · 2022-01-16T20:48:19Z

Maybe also worth noting that the notion of sharing in Functors is a===b, where these are leaflike AbstractArray{<:Number}. So it will not notice that W and W' are the same data.

avik-pal changed the title ~~Parameter Sharing breaks with destructure~~ Parameter Sharing breaks destructure Nov 17, 2021

ToucheSir added discussion gradients bug labels Nov 17, 2021

mcabbott mentioned this issue Jan 16, 2022

Flux.destructure's restructure fails in the gradient if loss does not use all parameters #1826

Closed

mcabbott mentioned this issue Mar 8, 2022

Use destructure from Optimisers.jl #1901

Merged

mcabbott closed this as completed in #1901 Mar 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter Sharing breaks `destructure` #1767

Parameter Sharing breaks `destructure` #1767

avik-pal commented Nov 17, 2021

ToucheSir commented Nov 18, 2021 •

edited

Loading

mcabbott commented Jan 16, 2022

ToucheSir commented Jan 16, 2022

mcabbott commented Jan 16, 2022

Parameter Sharing breaks destructure #1767

Parameter Sharing breaks destructure #1767

Comments

avik-pal commented Nov 17, 2021

ToucheSir commented Nov 18, 2021 • edited Loading

Footnotes

mcabbott commented Jan 16, 2022

ToucheSir commented Jan 16, 2022

mcabbott commented Jan 16, 2022

Parameter Sharing breaks `destructure` #1767

Parameter Sharing breaks `destructure` #1767

ToucheSir commented Nov 18, 2021 •

edited

Loading