Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pullback's back returns unexpected size if some parameters are not used #1601

Closed
garibarba opened this issue May 8, 2021 · 3 comments · Fixed by #1901
Closed

pullback's back returns unexpected size if some parameters are not used #1601

garibarba opened this issue May 8, 2021 · 3 comments · Fixed by #1901

Comments

@garibarba
Copy link

This behavior can lead to a DimensionMismatch error in DiffEqSensitivity. I don't know if this is intended behaviour, but I feel like there should be a warning.

Minimal example:

using Flux, Zygote

struct TwoDenses
    dense::Dense
    dense2::Dense
end

Flux.@functor TwoDenses

function (m::TwoDenses)(x)
    out = m.dense(x)
end

model = TwoDenses(
    Dense(3,1),
    Dense(3,2)
)
p, re = Flux.destructure(model)

x = [1., 2., 3.]
y, back = Zygote.pullback((x, p) -> re(p)(x), x, p)

dy = [4.]
dx, dp = back(dy)
@show length(p), length(dp)

Returns:

(length(p), length(dp)) = (12, 4)

Edit:
To be clear, I would have expected:

(length(p), length(dp)) = (12, 12)
@trohwer
Copy link

trohwer commented May 11, 2021

I think, I am seeing a similar issue:

julia> x=[1.0]

julia> typeof(Flux.pullback( ()->0.0*sum(x), params(x) )[2](1.0)[x])
FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}

julia> typeof(Flux.pullback( ()->0.0, params(x) )[2](1.0)[x])
Nothing

The second gradient should be the same as the first, a vector of length 1 with zero component, not the nothing object.
We are differentiating a constant float expression with respect to the parameter x.
Otherwise one has to create special cases for constant polynomials etc.

Likely this issue is in the initialization of the cache in the pullback function (interface.jl):
cache(cx)[p] = nothing
If I change that to
cache(cx)[p] = zero(p)
it works better for me, but the type of the zeros is probably not right.

@mcabbott
Copy link
Member

mcabbott commented May 11, 2021

Zygote uses nothing as a kind of zero, so that when it's sure the gradient is always zero, it doesn't bother tracing it further backwards. So this isn't a bug, it's just a surprise if you don't expect it:

julia> gradient(x -> 1, rand(3))
(nothing,)

julia> gradient(x -> 0*sum(x), rand(3))
(Fill(0.0, 3),)

julia> gradient(x -> 0*sum(abs2, x), rand(3))
([0.0, 0.0, 0.0],)

Flux's destructure is pretty sloppy about sizes of things, e.g. re(rand(1000)) runs without warning; there have been attempts to fix this. Although I'm not sure this particular weirdness has been reported reported before. Smaller example:

julia> v, re = Flux.destructure(([1,2], [3,4,5]))
([1, 2, 3, 4, 5], Flux.var"#61#63"{Tuple{Vector{Int64}, Vector{Int64}}}(([1, 2], [3, 4, 5])))

julia> Zygote.gradient(v -> sum(re(v)[1]), rand(5))
([1.0, 1.0],)

julia> Zygote.gradient(v -> sum(re(v)[2]), rand(500))
([1.0, 1.0, 1.0],)

@mcabbott mcabbott transferred this issue from FluxML/Zygote.jl May 20, 2021
@ToucheSir
Copy link
Member

_restructure is not exactly picky about how large the input vector is as long as it's large enough. I assume there wouldn't be any harm in adding an assert or other error check in there, will put in a PR next week if someone doesn't get to it first.

bors bot added a commit that referenced this issue Jun 17, 2021
1616: Warn on reconstruct length mismatch r=CarloLucibello a=ToucheSir

Ref. #1601. This is kept as a plain warning for backwards compat, but perhaps we want to consider it a bugfix and error/depwarn instead?

### PR Checklist

- [x] Tests are added
- [ ] Entry in NEWS.md
- [ ] Documentation, if applicable
- [ ] API changes require approval from a committer (different from the author, if applicable)


Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants