Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sensitivities for structured matrix arguments structured #52

Open
ararslan opened this issue Jun 13, 2019 · 4 comments
Open

Make sensitivities for structured matrix arguments structured #52

ararslan opened this issue Jun 13, 2019 · 4 comments
Labels
design Requires some design before changes are made
Milestone

Comments

@ararslan
Copy link
Member

Quoting Will in #29:

FWIW, the other thing to think about is what is actually happening computationally under the hood. Ultimately the Diagonal matrix type doesn't use any off-diagonal elements when used in e.g. a matrix-matrix multiply - the Diagonal type simply doesn't allow you to have non-zero off-diagonal elements, so it's a slightly odd question to ask what happens if you perturb the off-diagonals by an infinitesimal amount (i.e. compute the gradient w.r.t. them).

It's this slightly weird situation in which thinking about a Diagonal matrix as a regular dense matrix that happens to contain zeros on its off-diagonals isn't really faithful to the semantics of the type (not sure if I've really phrased that correctly, but hopefully the gist is clear)

@tkf
Copy link
Contributor

tkf commented Sep 13, 2019

Since (say) *(x::Diagonal, y::AbstractVector) in LinearAlgebra is defined in terms of combination of operations whose rules are already defined, would just "nulling" automatically handle the situation?

frule(::typeof(*), ::Diagonal, ::AbstractVector) = nothing
rrule(::typeof(*), ::Diagonal, ::AbstractVector) = nothing

Ref a similar question I posted in Zygote: FluxML/Zygote.jl#316

@oxinabox
Copy link
Member

I think "disabling" the rules by returnuing nothing would do it,
assuming inside an AD framework.
Better would be to not though and just define the correct answer directly.

@tkf
Copy link
Contributor

tkf commented Sep 13, 2019

I think there are too many specializations in LinearAlgebra (and likely in many other in the wild). But maybe not so bad at least as a "midterm" solution? Looking at how Zygote handles broadcasting, I have a feeling that it'll take some time to support broadcasting properly...

If ChainRules.jl's stance is "rules specializations are welcome" I can make a PR to implement FluxML/Zygote.jl#316 in ChainRules.jl (after the overhaul #91).

@oxinabox
Copy link
Member

I think there are too many specializations in LinearAlgebra (and likely in many other in the wild). But maybe not so bad at least as a "midterm" solution?

yeah, I think bailing out sometimes is the right answer for ChainRules,
prob with a link to a TODO issue for these cases.

If ChainRules.jl's stance is "rules specializations are welcome" I can make a PR to implement FluxML/Zygote.jl#316 in ChainRules.jl (after the overhaul #91).

Yeah, that would be great.
(Eventually we will have a @rrule or something that will have close-to the same surface semantics as @adjoint see JuliaDiff/ChainRulesCore.jl#44, at that point those will be copy-paste able, no promise when though)

@nickrobinson251 nickrobinson251 added this to the v2 milestone Feb 8, 2020
@nickrobinson251 nickrobinson251 added the design Requires some design before changes are made label Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Requires some design before changes are made
Projects
None yet
Development

No branches or pull requests

4 participants