Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rules for specific ADs #270

Closed
willtebbutt opened this issue Dec 24, 2020 · 4 comments
Closed

rules for specific ADs #270

willtebbutt opened this issue Dec 24, 2020 · 4 comments
Labels
design Requires some desgin before changes are made enhancement New feature or request

Comments

@willtebbutt
Copy link
Member

willtebbutt commented Dec 24, 2020

Lets assume that Diffractor is going to be better at some things than Zygote is and, as a consequence, there exist rules that we don't want to Diffractor to hit (since it generates perfectly good code anyway) but we do want Zygote to hit. Presently, we don't have a way to specify this.

It would be simple to achieve this via dispatch, and adding an additional argument to ones rrules, making them have the following signature:

abstract type AbstractAD end
struct ZygoteAD <: AbstractAD end
struct DiffractorAD <: AbstractAD end

rrule(::AbstractAD, ::typeof(f), args...) # applicable to all ADs
rrule(::ZygoteAD, ::typeof(f), args...) # only applicable to Zygote
rrule(::DiffractorAD, ::typeof(f), args...) # only applicable to Diffractor

This would also alleviate some of our existing headaches surrounding rrules for "very abstractly typed" arguments, since we could implement generic versions of things that we're sure ought to work (e.g. *(::Matrix{Float64}, Matrix{Float64})) in ChainRules, without requiring package authors to compromise on existing choices that they've made -- for example Zygote uses very abstract types for lots of things and, while I don't like it, it would be really very breaking to make changes to it at this point in time.

Recall that there are essentially 3 reasons (I think?) to implement a rule:

  1. Mathematical insight leads to a completely different algorithm than would be derived automatically by any (existing) AD tool. Anything that uses the Implicit Function Theorem to derive efficient rrules that avoid storing intermediate state are good examples of this, e.g. rrules for optimisation and (nice) ODEs.
  2. For some reason it's more efficient to manually write out the algorithm than to have a particular AD derive it.
  3. An AD doesn't know how to differentiate a particular function, but you do, so you write a rule.

Rules of type 1 are those for which you would consider writing a very generic rrule, so you would probably write them to accept any AbstractAD.

Rules of type 2 are somewhat borderline and would probably need to be done on a case-by-case basis. For example, you might write a custom adjoint for a function involving a for-loop if using Zygote, but might not need the rule at all if using Diffractor. While Zygote can usually differentiate through for-loops, it tends to be slow.

Rules of type 3 are prime candidates for AD-specific rules, since different ADs are able to differentiate through different language features.

This is related to #68 in that we're talking about including some kind of additional information about what ADs to use, but the underlying problem that it addresses is somewhat different.

@nickrobinson251 nickrobinson251 added design Requires some desgin before changes are made enhancement New feature or request labels Dec 28, 2020
@sethaxen
Copy link
Member

sethaxen commented Feb 4, 2021

I like the approach. I have started adding comments next to new rules explaining why they are added in the hopes of something like this in the future.

Rather than hardcoding the specific AD's in the spirit of ChainRules, perhaps it makes more sense to try to encode in the signature of the rule itself the reason why it is added, perhaps using a traits-based approach or a type union. Then some hot new AD doesn't depend on us recognizing it and modifying our rules. Rather, when hooking into ChainRules it could opt in to specific categories of rules.

@oxinabox
Copy link
Member

oxinabox commented Feb 4, 2021

Yeah traits could be a cool way to do this.
I am imagining paired with #68 one would declare a configured rrule. which you would use in place of rrule

Something like:
for Zygote:

configured_rrule = ConfiguredRRule(
    NoMutation, NoInplaceAccumulation, HatesLoops;
    forward_ad=Zygote._pushforward,  #Or maybe just `nothing` as not provided.
    reverse_ad=Zygote._pullback,
)

for Nabla:

configured_rrule = ConfiguredRRule(
    NoMutation, InplaceAccumulation, IsOkWithLoops;
    forward_ad=Nabla.fmad,
    reverse_ad=Nabla.∇,
)

@willtebbutt
Copy link
Member Author

Traits make sense -- provided we're able to provide an escape hatch so that AD implementers can always express "no, I really do want to use only that very particular rule.".

@mzgubic
Copy link
Member

mzgubic commented Jun 11, 2021

Closed by #363

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Requires some desgin before changes are made enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants