rules for specific ADs #270

willtebbutt · 2020-12-24T14:07:52Z

Lets assume that Diffractor is going to be better at some things than Zygote is and, as a consequence, there exist rules that we don't want to Diffractor to hit (since it generates perfectly good code anyway) but we do want Zygote to hit. Presently, we don't have a way to specify this.

It would be simple to achieve this via dispatch, and adding an additional argument to ones rrules, making them have the following signature:

abstract type AbstractAD end
struct ZygoteAD <: AbstractAD end
struct DiffractorAD <: AbstractAD end

rrule(::AbstractAD, ::typeof(f), args...) # applicable to all ADs
rrule(::ZygoteAD, ::typeof(f), args...) # only applicable to Zygote
rrule(::DiffractorAD, ::typeof(f), args...) # only applicable to Diffractor

This would also alleviate some of our existing headaches surrounding rrules for "very abstractly typed" arguments, since we could implement generic versions of things that we're sure ought to work (e.g. *(::Matrix{Float64}, Matrix{Float64})) in ChainRules, without requiring package authors to compromise on existing choices that they've made -- for example Zygote uses very abstract types for lots of things and, while I don't like it, it would be really very breaking to make changes to it at this point in time.

Recall that there are essentially 3 reasons (I think?) to implement a rule:

Mathematical insight leads to a completely different algorithm than would be derived automatically by any (existing) AD tool. Anything that uses the Implicit Function Theorem to derive efficient rrules that avoid storing intermediate state are good examples of this, e.g. rrules for optimisation and (nice) ODEs.
For some reason it's more efficient to manually write out the algorithm than to have a particular AD derive it.
An AD doesn't know how to differentiate a particular function, but you do, so you write a rule.

Rules of type 1 are those for which you would consider writing a very generic rrule, so you would probably write them to accept any AbstractAD.

Rules of type 2 are somewhat borderline and would probably need to be done on a case-by-case basis. For example, you might write a custom adjoint for a function involving a for-loop if using Zygote, but might not need the rule at all if using Diffractor. While Zygote can usually differentiate through for-loops, it tends to be slow.

Rules of type 3 are prime candidates for AD-specific rules, since different ADs are able to differentiate through different language features.

This is related to #68 in that we're talking about including some kind of additional information about what ADs to use, but the underlying problem that it addresses is somewhat different.

The text was updated successfully, but these errors were encountered:

sethaxen · 2021-02-04T10:34:14Z

I like the approach. I have started adding comments next to new rules explaining why they are added in the hopes of something like this in the future.

Rather than hardcoding the specific AD's in the spirit of ChainRules, perhaps it makes more sense to try to encode in the signature of the rule itself the reason why it is added, perhaps using a traits-based approach or a type union. Then some hot new AD doesn't depend on us recognizing it and modifying our rules. Rather, when hooking into ChainRules it could opt in to specific categories of rules.

oxinabox · 2021-02-04T12:05:15Z

Yeah traits could be a cool way to do this.
I am imagining paired with #68 one would declare a configured rrule. which you would use in place of rrule

Something like:
for Zygote:

configured_rrule = ConfiguredRRule(
    NoMutation, NoInplaceAccumulation, HatesLoops;
    forward_ad=Zygote._pushforward,  #Or maybe just `nothing` as not provided.
    reverse_ad=Zygote._pullback,
)

for Nabla:

configured_rrule = ConfiguredRRule(
    NoMutation, InplaceAccumulation, IsOkWithLoops;
    forward_ad=Nabla.fmad,
    reverse_ad=Nabla.∇,
)

willtebbutt · 2021-02-04T16:06:53Z

Traits make sense -- provided we're able to provide an escape hatch so that AD implementers can always express "no, I really do want to use only that very particular rule.".

mzgubic · 2021-06-11T12:39:17Z

Closed by #363

nickrobinson251 added design Requires some desgin before changes are made enhancement New feature or request labels Dec 28, 2020

willtebbutt mentioned this issue Jan 6, 2021

An additional approach to implementing rules JuliaDiff/ChainRules.jl#338

Open

mzgubic closed this as completed Jun 11, 2021

maximilian-gelbrecht mentioned this issue May 17, 2022

Rules for mutating functions, @adjoint! and its documentation FluxML/Zygote.jl#1228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rules for specific ADs #270

rules for specific ADs #270

willtebbutt commented Dec 24, 2020 •

edited

Loading

sethaxen commented Feb 4, 2021

oxinabox commented Feb 4, 2021 •

edited

Loading

willtebbutt commented Feb 4, 2021

mzgubic commented Jun 11, 2021

rules for specific ADs #270

rules for specific ADs #270

Comments

willtebbutt commented Dec 24, 2020 • edited Loading

sethaxen commented Feb 4, 2021

oxinabox commented Feb 4, 2021 • edited Loading

willtebbutt commented Feb 4, 2021

mzgubic commented Jun 11, 2021

willtebbutt commented Dec 24, 2020 •

edited

Loading

oxinabox commented Feb 4, 2021 •

edited

Loading