Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules for turning lazy broadcasted to eager one? #104

Closed
tkf opened this issue Sep 13, 2019 · 6 comments
Closed

Rules for turning lazy broadcasted to eager one? #104

tkf opened this issue Sep 13, 2019 · 6 comments

Comments

@tkf
Copy link
Contributor

tkf commented Sep 13, 2019

Zygote.jl has some definitions for broadcasted such as this:

@adjoint function broadcasted(::typeof(tanh), x::Numeric)
  y = tanh.(x)
  y, ȳ -> (nothing, ȳ .* conj.(1 .- y.^2))
end

Notice ed, not broadcast. It's converting a lazy broadcast to eager one to store the temporary values.

I'm wondering if this kind of rules should be ported to ChainRules.jl. I think the way ChainRulesCore.jl express the rules cannot be used to auto-generate this kind of specializations (both before and after JuliaDiff/ChainRulesCore.jl#30). However, implementing this rule in ChainRules.jl means that AD engines cannot choose to use it or not.

Does it make sense to have it in ChainRules.jl? Or should it be done for each AD implementation?

(Maybe related to @willtebbutt's question here #12 (comment) ?)

@oxinabox
Copy link
Member

Yes, this is a reasonable rule to have.

The reason rrule (and similarly frule)
allows for having the first return value not always be directly calling the function,
is that sometimes there is a different way to call it that makes doing the pullback (similar pushforward) much faster,
and without unreasonable overhead (for some definition of unreasonable)

However, implementing this rule in ChainRules.jl means that AD engines cannot choose to use it or not.

Not entirely true, AD engines can opt out of particular ChainRules rules,
e.g. at least initially Zygote will be opting out of the 6 that return Wirtinger.
And also probably a bunch of other things that rely on the specifics of the AD internals too.
(Most of this is done by just defining things in Zygotes rule system (i.e. directly overloading _forward, which takes precidence over using ChainRules' definitions))

While opting out has to more or less be on a case by case basic,
I think that is actually fine for this case.
Someone notices a particular performance issue that is tied to a particular behavour in a particular instancem, and they opt out of thjat instance.

At some point we might want to consider attaching metadata via say traits to rules to make opt-ing out of particular groups easier. (Or maybe to allow use to define a meta_rrule(traits...) that gives back an new general purpose rrule function that will follow.).
But that is a way down the road.

@tkf
Copy link
Contributor Author

tkf commented Sep 13, 2019

Not entirely true, AD engines can opt out of particular ChainRules rules,

i.e. directly overloading _forward, which takes precidence over using ChainRules' definitions

I see. This is a really nice way of implementing it. Thanks for pointing this out.

@tkf
Copy link
Contributor Author

tkf commented Sep 27, 2019

Is it solved?

@oxinabox
Copy link
Member

I think so?
The question was:

Does it make sense to have it in ChainRules.jl?

the answer was:

Yes, this is a reasonable rule to have.

@tkf
Copy link
Contributor Author

tkf commented Sep 27, 2019

I see. That makes sense. I was somehow thinking this would be closed by porting some related rules from Zyogte. But that's not the OP said and I guess we don't need an individual issue to track it.

@oxinabox
Copy link
Member

I do need to sit down and write a script to find all rules from Zygote that we haven't ported from Nabla alrrady.
But yes that is another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants