Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add project utility #1087

Closed
wants to merge 6 commits into from
Closed

Add project utility #1087

wants to merge 6 commits into from

Conversation

DhairyaLGandhi
Copy link
Member

@DhairyaLGandhi DhairyaLGandhi commented Sep 29, 2021

This PR revives working with RecursiveArrayTools (ref SciML/SciMLSensitivity.jl#493), and as discussed with @ChrisRackauckas , AD systems shouldn't be projecting internally, so we can reinstate RAT/ DiffEqSensitivity.

It is not known ahead of time where in the call stack a custom adjoint may be used or a custom type, we should allow for custom adjoints to manipulate code as they see fit.

Another change is that we allow complex gradients. These work most of the time, and the times they don't can be considered for bug fixes. I have a some methods which do technically do the same thing as _project in the accumulation stage rather than adding copies with every invocation of an adjoint.

@mcabbott
Copy link
Member

The conclusion of SciML/SciMLSensitivity.jl#493 is that RecursiveArrayTools needs a 3-line update to be fully compatible with ChainRulesCore 1.0.

If there are other issues, it would be helpful to describe and try to isolate them.

we allow complex gradients

Yes, Zygote still allows complex gradients. If you think something is wrong with them, you'll need to provide an example. I think you're proposing to re-break #342 etc, which isn't a great idea.

@DhairyaLGandhi
Copy link
Member Author

#342 has a pretty complete discussion about why the gradients are expected and correct, and with projection, it is possible to project it into the space of numbers the users want, which can be different for different cases. If the gradients are complex, the gradients are complex.

Beyond compat with RAT, it shouldn't be the AD that forces projection, but something either the compiler determines via promotion/ conversion or the user opts in via calling project explicitly.

The discussion in SciML/SciMLSensitivity.jl#493 also suggests that Zygote should be compatible with the broadcasting mechanism as RAT defines it since it satisfies the Base definition, even if by using fallbacks, which seems reasonable especially considering we were able to AD through it without special handling.

@ChrisRackauckas
Copy link
Member

The conclusion of SciML/SciMLSensitivity.jl#493 is that RecursiveArrayTools needs a 3-line update to be fully compatible with ChainRulesCore 1.0.

What 3 lines? I'll just add them.

The discussion in SciML/SciMLSensitivity.jl#493 also suggests that Zygote should be compatible with the broadcasting mechanism as RAT defines it since it satisfies the Base definition, even if by using fallbacks, which seems reasonable especially considering we were able to AD through it without special handling.

I thought the current issue was that broadcasting VectorOfArray produces an Array. This is the default behavior of AbstractArray objects though, so that's what I find confusing. If you define an AbstractArray object and just define indexing, x .+ x will give an Array, which is why I would've expected Zygote to handle that.

@mcabbott
Copy link
Member

Maybe we can discuss RAT on an issue there? The "3 lines" I mean are here: SciML/SciMLSensitivity.jl#493 (comment) .

The reason projection wants to be involved in broadcasting is that [1,2,3]' .+ 4 isa Matrix, but in reverse, we want to get back to an Adjoint vector. Likewise Diagonal([1,2,3]) .+ 4. To get this right it has to know some properties of x beyond its size. But the RAT examples linked involved iteration not broadcasting. Anyway, again, happy to discuss but perhaps this isn't the place.

@oxinabox
Copy link
Member

oxinabox commented Sep 29, 2021

This PR seems like it does at least 3 different things.
We probably should not try to talk about any of them here.
Because the conversation will go no where.

It should be broken up into several thing so each can be reviewed.

@ChrisRackauckas and @mcabbott should finish talking about ProjectTo for RecursiveArrayTools in SciML/SciMLSensitivity.jl#493

Beyond compat with RAT, it shouldn't be the AD that forces projection, but something either the compiler determines via promotion/ conversion or the user opts in via calling project explicitly.

@DhairyaLGandhi , @mcabbott and I should talk about this in a video call.
This is a pretty bold claim and goes against what a fair few people spent a whole lot of time thinking about.
Especially when you go back to the lead in discussions with worked out the complex number behavour (including but not limited to https://discourse.julialang.org/t/taking-complex-autodiff-seriously-in-chainrules/39317)
which was by people a fair bit more informed than me.
Could be true, could be false, could be we are talking about different things.
Almost certainly there are different understandings about what different things are doing and why.

But trying to talk this out in writing this thread is going to leads to a lot of time spent to reach a resolution that can be reached in a 30 minute call.

@ChrisRackauckas
Copy link
Member

Let's jump on a call in 5 minutes to flesh this out? I think it would be good to get some answers.

@CarloLucibello
Copy link
Member

closing as stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants