Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for ADTypes.jl #88

Open
oschulz opened this issue Jun 5, 2023 · 12 comments
Open

Adding support for ADTypes.jl #88

oschulz opened this issue Jun 5, 2023 · 12 comments
Labels
feature New feature or request

Comments

@oschulz
Copy link

oschulz commented Jun 5, 2023

The SciML ecosystem has had it's own "AD-backend" types since before AbstractDifferentiation.jl was created. Those have more recently been exposed in the standalone packages ADTypes.jl.

From a user point of view that means were currently have two incompatible ways of specifying which AD to use, and in packages/algorithms/applications that rely on both SciML and non-SciML-packages we can't easily offer a way for users to select AD.

SciML is unlikely to switch to AbstractDifferentiation (SciML/ADTypes.jl#8), it would require breaking changes and some corresponding types in ADTypes.jl and AbstractDifferentiation.jl have different content (so typedefs also won't work).

Speaking from a user point again, it would be really nice though to have a consistent way of selecting AD across the ecosystem. Could we add bi-directional conversion between ADTypes.AbstractADType and AbstractDifferentiation.AbstractBackend in AbstractDifferentiation.jl?

ADTypes.jl is extremely lightweight (no dependencies and a load time of 0.7 ms) so we could make it a (non-weak) dependency of AbstractDifferentiation.jl and add a

const ADBackendLike = Union{AbstractBackend, ADTypes.AbstractADType}

In AbstractDifferentiation itself (so packages can use that as a field type for algorithm configuration, etc.) and then provide bi-directional Base.convert methods for the AD-types the do correspond (not all are supported by both "systems" at the moment) in the AbstractDifferentiation package extensions.

Would a PR in this direction be welcome?

CC @devmotion, @ChrisRackauckas, @Vaibhavdixit02

@devmotion
Copy link
Member

It's a bit unclear to me in which scenarios this integration would be helpful. Naively, to me it seems if a package supports only ADType.jl then generally even with such an extension it would not support AbstractDifferentiation.jl automatically as well. So at some point, some package or the user would have to perform these conversions manually anyway?

It's also a bit unclear to me whether such convert definitions should rather be defined in a weak extension of AbstractDifferentiation or of ADTypes. Maybe Base.convert(::Type{T}, ...) should be defined in a package or an extension of a package that owns T.

@oschulz
Copy link
Author

oschulz commented Jun 5, 2023

It's a bit unclear to me in which scenarios this integration would be helpful. Naively, to me it seems if a package supports only ADType.jl then generally even with such an extension it would not support AbstractDifferentiation.jl automatically as well. So at some point, some package or the user would have to perform these conversions manually anyway?

It would, I think, be very helpful from the viewpoint of a third package that needs (e.g.) Optimisation (SciML) and also wants to do some of it's down AD for other parts of it's algorithms using AbstractDifferentiation.jl (since ADTypes.jl doesn't provide any functionality). And this third package wants to give it's users a choice regarding AD and not depend on specific AD-implementations. So currently, the user would then have to specify a Tuple{ADTypes.AbstractADType,AbstractDifferentiation.AbstractBackend} which certainly isn't convenient.

But if ADTypes.AbstractADType and AbstractDifferentiation.AbstractBackend can be used interchangeably, the third package can offer something like MyAlgorithm(some_args..., adbackend::ADBackendLike).

The goal here would be to avoid the growth of silos in the ecosystem (like we often see in the Python world). We can only take full advantage of multiple dispatch and make package compose well if we use common types and functions for fundamental things - I would argue that AD falls in that category.

A lot of packages currently roll their own AD-backend wrappers (LogDensityProblemsAd.jl, for example uses Val(ad_module_name). And while they may of course sometimes have to specialize things for (maybe only some of) the ADBackend using their own extensions, I do think it would be valuable to have an ecosytem-wide standard on how to say "use this AD, please". AbstractDifferentiation.jl and ADTypes.jl seem to be the two contenders here, so why not make them compatible with each other?

It's also a bit unclear to me whether such convert definitions should rather be defined in a weak extension of AbstractDifferentiation or of ADTypes. Maybe Base.convert(::Type{T}, ...) should be defined in a package or an extension of a package that owns T

I would argue that it should be the one or the other, splitting the convert methods between the two depending on which argument is first would make make the whole harder to maintain. I would think AbstractDifferentiation.jl to be the natural place to host this because it already has the necessary package extensions. Adding it to ADTypes.jl would mean creating lots of additional extensions there, which I fear would not be accepted.

In a perfect world, we'd have one common abstract API for this, the AD packages would implement it, and we wouldn't need any "extension magic". But I don't think we'll get there anytime soon, so I'd like to create what cohesion we can without breaking anything or requiring major changes across packages (that may then never happen).

@mohamed82008
Copy link
Member

A package that only hosts types with no functions seems odd to me. For these types to be useful, they need to be passed to functions which specialise their behaviour according to the input types. I think the path forward should be something like this:

  • Figure out why ADTypes types have different contents to AbstractDifferentiation types and unifying the contents. The 2 sets of types should be serving similar purposes of specifying AD package choices and settings.
  • Make the ADTypes names aliases to AbstractDifferentiation types or ideally get rid of it entirely and do a breaking release of downstream packages.
  • Rely on AbstractDifferentiation functions where possible in downstream packages depending on ADTypes instead of defining their own functions for gradients, Jacobians, etc. If there is a good use case not covered by AbstractDifferentiation, it's fine to have a separate implementation and then open an issue in AbstractDifferentiation.jl to discuss adding this feature here if it's a common enough use case.

@oschulz
Copy link
Author

oschulz commented Jun 5, 2023

A package that only hosts types with no functions seems odd to me. For these types to be useful, they need to be passed to functions which specialise their behaviour according to the input types.

From what I understand (I'm not affiliated with ADTypes, I come into this more from the user point of view) ADTypes just defines the types and then SciML packages like Optimizers each define internal implementations for them. So ADTypes not not useable to extenal packages directly - but if they want to use, e.g., Optimizers, they make still need those types.

I think the path forward should be something like this: [...]

I do agree with you @mohamed82008 . I'm just not sure on what time scale that can happen, so I thought a compatibility layer would be a useful first step.

@mohamed82008
Copy link
Member

I'm just not sure on what time scale that can happen, so I thought a compatibility layer would be a useful first step.

If the intention is there, it shouldn't take that much time at all. If there is no intention, then it will likely never happen. If there is no intention, you can roll your own conversion functions in your package or in a third party package which pirates both.

@oschulz
Copy link
Author

oschulz commented Jun 5, 2023

If there is no intention, you can roll your own conversion functions in your package or in a third party package which pirates both.

That's what I had hoped to avoid. :-)

@mohamed82008
Copy link
Member

I think most of the above action items are for the ADTypes maintainers.

@ChrisRackauckas
Copy link
Member

AbstractDifferentiation.jl and ADTypes.jl seem to be the two contenders here, so why not make them compatible with each other?

That would be the long term goal, but ultimately AbstractDifferentiation.jl needs a lot more work to be viable for that in most circumstances. It's just way too slow with way too much overhead. It's also too constrained right now. If you look at say the functions needed for MOI wrappers, it's missing most of them. I hope in the future we can use AbstractDifferentiation.jl for everything, but that's way too big of a lift right now, but I would support anyone who wants to try.

In the meantime, we needed a set way to talk about AD choices that is a higher level and leaves the implementation to the package. At a higher level, there's ForwardDiff, ReverseDiff, Enzyme, Zygote, etc. And within those, ReverseDiff has compiled mode and interpret mode, Enzyme has forward and reverse, etc. SparseDiffTools, Optimization, OrdinaryDiffEq, etc. all need to evolve away from autodiff = true to using standardized multi-valued logic for AD choices with the standard options. Using Symbols as Optim.jl does is error prone, as typos do not throw errors. Hence ADTypes.jl.

If someone would like to start on the path to unification, the first problem to solve is #41. Doing this even for just Jacobians would enable OrdinaryDiffEq, NonlinearSolve, and a whole lot of other packages to make use of AbstractDifferentiation.jl internally. However, without a way to define and initialize configs, the current setup with FiniteDiff.jl and ForwardDiff.jl is just so much faster in some cases (clocks in at like 10x) that it's simply not viable to adopt right now. So a caching interface is a first step.

But then the next step would be that AbstractDifferentatiation would need to adopt the standard options of the AD libraries. For example, compiled mode on ReverseDiff.jl used a lot in SciMLSensitivity.jl (in conjunction with a Cassette pass to prove its correctness) and so it cannot adopt AbstractDifferentiation for vjp code until all of these options are there.

If those pieces are there, Optimization.jl would probably still have to carry around some custom code for defining some weird MOI-specific functions until they get upstreamed, but we'd be in a much better spot. But for now, there's a bit too much of a lift and no one to solve it, so we'll need to take a short cut for a few years.

Also, I'm not convinced that in 3 years we will still need anything other than Enzyme, so investing too much time into solving this unification isn't something I would personally want to be spending my limited free time on.

So AbstractDifferentiation is a nice high level API as is, is good to recommend to many individuals, but optimized libraries need something else until it either catches up to all of the tricks that can be done, or Enzyme just becomes the standard.

@mohamed82008
Copy link
Member

It would be nice to open issues with specific examples where AD.jl introduces too much overhead or doesn't provide an API that's needed. I think I get the general idea but we need issues with MWEs and discussions to give new potential contributors a chance to improve the status quo. @oschulz are you willing to put in some time?

@ChrisRackauckas
Copy link
Member

I pointed to the issue and gave step by step instructions. It's rather straightforward as there's just no config type handling in AD.jl so the ForwardDiff.jl/FiniteDiff.jl calls with a pre-built config doesn't allocate the config and that can cut a good chunk of time out. Adding a caching interface to AD.jl that is able to carry forward options like chunk size choices and tags probably wouldn't take more than a week, but someone's still got to do it. So solving #41 needs to make an init function, but that init may be function-specific until the tricks from PreallocationTools are added. It would be a starting point though.

@oschulz
Copy link
Author

oschulz commented Jun 6, 2023

I think I get the general idea but we need issues with MWEs and discussions to give new potential contributors a chance to improve the status quo. @oschulz are you willing to put in some time?

I don't think I'm expert enough regarding the issues @ChrisRackauckas mentioned.

I do get that packages, or systems or packages, may need to use their own specialized AD-implementation beyond the default backend-bindings that AbstractDifferentiation provides. At the same time, the AbstractDifferentiation machinery can certainly be useful for many generic applications.

As for Enzyme (which I like very much), I hope it has a very bright future, and may really become the default AD. But there may always be other ways of performing AD, so have a selection mechanism will probably still be useful in the future (my guess).

My goal here is making things easy for AD-users, and for packages depending on packages that use AD - by adding a non-intrusive way that allows them to use either/both AbstractDifferentiation and ADTypes backends types to specify AD. My proposal above would have no measurable load-time impact on AbstractDifferentiation and ADTypes, would be usable right now, and could always go away if and when the two become one.

@oschulz
Copy link
Author

oschulz commented Jun 27, 2023

@mohamed82008 : you can roll your own conversion functions in your package [...]

I went with your advice and made AutoDiffOperators.jl. It supports (so far only a few) AD-"selectors" from both AbstractDifferentiation.jl and ADTypes.jl . Backend-implementations are still very non-optimized, though. AutoDiffOperators is meant to be as general as AbstractDifferentiation and is geared towards use-cases that mostly deal with flat real vectors and that do both JVP and VJP with the same function (though it also has a function for gradients).

@gdalle gdalle added the feature New feature or request label Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants