Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finite differences and autodiff - bake into *DifferentiableFunction? #329

Closed
pkofod opened this issue Dec 26, 2016 · 29 comments · Fixed by #337
Closed

Finite differences and autodiff - bake into *DifferentiableFunction? #329

pkofod opened this issue Dec 26, 2016 · 29 comments · Fixed by #337
Assignees

Comments

@pkofod
Copy link
Member

pkofod commented Dec 26, 2016

I was having a first go at fixing a lot of the issues surrounding (coming up with an improved for) *DifferentiableFunctions, and I realized that maybe we want to remove the autodiff keyword from Optim.Options. Currently you can use finite differences by DifferentiableFunction(f), but you have to pass a function and use Optim.Options(..., autodiff=true, ...) to use ForwardDiff. Wouldn't it make more sense to have something like DifferentableFunction(f; autodiff = false)? Going one step further, we might want to make an even more general interface so you could pass functions from any autodiff package (you might need to do some anonymous function magic to conform to the setup).

Edit: of course it could just as well go the other way: move the finite difference decision to Optim.Options if that is easier for people.

@anriseth
Copy link
Contributor

We're constructing a more "intelligent" objective function here, and to me it makes sense (and provides more flexibility?) to let the DifferentiableFunction object decide how the objective and derivatives should be approximated. This is not only for derivative approximation, but also approximations of expensive functions, expected values etc. This might go against some algorithms in the literature though, where the approximation method is part of the solver, and potentially "dynamic".

@ChrisRackauckas
Copy link
Contributor

We're constructing a more "intelligent" objective function here, and to me it makes sense (and provides more flexibility?) to let the DifferentiableFunction object decide how the objective and derivatives should be approximated.

No, it can turn out to be less flexible since many times the reason to turn off autodifferentiation is because you know that your function cannot handle Julia's type system (say, your objective function is the loss between data and the solution to an ODE via Sundials). If the system tries to be too smart, it will error, meaning you'd have to place it in a try/catch block which can hinder performance. So it's best left as a boolean flag.

@pkofod
Copy link
Member Author

pkofod commented Jan 7, 2017

No, it can turn out to be less flexible since many times the reason to turn off autodifferentiation is because you know that your function cannot handle Julia's type system (say, your objective function is the loss between data and the solution to an ODE via Sundials). If the system tries to be too smart, it will error, meaning you'd have to place it in a try/catch block which can hinder performance. So it's best left as a boolean flag.

So you'd vote for something like autodiff = false? Right now we only allow for either finite differences through Calculus or automatic differentiation ForwardDiff. To me those are quite standard for what they do, and if someone wants to use something else, I think it makes sense to say: look, we provide these convenience keywords, but if you want something else, by all means construct your gradient that way, and pass it yourself. Beyond that we would either have to manage a lot of symbols, or create a system based on dispatch (could be done).

@ChrisRackauckas
Copy link
Contributor

ChrisRackauckas commented Jan 7, 2017

Right now we only allow for either finite differences through Calculus or automatic differentiation ForwardDiff. To me those are quite standard for what they do, and if someone wants to use something else, I think it makes sense to say: look, we provide these convenience keywords, but if you want something else, by all means construct your gradient that way, and pass it yourself. Beyond that we would either have to manage a lot of symbols, or create a system based on dispatch (could be done).

That's exactly how I'm doing it in DiffEq-land. (Note that Calculus is moving the finite difference code to https://github.com/johnmyleswhite/FiniteDiff.jl , but the same sentiment applies). I don't think a dispatch system is needed here because a user can just provide a Jacobian etc. if they want.

@mlubin
Copy link
Contributor

mlubin commented Jan 7, 2017

The recent ReverseDiff package is technically superior in many cases to ForwardDiff for computing gradients, but I wouldn't be comfortable making the choice automatic. So it wouldn't be unreasonable to have multiple options to autodiff= in the near future.
CC @jrevels

@ChrisRackauckas
Copy link
Contributor

The recent ReverseDiff package is technically superior in many cases to ForwardDiff for computing gradients

And DiffEq-landia only tends to calculate Jacobians and not gradients, so that is a good point that for optimization routines you may want to use ReverseDiff instead, or at least give the option.

@pkofod
Copy link
Member Author

pkofod commented Jan 7, 2017

Keyword with symbols it is then. If we want users to be able to change parameters in the respective approaches, we could create types with the relevant fields at a later point. Symbols for now.

@pkofod pkofod self-assigned this Jan 21, 2017
@jrevels
Copy link
Contributor

jrevels commented Feb 2, 2017

I wish I had gotten around to posting this before folks started to do some work in this area - apologies for my slow response.

This probably won't be a popuIar opinion, but I think the autodiff-keyword style API is fundamentally the wrong way to go for packages which differentiate user-provided functions. It implies the callee has more knowledge than the caller about the objective function, which obviously isn't the case.

Thus, my vote is for the autodiff-keyword API to be totally deprecated in favor of the existing "pass us your derivative functions" API. Instead of telling users to type autodiff=:symbol, we should focus on making the various differentiation options clear, available, and easy to use. This encourages users to make an informed decision, whereas the current approach encourages them to just use autodiff=true and forget about it.

If you wanted, Optim could even define it's own differentiation wrappers around ForwardDiff, Calculus, ReverseDiff, etc. to display a more easy, uniform API to Optim users. I just don't think it should be relegated to be a single keyword argument, or even a set of keyword arguments.

@johnmyleswhite
Copy link
Contributor

I agree with @jrevels on this one. We did this because it was automatic and made things easier for naive users (especially because the finite differentation stuff in Calculus.jl was laughably under-documented), but we should just educate them more.

@mlubin
Copy link
Contributor

mlubin commented Feb 2, 2017

That's fine with me so long as finite differences is not exposed as an easier option than forward mode.

@johnmyleswhite
Copy link
Contributor

That's fine with me so long as finite differences is not exposed as an easier option than forward mode.

I'd suggest that we try to make them comparably easy. To my knowledge, AD still doesn't play nice with Distributions.jl whenever it calls into RMath and that's going to continue to block using AD for some statistical work until all of Distributions.jl is pure Julia.

@mlubin
Copy link
Contributor

mlubin commented Feb 3, 2017

I'd suggest that we try to make them comparably easy.

No objection there, I just don't want a case where the user provides only f and by default it would use finite differences, as it is now I believe.

@ChrisRackauckas
Copy link
Contributor

No objection there, I just don't want a case where the user provides only f and by default it would use finite differences, as it is now I believe.

When the objective function has anything inside of it which is not a pure Julia function. Lots of things like linear algebra (*, ) can show up in a general objective function, and forward mode autodifferentiation won't work in those cases. So yes, there are many cases where if autodiff=true is the default, the user will just get an error (which if someone is not familiar with autodifferentiation, it may not be immediately obvious that it's the problem). That's why I would recommend not having it be the default.

@pkofod
Copy link
Member Author

pkofod commented Feb 3, 2017

Okay... it's a minor detail, but one of the points was to have finite differences actually count fcalls correctly. But, this can just be seen as a documentation issue. I was actually beginning to think that it was wrong to count fcalls and gcalls when a gcall was from a finite difference gradient. Better to just document what a gcall counter represents there.

I don't like an AD default for the reasons Chris mentions. Functions passed to optim often have all sorts of distributions and blas calls in them. We can discuss if it's appropriate to use in all cases, but it is the only "Just Works" option we have imo.

Then I'd rather have it completely opt in as @jrevels suggests.

@anriseth anriseth mentioned this issue Feb 3, 2017
1 task
@anriseth
Copy link
Contributor

anriseth commented Feb 3, 2017

Okay... it's a minor detail, but one of the points was to have finite differences actually count fcalls correctly. But, this can just be seen as a documentation issue. I was actually beginning to think that it was wrong to count fcalls and gcalls when a gcall was from a finite difference gradient. Better to just document what a gcall counter represents there

What are you proposing, specifically? With FD, the ideal situation IMHO would be to have gcalls count each call to gradient!, and have fcall record each call of f that FD approximation is taking. This gives more information about where the f-calls are coming from (gradient approximation vs. other uses of f).

@pkofod
Copy link
Member Author

pkofod commented Feb 3, 2017

What are you proposing, specifically? With FD, the ideal situation IMHO would be to have gcalls count each call to gradient!, and have fcall record each call of f that FD approximation is taking. This gives more information about where the f-calls are coming from (gradient approximation vs. other uses of f).

Well, in #337 I've done the following:

If g! is constructed using finite differences, then a call to g! will increment f_calls twice (central differencing) and g_calls once. However, I don't think that is as transparent as I had initially thought it would. If you want the "pure" f_calls (ie calls to f) then you have to say f_calls-2*g_calls.

Why not just record an f call as an f_call and a g! call as a g_call (as we actually currently do)? I think it's much more transparent that an f_call increment represents a call to value! and a g_call represents a grad! call. If we make "automatic differentiation" (including finite differencing) opt-in, then it's simply a matter of telling the users: if you use finite differences then remember what a g_call represents.

@anriseth
Copy link
Contributor

anriseth commented Feb 3, 2017 via email

@pkofod
Copy link
Member Author

pkofod commented Feb 3, 2017

Okay. That is probably a better approach, as we cannot really know how the g! is calculated in general, whilst the user should have that information.

That is what I've come to think based on my work on the other PR and this discussion, yes.

@jrevels
Copy link
Contributor

jrevels commented Feb 3, 2017

Lots of things like linear algebra (*, ) can show up in a general objective function, and forward mode autodifferentiation won't work in those cases.

Just FYI, ForwardDiff should work for most functions that have a generically typed, pure Julia implementation (including a lot of the linear algebra functions like *).

On the g_call/f_call discussion, I'm woefully out of my depth, but I'm assuming you're just counting the number of calls Optim makes to the oracle functions provided by the user? Why not just keep track of that by closing over some counters before calling the function? That way, you can (for example) count calls only to the objective directly via Optim and not accidentally count objective calls that happen inside of the derivative functions (i.e. all the calls would be properly separated).

@ChrisRackauckas
Copy link
Contributor

Just FYI, ForwardDiff should work for most functions that have a generically typed, pure Julia implementation (including a lot of the linear algebra functions like *).

That's still pretty limited. A ForwardDiff default would still error when using most of the direct linear solvers in Base, most things with sparse matrices, FFTs, matrix norms (unless you add the package GenericSVD.jl), Sundials/ODEInterface solvers, etc. inside of objective functions. If you try to use another optimization call, or most simulation packages (I know DifferentialEquations.jl's OrdinaryDiffEq.jl works, but that took a lot of work), or other nontrivial functionality inside of an objective function, it will error. In theory non-strict typing fixes all of this, but in practice not every control systems library has been written to be type-generic enough to handle this (and it's actually not too easy in some cases because "mixing" can occur in large algorithms).

If the errors were manged better ("Function call is incompatible with autodifferentiation. Please try with autodiff=false") then maybe it would be just fine as a default. However, instead the default error you will get in any case like this is a conversion error, usually saying it cannot convert Dual{Float64,N} to Float64. I know what that means, so I know that means "this is incompatible with autodifferentiation", but someone who just comes to Julia and wants to optimize something which has an FFT will be struck with a weird error that's not due to Optim.jl itself, and thus is quite hard to diagnose. Try/catch and throwing a different error message isn't a good idea either because that has quite a bit of overhead (I think?). That's why I prefer autodifferentiation as an opt-in instead of a default, and even better having a very simple flag to opt-in in. As more and more things get native Julia functions which rival or surpass their C/FORTRAN equivalents this won't be an issue, but there are still many places where interop is used (and many times it's hidden) when doing scientific computing.

@johnmyleswhite
Copy link
Contributor

If we make "automatic differentiation" (including finite differencing) opt-in, then it's simply a matter of telling the users: if you use finite differences then remember what a g_call represents.

+1 to this approach.

Just FYI, ForwardDiff should work for most functions that have a generically typed, pure Julia implementation (including a lot of the linear algebra functions like *).

But this is an ill-defined subset of Julia. Try the examples below to see how frustrating it could be to find that ForwardDiff seems to work "at random" when you don't know the internals of Distributions.jl:

using Distributions, ForwardDiff

xs = rand(100)

f = params -> prod([logpdf(Beta(params[1], params[2]), x) for x in xs])
g = params -> prod([logpdf(Normal(params[1], params[2]), x) for x in xs])

ForwardDiff.gradient(f, [0.5, 0.5])
ForwardDiff.gradient(g, [0.5, 0.5])

There's no defaults that will satisfy everyone. Let's not have any defaults rather than adopt a default that wouldn't be useful to people like me.

@mlubin
Copy link
Contributor

mlubin commented Feb 3, 2017

I don't think anyone is arguing for autodifferentiation to be the default. All I'm arguing for is for finite differences not to be a magical option where you can just provide f and forget about it. It should be equally easy to use finite differences and ForwardDiff, and we should force people to make the choice.

@johnmyleswhite
Copy link
Contributor

I think we can all get behind that.

@jrevels
Copy link
Contributor

jrevels commented Feb 3, 2017

Yes, my comment was just correcting the claim that ForwardDiff didn't support generic linear algebra functions like *, not arguing for ForwardDiff to be a default. As I said before, I don't think there should be any defaults.

@ChrisRackauckas
Copy link
Contributor

ChrisRackauckas commented Feb 3, 2017

I don't think anyone is arguing for autodifferentiation to be the default. All I'm arguing for is for finite differences not to be a magical option where you can just provide f and forget about it.

I'm not sure about this. I think the ability to brain-dead throw f at an optimization package and get a sensible result is what is wanted in the vast majority of cases. Low barrier to entry with a high ceiling.

using Optim
rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result = optimize(rosenbrock, zeros(2), BFGS())

It's nice that things like this just work. Of course, there are differentiation errors, and it's not as fast as possible, and all sorts of things, but I think that the extras (which I would include autodifferentiation in) should be opt-in. Or it should be extremely easy to say "just use finite differencing because I really don't care".

@pkofod
Copy link
Member Author

pkofod commented Feb 6, 2017

Would it be stupid to simply maintain a few selected easy constructors? autodiff(f, :finitediff) (potentially unexported) for example?

@jrevels
Copy link
Contributor

jrevels commented Feb 6, 2017

Would it be stupid to simply maintain a few selected easy constructors?

I feel like a performant and useful "easy" API in Optim would be close enough to the existing APIs in the various differentiation packages that it wouldn't be worth it. I think it would be better to show some examples using these packages than to introduce a new API to users.

The "naive" differentiation methods provided by FiniteDiff.jl, ForwardDiff.jl, and ReverseDiff.jl, etc. aren't that much more complex than the current keyword approach, are more transparent, and far more flexible if the user wants to dig in.

Note that these "naive" APIs are also all pretty similar by design (at least, for the *Diff.jl packages). My goal for the future is that these interfaces become further standardized, and achieve better feature parity where it's sensible.

autodiff(f, :finitediff) (potentially unexported) for example

Branching on a Symbol value to decide which differentiation technique to use is probably a bad route performance-wise compared to using dispatch (as long as everything is type-inferred).

@ChrisRackauckas
Copy link
Contributor

Branching on a Symbol value to decide which differentiation technique to use is probably a bad route performance-wise compared to using dispatch (as long as everything is type-inferred).

Only if the optimization problem is quick, like microseconds quick. If it's any nontrivial optimization problem, the few dynamic dispatches that are hit before entering the main function are meaningless to the overall time.

@anriseth
Copy link
Contributor

anriseth commented Feb 6, 2017

I feel like a performant and useful "easy" API in Optim would be close enough to the existing APIs in the various differentiation packages that it wouldn't be worth it. I think it would be better to show some examples using these packages than to introduce a new API to users.

I agree that it is a bit wasteful to create a wrapper around these already simple AD packages. Examples sound easier to maintain, and may help spread the idea of ADs power and simplicity to a larger userbase (although the usefulness of initialising a *Config can be lost on many newcomers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants