-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finite differences and autodiff - bake into *DifferentiableFunction? #329
Comments
We're constructing a more "intelligent" objective function here, and to me it makes sense (and provides more flexibility?) to let the DifferentiableFunction object decide how the objective and derivatives should be approximated. This is not only for derivative approximation, but also approximations of expensive functions, expected values etc. This might go against some algorithms in the literature though, where the approximation method is part of the solver, and potentially "dynamic". |
No, it can turn out to be less flexible since many times the reason to turn off autodifferentiation is because you know that your function cannot handle Julia's type system (say, your objective function is the loss between data and the solution to an ODE via Sundials). If the system tries to be too smart, it will error, meaning you'd have to place it in a try/catch block which can hinder performance. So it's best left as a boolean flag. |
So you'd vote for something like |
That's exactly how I'm doing it in DiffEq-land. (Note that Calculus is moving the finite difference code to https://github.com/johnmyleswhite/FiniteDiff.jl , but the same sentiment applies). I don't think a dispatch system is needed here because a user can just provide a Jacobian etc. if they want. |
The recent ReverseDiff package is technically superior in many cases to ForwardDiff for computing gradients, but I wouldn't be comfortable making the choice automatic. So it wouldn't be unreasonable to have multiple options to |
And DiffEq-landia only tends to calculate Jacobians and not gradients, so that is a good point that for optimization routines you may want to use ReverseDiff instead, or at least give the option. |
Keyword with symbols it is then. If we want users to be able to change parameters in the respective approaches, we could create types with the relevant fields at a later point. Symbols for now. |
I wish I had gotten around to posting this before folks started to do some work in this area - apologies for my slow response. This probably won't be a popuIar opinion, but I think the Thus, my vote is for the If you wanted, Optim could even define it's own differentiation wrappers around ForwardDiff, Calculus, ReverseDiff, etc. to display a more easy, uniform API to Optim users. I just don't think it should be relegated to be a single keyword argument, or even a set of keyword arguments. |
I agree with @jrevels on this one. We did this because it was automatic and made things easier for naive users (especially because the finite differentation stuff in Calculus.jl was laughably under-documented), but we should just educate them more. |
That's fine with me so long as finite differences is not exposed as an easier option than forward mode. |
I'd suggest that we try to make them comparably easy. To my knowledge, AD still doesn't play nice with Distributions.jl whenever it calls into RMath and that's going to continue to block using AD for some statistical work until all of Distributions.jl is pure Julia. |
No objection there, I just don't want a case where the user provides only |
When the objective function has anything inside of it which is not a pure Julia function. Lots of things like linear algebra (*, ) can show up in a general objective function, and forward mode autodifferentiation won't work in those cases. So yes, there are many cases where if |
Okay... it's a minor detail, but one of the points was to have finite differences actually count fcalls correctly. But, this can just be seen as a documentation issue. I was actually beginning to think that it was wrong to count fcalls and gcalls when a gcall was from a finite difference gradient. Better to just document what a gcall counter represents there. I don't like an AD default for the reasons Chris mentions. Functions passed to optim often have all sorts of distributions and blas calls in them. We can discuss if it's appropriate to use in all cases, but it is the only "Just Works" option we have imo. Then I'd rather have it completely opt in as @jrevels suggests. |
What are you proposing, specifically? With FD, the ideal situation IMHO would be to have gcalls count each call to |
Well, in #337 I've done the following: If Why not just record an |
Okay. That is probably a better approach, as we cannot really know how the
`g!` is calculated in general, whilst the user should have that information.
…On Fri, 3 Feb 2017 at 12:14 Patrick Kofod Mogensen ***@***.***> wrote:
What are you proposing, specifically? With FD, the ideal situation IMHO
would be to have gcalls count each call to gradient!, and have fcall record
each call of f that FD approximation is taking. This gives more information
about where the f-calls are coming from (gradient approximation vs. other
uses of f).
Well, in #337 <#337> I've done
the following:
If g! is constructed using finite differences, then a call to g! will
increment f_calls twice (central differencing) and g_calls once. However,
I don't think that is as transparent as I had initially thought it would.
If you want the "pure" f_calls (ie calls to f) then you have to say
f_calls-2*g_calls. Why not just record an f call as an f_call and a g!
call as a g_call. If we make "automatic differentiation" (including
finite differencing) *opt-in*, then it's simply a matter of telling the
users: if you use finite differences then remember what a g_call
represents.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#329 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFV6Dbx5Hy24jcdS-TOKVQJz0wnyLrQIks5rYxo5gaJpZM4LVq3n>
.
|
That is what I've come to think based on my work on the other PR and this discussion, yes. |
Just FYI, ForwardDiff should work for most functions that have a generically typed, pure Julia implementation (including a lot of the linear algebra functions like On the |
That's still pretty limited. A ForwardDiff default would still error when using most of the direct linear solvers in Base, most things with sparse matrices, FFTs, matrix norms (unless you add the package GenericSVD.jl), Sundials/ODEInterface solvers, etc. inside of objective functions. If you try to use another optimization call, or most simulation packages (I know DifferentialEquations.jl's OrdinaryDiffEq.jl works, but that took a lot of work), or other nontrivial functionality inside of an objective function, it will error. In theory non-strict typing fixes all of this, but in practice not every control systems library has been written to be type-generic enough to handle this (and it's actually not too easy in some cases because "mixing" can occur in large algorithms). If the errors were manged better ("Function call is incompatible with autodifferentiation. Please try with |
+1 to this approach.
But this is an ill-defined subset of Julia. Try the examples below to see how frustrating it could be to find that ForwardDiff seems to work "at random" when you don't know the internals of Distributions.jl:
There's no defaults that will satisfy everyone. Let's not have any defaults rather than adopt a default that wouldn't be useful to people like me. |
I don't think anyone is arguing for autodifferentiation to be the default. All I'm arguing for is for finite differences not to be a magical option where you can just provide |
I think we can all get behind that. |
Yes, my comment was just correcting the claim that ForwardDiff didn't support generic linear algebra functions like |
I'm not sure about this. I think the ability to brain-dead throw using Optim
rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result = optimize(rosenbrock, zeros(2), BFGS()) It's nice that things like this just work. Of course, there are differentiation errors, and it's not as fast as possible, and all sorts of things, but I think that the extras (which I would include autodifferentiation in) should be opt-in. Or it should be extremely easy to say "just use finite differencing because I really don't care". |
Would it be stupid to simply maintain a few selected easy constructors? |
I feel like a performant and useful "easy" API in Optim would be close enough to the existing APIs in the various differentiation packages that it wouldn't be worth it. I think it would be better to show some examples using these packages than to introduce a new API to users. The "naive" differentiation methods provided by FiniteDiff.jl, ForwardDiff.jl, and ReverseDiff.jl, etc. aren't that much more complex than the current keyword approach, are more transparent, and far more flexible if the user wants to dig in. Note that these "naive" APIs are also all pretty similar by design (at least, for the
Branching on a |
Only if the optimization problem is quick, like microseconds quick. If it's any nontrivial optimization problem, the few dynamic dispatches that are hit before entering the main function are meaningless to the overall time. |
I agree that it is a bit wasteful to create a wrapper around these already simple AD packages. Examples sound easier to maintain, and may help spread the idea of ADs power and simplicity to a larger userbase (although the usefulness of initialising a *Config can be lost on many newcomers) |
I was having a first go at fixing a lot of the issues surrounding (coming up with an improved for) *DifferentiableFunctions, and I realized that maybe we want to remove the
autodiff
keyword fromOptim.Options
. Currently you can use finite differences byDifferentiableFunction(f)
, but you have to pass a function and useOptim.Options(..., autodiff=true, ...)
to use ForwardDiff. Wouldn't it make more sense to have something likeDifferentableFunction(f; autodiff = false)
? Going one step further, we might want to make an even more general interface so you could pass functions from any autodiff package (you might need to do some anonymous function magic to conform to the setup).Edit: of course it could just as well go the other way: move the finite difference decision to
Optim.Options
if that is easier for people.The text was updated successfully, but these errors were encountered: