Finite differences and autodiff - bake into *DifferentiableFunction? #329

pkofod · 2016-12-26T09:14:56Z

I was having a first go at fixing a lot of the issues surrounding (coming up with an improved for) *DifferentiableFunctions, and I realized that maybe we want to remove the autodiff keyword from Optim.Options. Currently you can use finite differences by DifferentiableFunction(f), but you have to pass a function and use Optim.Options(..., autodiff=true, ...) to use ForwardDiff. Wouldn't it make more sense to have something like DifferentableFunction(f; autodiff = false)? Going one step further, we might want to make an even more general interface so you could pass functions from any autodiff package (you might need to do some anonymous function magic to conform to the setup).

Edit: of course it could just as well go the other way: move the finite difference decision to Optim.Options if that is easier for people.

The text was updated successfully, but these errors were encountered:

anriseth · 2016-12-26T09:42:45Z

We're constructing a more "intelligent" objective function here, and to me it makes sense (and provides more flexibility?) to let the DifferentiableFunction object decide how the objective and derivatives should be approximated. This is not only for derivative approximation, but also approximations of expensive functions, expected values etc. This might go against some algorithms in the literature though, where the approximation method is part of the solver, and potentially "dynamic".

ChrisRackauckas · 2017-01-07T02:30:48Z

We're constructing a more "intelligent" objective function here, and to me it makes sense (and provides more flexibility?) to let the DifferentiableFunction object decide how the objective and derivatives should be approximated.

No, it can turn out to be less flexible since many times the reason to turn off autodifferentiation is because you know that your function cannot handle Julia's type system (say, your objective function is the loss between data and the solution to an ODE via Sundials). If the system tries to be too smart, it will error, meaning you'd have to place it in a try/catch block which can hinder performance. So it's best left as a boolean flag.

pkofod · 2017-01-07T18:20:16Z

No, it can turn out to be less flexible since many times the reason to turn off autodifferentiation is because you know that your function cannot handle Julia's type system (say, your objective function is the loss between data and the solution to an ODE via Sundials). If the system tries to be too smart, it will error, meaning you'd have to place it in a try/catch block which can hinder performance. So it's best left as a boolean flag.

So you'd vote for something like autodiff = false? Right now we only allow for either finite differences through Calculus or automatic differentiation ForwardDiff. To me those are quite standard for what they do, and if someone wants to use something else, I think it makes sense to say: look, we provide these convenience keywords, but if you want something else, by all means construct your gradient that way, and pass it yourself. Beyond that we would either have to manage a lot of symbols, or create a system based on dispatch (could be done).

ChrisRackauckas · 2017-01-07T18:24:31Z

Right now we only allow for either finite differences through Calculus or automatic differentiation ForwardDiff. To me those are quite standard for what they do, and if someone wants to use something else, I think it makes sense to say: look, we provide these convenience keywords, but if you want something else, by all means construct your gradient that way, and pass it yourself. Beyond that we would either have to manage a lot of symbols, or create a system based on dispatch (could be done).

That's exactly how I'm doing it in DiffEq-land. (Note that Calculus is moving the finite difference code to https://github.com/johnmyleswhite/FiniteDiff.jl , but the same sentiment applies). I don't think a dispatch system is needed here because a user can just provide a Jacobian etc. if they want.

mlubin · 2017-01-07T18:25:04Z

The recent ReverseDiff package is technically superior in many cases to ForwardDiff for computing gradients, but I wouldn't be comfortable making the choice automatic. So it wouldn't be unreasonable to have multiple options to autodiff= in the near future.
CC @jrevels

ChrisRackauckas · 2017-01-07T18:26:28Z

The recent ReverseDiff package is technically superior in many cases to ForwardDiff for computing gradients

And DiffEq-landia only tends to calculate Jacobians and not gradients, so that is a good point that for optimization routines you may want to use ReverseDiff instead, or at least give the option.

pkofod · 2017-01-07T18:28:30Z

Keyword with symbols it is then. If we want users to be able to change parameters in the respective approaches, we could create types with the relevant fields at a later point. Symbols for now.

jrevels · 2017-02-02T23:45:14Z

I wish I had gotten around to posting this before folks started to do some work in this area - apologies for my slow response.

This probably won't be a popuIar opinion, but I think the autodiff-keyword style API is fundamentally the wrong way to go for packages which differentiate user-provided functions. It implies the callee has more knowledge than the caller about the objective function, which obviously isn't the case.

Thus, my vote is for the autodiff-keyword API to be totally deprecated in favor of the existing "pass us your derivative functions" API. Instead of telling users to type autodiff=:symbol, we should focus on making the various differentiation options clear, available, and easy to use. This encourages users to make an informed decision, whereas the current approach encourages them to just use autodiff=true and forget about it.

If you wanted, Optim could even define it's own differentiation wrappers around ForwardDiff, Calculus, ReverseDiff, etc. to display a more easy, uniform API to Optim users. I just don't think it should be relegated to be a single keyword argument, or even a set of keyword arguments.

johnmyleswhite · 2017-02-02T23:47:55Z

I agree with @jrevels on this one. We did this because it was automatic and made things easier for naive users (especially because the finite differentation stuff in Calculus.jl was laughably under-documented), but we should just educate them more.

mlubin · 2017-02-02T23:54:00Z

That's fine with me so long as finite differences is not exposed as an easier option than forward mode.

johnmyleswhite · 2017-02-03T00:06:13Z

That's fine with me so long as finite differences is not exposed as an easier option than forward mode.

I'd suggest that we try to make them comparably easy. To my knowledge, AD still doesn't play nice with Distributions.jl whenever it calls into RMath and that's going to continue to block using AD for some statistical work until all of Distributions.jl is pure Julia.

mlubin · 2017-02-03T00:25:20Z

I'd suggest that we try to make them comparably easy.

No objection there, I just don't want a case where the user provides only f and by default it would use finite differences, as it is now I believe.

ChrisRackauckas · 2017-02-03T01:35:03Z

No objection there, I just don't want a case where the user provides only f and by default it would use finite differences, as it is now I believe.

When the objective function has anything inside of it which is not a pure Julia function. Lots of things like linear algebra (*, ) can show up in a general objective function, and forward mode autodifferentiation won't work in those cases. So yes, there are many cases where if autodiff=true is the default, the user will just get an error (which if someone is not familiar with autodifferentiation, it may not be immediately obvious that it's the problem). That's why I would recommend not having it be the default.

pkofod · 2017-02-03T06:58:17Z

Okay... it's a minor detail, but one of the points was to have finite differences actually count fcalls correctly. But, this can just be seen as a documentation issue. I was actually beginning to think that it was wrong to count fcalls and gcalls when a gcall was from a finite difference gradient. Better to just document what a gcall counter represents there.

I don't like an AD default for the reasons Chris mentions. Functions passed to optim often have all sorts of distributions and blas calls in them. We can discuss if it's appropriate to use in all cases, but it is the only "Just Works" option we have imo.

Then I'd rather have it completely opt in as @jrevels suggests.

anriseth · 2017-02-03T12:09:01Z

Okay... it's a minor detail, but one of the points was to have finite differences actually count fcalls correctly. But, this can just be seen as a documentation issue. I was actually beginning to think that it was wrong to count fcalls and gcalls when a gcall was from a finite difference gradient. Better to just document what a gcall counter represents there

What are you proposing, specifically? With FD, the ideal situation IMHO would be to have gcalls count each call to gradient!, and have fcall record each call of f that FD approximation is taking. This gives more information about where the f-calls are coming from (gradient approximation vs. other uses of f).

pkofod · 2017-02-03T12:14:48Z

What are you proposing, specifically? With FD, the ideal situation IMHO would be to have gcalls count each call to gradient!, and have fcall record each call of f that FD approximation is taking. This gives more information about where the f-calls are coming from (gradient approximation vs. other uses of f).

Well, in #337 I've done the following:

If g! is constructed using finite differences, then a call to g! will increment f_calls twice (central differencing) and g_calls once. However, I don't think that is as transparent as I had initially thought it would. If you want the "pure" f_calls (ie calls to f) then you have to say f_calls-2*g_calls.

Why not just record an f call as an f_call and a g! call as a g_call (as we actually currently do)? I think it's much more transparent that an f_call increment represents a call to value! and a g_call represents a grad! call. If we make "automatic differentiation" (including finite differencing) opt-in, then it's simply a matter of telling the users: if you use finite differences then remember what a g_call represents.

anriseth · 2017-02-03T12:19:47Z

Okay. That is probably a better approach, as we cannot really know how the `g!` is calculated in general, whilst the user should have that information.

…

On Fri, 3 Feb 2017 at 12:14 Patrick Kofod Mogensen ***@***.***> wrote: What are you proposing, specifically? With FD, the ideal situation IMHO would be to have gcalls count each call to gradient!, and have fcall record each call of f that FD approximation is taking. This gives more information about where the f-calls are coming from (gradient approximation vs. other uses of f). Well, in #337 <#337> I've done the following: If g! is constructed using finite differences, then a call to g! will increment f_calls twice (central differencing) and g_calls once. However, I don't think that is as transparent as I had initially thought it would. If you want the "pure" f_calls (ie calls to f) then you have to say f_calls-2*g_calls. Why not just record an f call as an f_call and a g! call as a g_call. If we make "automatic differentiation" (including finite differencing) *opt-in*, then it's simply a matter of telling the users: if you use finite differences then remember what a g_call represents. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#329 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFV6Dbx5Hy24jcdS-TOKVQJz0wnyLrQIks5rYxo5gaJpZM4LVq3n> .

pkofod · 2017-02-03T12:29:11Z

Okay. That is probably a better approach, as we cannot really know how the g! is calculated in general, whilst the user should have that information.

That is what I've come to think based on my work on the other PR and this discussion, yes.

jrevels · 2017-02-03T16:05:08Z

Lots of things like linear algebra (*, ) can show up in a general objective function, and forward mode autodifferentiation won't work in those cases.

Just FYI, ForwardDiff should work for most functions that have a generically typed, pure Julia implementation (including a lot of the linear algebra functions like *).

On the g_call/f_call discussion, I'm woefully out of my depth, but I'm assuming you're just counting the number of calls Optim makes to the oracle functions provided by the user? Why not just keep track of that by closing over some counters before calling the function? That way, you can (for example) count calls only to the objective directly via Optim and not accidentally count objective calls that happen inside of the derivative functions (i.e. all the calls would be properly separated).

ChrisRackauckas · 2017-02-03T16:29:52Z

Just FYI, ForwardDiff should work for most functions that have a generically typed, pure Julia implementation (including a lot of the linear algebra functions like *).

That's still pretty limited. A ForwardDiff default would still error when using most of the direct linear solvers in Base, most things with sparse matrices, FFTs, matrix norms (unless you add the package GenericSVD.jl), Sundials/ODEInterface solvers, etc. inside of objective functions. If you try to use another optimization call, or most simulation packages (I know DifferentialEquations.jl's OrdinaryDiffEq.jl works, but that took a lot of work), or other nontrivial functionality inside of an objective function, it will error. In theory non-strict typing fixes all of this, but in practice not every control systems library has been written to be type-generic enough to handle this (and it's actually not too easy in some cases because "mixing" can occur in large algorithms).

If the errors were manged better ("Function call is incompatible with autodifferentiation. Please try with autodiff=false") then maybe it would be just fine as a default. However, instead the default error you will get in any case like this is a conversion error, usually saying it cannot convert Dual{Float64,N} to Float64. I know what that means, so I know that means "this is incompatible with autodifferentiation", but someone who just comes to Julia and wants to optimize something which has an FFT will be struck with a weird error that's not due to Optim.jl itself, and thus is quite hard to diagnose. Try/catch and throwing a different error message isn't a good idea either because that has quite a bit of overhead (I think?). That's why I prefer autodifferentiation as an opt-in instead of a default, and even better having a very simple flag to opt-in in. As more and more things get native Julia functions which rival or surpass their C/FORTRAN equivalents this won't be an issue, but there are still many places where interop is used (and many times it's hidden) when doing scientific computing.

johnmyleswhite · 2017-02-03T16:33:17Z

If we make "automatic differentiation" (including finite differencing) opt-in, then it's simply a matter of telling the users: if you use finite differences then remember what a g_call represents.

+1 to this approach.

Just FYI, ForwardDiff should work for most functions that have a generically typed, pure Julia implementation (including a lot of the linear algebra functions like *).

But this is an ill-defined subset of Julia. Try the examples below to see how frustrating it could be to find that ForwardDiff seems to work "at random" when you don't know the internals of Distributions.jl:

using Distributions, ForwardDiff

xs = rand(100)

f = params -> prod([logpdf(Beta(params[1], params[2]), x) for x in xs])
g = params -> prod([logpdf(Normal(params[1], params[2]), x) for x in xs])

ForwardDiff.gradient(f, [0.5, 0.5])
ForwardDiff.gradient(g, [0.5, 0.5])

There's no defaults that will satisfy everyone. Let's not have any defaults rather than adopt a default that wouldn't be useful to people like me.

mlubin · 2017-02-03T16:34:12Z

I don't think anyone is arguing for autodifferentiation to be the default. All I'm arguing for is for finite differences not to be a magical option where you can just provide f and forget about it. It should be equally easy to use finite differences and ForwardDiff, and we should force people to make the choice.

johnmyleswhite · 2017-02-03T16:38:34Z

I think we can all get behind that.

jrevels · 2017-02-03T16:38:41Z

Yes, my comment was just correcting the claim that ForwardDiff didn't support generic linear algebra functions like *, not arguing for ForwardDiff to be a default. As I said before, I don't think there should be any defaults.

ChrisRackauckas · 2017-02-03T16:39:36Z

I don't think anyone is arguing for autodifferentiation to be the default. All I'm arguing for is for finite differences not to be a magical option where you can just provide f and forget about it.

I'm not sure about this. I think the ability to brain-dead throw f at an optimization package and get a sensible result is what is wanted in the vast majority of cases. Low barrier to entry with a high ceiling.

using Optim
rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result = optimize(rosenbrock, zeros(2), BFGS())

It's nice that things like this just work. Of course, there are differentiation errors, and it's not as fast as possible, and all sorts of things, but I think that the extras (which I would include autodifferentiation in) should be opt-in. Or it should be extremely easy to say "just use finite differencing because I really don't care".

pkofod · 2017-02-06T18:44:44Z

Would it be stupid to simply maintain a few selected easy constructors? autodiff(f, :finitediff) (potentially unexported) for example?

jrevels · 2017-02-06T20:03:53Z

Would it be stupid to simply maintain a few selected easy constructors?

I feel like a performant and useful "easy" API in Optim would be close enough to the existing APIs in the various differentiation packages that it wouldn't be worth it. I think it would be better to show some examples using these packages than to introduce a new API to users.

The "naive" differentiation methods provided by FiniteDiff.jl, ForwardDiff.jl, and ReverseDiff.jl, etc. aren't that much more complex than the current keyword approach, are more transparent, and far more flexible if the user wants to dig in.

Note that these "naive" APIs are also all pretty similar by design (at least, for the *Diff.jl packages). My goal for the future is that these interfaces become further standardized, and achieve better feature parity where it's sensible.

autodiff(f, :finitediff) (potentially unexported) for example

Branching on a Symbol value to decide which differentiation technique to use is probably a bad route performance-wise compared to using dispatch (as long as everything is type-inferred).

ChrisRackauckas · 2017-02-06T20:06:16Z

Branching on a Symbol value to decide which differentiation technique to use is probably a bad route performance-wise compared to using dispatch (as long as everything is type-inferred).

Only if the optimization problem is quick, like microseconds quick. If it's any nontrivial optimization problem, the few dynamic dispatches that are hit before entering the main function are meaningless to the overall time.

anriseth · 2017-02-06T20:27:01Z

I feel like a performant and useful "easy" API in Optim would be close enough to the existing APIs in the various differentiation packages that it wouldn't be worth it. I think it would be better to show some examples using these packages than to introduce a new API to users.

I agree that it is a bit wasteful to create a wrapper around these already simple AD packages. Examples sound easier to maintain, and may help spread the idea of ADs power and simplicity to a larger userbase (although the usefulness of initialising a *Config can be lost on many newcomers)

pkofod self-assigned this Jan 21, 2017

pkofod mentioned this issue Jan 21, 2017

[RFC/WIP] Rework *DifferentiableFunction #337

Merged

11 tasks

anriseth mentioned this issue Feb 3, 2017

RFC: Support ReverseDiff #357

Closed

1 task

anriseth mentioned this issue Feb 9, 2017

Breaking changes in Optim v0.8 STOR-i/GaussianProcesses.jl#46

Closed

pkofod closed this as completed in #337 Mar 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finite differences and autodiff - bake into *DifferentiableFunction? #329

Finite differences and autodiff - bake into *DifferentiableFunction? #329

pkofod commented Dec 26, 2016 •

edited

Loading

anriseth commented Dec 26, 2016

ChrisRackauckas commented Jan 7, 2017

pkofod commented Jan 7, 2017

ChrisRackauckas commented Jan 7, 2017 •

edited

Loading

mlubin commented Jan 7, 2017

ChrisRackauckas commented Jan 7, 2017

pkofod commented Jan 7, 2017

jrevels commented Feb 2, 2017

johnmyleswhite commented Feb 2, 2017

mlubin commented Feb 2, 2017

johnmyleswhite commented Feb 3, 2017

mlubin commented Feb 3, 2017

ChrisRackauckas commented Feb 3, 2017

pkofod commented Feb 3, 2017 •

edited

Loading

anriseth commented Feb 3, 2017 •

edited

Loading

pkofod commented Feb 3, 2017 •

edited

Loading

anriseth commented Feb 3, 2017 via email

pkofod commented Feb 3, 2017

jrevels commented Feb 3, 2017

ChrisRackauckas commented Feb 3, 2017

johnmyleswhite commented Feb 3, 2017

mlubin commented Feb 3, 2017

johnmyleswhite commented Feb 3, 2017

jrevels commented Feb 3, 2017 •

edited

Loading

ChrisRackauckas commented Feb 3, 2017 •

edited

Loading

pkofod commented Feb 6, 2017

jrevels commented Feb 6, 2017

ChrisRackauckas commented Feb 6, 2017

anriseth commented Feb 6, 2017

Finite differences and autodiff - bake into *DifferentiableFunction? #329

Finite differences and autodiff - bake into *DifferentiableFunction? #329

Comments

pkofod commented Dec 26, 2016 • edited Loading

anriseth commented Dec 26, 2016

ChrisRackauckas commented Jan 7, 2017

pkofod commented Jan 7, 2017

ChrisRackauckas commented Jan 7, 2017 • edited Loading

mlubin commented Jan 7, 2017

ChrisRackauckas commented Jan 7, 2017

pkofod commented Jan 7, 2017

jrevels commented Feb 2, 2017

johnmyleswhite commented Feb 2, 2017

mlubin commented Feb 2, 2017

johnmyleswhite commented Feb 3, 2017

mlubin commented Feb 3, 2017

ChrisRackauckas commented Feb 3, 2017

pkofod commented Feb 3, 2017 • edited Loading

anriseth commented Feb 3, 2017 • edited Loading

pkofod commented Feb 3, 2017 • edited Loading

anriseth commented Feb 3, 2017 via email

pkofod commented Feb 3, 2017

jrevels commented Feb 3, 2017

ChrisRackauckas commented Feb 3, 2017

johnmyleswhite commented Feb 3, 2017

mlubin commented Feb 3, 2017

johnmyleswhite commented Feb 3, 2017

jrevels commented Feb 3, 2017 • edited Loading

ChrisRackauckas commented Feb 3, 2017 • edited Loading

pkofod commented Feb 6, 2017

jrevels commented Feb 6, 2017

ChrisRackauckas commented Feb 6, 2017

anriseth commented Feb 6, 2017

pkofod commented Dec 26, 2016 •

edited

Loading

ChrisRackauckas commented Jan 7, 2017 •

edited

Loading

pkofod commented Feb 3, 2017 •

edited

Loading

anriseth commented Feb 3, 2017 •

edited

Loading

pkofod commented Feb 3, 2017 •

edited

Loading

jrevels commented Feb 3, 2017 •

edited

Loading

ChrisRackauckas commented Feb 3, 2017 •

edited

Loading