Offer a way for users to access the lower-order values that end up being calculated by FAD methods #37

jrevels · 2015-08-14T20:00:08Z

Previously discussed here.

For example, if somebody calls hessian(f,x) they should be able to tell the function to return ∇f(x) and/or f(x) as well, since those values are already calculated by way of the Hessian calculation.

Here some ideas for what this might look like:

Have a keyword argument that specifies that the function should return the calculated ForwardDiffNumber, then provide methods in the API to extract the data you want from it. Something like this:

julia> data = hessian(f, x, alldata=true) # returns ForwardDiffNumber
julia> gradient(data) # extract gradient from data
julia> hessian(data) # extract hessian from data

I don't really like alldata as a name for the keyword argument, but you get my drift.

Pass symbols to the method saying what you want up front (once again, via a keyword argument):

# returns in the same order as it's given, with Hessian first 
julia> hess, val, grad = hessian(f, x, also=(:value, :gradient))

Thoughts?

The text was updated successfully, but these errors were encountered:

KristofferC · 2015-08-17T12:40:10Z

Just a thought, but one way is to instead of returning the values directly, we give the user a possibility to create a tailor-made function that does what you want.

Some examples of how it would work:

g = makefunction(f, returns=(:value, :gradient))
gives a function that does
value, gradient = g(x)

g = makefunction(f, returns=(:value), mutates=(:gradient)
gives a function that does
value = g(x, gradient)
and mutates the gradient argument

g = makefunction(f, returns=(:value), mutates=(:gradient, :hessian)
gives a function that does
value = g(x, gradient, hessian)
and mutates the gradient and hessian argument.

g = makefunction(f, returns=(:gradient))
gives a function that does
graident = g(x)
which is the same as gradient(f)does now.

This basically provides an API to let the user define exactly what the function should return and what should be mutated. That would also remove the type instabilities that suggestion 2 seem to imply.

One problem, what do you do if the user put :value in the mutable field and the function returns a scalar. Force the user to wrap the scalar in an array?

mlubin · 2015-08-17T17:11:15Z

Since this is an expert API designed to save one function evaluation out of many, I think we should provide the fast mutating versions as a starting point. Everything else can be built on top of that.

jrevels · 2015-08-17T18:18:24Z

Here are my thoughts on the two main strategies proposed so far:

Return the ForwardDiffNumber and provide API methods to work with it

This option would be less work to implement, and it would fit in better with the existing implementation. We wouldn't have to handle as many configuration options just to let users get to the data they need, and it's up to the user to grab what they need from the results themselves.

@mlubin The way mutation would be handled would look like this:

julia> data = hessian(f, x, return_all=true); # still looking for the best keyword...

julia> gradient!(gradout, data); # load gradient from data into gradout

julia> hessian!(hessout, data); # load hessian from data into hessout

julia> gradient(data) == gradout 
 true

julia> hessian(data) == hessout
 true

The API would then be composable in pretty predictable ways:

julia> gradient!(output1, f, x);

julia> gradient!(output2, gradient(f, x, return_all=true));

julia> output1 == output2
true

The biggest downside to this approach is that I don't immediately see a way to support a call like

hessian(f, x, return_all=true, chunk_size=10)

It seems to me that this approach could only be allowed when a chunk_size isn't specified.

Pros

composability > configuration
much easier to implement
might actually make the existing fad_api cleaner/more "separable"

Cons

I don't know how chunk_size could be used in tandem with this

Configure output to be exactly what you want

Riffing off of @KristofferC's idea/type stability concerns, I think the best way to do this would be to pass in a Tuple type instead of an instance:

julia> fad! = forwarddiff(f, returns=Tuple{:value, :gradient, :hessian, :tensor}, mutates=Tuple{:gradient, :hessian});

Then just pass a tuple of the objects you want to mutate to fad! as the first argument:

julia> val, gradout, hessout, tens = fad!((gradout, hessout), x);

One problem, what do you do if the user put :value in the mutable field and the function returns a scalar.

I'd just throw an error.

Pros

It's not impossible to support simultaneous chunk_size configuration (though it might be super tough to get right)

Cons

Implementation of something this general could be really messy, and would be a pretty big change from what's currently on master
Things might get rough for codegen/JIT, as the space of allowable function signatures is huge

KristofferC · 2015-08-17T21:11:17Z

@mlubin Why do you say that it saves one function evaluation out of many? Isn't it true that currently you need two function calls to get the value and the gradient while this could be reduced to one. That will be the case in every iteration so you save half of the evaluations? I am not very familiar with automatic differentiation so I might have misunderstood something,

Suggestion 1) is pretty good. It exposes the DualNumbers directly to the user but since this is an advance feature it is likely no problem and when you have the dual numbers you can do what you want.

Regarding the function generators. I don't think that code gen is a problem because even though there is many different possible signatures, any given user is likely to only use at most a few different signatures. The code wouldn't be generated until the user request the function.

I also don't think the implementation have to be so messy,

mlubin · 2015-08-17T21:15:43Z

@KristofferC, it's a bit tricky to measure exactly, but the complexity of evaluating the gradient increases proportionally with the dimension of the input. So as the dimension of the input increases, the time saved by avoiding the extra function evaluation becomes relatively smaller. The factor of 2 improvement is only really the case for functions of one or two variables, definitely not for functions of 100s of variables.

KristofferC · 2015-08-18T21:27:57Z

@mlubin My use case is structural optimization where you don't need that many variables simultaneously (order of 10), but the function cost is large because you need to solve the assembled sparse system. I believe for these type of applications, avoiding extra function calls are quite significant.

jrevels · 2015-08-18T22:21:29Z

So as the dimension of the input increases, the time saved by avoiding the extra function evaluation becomes relatively smaller. The factor of 2 improvement is only really the case for functions of one or two variables, definitely not for functions of 100s of variables.

My use case is structural optimization where you don't need that many variables simultaneously (order of 10), but the function cost is large because you need to solve the assembled sparse system. I believe for these type of applications, avoiding extra function calls are quite significant.

With both of these in mind, it seems to me that the problems for which this feature would be really useful are, in general, going to be the problems in which chunk-mode is not useful. If you're specifying a chunk_size, then you are specifying that you want to trade a greater number of function calls for less memory usage. Otherwise, the current implementation assumes memory can be freely spent to minimize the number of function calls.

If that's true, then that makes the single con of option 1 (inability to simultaneously specify chunk_size) a bit less scary. However, I'm sure there are many cases in which you'd want to find a middle ground (e.g. when your chunk_size is only off by a small factor of your input dimension).

I think next week I'm going to get ForwardDiff.jl back on readthedocs and have a comprehensive section on what chunk_size actually does, so that users can realize upfront the tradeoff @mlubin is talking about.

I don't think that code gen is a problem because even though there is many different possible signatures, any given user is likely to only use at most a few different signatures. The code wouldn't be generated until the user request the function.

JIT overhead and poor codegen are two related, but distinct issues. JIT overhead is (as you say), the compilation cost incurred from the initial call of a method with a specific signature. "Poor codegen", however, ultimately means "the LLVM bitcode compiled from this Julia expression isn't as performant as it could be", which is tied to the compiler's ability to prove assumptions about the behavior of the code (the most prominent example of this is type inferencing). My intuition is that the more we offload "configuration-like" things onto the type system (which ForwardDiff already does a lot of through @generated functions), the more difficult our code could be for Julia's codegen. That intuition is at least partly fueled by ignorance, though, since I've not dug deep into the codegen stuff - a benchmark against an actual implementation would always be best...

jrevels · 2015-08-21T14:11:21Z

I just realized that chunk-mode can fairly easily be supported in conjunction with strategy 1 by loading the chunks into a result ForwardDiffNumber instead of loading them into result arrays. With that problem conceptually out of the way, I'm going to go for strategy 1 (at least for an initial implementation).

jrevels added the enhancement label Aug 14, 2015

jrevels mentioned this issue Aug 17, 2015

Allowing additional arguments to function being differentiated #32

Closed

jrevels mentioned this issue Sep 1, 2015

Allow access to lower-order values by providing a ForwardDiffResults interface #40

Merged

6 tasks

jrevels closed this as completed in #40 Sep 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offer a way for users to access the lower-order values that end up being calculated by FAD methods #37

Offer a way for users to access the lower-order values that end up being calculated by FAD methods #37

jrevels commented Aug 14, 2015

KristofferC commented Aug 17, 2015

mlubin commented Aug 17, 2015

jrevels commented Aug 17, 2015

KristofferC commented Aug 17, 2015

mlubin commented Aug 17, 2015

KristofferC commented Aug 18, 2015

jrevels commented Aug 18, 2015

jrevels commented Aug 21, 2015

Offer a way for users to access the lower-order values that end up being calculated by FAD methods #37

Offer a way for users to access the lower-order values that end up being calculated by FAD methods #37

Comments

jrevels commented Aug 14, 2015

KristofferC commented Aug 17, 2015

mlubin commented Aug 17, 2015

jrevels commented Aug 17, 2015

KristofferC commented Aug 17, 2015

mlubin commented Aug 17, 2015

KristofferC commented Aug 18, 2015

jrevels commented Aug 18, 2015

jrevels commented Aug 21, 2015