-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offer a way for users to access the lower-order values that end up being calculated by FAD methods #37
Comments
Just a thought, but one way is to instead of returning the values directly, we give the user a possibility to create a tailor-made function that does what you want. Some examples of how it would work:
This basically provides an API to let the user define exactly what the function should return and what should be mutated. That would also remove the type instabilities that suggestion 2 seem to imply. One problem, what do you do if the user put |
Since this is an expert API designed to save one function evaluation out of many, I think we should provide the fast mutating versions as a starting point. Everything else can be built on top of that. |
Here are my thoughts on the two main strategies proposed so far:
This option would be less work to implement, and it would fit in better with the existing implementation. We wouldn't have to handle as many configuration options just to let users get to the data they need, and it's up to the user to grab what they need from the results themselves. @mlubin The way mutation would be handled would look like this: julia> data = hessian(f, x, return_all=true); # still looking for the best keyword...
julia> gradient!(gradout, data); # load gradient from data into gradout
julia> hessian!(hessout, data); # load hessian from data into hessout
julia> gradient(data) == gradout
true
julia> hessian(data) == hessout
true The API would then be composable in pretty predictable ways: julia> gradient!(output1, f, x);
julia> gradient!(output2, gradient(f, x, return_all=true));
julia> output1 == output2
true The biggest downside to this approach is that I don't immediately see a way to support a call like hessian(f, x, return_all=true, chunk_size=10) It seems to me that this approach could only be allowed when a Pros
Cons
Riffing off of @KristofferC's idea/type stability concerns, I think the best way to do this would be to pass in a julia> fad! = forwarddiff(f, returns=Tuple{:value, :gradient, :hessian, :tensor}, mutates=Tuple{:gradient, :hessian}); Then just pass a tuple of the objects you want to mutate to julia> val, gradout, hessout, tens = fad!((gradout, hessout), x);
I'd just throw an error. Pros
Cons
|
@mlubin Why do you say that it saves one function evaluation out of many? Isn't it true that currently you need two function calls to get the value and the gradient while this could be reduced to one. That will be the case in every iteration so you save half of the evaluations? I am not very familiar with automatic differentiation so I might have misunderstood something, Suggestion 1) is pretty good. It exposes the DualNumbers directly to the user but since this is an advance feature it is likely no problem and when you have the dual numbers you can do what you want. Regarding the function generators. I don't think that code gen is a problem because even though there is many different possible signatures, any given user is likely to only use at most a few different signatures. The code wouldn't be generated until the user request the function. I also don't think the implementation have to be so messy, |
@KristofferC, it's a bit tricky to measure exactly, but the complexity of evaluating the gradient increases proportionally with the dimension of the input. So as the dimension of the input increases, the time saved by avoiding the extra function evaluation becomes relatively smaller. The factor of 2 improvement is only really the case for functions of one or two variables, definitely not for functions of 100s of variables. |
@mlubin My use case is structural optimization where you don't need that many variables simultaneously (order of 10), but the function cost is large because you need to solve the assembled sparse system. I believe for these type of applications, avoiding extra function calls are quite significant. |
With both of these in mind, it seems to me that the problems for which this feature would be really useful are, in general, going to be the problems in which chunk-mode is not useful. If you're specifying a If that's true, then that makes the single con of option 1 (inability to simultaneously specify I think next week I'm going to get ForwardDiff.jl back on readthedocs and have a comprehensive section on what
JIT overhead and poor codegen are two related, but distinct issues. JIT overhead is (as you say), the compilation cost incurred from the initial call of a method with a specific signature. "Poor codegen", however, ultimately means "the LLVM bitcode compiled from this Julia expression isn't as performant as it could be", which is tied to the compiler's ability to prove assumptions about the behavior of the code (the most prominent example of this is type inferencing). My intuition is that the more we offload "configuration-like" things onto the type system (which ForwardDiff already does a lot of through |
I just realized that chunk-mode can fairly easily be supported in conjunction with strategy 1 by loading the chunks into a result |
Previously discussed here.
For example, if somebody calls
hessian(f,x)
they should be able to tell the function to return∇f(x)
and/orf(x)
as well, since those values are already calculated by way of the Hessian calculation.Here some ideas for what this might look like:
ForwardDiffNumber
, then provide methods in the API to extract the data you want from it. Something like this:I don't really like
alldata
as a name for the keyword argument, but you get my drift.Thoughts?
The text was updated successfully, but these errors were encountered: