Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing additional arguments to function being differentiated #32

Closed
gragusa opened this issue Aug 5, 2015 · 21 comments
Closed

Allowing additional arguments to function being differentiated #32

gragusa opened this issue Aug 5, 2015 · 21 comments

Comments

@gragusa
Copy link
Contributor

gragusa commented Aug 5, 2015

I have being using ForwardDiff.jl for a while. My use case is a little bit different as I need to differentiate f(x, args...) only with respect to x. Of source, in typical usage this can be accomplished by closure. By in some instances it is not possible.

I locally extended the api to allow args... to be passed down down (see, e.g., here).

I have been following the work on #27 (which is great by the way) and I was wondering whether either such extension would be possible or whether the new API will allow additional arguments.

@mlubin
Copy link
Contributor

mlubin commented Aug 5, 2015

In which cases is it not possible to achieve this behavior by using a closure?

@gragusa
Copy link
Contributor Author

gragusa commented Aug 5, 2015

I am using it inside an optimization problem (through MathProgBase), The
user passes a function which is used to set up the constraint. I use
ForwardDiff.jl to automatically get the Jacobian and Hessian of the
constraints where jacobian and hessian depends on other variables. I cannot
use closure because construct the Jacobian and Hessian function outside the
optimization.

I could construct fad_jacobian and fad_hessian inside by creating a
closure inside eval_jac_g, but this would kill performances.

Giuseppe Ragusa

On Wed, Aug 5, 2015 at 4:05 PM, Miles Lubin notifications@github.com
wrote:

In which cases is it not possible to achieve this behavior by using a
closure?


Reply to this email directly or view it on GitHub
#32 (comment)
.

@jrevels
Copy link
Member

jrevels commented Aug 5, 2015

I might be misunderstanding, but does it help that the new API allows one to easily take derivatives/gradients/etc. at the provided point without having to build a closure first?

For example, using the old API, you have to construct ∇f before you can evaluate ∇f(x):

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)

julia> g = forwarddiff_gradient(f, Float64)
gradf (generic function with 2 methods)

julia> g([1.0,2.0,3.0])
3-element Array{Float64,1}:
 0.540302
 0.756802
 0.0621972

Using the new API, you can simply perform the evaluation ∇f(x) without actually constructing ∇f:

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)

julia> gradient(f, [1.0, 2.0, 3.0])
3-element Array{Float64,1}:
 0.540302
 0.756802
 0.0621972

Similar methods exist for Jacobians/Hessians/etc. The tentative documentation for the new API can be found in the README of #27's branch.

@gragusa
Copy link
Contributor Author

gragusa commented Aug 6, 2015

The new API is great. But it is still not up to the task for the sort of problems I am dealing with. To be more concrete, I tried to make a simple example where passing down argument is necessary.

I have a function, f(θ) in R^k -> R^(n x m). This function is used to define another function h(u) in R^(n x m) -> R^m

function h(u)
    p = u[1:n]
    θ = u[n+1:end]
    vec(mean(p.*f(θ),1))
end

The Jacobian with respect to u is given by [f(θ)'/n \partial h(u)/\partial θ].

With the new API I can use

jacobian(h, [ones(n); 0])

which gives

3x7 Array{Float64,2}:
 -0.000217836   0.0556389   0.0816585   0.000563994   0.112706    0.0099686  -0.332284
 -0.00168171   -0.329863   -0.0724125  -0.0232947    -0.0784092   0.030865   -1.35021
 -0.000336177  -0.0760003  -0.0558016   0.00272413    0.0931781  -0.056837   -0.796085

where the 3x6 block is

f([.0])'/6
3x6 Array{Float64,2}:
 -0.000217836   0.0556389   0.0816585   0.000563994   0.112706    0.0099686
 -0.00168171   -0.329863   -0.0724125  -0.0232947    -0.0784092   0.030865
 -0.000336177  -0.0760003  -0.0558016   0.00272413    0.0931781  -0.056837

In this case, jacobian is already calculating 18 more derivatives that it needs to.

The things get worse since I am also interested to the hessian of the following function

function v(u)
    p = u[1:n]
    θ = u[n+1:n+1]
    λ = u[n+2:end]
    sum(p.*f(θ)*λ)
end

This is sparse as the second derivtives wrt to p are all 0. I am also not interested in the hessian with respect to λ.

If I do

hessian(v, [ones(n); 0., ones(3)])
10x10 Array{Float64,2}:
  0.0          0.0        0.0        0.0          0.0        0.0        -9.61983   -0.00130702  -0.0100903  -0.00201706
  0.0          0.0        0.0        0.0          0.0        0.0        -2.88912    0.333833    -1.97918    -0.456002
  0.0          0.0        0.0        0.0          0.0        0.0        -0.097451   0.489951    -0.434475   -0.33481
  0.0          0.0        0.0        0.0          0.0        0.0         0.131498   0.00338397  -0.139768    0.0163448
  0.0          0.0        0.0        0.0          0.0        0.0        -2.10706    0.676236    -0.470455    0.559068
  0.0          0.0        0.0        0.0          0.0        0.0        -0.289503   0.0598116    0.18519    -0.341022
 -9.61983     -2.88912   -0.097451   0.131498    -2.10706   -0.289503    0.0       -1.9937      -8.10126    -4.77651
 -0.00130702   0.333833   0.489951   0.00338397   0.676236   0.0598116  -1.9937     0.0          0.0         0.0
 -0.0100903   -1.97918   -0.434475  -0.139768    -0.470455   0.18519    -8.10126    0.0          0.0         0.0
 -0.00201706  -0.456002  -0.33481    0.0163448    0.559068  -0.341022   -4.77651    0.0          0.0         0.0

Since in my use case n is very large (in the thousands) and θ is very small (in the tens) having the possibility to pass down additional argument would make this computation feasible.

Both h and v are used inside a loop --- and each iteration of the loop gives the value of p and λ. I could do it using closure. But the closure would be created at each iteration. Isn't this too costly?

@mlubin
Copy link
Contributor

mlubin commented Aug 6, 2015

But the closure would be created at each iteration. Isn't this too costly?

Not necessarily, and why not create the closure once instead of at each iteration?

@gragusa
Copy link
Contributor Author

gragusa commented Aug 6, 2015

Because, for instance, λ is a lagrange multiplier. I am using this type of function into the MathProgBase interface. If I create the closure 'outside' the interface I have to make the λ a global variable. That is, I have scoping issues.

Probably I my resolve the issue by looking at advanced closures alternatives #102.

@mlubin
Copy link
Contributor

mlubin commented Aug 6, 2015

Do you have any benchmarks showing the performance difference here?

@gragusa
Copy link
Contributor Author

gragusa commented Aug 6, 2015

Not off-hand --- I changed completely the interface to avoid globals --- but I remember getting a ~5x speedup plus a reduction in memory allocation using my hacked ForwardDiff API (that allows extra argument).

@mlubin
Copy link
Contributor

mlubin commented Aug 6, 2015

Were you using globals or closures?

@gragusa
Copy link
Contributor Author

gragusa commented Aug 6, 2015

Globals. I thought that the closure performance hit is (or used to be?) large.

@mlubin
Copy link
Contributor

mlubin commented Aug 6, 2015

Not nearly as bad as using globals. If you can put together a realistic but small benchmark, I think that would help sort out the issues here.

@jrevels
Copy link
Member

jrevels commented Aug 7, 2015

I just pushed a branch that extends the API to allow for targeting specific arguments. It was quite a fun little problem to try to solve generically. Scroll down to the bottom of the README for the new stuff.

I didn't want to add this to #27 yet, because I haven't developed good tests for it. I believe that the strategy I'm using can be implemented such that there is no (or very little) loss of performance due to the generalization. It's very possible that what I just pushed will be that fast, but I need to do to some more rigorous performance comparisons before I make any claims.

My guess is that the main performance hit for the wrt-feature branch will come from conversions between the evaluated target argument and the non-partial arguments (with no extra conversions being done in the 1-arg case, so no performance hit there). For similar reasons, I also imagine that rigorous testing of the wrt-feature branch will reveal that we need more comprehensive conversion code, but we can tackle that once we get there.

Finally, wrt-feature stuff is backwards compatible with #27, so it can be merged after #27 and not break the existing API.

@gragusa
Copy link
Contributor Author

gragusa commented Aug 7, 2015

@jrevels this looks outstanding. I will try it soon. I will contribute with extensive performance testing.

@jrevels
Copy link
Member

jrevels commented Aug 7, 2015

@gragusa Awesome, thanks! Also, remember - since I don't have automated tests for it yet, there are probably bugs that may or may not actually throw errors. Silent bugs could corrupt your results without any warning, so be wary of results you get from that branch for the time being.

@gragusa
Copy link
Contributor Author

gragusa commented Aug 7, 2015

I can help with testing. Do you have anything in mind? Adjusting the old
ones to the new API extended to the new features?
On Fri 7 Aug 2015 at 14:47 Jarrett Revels notifications@github.com wrote:

@gragusa https://github.com/gragusa Awesome, thanks! Also, remember -
since I don't have automated tests for it yet, there are probably bugs that
may or may not actually throw errors. Silent bugs could corrupt your
results without any warning, so be wary of results you get from that branch
for the time being.


Reply to this email directly or view it on GitHub
#32 (comment)
.

@jrevels
Copy link
Member

jrevels commented Aug 7, 2015

Pretty much. Like I mentioned, wrt-feature's API should be backwards-compatible with #27's API, so all the tests from #27 should remain in place.

The things that immediately come to mind that need to be tested for wrt-feature:

    1. Check that using wrt{1}, wrt{2}...wrt{n} consistently delivers correct results across a variety of n-ary functions. A while ago, @mlubin floated the idea of making a fuzzer to generate test expressions from lists of allowable univariate and bivariate functions; I think that would be a necessity in this case. I implemented a really naive one here, but removed in a subsequent commit because it needed better logic to avoid building functions that would inherently throw DomainErrors.
    1. Pass in arguments of heterogeneous type to the functions from 1), checking that results are still consistently correct. Additionally, compare performance with the same functions when the arguments are homogeneously typed. Finally, compare performance with the same functions, but change the internal methods to replace all arguments with the relevant ForwardDiffNum type (real-valued for all non wrt args). For example, hessian(wrt{2}, f, x, y, z) would be redefined to evaluate something like f(HessianNum(x, zero_partials...), HessianNum(y, one_partials...), HessianNum(z, zero_partials...)) (its current behavior is to evaluate f(x, HessianNum(y, one_partials...), z)). That's basically psuedo-code - I'll probably need to implement better constructors/convert methods to actually get this done.
    1. Check univariate performance (e.g. no wrt{i}) against WIP: Refactor ForwardDiff to utilize NDuals and replace API #27's univariate performance.

TBH, properly building the fuzzer mentioned in 1) would be a task that might even be worthy of its own package. 3) is something that could be done basically immediately, however, and I'd love to see the results.

@jrevels
Copy link
Member

jrevels commented Aug 7, 2015

I made a package for the fuzzer here. It may take a while before I can start work on it, but at least now there's a centralized place to make contributions to.

@gragusa
Copy link
Contributor Author

gragusa commented Aug 17, 2015

@jrevels I was starting on helping with test and performing with benchmark of the wrt-feature branch, but the branch does not merge nicely anymore --- you have done way to much work in the last week. I looked at the code and it seems that your changes are still compatible with your work on the wrt feature. Are they?

@jrevels
Copy link
Member

jrevels commented Aug 17, 2015

This feature is still possible given the new underlying structure, but a lot has changed so it wouldn't just be a straight port (the strategy would essentially be the same, though). The wrt-feature branch remains as a proof-of-concept, but the feature would have to be re-implemented overtop of the current master to actually work.

Currently, fleshing out our benchmarks and resolving #37 are higher priority than this, since those may change how we implement the wrt feature.

@jrevels
Copy link
Member

jrevels commented Sep 5, 2015

Now that #37 is resolved, I've revisited this idea, and I think that telling users to make closures is the way to go here. It's a much simpler solution than trying to incorporate this feature into ForwardDiff's API. Feel free to continue to discuss this, but for now, I'm going to close the issue.

To give an explicit example, here's a potential way in which a user should accomplish this:

# I have some function f(x, y, z)
# I want to take the gradient of f with respect to y

const my_cache = ForwardDiffCache()

g_y(x, y, z) = ForwardDiff.gradient(n -> f(x, n, z), y, cache=my_cache)

Obviously, there may be more performant ways of doing the above if you're looping over different ys while your xs or zs are invariant (e.g. construct the closure outside the loop).

If there are still performance concerns with anonymous closures, then this issue could be reopened is depending on what we see from benchmarks/profiling examples. In general, however, I feel like effort to fix closure concerns should be focused on improving Base rather than hacking around the problem in ForwardDiff.

@papamarkou
Copy link
Contributor

I am re-opening this issue, with a justification and explanation found at the bottom of #77.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants