Allowing additional arguments to function being differentiated #32

gragusa · 2015-08-05T08:11:24Z

I have being using ForwardDiff.jl for a while. My use case is a little bit different as I need to differentiate f(x, args...) only with respect to x. Of source, in typical usage this can be accomplished by closure. By in some instances it is not possible.

I locally extended the api to allow args... to be passed down down (see, e.g., here).

I have been following the work on #27 (which is great by the way) and I was wondering whether either such extension would be possible or whether the new API will allow additional arguments.

The text was updated successfully, but these errors were encountered:

mlubin · 2015-08-05T14:05:50Z

In which cases is it not possible to achieve this behavior by using a closure?

gragusa · 2015-08-05T14:22:32Z

I am using it inside an optimization problem (through MathProgBase), The
user passes a function which is used to set up the constraint. I use
ForwardDiff.jl to automatically get the Jacobian and Hessian of the
constraints where jacobian and hessian depends on other variables. I cannot
use closure because construct the Jacobian and Hessian function outside the
optimization.

I could construct fad_jacobian and fad_hessian inside by creating a
closure inside eval_jac_g, but this would kill performances.

Giuseppe Ragusa

On Wed, Aug 5, 2015 at 4:05 PM, Miles Lubin notifications@github.com
wrote:

In which cases is it not possible to achieve this behavior by using a
closure?

—
Reply to this email directly or view it on GitHub
#32 (comment)
.

jrevels · 2015-08-05T18:56:03Z

I might be misunderstanding, but does it help that the new API allows one to easily take derivatives/gradients/etc. at the provided point without having to build a closure first?

For example, using the old API, you have to construct ∇f before you can evaluate ∇f(x):

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)

julia> g = forwarddiff_gradient(f, Float64)
gradf (generic function with 2 methods)

julia> g([1.0,2.0,3.0])
3-element Array{Float64,1}:
 0.540302
 0.756802
 0.0621972

Using the new API, you can simply perform the evaluation ∇f(x) without actually constructing ∇f:

julia> using ForwardDiff

julia> f(x) = sin(x[1]) + cos(x[2])^2 + tan(x[3])^3
f (generic function with 1 method)

julia> gradient(f, [1.0, 2.0, 3.0])
3-element Array{Float64,1}:
 0.540302
 0.756802
 0.0621972

Similar methods exist for Jacobians/Hessians/etc. The tentative documentation for the new API can be found in the README of #27's branch.

gragusa · 2015-08-06T17:01:33Z

The new API is great. But it is still not up to the task for the sort of problems I am dealing with. To be more concrete, I tried to make a simple example where passing down argument is necessary.

I have a function, f(θ) in R^k -> R^(n x m). This function is used to define another function h(u) in R^(n x m) -> R^m

function h(u)
    p = u[1:n]
    θ = u[n+1:end]
    vec(mean(p.*f(θ),1))
end

The Jacobian with respect to u is given by [f(θ)'/n \partial h(u)/\partial θ].

With the new API I can use

jacobian(h, [ones(n); 0])

which gives

3x7 Array{Float64,2}:
 -0.000217836   0.0556389   0.0816585   0.000563994   0.112706    0.0099686  -0.332284
 -0.00168171   -0.329863   -0.0724125  -0.0232947    -0.0784092   0.030865   -1.35021
 -0.000336177  -0.0760003  -0.0558016   0.00272413    0.0931781  -0.056837   -0.796085

where the 3x6 block is

f([.0])'/6
3x6 Array{Float64,2}:
 -0.000217836   0.0556389   0.0816585   0.000563994   0.112706    0.0099686
 -0.00168171   -0.329863   -0.0724125  -0.0232947    -0.0784092   0.030865
 -0.000336177  -0.0760003  -0.0558016   0.00272413    0.0931781  -0.056837

In this case, jacobian is already calculating 18 more derivatives that it needs to.

The things get worse since I am also interested to the hessian of the following function

function v(u)
    p = u[1:n]
    θ = u[n+1:n+1]
    λ = u[n+2:end]
    sum(p.*f(θ)*λ)
end

This is sparse as the second derivtives wrt to p are all 0. I am also not interested in the hessian with respect to λ.

If I do

hessian(v, [ones(n); 0., ones(3)])

10x10 Array{Float64,2}:
  0.0          0.0        0.0        0.0          0.0        0.0        -9.61983   -0.00130702  -0.0100903  -0.00201706
  0.0          0.0        0.0        0.0          0.0        0.0        -2.88912    0.333833    -1.97918    -0.456002
  0.0          0.0        0.0        0.0          0.0        0.0        -0.097451   0.489951    -0.434475   -0.33481
  0.0          0.0        0.0        0.0          0.0        0.0         0.131498   0.00338397  -0.139768    0.0163448
  0.0          0.0        0.0        0.0          0.0        0.0        -2.10706    0.676236    -0.470455    0.559068
  0.0          0.0        0.0        0.0          0.0        0.0        -0.289503   0.0598116    0.18519    -0.341022
 -9.61983     -2.88912   -0.097451   0.131498    -2.10706   -0.289503    0.0       -1.9937      -8.10126    -4.77651
 -0.00130702   0.333833   0.489951   0.00338397   0.676236   0.0598116  -1.9937     0.0          0.0         0.0
 -0.0100903   -1.97918   -0.434475  -0.139768    -0.470455   0.18519    -8.10126    0.0          0.0         0.0
 -0.00201706  -0.456002  -0.33481    0.0163448    0.559068  -0.341022   -4.77651    0.0          0.0         0.0

Since in my use case n is very large (in the thousands) and θ is very small (in the tens) having the possibility to pass down additional argument would make this computation feasible.

Both h and v are used inside a loop --- and each iteration of the loop gives the value of p and λ. I could do it using closure. But the closure would be created at each iteration. Isn't this too costly?

mlubin · 2015-08-06T17:07:17Z

But the closure would be created at each iteration. Isn't this too costly?

Not necessarily, and why not create the closure once instead of at each iteration?

gragusa · 2015-08-06T17:12:00Z

Because, for instance, λ is a lagrange multiplier. I am using this type of function into the MathProgBase interface. If I create the closure 'outside' the interface I have to make the λ a global variable. That is, I have scoping issues.

Probably I my resolve the issue by looking at advanced closures alternatives #102.

mlubin · 2015-08-06T17:13:02Z

Do you have any benchmarks showing the performance difference here?

gragusa · 2015-08-06T17:16:44Z

Not off-hand --- I changed completely the interface to avoid globals --- but I remember getting a ~5x speedup plus a reduction in memory allocation using my hacked ForwardDiff API (that allows extra argument).

mlubin · 2015-08-06T17:18:06Z

Were you using globals or closures?

gragusa · 2015-08-06T17:20:23Z

Globals. I thought that the closure performance hit is (or used to be?) large.

mlubin · 2015-08-06T17:22:16Z

Not nearly as bad as using globals. If you can put together a realistic but small benchmark, I think that would help sort out the issues here.

jrevels · 2015-08-07T01:46:16Z

I just pushed a branch that extends the API to allow for targeting specific arguments. It was quite a fun little problem to try to solve generically. Scroll down to the bottom of the README for the new stuff.

I didn't want to add this to #27 yet, because I haven't developed good tests for it. I believe that the strategy I'm using can be implemented such that there is no (or very little) loss of performance due to the generalization. It's very possible that what I just pushed will be that fast, but I need to do to some more rigorous performance comparisons before I make any claims.

My guess is that the main performance hit for the wrt-feature branch will come from conversions between the evaluated target argument and the non-partial arguments (with no extra conversions being done in the 1-arg case, so no performance hit there). For similar reasons, I also imagine that rigorous testing of the wrt-feature branch will reveal that we need more comprehensive conversion code, but we can tackle that once we get there.

Finally, wrt-feature stuff is backwards compatible with #27, so it can be merged after #27 and not break the existing API.

gragusa · 2015-08-07T10:10:51Z

@jrevels this looks outstanding. I will try it soon. I will contribute with extensive performance testing.

jrevels · 2015-08-07T12:47:27Z

@gragusa Awesome, thanks! Also, remember - since I don't have automated tests for it yet, there are probably bugs that may or may not actually throw errors. Silent bugs could corrupt your results without any warning, so be wary of results you get from that branch for the time being.

gragusa · 2015-08-07T14:02:33Z

I can help with testing. Do you have anything in mind? Adjusting the old
ones to the new API extended to the new features?
On Fri 7 Aug 2015 at 14:47 Jarrett Revels notifications@github.com wrote:

@gragusa https://github.com/gragusa Awesome, thanks! Also, remember -
since I don't have automated tests for it yet, there are probably bugs that
may or may not actually throw errors. Silent bugs could corrupt your
results without any warning, so be wary of results you get from that branch
for the time being.

—
Reply to this email directly or view it on GitHub
#32 (comment)
.

jrevels · 2015-08-07T15:07:13Z

Pretty much. Like I mentioned, wrt-feature's API should be backwards-compatible with #27's API, so all the tests from #27 should remain in place.

The things that immediately come to mind that need to be tested for wrt-feature:

1. Check that using wrt{1}, wrt{2}...wrt{n} consistently delivers correct results across a variety of n-ary functions. A while ago, @mlubin floated the idea of making a fuzzer to generate test expressions from lists of allowable univariate and bivariate functions; I think that would be a necessity in this case. I implemented a really naive one here, but removed in a subsequent commit because it needed better logic to avoid building functions that would inherently throw DomainErrors.
1. Pass in arguments of heterogeneous type to the functions from 1), checking that results are still consistently correct. Additionally, compare performance with the same functions when the arguments are homogeneously typed. Finally, compare performance with the same functions, but change the internal methods to replace all arguments with the relevant ForwardDiffNum type (real-valued for all non wrt args). For example, hessian(wrt{2}, f, x, y, z) would be redefined to evaluate something like f(HessianNum(x, zero_partials...), HessianNum(y, one_partials...), HessianNum(z, zero_partials...)) (its current behavior is to evaluate f(x, HessianNum(y, one_partials...), z)). That's basically psuedo-code - I'll probably need to implement better constructors/convert methods to actually get this done.
1. Check univariate performance (e.g. no wrt{i}) against WIP: Refactor ForwardDiff to utilize NDuals and replace API #27's univariate performance.

TBH, properly building the fuzzer mentioned in 1) would be a task that might even be worthy of its own package. 3) is something that could be done basically immediately, however, and I'd love to see the results.

jrevels · 2015-08-07T15:34:42Z

I made a package for the fuzzer here. It may take a while before I can start work on it, but at least now there's a centralized place to make contributions to.

gragusa · 2015-08-17T16:55:38Z

@jrevels I was starting on helping with test and performing with benchmark of the wrt-feature branch, but the branch does not merge nicely anymore --- you have done way to much work in the last week. I looked at the code and it seems that your changes are still compatible with your work on the wrt feature. Are they?

jrevels · 2015-08-17T17:10:37Z

This feature is still possible given the new underlying structure, but a lot has changed so it wouldn't just be a straight port (the strategy would essentially be the same, though). The wrt-feature branch remains as a proof-of-concept, but the feature would have to be re-implemented overtop of the current master to actually work.

Currently, fleshing out our benchmarks and resolving #37 are higher priority than this, since those may change how we implement the wrt feature.

jrevels · 2015-09-05T19:21:18Z

Now that #37 is resolved, I've revisited this idea, and I think that telling users to make closures is the way to go here. It's a much simpler solution than trying to incorporate this feature into ForwardDiff's API. Feel free to continue to discuss this, but for now, I'm going to close the issue.

To give an explicit example, here's a potential way in which a user should accomplish this:

# I have some function f(x, y, z)
# I want to take the gradient of f with respect to y

const my_cache = ForwardDiffCache()

g_y(x, y, z) = ForwardDiff.gradient(n -> f(x, n, z), y, cache=my_cache)

Obviously, there may be more performant ways of doing the above if you're looping over different ys while your xs or zs are invariant (e.g. construct the closure outside the loop).

If there are still performance concerns with anonymous closures, then this issue could be reopened is depending on what we see from benchmarks/profiling examples. In general, however, I feel like effort to fix closure concerns should be focused on improving Base rather than hacking around the problem in ForwardDiff.

papamarkou · 2015-12-12T21:53:38Z

I am re-opening this issue, with a justification and explanation found at the bottom of #77.

jrevels added the enhancement label Aug 14, 2015

jrevels closed this as completed Sep 5, 2015

gragusa mentioned this issue Dec 10, 2015

Possible second argument to input function #77

Closed

papamarkou reopened this Dec 12, 2015

papamarkou closed this as completed Dec 12, 2015

KristofferC mentioned this issue Dec 28, 2016

Added functionality to allow users to pass parameters as f(x, args...) #182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing additional arguments to function being differentiated #32

Allowing additional arguments to function being differentiated #32

gragusa commented Aug 5, 2015

mlubin commented Aug 5, 2015

gragusa commented Aug 5, 2015

jrevels commented Aug 5, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

jrevels commented Aug 7, 2015

gragusa commented Aug 7, 2015

jrevels commented Aug 7, 2015

gragusa commented Aug 7, 2015

jrevels commented Aug 7, 2015

jrevels commented Aug 7, 2015

gragusa commented Aug 17, 2015

jrevels commented Aug 17, 2015

jrevels commented Sep 5, 2015

papamarkou commented Dec 12, 2015

Allowing additional arguments to function being differentiated #32

Allowing additional arguments to function being differentiated #32

Comments

gragusa commented Aug 5, 2015

mlubin commented Aug 5, 2015

gragusa commented Aug 5, 2015

jrevels commented Aug 5, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

gragusa commented Aug 6, 2015

mlubin commented Aug 6, 2015

jrevels commented Aug 7, 2015

gragusa commented Aug 7, 2015

jrevels commented Aug 7, 2015

gragusa commented Aug 7, 2015

jrevels commented Aug 7, 2015

jrevels commented Aug 7, 2015

gragusa commented Aug 17, 2015

jrevels commented Aug 17, 2015

jrevels commented Sep 5, 2015

papamarkou commented Dec 12, 2015