Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible second argument to input function #77

Closed
papamarkou opened this issue Dec 10, 2015 · 54 comments
Closed

Possible second argument to input function #77

papamarkou opened this issue Dec 10, 2015 · 54 comments

Comments

@papamarkou
Copy link
Contributor

As I am trying to connect ForwardDiff with Lora, I came to realise that this can work in one of two possible cases, namely when the log-target function has a signature such as logtarget(x::Vector). There is a second case, however, which is perhaps more common and interesting in real applications; in this latter case, the log-target takes a second argument, which may be a cell array (or dictionary, but let's say cell array for now) of auxiliary variables.

The main question is whether we can allow ForwardDiff to work with functions with signature logtarget(x::Vector, y::Vector). In principle, ForwardDiff will carry on working the same way by performing autodiff with respect to x only, i.e. by ignoring any subsequent input argument after the first one. y will only appear in the body of logtarget to pass additional values in the body of log-target, but as far as autodiff goes y will be a constant.

All it takes to add this extra feature is a slight change of API allowing extra arguments that won't change the course of differentiation. Do you think we could possibly deal with this?

@KristofferC
Copy link
Collaborator

Isn't this handled by closures already?

@papamarkou
Copy link
Contributor Author

True but in iterative algorithms where the second argument changes, the closure has to be redefined in each step/iteration, and this is not always the best design for extraneous reasons.

@KristofferC
Copy link
Collaborator

The closure "updates" together with the captured variables. There is no need to redefine it.

@papamarkou
Copy link
Contributor Author

That's good to know. At the same time it is good to allow for extra dummy input arguments (dummy from the autodiff perspective) as it would enable more diverse development. For example I have separated parameters (methods) from states (data), so my software engineering approach is conceptually orthogonal to closures.

@KristofferC
Copy link
Collaborator

I still don't understand why closures doesn't solve this.

If you have a function f(x, data, parameters, dummy, whatever) and want to take the derivative of f w.r.t x you just create f(x) = f(x, data, parameters, dummy, whatever) where the arguments now are actual instantiated variables and pass f to ForwardDiff. This is how all the optimization and solver packages does it. Your method is similar to what you do in c where you pass a void pointer as the second argument which you can cast to whatever you want but closures solves this imo.

@gragusa
Copy link
Contributor

gragusa commented Dec 10, 2015

The issue of allowing additional arguments to function was discussed in #32. It was also concluded that closures do indeed are a solution. A potential worry is that creating many closure --- for instance in MCMC case you might to create a closure for each iteration --- may end up with a big performance penalty. I have done some benchmarking and it seems that creating closures (many closures) is not a big issue as the penalty is an order of magnitude smaller than the cost of obtaining the function being differentiated.

@jrevels
Copy link
Member

jrevels commented Dec 10, 2015

Closing as duplicate of #32.

@scidom Feel free to reopen if you think this needs to be revisited, but I really think closures are the best way to do this. If it helps to approach it a different way, remember that ForwardDiff now supports differentiating callable types.

@jrevels jrevels closed this as completed Dec 10, 2015
@papamarkou
Copy link
Contributor Author

Thanks @jrevels. I wanted to kindly ask if it would be possible to add this as a feature request. I understand that closures is a possible solution, but I want as a programmer and ForwardDiff user to have the alternative option of inputing a function with more than one arguments for two reasons, to avoid any small performance penalty (however small it may be, no penalty is better than small penalty), and most importantly because it may be helpful to provide an alternative usage case/programming paradigm.

If of course the answer is that it is not possible, then it is ok, but wanted to make the effort and ask for this feature.

@papamarkou
Copy link
Contributor Author

I tried the following:

a = rand(2000)
x = rand(2000)

n = 1000000

function f(x)
  x+a
end

function g(x, y)
  x+y
end

@time for i in 1:n
  f(x)
end

@time for i in 1:n
  g(x, a)
end

It seems that actually f is a tiny bit faster than g, i.e. closures are a tiny bit faster, is this possible?

@KristofferC
Copy link
Collaborator

Now you are using a global variable. That is not the same as creating an closure inside another function that captures function local variables.

@jrevels
Copy link
Member

jrevels commented Dec 10, 2015

Consider what it would be like if we DID have this feature - I think the package would actually be worse off. Here are some reasons why:

  • It'd make the API more complicated, and the API is already somewhat complicated.
  • We'd have to write (and maintain) tests that cover all the new behaviors enabled by this feature.
  • A general, maintainable implementation of this feature could very well be slower than just letting the user pass in closures and callable types, and could make our performance model even harder to understand.

@gragusa
Copy link
Contributor

gragusa commented Dec 10, 2015

Try this

n = 1000000
a = rand(n)
x = rand(n)

function g(x, y)
  x+y
end

@time for i in 1:n
  g(x[i], a[i])
end

@time for i in 1:n
  cl(x) = g(x, a[i])
  cl(x[i])
end

@KristofferC
Copy link
Collaborator

You will not get any useful data by benchmarking functions accessing global variables.

@papamarkou
Copy link
Contributor Author

Thanks for the feedback @KristofferC and for the snippet @gragusa.

@jrevels I can see your point about complexity of development. My primary criterion during development tends to be performance. Along these lines, to quantify performance, shall we try to think of a simple benchmark, without using globals as @KristofferC pointed, just to see what is the penalty of closures if any? It is hard for all of us to know the pros and cons without measuring them... I will try to think of a reasonable example, not necessarily related to ForwardDiff.

@gragusa
Copy link
Contributor

gragusa commented Dec 10, 2015

You can always enclose the above into a function. But why accessing global variables should be different for closures?

@papamarkou
Copy link
Contributor Author

@gragusa you mean sth like the example below?

n = 1000000
a = rand(n)
x = rand(n)

function g(x, y)
  x+y
end

function z()
  for i in 1:n
    g(x[i], a[i])
  end
end

function w()
  for i in 1:n
    cl(x) = g(x, a[i])
    cl(x[i])
  end
end

@time z()

@time w()

This still uses globals, plus we redefine the closure in each iteration, so we may want to improve on these.

@gragusa
Copy link
Contributor

gragusa commented Dec 10, 2015

@scidom I was thinking more along the lines of

function f()
    n = 1000000
    a = rand(n)
    x = rand(n)

    function g(x, y)
        x+y
    end

    @time for i in 1:n
        cl(x) = g(x, a[i])
        cl(x[i])
    end

end

This is the worst case scenario --- when closures need to be created at each iteration. Think about MCMC within Gibbs: at each iteration you have to create a closure enclosing the blocks of variables.

@KristofferC
Copy link
Collaborator

Global variables can change type at any time and must therefore be boxed. Local variables can be reasoned about. This does not change when using closures.

Regarding performance. We are already passing in a function as argument to ForwardDiff which has its own overhead so the closure overhead should be amortized. For best performance, create a functor (type which overloads Base.call) and pass that to ForwardDiff.

@papamarkou
Copy link
Contributor Author

Yes, this Gibbs scenario is exactly the situation I am dealing with 👍

@gragusa
Copy link
Contributor

gragusa commented Dec 10, 2015

@KristofferC Yes, I understand the boxing. The fact that "this does not change when using closures" is what I thought --- and so the benchmarking in this case is not misleading because both functions deal with global in the same way.

@papamarkou
Copy link
Contributor Author

@gragusa I tried

function f()
    n = 1000000
    a = rand(n)
    x = rand(n)

    function g(x, y)
        x+y
    end

    @time for i in 1:n
        cl(x) = g(x, a[i])
        cl(x[i])
    end

    @time for i in 1:n
      g(x[i], a[i])
    end
end

f()

and there is massive gap in performance. Performance with Gibbs sampling is even worse than our toy benchmark because each Gibbs iteration requires redefining several closures - pretty much as many as the number of sampled parameters.

@KristofferC can you provide a toy example of what you meant about overloading Base.call()? It doesn't have to be a benchmark, just an example of what you have in mind about functors.

@KristofferC
Copy link
Collaborator

See https://github.com/JuliaLang/julia/blob/master/base/functors.jl

This types could also hold states. When passing these as arguments julia can do inlining and all that jazz.

@papamarkou
Copy link
Contributor Author

Thanks, I also found the following in the documentation:

http://docs.julialang.org/en/release-0.4/manual/methods/#call-overloading-and-function-like-objects

@KristofferC
Copy link
Collaborator

Also what is interesting is not the cost of the closure vs calling a normal function but the cost of passing a closure vs a function + arguments to ForwardDiff. My gut says there will be very little difference. I'm in bed now so can't test it.

@jrevels
Copy link
Member

jrevels commented Dec 10, 2015

Here's a concrete example of making a callable type and using it with ForwardDiff:

julia> immutable Foo{T}
           x::T
           y::T
       end

julia> call(f::Foo, a) = a^(f.x) + (a / f.y)
call (generic function with 1033 methods)

julia> f = Foo(3, 2)
Foo{Int64}(3,2)

julia> f(2.5)
16.875

julia> using ForwardDiff

julia> ForwardDiff.derivative(f, 2.5)
19.25

As @KristofferC said, you can use these types instead of closures. Any time where you would make a new closure to update the values, you can instead make a new instance of one of these types. If your type is mutable, you could just update the fields as you go instead of making new instances (though keeping it immutable could work better if you're storing isbits field types, like numeric primitives).

AFAIK, the plan is for closures to start using this callable type strategy "under the hood" in v0.5 (see JuliaLang/julia#13412).

@papamarkou
Copy link
Contributor Author

That's amazing, I must say. I am feeling a bit puzzled though in two respects; i) in your example f is of type Foo and not of type Function, right? How is it possible then that Forward.derivative(f, 2.5) perceives f as a generic function instead of Foo?! ii) Jeff mentioned in the issue you forwarded that the call function will be deprecated; in this case, would your nice example remain applicable? Bed time for me too, but will see your reply tomorrow.

@papamarkou
Copy link
Contributor Author

Actually, I found the answer to the second question, I think call will be replaced by (f::Foo)(a) = a^(f.x) + (a / f.y) in our example...

@papamarkou
Copy link
Contributor Author

I am trying to understand how this would work in practice, if I would need to introduce a new log-target type and make it callable, too tired to think clearly now, tomorrow again...

@papamarkou
Copy link
Contributor Author

I wanted to add 2 more thoughts. Firstly, the functors are a great idea but (as any tool) they are not the proper means for every situation. More specifically, consider the case of having a functor f with field f.y for holding the field that would otherwise be enclosed in a closure, and let's say we call f(x), where f in its body uses f.y. This is all fine. But once we consider a second functor g with the same field g.y and call g(x) this becomes problematic if we would want f.y and g.y to point to a common vector (because f and g have their own copies of y).

The second remark relates to the comment made by @KristofferC; I wrote a simple example for which in one case we pass a closure f whereas in another case we pass a function g plus an argument a. If the closure doesn't need to be redefined, then there seems to be no difference in performance:

function mytest()
  len = 2000

  a = rand(len)
  x = rand(len)

  n = 1000000

  function f(x)
    x+a
  end

  function g(x, y)
    x+y
  end

  function fwrapper(f, x)
    f(x)
  end

  function gwrapper(g, x, y)
    g(x, y)
  end

  @time for i in 1:n
    fwrapper(f, x)
  end

  @time for i in 1:n
    gwrapper(g, x, a)
  end
end

mytest()

The only problem of course is development-related, i.e. having one version of user-defined function g(x, a) and then another internal version of it calling the relevant closure h(x). If this needs to be done for many functions, I need to maintain double copies for each function to communicate with the user, one external g(x, a) that doesn't enclose a (because as far as the API goes I do need to have a version of g that is not a closure so that l can create the closure later on, once a has been defined) and an internal that provides the respective closure. This is my glue code solution to proceed, with some extra memory penalty of course... but that's the only way I can reconcile the restriction posed by only allowing closures with ForwardDiff in the context of a general Gibbs scheme.

@KristofferC
Copy link
Collaborator

There is no reason why you couldn't make two separate instances of a functor hold references to the same vector.

Regarding recreating the closure, why not do something like this:

function f()
    n = 1000000
    a = rand(n)
    x = rand(n)


    function g(x, y)
        x+y
    end

    @time for i in 1:n
        cl(x) = g(x, a[i])
        cl(x[i])
    end

    a_val = 0.0
    cl2(x) = g(x, a_val)
    @time for i in 1:n
        a_val = a[i]
        cl2(x[i])
    end

    @time for i in 1:n
      g(x[i], a[i])
    end
end

Since the captured variable updates there is no need to recreate the closure unless you will change the type of captured variable in every iteration (seems unlikely).

@papamarkou
Copy link
Contributor Author

I agree @KristofferC, there is no need to redefine the closure. The only difficulty is that I want to be able to define method fields in a parameter type independently of the enclosed value, which means that I need to delay the (one and only) definition of the closure until I know the delayed value.

About functors, you are actually right. For example the following works (therefore I will consider more seriously the possibility of using functors):

immutable Foo1{T}
  y::Vector{T}
end

immutable Foo2{T}
  y::Vector{T}
end

y = [2., 3.]

f1 = Foo1(y)
f2 = Foo2(y)

call(f::Foo1, a) = a*f.y
call(f::Foo2, a) = a+f.y

f1(2.)
f2(2.)

y[2] = 30.

f1(2.)
f2(2.)

@papamarkou
Copy link
Contributor Author

P.S. I find the functor solution quite ingenious the more I think about it.

@KristofferC
Copy link
Collaborator

Note that in the (not so far away) future, the plan for julia is for closures to basically lower into their functor equivalent version making them just as performant.

@papamarkou
Copy link
Contributor Author

This is great. I find functors more handy than closures in Lora because they are more low level exactly, which makes dev easier.

@papamarkou
Copy link
Contributor Author

After more thinking and an attempt to implement the idea, it turns out functors won't work unfortunately for the application at hand. The problem is that the definition call(f::Foo, x) is Foo-specific and not f-specific, meaning that the overloading of call is universal for Foo, whereas I need different calls for different instances f of Foo. It seems after all that closures are the only way out probably.

@papamarkou
Copy link
Contributor Author

After having spent a good deal of time thinking over this problem and after having tried to tackle it in practice, I would like to share my thoughts and to make a case for adding the extra feature of a wrt optional argument.

I am convinced by now that there is no performance penalty when using closures, so performance is not a concern anymore. The reason I think the wrt feature is important is programming-related only. Alternative solutions have been proposed but none of them truly resolves the development scenario I am dealing with.

Functors are indeed a great solution but the main issue, as explained above, is that the definition of call() is made on the basis of the underlying functor, whereas in my case conceptually the call() function varies between instances of the functor.

Closures are great, but in a larger scale project of a "real-life" application they can become impractical. For instance, if we were to ask the user to pass just one closure to gradient() this would make sense. However, in many cases this is impractical because i) many closures may be involved and ii) most importantly the additional argument enclosed by the closure is not naturally known to the user, or at least not early in model specification, so the API for a model would become obfuscated and not transparent by asking the user to create a dummy variable with a specific name (so that it can later by modified before calling the closure).

The bottom line is that there is no performance consideration, but there are important use cases where it becomes less natural and more programmatically impractical to request the user to define a closure.

Along these lines, I want to kindly re-open issue #32 and consider adding the wrt option. This will not make ForwardDiff more complicated for the user, because wrt can have a default value; it will heve some impact only on maintenance of ForwardDiff but I think it is worth the effort in order to provide this extra feature that can prove to be invaluable in practice when using ForwardDiff inside other prackages.

@mlubin
Copy link
Contributor

mlubin commented Dec 12, 2015

the call() function varies between instances of the functor

@scidom, could you explain what you mean by this?

@papamarkou
Copy link
Contributor Author

I will try to explain using a fictitious toy example:

immutable LogPrior
  y::Vector
end

call(lp::LogPrior, x::Vector{Float64}) = x+lp.y

y = [1., 2.5]
logprior = LogPrior(y)

logprior([2., 5.])

This works fine, but the problem is that I may want to define more than one priors. Since multiple dispatch for call(::LogPrior, ::Vector{Float64}) for the functor LogPrior can be exploited only once, I can not define, say

call(logprior::LogPrior, x::Vector{Float64}) = x-0.5y

for another prior...

@papamarkou
Copy link
Contributor Author

I could code-generate many functors for different priors with @gensym LogPrior names but not sure if this is a reasonable coding approach.

@mlubin
Copy link
Contributor

mlubin commented Dec 12, 2015

What would the wrt form look like?

@mlubin
Copy link
Contributor

mlubin commented Dec 12, 2015

I'm not sure how the issue is any different in that case

@papamarkou
Copy link
Contributor Author

I was thinking that if we have a function say f(x, y) and we can call

ForwardDiff.gradient(f, wrt=:x)

or perhaps

ForwardDiff.gradient(f, wrt=1)

then this may allow passing a function f with more than one arguments. I am not sure if this is the best solution, just trying to think of a way of maintaining a reasonable API and interface between ForwardDiff and Lora.

@papamarkou
Copy link
Contributor Author

To explain my problem, if I ask the user to provide a closure f(x) with a mangled name _y inside the body of the closure for the enclosed value, and also define a dummy _y = sth wouldn't that be ugly on the Lora end?

@papamarkou
Copy link
Contributor Author

The other solution is that I ask the user to provide f(x, y) and I codegen the closure f(x) internally which is possible.

@mlubin
Copy link
Contributor

mlubin commented Dec 12, 2015

The functor object can just store the user-provided function and the arguments.

@KristofferC
Copy link
Collaborator

All the solver packages, and optimization packages seems to do just fine with only closures so I am sure there should be some way to solve this.

Regarding extra arguments in ForwardDiff. My first thought is that allowing 1 extra argument is completely arbitrary. What if you want to pass multiple extra arguments? The only way you can do that in a type stable way is to write a tailor made type for your specific function. This means that you suddenly impose on the user to always write their functions as f(x, extra_parameters), instead of letting them write them however they want and just ask them to pass the closure to the library. They also have to deal with the extra boilerplate of defining a new type for each function.

The functor object can just store the user-provided function and the arguments.

Isn't that pretty much exactly what a closure is?

@papamarkou
Copy link
Contributor Author

@KristofferC, I agree, one would need to generalize the way wrt would operate, the above was just a first naive example.

@mlubin your solution works for me, it is really great, thank you:

immutable LogPrior
  f::Function
  y::Vector
end

call(lp::LogPrior, x::Vector{Float64}) = lp.f(x, lp.y)

y = [1., 2.5]
logprior = LogPrior((x, y) -> x+y, y)

logprior([2., 5.])

I will then close #32 and will use the proposed solution (and yes, @KristofferC, I presume this functor definition is pretty much conceptually equivalent to a closure).

@papamarkou
Copy link
Contributor Author

One more thing, the reason this lower-level functor approach works better for me than just a closure is because I can then define the constructor (using pseudocode below)

LogPrior(f::Function, n::Int) = LogPrior(f, Array(MyLoraType, n))

so the user doesn't need to define the dummy variable y anymore, nor I need to worry about reserving a name for it.

@papamarkou
Copy link
Contributor Author

I may be doing sth wrong (highly likely due to being late), but I get an error when I try to use ForwardDiff with the above solution:

import ForwardDiff

immutable LogPrior
  f::Function
  y::Vector
end

call(lp::LogPrior, x::Vector) = lp.f(x, lp.y)

function f(x, y)
  x+y
end
y = [1., 2.5]
logprior = LogPrior(f, y)

logprior([2., 5.])

g = ForwardDiff.gradient(logprior)

g([2., 5.])

Although g is defined via ForwardDiff without issues, calling it throws

ERROR: MethodError: `convert` has no method matching convert(::Type{ForwardDiff.GradientNumber{2,Float64,Tuple{Float64,Float64}}}, ::Array{ForwardDiff.GradientNumber{2,Real,Tuple{Real,Real}},1})
This may have arisen from a call to the constructor ForwardDiff.GradientNumber{2,Float64,Tuple{Float64,Float64}}(...),
since type constructors fall back to convert methods.
Closest candidates are:
  ForwardDiff.GradientNumber{N,T,C}(::Any, ::Any)
  call{T}(::Type{T}, ::Any)
  convert{T<:Real}(::Type{T<:Real}, ::Complex{T<:Real})
  ...
 in _calc_gradient at /Users/theodore/.julia/v0.4/ForwardDiff/src/api/gradient.jl:86
 in g at /Users/theodore/.julia/v0.4/ForwardDiff/src/api/gradient.jl:54

@misun6312
Copy link

So with current version how can I use additional arguments to function being differentiated?

@KristofferC
Copy link
Collaborator

Same as before. Pass a closure as your function to be differentiated where it closes over your parameters.

@misun6312
Copy link

With Call overloading?

Okay I tried like this.. it's working.
I will apply this to my code. Thanks!

immutable LogLike
  f::Function
  y::Vector
end

call(ll::LogLike, params1::Vector) = ll.f(params1, ll.y)

function f(params1, y)
    dot(params1, y).^2+1
end
y = [1., 3, 10.]
LL = LogLike(f, y)

param = [2., 5., 1.]

LL(param)

res = GradientResult(param)
g = ForwardDiff.gradient!(res,LL,param)

ForwardDiff.gradient(res)

@KristofferC
Copy link
Collaborator

Call overloading is not needed anymore for performance. It is easier and more flexible to create a closure. Read through this thread and you will find examples. I am on mobile so can't post a full code example.

@misun6312
Copy link

misun6312 commented Jul 10, 2016

Oh.. you mean like this?

import ForwardDiff
using ForwardDiff


param = [2., 5., 1.]
    y = [1., 3, 10.]



function test(param, y)
    function g(params1)
        dot(params1, y).^2+1
    end
    LL = g(param)

    res = GradientResult(param)

    ForwardDiff.gradient!(res,g,param)
    ForwardDiff.gradient(res)    
end

test(param,y)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants