`makeGradients` wrapper to simplify common use case #91

cscherrer · 2017-08-30T18:29:23Z

The most common "big win" case for reverse-mode autodiff is for a function from a vector over the reals to the reals. But compared to ForwardDiff.jl, ReverseDiff.jl has some boilerplate and cognitive overhead, because of the need to preallocate the tape.

A simple way around this is to build a wrapper makeGradients like so:

makeGradients(f, x0) = begin
  const f_tape = GradientTape(f, x0)
  const compiled_f_tape = compile(f_tape)
  g = similar(x0)
  yg = DiffBase.GradientResult(g)
  cfg = GradientConfig(x0)
  ∇f!(x)  = gradient!(g, compiled_f_tape, x)
  f∇f!(x) = gradient!(yg, compiled_f_tape, x)
  return(∇f!, f∇f!, g, yg)
end

This seems to me to arrive at the best of both worlds: It's very simple to use, and computation is efficient and requires no allocation. And it returns both the gradient as well as a function that computes the value and gradient together in only one pass. Many algorithms require both of these in order to really be efficient. The wrapper makes building it very simple.

In principle, this could be extended to inputs other than vectors, as well as to computations other than function values and gradients.

jrevels · 2017-09-01T17:09:56Z

Hi, thanks for the contribution! This is certainly a correct usage of the ReverseDiff/DiffBase API, and is a pattern that I expect many folks use.

However, not including such a function as part of ReverseDiff's API was intentional. This function makes a lot of choices for users that I'd rather force users to make themselves. Here are some examples:

f may not be amenable to tape preallocation/compilation
The returned functions aren't generic w.r.t. input/output type/shape, even though they might look generic to unaware downstream code
This approach isn't thread-safe. There are many different routes to making it thread-safe, but picking the best strategy for thread-safety will heavily depend on the use case (so it's not a decision ReverseDiff should be making for the user).

Thanks again for the work, though! I do believe this could be useful as an example, if you'd like to add it to the gradient examples file.

cscherrer · 2017-09-06T03:17:34Z

Hi Jarrett,

Thanks for the feedback. I was aware of the first two concerns but hadn't considered thread safety. I suppose a way to do that would be to allocate a tape for each thread, or possibly one for each processor if it can be guaranteed that gradient evaluation is not interrupted. Interesting...

I like your suggestion of moving this to the examples file - I'll update my branch accordingly.

cscherrer added 2 commits August 30, 2017 10:58

Add makeGradients to simplify common use case

ef51b03

Correcting documented type

05c7bfa

cscherrer closed this Sep 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`makeGradients` wrapper to simplify common use case #91

`makeGradients` wrapper to simplify common use case #91

cscherrer commented Aug 30, 2017

jrevels commented Sep 1, 2017

cscherrer commented Sep 6, 2017

makeGradients wrapper to simplify common use case #91

makeGradients wrapper to simplify common use case #91

Conversation

cscherrer commented Aug 30, 2017

jrevels commented Sep 1, 2017

cscherrer commented Sep 6, 2017

`makeGradients` wrapper to simplify common use case #91

`makeGradients` wrapper to simplify common use case #91