Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makeGradients wrapper to simplify common use case #91

Closed
wants to merge 2 commits into from

Conversation

cscherrer
Copy link

The most common "big win" case for reverse-mode autodiff is for a function from a vector over the reals to the reals. But compared to ForwardDiff.jl, ReverseDiff.jl has some boilerplate and cognitive overhead, because of the need to preallocate the tape.

A simple way around this is to build a wrapper makeGradients like so:

makeGradients(f, x0) = begin
  const f_tape = GradientTape(f, x0)
  const compiled_f_tape = compile(f_tape)
  g = similar(x0)
  yg = DiffBase.GradientResult(g)
  cfg = GradientConfig(x0)
  ∇f!(x)  = gradient!(g, compiled_f_tape, x)
  f∇f!(x) = gradient!(yg, compiled_f_tape, x)
  return(∇f!, f∇f!, g, yg)
end

This seems to me to arrive at the best of both worlds: It's very simple to use, and computation is efficient and requires no allocation. And it returns both the gradient as well as a function that computes the value and gradient together in only one pass. Many algorithms require both of these in order to really be efficient. The wrapper makes building it very simple.

In principle, this could be extended to inputs other than vectors, as well as to computations other than function values and gradients.

@jrevels
Copy link
Member

jrevels commented Sep 1, 2017

Hi, thanks for the contribution! This is certainly a correct usage of the ReverseDiff/DiffBase API, and is a pattern that I expect many folks use.

However, not including such a function as part of ReverseDiff's API was intentional. This function makes a lot of choices for users that I'd rather force users to make themselves. Here are some examples:

  • f may not be amenable to tape preallocation/compilation
  • The returned functions aren't generic w.r.t. input/output type/shape, even though they might look generic to unaware downstream code
  • This approach isn't thread-safe. There are many different routes to making it thread-safe, but picking the best strategy for thread-safety will heavily depend on the use case (so it's not a decision ReverseDiff should be making for the user).

Thanks again for the work, though! I do believe this could be useful as an example, if you'd like to add it to the gradient examples file.

@cscherrer
Copy link
Author

Hi Jarrett,

Thanks for the feedback. I was aware of the first two concerns but hadn't considered thread safety. I suppose a way to do that would be to allocate a tape for each thread, or possibly one for each processor if it can be guaranteed that gradient evaluation is not interrupted. Interesting...

I like your suggestion of moving this to the examples file - I'll update my branch accordingly.

@cscherrer cscherrer closed this Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants