Implement backend for ReverseDiff #29

sethaxen · 2022-01-09T13:24:01Z

This PR will implement the backend for ReverseDiff. I'll make an attempt as well to support optional tape compilation at least when pullback_function or value_and_pullback_function is called.

codecov-commenter · 2022-01-09T13:30:49Z

Codecov Report

Merging #29 (29323b0) into master (979fa04) will increase coverage by 1.85%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #29      +/-   ##
==========================================
+ Coverage   80.05%   81.90%   +1.85%     
==========================================
  Files           2        3       +1     
  Lines         396      431      +35     
==========================================
+ Hits          317      353      +36     
+ Misses         79       78       -1

Impacted Files	Coverage Δ
src/forwarddiff.jl	`100.00% <ø> (+2.63%)`	⬆️
src/AbstractDifferentiation.jl	`78.39% <100.00%> (+0.18%)`	⬆️
src/reversediff.jl	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 979fa04...29323b0. Read the comment docs.

frankschae · 2022-01-10T08:17:05Z

src/reversediff.jl

+        ReverseDiff.deriv!.(y_tracked, ws...)
+        ReverseDiff.reverse_pass!(tape)
+        return ReverseDiff.deriv.(x_tracked)
+    end


Is this the recommended form? In DiffEqSensitivity we use a slightly different set of instructions, see
https://github.com/SciML/DiffEqSensitivity.jl/blob/8601419ad910455ef8842597a8a13c615768477f/src/derivative_wrappers.jl#L406
(including unseed!, input_hook, output_hook)

Perhaps, I wrote this a few months back and don't recall the reasons. Thanks the reference, I'll check.

ReverseDiff is a little unclear about what constitutes its API. e.g. there's its documented API, and the api source directory. Some functions are declared in the latter but not documented (e.g. input_hook and output_hook), so they may be part of the API, while unseed!, for example, does not appear to be part of the API at all. Can you comment on why non-API functions were used in DiffEqSensitivity?

I am not sure. Probably to support some more general cases than seeded_reverse_pass! does? CC @ChrisRackauckas

It's to make it non-allocating and more generic. I got those hooks directly from Jarrett.

Non-allocating would only be relevant for mutating f!, right? Currently I don't think mutating f! are supported by the AD API.

sethaxen · 2022-01-20T12:59:23Z

RE tape compilation, I can think of 3 ways to do it, each with their own levels of support. Note that tape compilation requires the function and inputs be available.

First, a user can provide the function f and inputs directly to a CompiledReverseDiffBackend object. If the same function and inputs of the same type are encountered, then the compiled tape is used. This is a bad idea.

Second, a user can specify a compiled type parameter. This is only used by value_and_pullback_function and pullback_function, where the tape is compiled and then reused for all executions of the returned function.

Third, as Turing does, the compiled tape is memoized with Memoization.jl based on the function and the size and type of the inputs. I think this would also allow for gradient, jacobian, etc to use compiled tapes. But it would require conditional loading of Memoization. I also am not certain if it work work for the anonymous functions used internally in AbstractDifferentiation.jl.

sethaxen · 2022-01-20T13:33:34Z

@devmotion do you have other ideas regarding supporting tape compilation?

devmotion · 2022-01-20T13:52:52Z

I'm not sure. I've seen some strange with Memoization (e.g. TuringLang/Turing.jl#1418), in general it feels a bit obscure and like hiding something that maybe should just be more explicit.

Is it possible to add AD-specific keyword arguments to the API? Then another alternative could be to provide a user-facing function that returns a compiled tape (basically ReverseDiff.compile(ReverseDiff.Gradient/Jacobian/HessianTape(f, ...)): https://juliadiff.org/ReverseDiff.jl/api/#the-abstracttape-api), possibly by dispatching on the desired call (gradient, jacobian, value_and_gradient etc. and without exposing the ReverseDiff functions. Users could then compile a tape if they want to and reuse it by specifying when calling e.g. gradient(::ReverseDiffBackend, f, x; tape = mytape).

mohamed82008 · 2022-01-20T13:54:02Z

I am a fan of option 1. We can provide an additional function compile which takes a regular backend, the function and sample inputs as inputs and returns a compiled backend. Memoization is a hack that makes life easier in Turing but I don't think we should use that here. Let's expose the functionality of each package without too much magic. If the user tries to use a compiled tape with a different function, we can give a warning, re-compile the new function, and then store its compiled tape in the mutable backend struct.

sethaxen · 2022-01-20T14:43:45Z

@mohammed perhaps the user can just construct the object without providing a compiled tape. Then AD calls compile on the first tape it needs. This avoids the case where what a user wants is the pullback of f, so they compile the tape for f, but that tape is always discarded since AD actually needs the tape of the anonymous function it writes internally:

AbstractDifferentiation.jl/src/AbstractDifferentiation.jl

Lines 236 to 244 in 7cf1724

    
           return gradient(lowest(ab), (xs...,) -> begin 
        
               vs = f(xs...) 
        
               if ws isa Tuple 
        
                   @assert length(vs) == length(ws) 
        
                   return sum(Base.splat(_dot), zip(ws, vs)) 
        
               else 
        
                   return _dot(vs, ws) 
        
               end 
        
           end, xs...)

.

mohamed82008 · 2022-01-20T15:50:11Z

Hmm, I think we may need to define most of the value_and_xyz functions for the compiled RD backend since that might be easier than trying to repurpose the current fallbacks to use the tape correctly instead of the function. (I am not sure though, they might just work. Will need to implement and test it to see.)

sethaxen · 2022-01-20T22:51:05Z

If the user tries to use a compiled tape with a different function, we can give a warning, re-compile the new function, and then store its compiled tape in the mutable backend struct.

One problem with this is that for the backend to be able to store a new compiled tape, it cannot be specialized on the function. This causes e.g. the computation of the Jacobian to be type-unstable.

mohamed82008 · 2022-01-20T23:56:59Z

This causes e.g. the computation of the Jacobian to be type-unstable.

One idea is to store an example output in the backend and use type assertions ::typeof(backend.output) to restore type information.

mohamed82008 · 2022-01-20T23:57:50Z

Alternatively, we can warn and fallback on the uncompiled tape behaviour asking the user to recompile a different tape.

sethaxen · 2022-01-21T12:58:53Z

One idea is to store an example output in the backend and use type assertions ::typeof(backend.output) to restore type information.

Then we have the same issue with type-stability for output, since for the same backend to support jacobians, gradients, etc, it has to be able to store outputs of different types.

Alternatively, we can warn and fallback on the uncompiled tape behaviour asking the user to recompile a different tape.

This could work if we only support gradient, jacobian, hessian, but for other functions, where we AD through an anonymous function of our own creation, then the user can't comply.

sethaxen · 2022-01-22T09:56:57Z

Either way, since it seems a ReverseDiff backend with a compiled tape will use a different struct than one without it, we can address tape compilation in a future PR.

Now that we have an implementation with a jacobian primitive that passes the tests, I'll benchmark adding specific various overloads (e.g. using ReverseDiff.gradient) as done in #20.

ReverseDiff has no analogous convenience function to `derivative`

sethaxen · 2022-01-22T20:56:03Z

I ran a benchmark (derived from the one in #20), which led me to make many of the same overloads as in #20. This PR is now ready for review.

Benchmark:

using AbstractDifferentiation: AbstractDifferentiation, AD, @primitive, AbstractBackend
import AbstractDifferentiation: primal_value, jacobian, derivative, gradient, hessian, value_and_gradient, value_and_hessian
using ReverseDiff: ReverseDiff, DiffResults
using BenchmarkTools, PrettyTables

f(x) = sum(sin, x)
g(x) = sin.(x)

x = randn(10)
Δx = randn(10)
Δy = randn(10)

@inline asarray(x) = [x]
@inline asarray(x::AbstractArray) = x

primal_value(x::ReverseDiff.TrackedReal) = ReverseDiff.value(x)

# just implement the primitive

struct ReverseDiffBackend <: AbstractBackend end

@primitive function jacobian(ba::ReverseDiffBackend, f, xs...)
    xs_arr = map(asarray, xs)
    tape = ReverseDiff.JacobianTape(xs_arr) do (xs_arr...)
        xs_new = map(xs, xs_arr) do x, x_arr
            return x isa Number ? only(x_arr) : x_arr
        end
        return asarray(f(xs_new...))
    end
    results = ReverseDiff.jacobian!(tape, xs_arr)
    return map(xs, results) do x, result
        return x isa Number ? vec(result) : result
    end
end


# above plus a few obvious optimizations
struct ReverseDiffBackendPartial <: AbstractBackend end

@primitive function jacobian(ba::ReverseDiffBackendPartial, f, xs...)
    xs_arr = map(asarray, xs)
    tape = ReverseDiff.JacobianTape(xs_arr) do (xs_arr...)
        xs_new = map(xs, xs_arr) do x, x_arr
            return x isa Number ? only(x_arr) : x_arr
        end
        return asarray(f(xs_new...))
    end
    results = ReverseDiff.jacobian!(tape, xs_arr)
    return map(xs, results) do x, result
        return x isa Number ? vec(result) : result
    end
end
function jacobian(ba::ReverseDiffBackendPartial, f, xs::AbstractArray...)
    return ReverseDiff.jacobian(asarray ∘ f, xs)
end

function derivative(ba::ReverseDiffBackendPartial, f, xs::Number...)
    tape = ReverseDiff.InstructionTape()
    xs_tracked = ReverseDiff.TrackedReal.(xs, zero.(xs), Ref(tape))
    y_tracked = f(xs_tracked...)
    ReverseDiff.seed!(y_tracked)
    ReverseDiff.reverse_pass!(tape)
    return ReverseDiff.deriv.(xs_tracked)
end

function gradient(ba::ReverseDiffBackendPartial, f, xs::AbstractArray...)
    return ReverseDiff.gradient(f, xs)
end

function hessian(ba::ReverseDiffBackendPartial, f, x::AbstractArray)
    return (ReverseDiff.hessian(f, x),)
end


# above plus overloading some value_and_XXX functions
struct ReverseDiffBackendAll <: AbstractBackend end

@primitive function jacobian(ba::ReverseDiffBackendAll, f, xs...)
    xs_arr = map(asarray, xs)
    tape = ReverseDiff.JacobianTape(xs_arr) do (xs_arr...)
        xs_new = map(xs, xs_arr) do x, x_arr
            return x isa Number ? only(x_arr) : x_arr
        end
        return asarray(f(xs_new...))
    end
    results = ReverseDiff.jacobian!(tape, xs_arr)
    return map(xs, results) do x, result
        return x isa Number ? vec(result) : result
    end
end
function jacobian(ba::ReverseDiffBackendAll, f, xs::AbstractArray...)
    return ReverseDiff.jacobian(asarray ∘ f, xs)
end

function derivative(ba::ReverseDiffBackendAll, f, xs::Number...)
    tape = ReverseDiff.InstructionTape()
    xs_tracked = ReverseDiff.TrackedReal.(xs, zero.(xs), Ref(tape))
    y_tracked = f(xs_tracked...)
    ReverseDiff.seed!(y_tracked)
    ReverseDiff.reverse_pass!(tape)
    return ReverseDiff.deriv.(xs_tracked)
end

function gradient(ba::ReverseDiffBackendAll, f, xs::AbstractArray...)
    return ReverseDiff.gradient(f, xs)
end

function hessian(ba::ReverseDiffBackendAll, f, x::AbstractArray)
    return (ReverseDiff.hessian(f, x),)
end

function value_and_gradient(ba::ReverseDiffBackendAll, f, x::AbstractArray)
    result = DiffResults.GradientResult(x)
    cfg = ReverseDiff.GradientConfig(x)
    ReverseDiff.gradient!(result, f, x, cfg)
    return DiffResults.value(result), (DiffResults.derivative(result),)
end

function value_and_hessian(ba::ReverseDiffBackendAll, f, x)
    result = DiffResults.HessianResult(x)
    cfg = ReverseDiff.HessianConfig(result, x)
    ReverseDiff.hessian!(result, f, x, cfg)
    return DiffResults.value(result), (DiffResults.hessian(result),)
end

# benchmark
suite = BenchmarkGroup()

approaches = [
    "jacobian" => ReverseDiffBackend(),
    "jacobian partial" => ReverseDiffBackendPartial(),
    "jacobian all" => ReverseDiffBackendAll(),
]
for (name, b) in approaches
    bg = BenchmarkGroup()
    bg["derivative(b,f,::Number)"] = @benchmarkable $(AD.derivative)($b, sin, 0.5)
    bg["gradient(b,f,x)"] = @benchmarkable $(AD.gradient)($b, $f, $x)
    bg["jacobian(b,f,x)"] = @benchmarkable $(AD.jacobian)($b, $f, $x)
    bg["jacobian(b,g,x)"] = @benchmarkable $(AD.jacobian)($b, $g, $x)
    bg["jacobian(b,f,::Number)"] = @benchmarkable $(AD.jacobian)($b, sin, 0.5)
    bg["hessian"] = @benchmarkable $(AD.hessian)($b, $f, $x)
    bg["value_and_gradient"] = @benchmarkable $(AD.value_and_gradient)($b, $f, $x)
    bg["value_and_jacobian(b,f,x)"] = @benchmarkable $(AD.value_and_jacobian)($b, $f, $x)
    bg["value_and_jacobian(b,g,x)"] = @benchmarkable $(AD.value_and_jacobian)($b, $g, $x)
    bg["value_and_jacobian(b,f,::Number)"] = @benchmarkable $(AD.value_and_jacobian)($b, sin, 0.5)
    bg["value_and_hessian"] = @benchmarkable $(AD.value_and_hessian)($b, $f, $x)
    bg["value_gradient_and_hessian"] = @benchmarkable $(AD.value_gradient_and_hessian)($b, $f, $x)
    pushforward = AD.pushforward_function(b, g, x)
    bg["pushforward"] = @benchmarkable $pushforward($((Δx,)))
    value_and_pushforward = AD.value_and_pushforward_function(b, g, x)
    bg["value_and_pushforward"] = @benchmarkable $value_and_pushforward($((Δx,)))
    pullback = AD.pullback_function(b, g, x)
    bg["pullback"] = @benchmarkable $pullback($Δy)
    value_and_pullback = AD.value_and_pullback_function(b, g, x)
    bg["value_and_pullback"] = @benchmarkable $value_and_pullback($Δy)
    suite[name] = bg
end

tune!(suite; verbose = true)
results = run(suite, verbose = true)
map(sort(collect(keys(results["jacobian"])))) do k
    (;
        Symbol("function") => k,
        (Symbol(approach) => results[approach][k] for (approach, _) in approaches)...
    )
end |> pretty_table

The last column uses identical implementations to the ones in this PR.

┌──────────────────────────────────┬──────────────────────┬──────────────────────┬──────────────────────┐
│                         function │             jacobian │     jacobian partial │         jacobian all │
│                           String │ BenchmarkTools.Trial │ BenchmarkTools.Trial │ BenchmarkTools.Trial │
├──────────────────────────────────┼──────────────────────┼──────────────────────┼──────────────────────┤
│         derivative(b,f,::Number) │    Trial(990.800 ns) │    Trial(280.469 ns) │    Trial(279.181 ns) │
│                  gradient(b,f,x) │      Trial(4.302 μs) │      Trial(3.427 μs) │      Trial(3.398 μs) │
│                          hessian │    Trial(169.063 μs) │     Trial(96.796 μs) │     Trial(99.938 μs) │
│           jacobian(b,f,::Number) │      Trial(1.033 μs) │      Trial(1.015 μs) │      Trial(1.052 μs) │
│                  jacobian(b,f,x) │      Trial(4.195 μs) │      Trial(4.115 μs) │      Trial(4.030 μs) │
│                  jacobian(b,g,x) │      Trial(2.206 μs) │      Trial(1.856 μs) │      Trial(1.846 μs) │
│                         pullback │      Trial(1.974 μs) │      Trial(1.175 μs) │      Trial(1.337 μs) │
│                      pushforward │      Trial(3.084 μs) │      Trial(3.055 μs) │      Trial(3.164 μs) │
│               value_and_gradient │      Trial(4.393 μs) │      Trial(4.290 μs) │      Trial(3.457 μs) │
│                value_and_hessian │    Trial(171.106 μs) │    Trial(172.341 μs) │    Trial(108.149 μs) │
│ value_and_jacobian(b,f,::Number) │      Trial(1.145 μs) │    Trial(972.300 ns) │      Trial(1.193 μs) │
│        value_and_jacobian(b,f,x) │      Trial(4.429 μs) │      Trial(4.289 μs) │      Trial(4.233 μs) │
│        value_and_jacobian(b,g,x) │      Trial(2.436 μs) │      Trial(2.032 μs) │      Trial(2.009 μs) │
│               value_and_pullback │      Trial(3.588 μs) │      Trial(1.617 μs) │      Trial(1.565 μs) │
│            value_and_pushforward │      Trial(3.186 μs) │      Trial(3.106 μs) │      Trial(3.174 μs) │
│       value_gradient_and_hessian │    Trial(172.960 μs) │    Trial(171.780 μs) │    Trial(100.681 μs) │
└──────────────────────────────────┴──────────────────────┴──────────────────────┴──────────────────────┘

mohamed82008 · 2022-01-26T07:11:05Z

thanks Seth!

sethaxen added 3 commits January 9, 2022 14:20

Move asarray to main file

fd68ec6

Create reversediff.jl

7cb5c19

Conditionally load ReverseDiff

9a3089e

frankschae reviewed Jan 10, 2022

View reviewed changes

Merge branch 'master' into reversediff

d6b412d

sethaxen added 3 commits January 22, 2022 02:42

Merge branch 'master' into reversediff

7ca5da5

Add ReverseDiff tests

125be12

Use Jacobian as primitive (for now)

82a1c3c

sethaxen added 6 commits January 22, 2022 21:48

Overload primal_value

a1028f3

Use ReverseDiff's own API when available

b1a6507

Speed up derivative

1e1f92c

ReverseDiff has no analogous convenience function to `derivative`

Use DiffResults' functions for value_and_XXX functions

6a7c0e5

Increment version number

c4a084e

Add docstring

1e3b73c

sethaxen marked this pull request as ready for review January 22, 2022 20:56

sethaxen and others added 2 commits January 22, 2022 22:41

Subtype AbstractReverseMode

29323b0

Merge branch 'master' into reversediff

07cae79

mohamed82008 merged commit 57a7b69 into JuliaDiff:master Jan 26, 2022

sethaxen mentioned this pull request Jan 26, 2022

Caching interfaces #41

Open

sethaxen deleted the reversediff branch January 26, 2022 11:35

jacobusmmsmit mentioned this pull request Jun 7, 2023

Enhancement proposal: Modular tape caching JuliaDiff/ReverseDiff.jl#234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement backend for ReverseDiff #29

Implement backend for ReverseDiff #29

sethaxen commented Jan 9, 2022

codecov-commenter commented Jan 9, 2022 •

edited

Loading

frankschae Jan 10, 2022

sethaxen Jan 10, 2022

sethaxen Jan 10, 2022

frankschae Jan 11, 2022 •

edited

Loading

ChrisRackauckas Jan 20, 2022

sethaxen Jan 22, 2022

sethaxen commented Jan 20, 2022

sethaxen commented Jan 20, 2022

devmotion commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022 •

edited

Loading

sethaxen commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022 •

edited

Loading

sethaxen commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022

sethaxen commented Jan 21, 2022

sethaxen commented Jan 22, 2022

sethaxen commented Jan 22, 2022

mohamed82008 commented Jan 26, 2022

Implement backend for ReverseDiff #29

Implement backend for ReverseDiff #29

Conversation

sethaxen commented Jan 9, 2022

codecov-commenter commented Jan 9, 2022 • edited Loading

Codecov Report

frankschae Jan 10, 2022

Choose a reason for hiding this comment

sethaxen Jan 10, 2022

Choose a reason for hiding this comment

sethaxen Jan 10, 2022

Choose a reason for hiding this comment

frankschae Jan 11, 2022 • edited Loading

Choose a reason for hiding this comment

ChrisRackauckas Jan 20, 2022

Choose a reason for hiding this comment

sethaxen Jan 22, 2022

Choose a reason for hiding this comment

sethaxen commented Jan 20, 2022

sethaxen commented Jan 20, 2022

devmotion commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022 • edited Loading

sethaxen commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022 • edited Loading

sethaxen commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022

mohamed82008 commented Jan 20, 2022

sethaxen commented Jan 21, 2022

sethaxen commented Jan 22, 2022

sethaxen commented Jan 22, 2022

mohamed82008 commented Jan 26, 2022

codecov-commenter commented Jan 9, 2022 •

edited

Loading

frankschae Jan 11, 2022 •

edited

Loading

mohamed82008 commented Jan 20, 2022 •

edited

Loading

mohamed82008 commented Jan 20, 2022 •

edited

Loading