Add many `frules` #565

mcabbott · 2022-01-14T21:42:13Z

Many of these are almost trivial, you almost want an @trivial_frule vcat(xs...).

One subtlety is that vcat(x...); x==(1,2,3) might get xdot = (4, ZeroTangent(), ZeroTangent()) which would give a Vector{Any}, whereas I think in this case you ought to make a Vector{Float64}, taking float(T) from the forward pass? I'm far from sure that all such cases are handled well.

Maybe they can be automated. Maybe this should do something sensible:

julia> promote(4, ZeroTangent(), ZeroTangent())
ERROR: promotion of types Int64, ZeroTangent and ZeroTangent failed to change any arguments

Edit: this is now handled by _make_real_zeros.

In adding a forward rule for reshape, I tidied up the reverse one, and made one for dropdims.

~~Completely fails on Julia 1.0, because syntax f((x...,), y) = x, y was adde d only in 1.6. Maybe we should drop 1.0? (#577)~~

Closes #406

mcabbott · 2022-01-20T04:12:07Z

src/rulesets/Base/arraymath.jl

+frule((_, Adot, Bdot), ::typeof(*), A, B) = A * B, muladd(Adot, B, A * Bdot)
+
+frule((_, Adot, Bdot, Cdot), ::typeof(*), A, B, C) = A*B*C, Adot*B*C + A*Bdot*C + A*B*Cdot


Will anything go wrong if these are left without types?

hmm, I mean it will mean the rule is hit rather than decomposing further.
But we can try without restrictions then if we run into bad cases add them later

One bad thing about Adot*B*C is that, for 3 matrices, it will run the computation of how to group them 3 times. Without the rule, it would run once. But this ought to be tiny compared to matmul.

One good thing about it is that, for scalar-matrix-matrix, AD won't try to go inside the fused mul! implementation. Although perhaps nothing bad would happen there anyway.

src/rulesets/Base/indexing.jl

mcabbott · 2022-01-20T04:24:55Z

src/rulesets/LinearAlgebra/dense.jl

+function frule((_, ΔC, ΔA, ΔB), ::typeof(mul!), C::AbstractArray, A, B)
+    mul!(C, A, B)
+    mul!(ΔC, ΔA, B)
+    mul!(ΔC, A, ΔB, true, true)
+    return C, ΔC
+end


Likewise few types, can this go wrong?

src/rulesets/Base/array.jl

oxinabox

I have reviewed up to reshape
I will continue after my meeting today

src/rulesets/Base/array.jl

oxinabox · 2022-01-24T14:18:14Z

src/rulesets/Base/array.jl

@@ -4,6 +4,10 @@

 ChainRules.@non_differentiable (::Type{T} where {T<:Array})(::UndefInitializer, args...)

+function frule((_, xdot), ::Type{T}, x::AbstractArray) where {T<:Array}


Can we run a find + replace in files, and fix names like xdot into unicode ẋ
for consistency with the rest of the project

Ok, I've changed all the ones in array.jl to have dots. These are all linear rules, so all you have to see is that the dot equation has dots.

In arraymath.jl and dense.jl there is some precedent for Δx which I've also followed. For rules where each term mixes up original and perturbation, I think it's a bit too subtle to put tiny dots on some factors.

The divide between array.jl and arraymath.jl is almost perfectly linear/nonlinear, in fact. Maybe the rules for array + & - should move to the linear file.

oxinabox · 2022-01-24T14:31:31Z

src/rulesets/Base/array.jl

@@ -43,32 +51,81 @@ function rrule(::typeof(Base.vect), X::Vararg{Any,N}) where {N}
    return Base.vect(X...), vect_pullback
 end

+"""
+    _make_real_zeros(xdots, xs)


perhaps instantiate_zeros
make_real_zeros could be interpretted as a Real vs Complex.

or materialize_zeros
we should be roughly consistent with what broadcasting has to say about the wording

Just saw this. For now it's _instantiate_zeros, clearly internal, and unlike materialise I can reliably spell it...

src/rulesets/Base/array.jl

oxinabox · 2022-01-24T14:52:15Z

src/rulesets/Base/array.jl

+"""
+_make_real_zeros(xdots, xs) = map(_real_zero, xdots, xs)
+_real_zero(xdot, x) = xdot
+_real_zero(xdot::AbstractZero, x) = zero(x)


I wonder if this shouldn't be:

Suggested change

_real_zero(xdot::AbstractZero, x) = zero(x)

_real_zero(xdot::ZeroTangent, x) = zero(x)

_real_zero(xdot::DoesNotExist, x) = isapplicable(zero, x) ? zero(x) : xdot

So that we never end up calling zero("abc")

So the example is something like gradient(x -> ["abc", x][end], 1). That does actually work in Zygote; it fails in Diffractor seemingly before hitting _real_zero.

What's DoesNotExist? I think hasmethod is pretty slow; maybe better to leave an error until we have a better plan?

julia> @btime Base.hasmethod(zero, Tuple{typeof($([1,2,3]))}) min 293.075 ns, mean 306.452 ns (3 allocations, 144 bytes) true

oops DoesNotExist was renamed to NoTangent

We can push hasmethod to compile-time if we want with Tricks.jl.
If we really want.

but also maybe we want to do:

Suggested change

_real_zero(xdot::AbstractZero, x) = zero(x)

_real_zero(xdot::ZeroTangent, x) = zero(x)

_real_zero(xdot::NoTangent, x) = xdot

but yeah i think fine enough to leave it as an error til it becomes a problem.

oxinabox · 2022-01-24T14:53:36Z

src/rulesets/Base/array.jl

+_make_real_zeros(xdots::NTuple{<:Any, <:Number}, xs) = xdots
+_make_real_zeros(xdots::AbstractArray{<:Number}, xs) = xdots
+_make_real_zeros(xdots::AbstractArray{<:AbstractArray}, xs) = xdots


Should we use
eltype and HasEltype here? so that we are abstracted over collections?
Possibly not required right now and we can leave it?

My guess is that this is already into premature optimisation territory. A tuple of numbers for vect, and a tuple of arrays for vcat, are the cases that I actually managed to trigger; an array for reduce(vcat, xs) seemed easy to handle with the same machine. Maybe there should be a xdots::NTuple{<:Any, <:AbstractArray} method. Although map on tuples is mostly free.

src/rulesets/Base/array.jl

oxinabox

nice. Huge work well done.

Address the comments i leave as you deem good.
Check the code coverage is good,
and then this should be good to merged

oxinabox · 2022-01-24T16:52:41Z

src/rulesets/Base/array.jl

@@ -43,32 +51,81 @@ function rrule(::typeof(Base.vect), X::Vararg{Any,N}) where {N}
    return Base.vect(X...), vect_pullback
 end

+"""
+    _make_real_zeros(xdots, xs)


or materialize_zeros
we should be roughly consistent with what broadcasting has to say about the wording

oxinabox · 2022-01-25T12:14:26Z

src/rulesets/Base/array.jl

+    ax = axes(A)
+    project = ProjectTo(A)  # Projection is here for e.g. reshape(::Diagonal, :)
+    ∂dims = broadcast(Returns(NoTangent()), dims)
+    reshape_pullback(Ȳ) = (NoTangent(), project(reshape(Ȳ, ax)), ∂dims...)


Am I correct in saying project will not do the reshaping for us, as it only handles cases with singleton dimensions?

Yes. It will accept arrays whose size is almost right, i.e. differing by trailing 1s only. The offsets can be wrong. Doing one reshape(... , axes) here should mean it never reshapes twice.

src/rulesets/Base/array.jl

oxinabox · 2022-01-25T17:35:45Z

src/rulesets/Base/sort.jl

@@ -42,6 +52,12 @@ end
 ##### `sortslices`
 #####

+function frule((_, ẋ), ::typeof(sortslices), x::AbstractArray; dims::Integer, kw...)
+    p = sortperm(collect(eachslice(x; dims=dims)); kw...)
+    inds = ntuple(d -> d == dims ? p : (:), ndims(x))


dims can also be a tuple.
But the rrule doesn't support that either so probably fine?

Though it is a bif of a problem with it being a kwarg (for both forwards and reverse modes)
since won't redispatch to using the AD to work it out.

Anyway we should make an issue for this unless the solution is trivial.
E.g.:

Suggested change

inds = ntuple(d -> d == dims ? p : (:), ndims(x))

inds = ntuple(d -> d in dims ? p : (:), ndims(x))

Here we assume the abstract array uses 1 based indexing.
should we actualy be doing
something with eachindex/axes ?

I did not think about offsets. They ought to work in some places:

julia> sortperm(OffsetArray(rand(3), 4)) 3-element OffsetArray(::Vector{Int64}, 5:7) with eltype Int64 with indices 5:7: 7 6 5

but not with collect(eachslice).

Suggested change

inds = ntuple(d -> d == dims ? p : (:), ndims(x))

firstindex(x, d) == 1 || throw(ArgumentError("The `rrule` for `sortslices` does not at present handle offset indices here."))

inds = ntuple(d -> d == dims ? p : (:), ndims(x))

I vote to kick dims::Tuple down the road. It might not be hard to handle, but would need a bit of thought, and some tests...

It will fail with an error now. I don't think Zygote et al. had any hope of getting through sortslices before the rule.

In fact works fine with OffsetArrays. The generator used for eachslice propagates indices through correctly:

julia> x = OffsetMatrix(rand(2,3), 4, 5); julia> sortslices(x; dims=2) 2×3 OffsetArray(::Matrix{Float64}, 5:6, 6:8) with eltype Float64 with indices 5:6×6:8: 0.689597 0.805156 0.995562 0.727762 0.320152 0.924272 julia> rrule(sortslices, x; dims=2)[1] 2×3 OffsetArray(::Matrix{Float64}, 5:6, 6:8) with eltype Float64 with indices 5:6×6:8: 0.689597 0.805156 0.995562 0.727762 0.320152 0.924272 julia> collect(eachslice(x; dims=2)) 3-element OffsetArray(::Vector{SubArray{Float64, 1, OffsetMatrix{Float64, Matrix{Float64}}, Tuple{Base.Slice{OffsetArrays.IdOffsetRange{Int64, Base.OneTo{Int64}}}, Int64}, true}}, 6:8) with eltype SubArray{Float64, 1, OffsetMatrix{Float64, Matrix{Float64}}, Tuple{Base.Slice{OffsetArrays.IdOffsetRange{Int64, Base.OneTo{Int64}}}, Int64}, true} with indices 6:8: [0.9955624968587956, 0.9242722045713299] [0.8051558526354236, 0.32015211201093363] [0.689597064698794, 0.7277619193702523]

I've removed the error message.

oxinabox · 2022-01-25T17:46:18Z

test/rulesets/Base/array.jl

@@ -1,23 +1,36 @@
 @testset "Array constructors" begin
-
+    @testset "undef" begin


indenting of comment is now wrong?

It is. I moved the heading as I was confused for a minute about how much the comment applied to. But I thought preserving the blame for the comment might be helpful.

oxinabox · 2022-01-25T17:47:54Z

test/rulesets/Base/array.jl

+    @test rrule(reshape, adjoint(rand(ComplexF64, 4)), :)[2](rand(4))[2] isa Adjoint{ComplexF64}
+    @test rrule(reshape, Diagonal(rand(4)), (2, :))[2](ones(2,8))[2] isa Diagonal
+    @test_skip test_rrule(reshape, Diagonal(rand(4)), 2, :)  # DimensionMismatch("second dimension of A, 22, does not match length of x, 16")
+    @test_skip test_rrule(reshape, UpperTriangular(rand(4,4)), (8, 2))


can we open an issue/issues with a list of all things that are skipped and why and cross link the URL in the code where the skip happens?

It's a pity you can't put @test_broken instead, so that you find out when the bug is fixed.

oxinabox · 2022-01-25T17:55:27Z

test/rulesets/Base/indexing.jl

+        @testset "forward mode" begin
+            test_frule(getindex, x, 2)
+            test_frule(getindex, x, 2, 1)
+            test_frule(getindex, x, CartesianIndex(2, 3))
+
+            test_rrule(getindex, x, 2:3)
+            test_rrule(getindex, x, (:), 2:3)
+        end


these seem misorganized, several of these are for rrule (or are those a mistake?)
And i think we would in general rather organise by operands than by mode.
So we can probably just push them down into the signle element/slice etc tests?

Well spotted, that's just a typo, I meant to only test frules:

Suggested change

@testset "forward mode" begin

test_frule(getindex, x, 2)

test_frule(getindex, x, 2, 1)

test_frule(getindex, x, CartesianIndex(2, 3))

test_rrule(getindex, x, 2:3)

test_rrule(getindex, x, (:), 2:3)

end

@testset "forward mode" begin

test_frule(getindex, x, 2)

test_frule(getindex, x, 2, 1)

test_frule(getindex, x, CartesianIndex(2, 3))

test_frule(getindex, x, 2:3)

test_frule(getindex, x, (:), 2:3)

end

The argument for keeping them separate is that it makes sense to have far fewer. The frule is super-simple and has few edge cases. The rrule has many more things to think through, will eventually need to handle arrays of arrays (and second derivatives of those...).

Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>

mcabbott marked this pull request as draft January 14, 2022 21:51

mcabbott force-pushed the forward branch 7 times, most recently from ef971c1 to 5714d78 Compare January 20, 2022 04:08

mcabbott commented Jan 20, 2022

View reviewed changes

mcabbott marked this pull request as ready for review January 20, 2022 04:25

mcabbott force-pushed the forward branch from dc150ee to 928ce15 Compare January 21, 2022 23:15

mcabbott commented Jan 24, 2022

View reviewed changes

src/rulesets/Base/array.jl Outdated Show resolved Hide resolved

oxinabox reviewed Jan 24, 2022

View reviewed changes

mcabbott force-pushed the forward branch from c06fa56 to 8e84fe2 Compare January 24, 2022 20:32

oxinabox approved these changes Jan 25, 2022

View reviewed changes

mcabbott and others added 15 commits January 25, 2022 14:58

drop 1.0, now that LTS == 1.6

0932a86

revert to one Project

edb1bbd

rm Compat

60a73b5

turns out this does still need Compat

94913bc

add many frules

94d5898

in-place frules

4971f99

reshape + dropdims too

793f47e

tests

d1b8949

5-arg mul

3ea58fb

notation changes

8b11c86

rm 2nd order rules

c509d5e

don't skip setindex

5139377

AbstractArray constructors

ae9d76b

reshape tests

b64265e

Apply 4 suggestions

6f047c3

Co-authored-by: Lyndon White <oxinabox@ucc.asn.au>

fixup, bump

016064f

mcabbott force-pushed the forward branch from b73391f to 016064f Compare January 25, 2022 20:18

mcabbott mentioned this pull request Jan 25, 2022

Errors from ReshapedArray? JuliaDiff/ChainRulesTestUtils.jl#239

Open

mcabbott added 2 commits January 25, 2022 16:10

several comments, and one rule for PermutedDimsArray

a773f9a

in fact sortslices is fine with offsets

327bf26

mcabbott merged commit 8c34f19 into JuliaDiff:main Jan 25, 2022

mcabbott deleted the forward branch January 25, 2022 22:43

mcabbott mentioned this pull request Jan 26, 2022

Rule for float(x), or convert(AbstractArray, x) #581

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add many `frules` #565

Add many `frules` #565

mcabbott commented Jan 14, 2022 •

edited

Loading

mcabbott Jan 20, 2022

oxinabox Jan 24, 2022

mcabbott Jan 24, 2022

mcabbott Jan 20, 2022

oxinabox left a comment

oxinabox Jan 24, 2022

mcabbott Jan 24, 2022

oxinabox Jan 24, 2022

oxinabox Jan 24, 2022

mcabbott Jan 25, 2022

oxinabox Jan 24, 2022

mcabbott Jan 24, 2022

oxinabox Jan 24, 2022

oxinabox Jan 24, 2022

mcabbott Jan 24, 2022 •

edited

Loading

oxinabox left a comment

oxinabox Jan 24, 2022

oxinabox Jan 25, 2022

mcabbott Jan 25, 2022

oxinabox Jan 25, 2022

oxinabox Jan 25, 2022

mcabbott Jan 25, 2022

mcabbott Jan 25, 2022

mcabbott Jan 25, 2022

oxinabox Jan 25, 2022

mcabbott Jan 25, 2022

oxinabox Jan 25, 2022

mcabbott Jan 25, 2022

mcabbott Jan 25, 2022

oxinabox Jan 25, 2022

mcabbott Jan 25, 2022

		frule((_, Adot, Bdot), ::typeof(), A, B) = A B, muladd(Adot, B, A * Bdot)

		frule((_, Adot, Bdot, Cdot), ::typeof(), A, B, C) = ABC, AdotBC + ABdotC + AB*Cdot

		@@ -4,6 +4,10 @@

		ChainRules.@non_differentiable (::Type{T} where {T<:Array})(::UndefInitializer, args...)

		function frule((_, xdot), ::Type{T}, x::AbstractArray) where {T<:Array}

	_real_zero(xdot::AbstractZero, x) = zero(x)
	_real_zero(xdot::ZeroTangent, x) = zero(x)
	_real_zero(xdot::DoesNotExist, x) = isapplicable(zero, x) ? zero(x) : xdot

	_real_zero(xdot::AbstractZero, x) = zero(x)
	_real_zero(xdot::ZeroTangent, x) = zero(x)
	_real_zero(xdot::NoTangent, x) = xdot

	inds = ntuple(d -> d == dims ? p : (:), ndims(x))
	inds = ntuple(d -> d in dims ? p : (:), ndims(x))

	inds = ntuple(d -> d == dims ? p : (:), ndims(x))
	firstindex(x, d) == 1 \|\| throw(ArgumentError("The `rrule` for `sortslices` does not at present handle offset indices here."))
	inds = ntuple(d -> d == dims ? p : (:), ndims(x))

		@@ -1,23 +1,36 @@
		@testset "Array constructors" begin

		@testset "undef" begin

Add many frules #565

Add many frules #565

Conversation

mcabbott commented Jan 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcabbott Jan 24, 2022 • edited Loading

Choose a reason for hiding this comment

oxinabox left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add many `frules` #565

Add many `frules` #565

mcabbott commented Jan 14, 2022 •

edited

Loading

mcabbott Jan 24, 2022 •

edited

Loading