mul/ewise rules for basic arithmetic semiring #26

rayegun · 2021-07-06T06:02:38Z

I removed some I'm still testing to get feedback on these and avoid a monster PR.

Notes:

You need to call test_*rule with check_inferred=false. Issue Output eltype inference is not type stable #25 will fix.
Missing the kwargs. I want to get everything working first before trying those.
rrules are incorrect according to tests. Some of these are just floating point issues. However for mul there's a deeper issue. I'm 85-90% sure the rules are correct, but the patterns are not the same as for FiniteDifferences, and occasionally there's different values.

rayegun · 2021-07-06T06:32:05Z

I apologize for the messy PR, the only important parts are in the tests and chainrules folders.

I'm primarily interested in your thoughts about the rrules for mul, and in particular whether I'm wrong, FiniteDifferences is wrong, or I just haven't given FiniteDifferences the right information.

Everything works fine for dense. For sparse inputs though there's two problems:

∂A has a different sparsity pattern, and thus different values where the sparsity is different.
∂B is straight up wrong according to FiniteDifferences.

@mzgubic

mzgubic

Didn't manage to finish it all today, but here's a few comments

Project.toml

src/chainrules/chainruleutils.jl

mzgubic · 2021-07-06T17:09:24Z

test/runtests.jl

@@ -14,4 +14,5 @@ println("Testing SuiteSparseGraphBLAS.jl")
 @testset "SuiteSparseGraphBLAS" begin
    include_test("gbarray.jl")
    include_test("operations.jl")
+    include_test("testrules.jl")


Usually the structure of test folder mirrors the src/ folder, which makes it easier to find things when the package grows.

src/chainrules/ewiserules.jl

mzgubic · 2021-07-07T09:59:58Z

Everything works fine for dense. For sparse inputs though there's two problems:
∂A has a different sparsity pattern, and thus different values where the sparsity is different.
∂B is straight up wrong according to FiniteDifferences.

I think the underlying issue is the same (and also the same one as in the elementwise rules). What it comes down is an instance of the "array dilemma", discussed in great detail over many issues and PRs. See JuliaDiff/ChainRulesCore.jl#347 (and related issues) for a discussion, but I warn you, it is a rabbit hole ;)

Essentially what it comes down to is whether you think of the input, say A::GBMatrix as an efficient representation of an array that just happens to be sparse, or whether you think of it as a struct. Consider y = A * B, where B::Matrix is a dense array, and y is therefore dense as well.

Primal computation will be fast because A is sparse. In fact A was probably chosen to be GBMatrix solely to get
that speedup. What happens in the backward pass depends on how you interpret A: an array, or a struct?

if you interpret it as an array, the dA = mul(ΔΩ, B') will be dense and you lose all the benefits of the speedup, but dA will match the dA you would have gotten with a dense A with the zeros in the same place as structural zeros of the sparse A.

If you interpret it as a struct, meaning that the zeros are structural, it doesn't make sense to compute the tangents to all the zeros, and you can compute the backward pass efficiently. Since dA for sparse A is sparse in this case, it is somewhat unintuitive that it is different to the dA that would be obtained if A was a dense array with the zeros in the same place.

Long story short, we are treating them as structs now in order to not completely kill efficiency. We should probably treat them as structs here as well.

Aside: projection, merged recently, was a way to make sure rules with abstractly typed arguments still return the correct tangent type. The classic example is Diagonal * Matrix where we project the dense gradient onto the Diagonal.

In this case, as you point out, masking is all we need to do, since we are writing dedicated rules for GBMatrix multiplication.

…teSparseGraphBLAS.jl into arithmeticchains

Wimmerer added 3 commits July 2, 2021 23:28

arithmetic groundwork

11bdec4

arithmetic rules for mul and elwise 1st pass

42c8670

tests and a few fixes

8f595b9

This comment has been minimized.

Sign in to view

rayegun requested a review from mzgubic July 6, 2021 06:10

Merge branch 'master' into arithmeticchains

4e52852

mzgubic reviewed Jul 6, 2021

View reviewed changes

mzgubic reviewed Jul 7, 2021

View reviewed changes

src/chainrules/ewiserules.jl Outdated Show resolved Hide resolved

Wimmerer added 16 commits July 9, 2021 11:28

Add mask function, fix eadd(PLUS)

c58a4a8

correct mul rrules

980d7d5

test folder structure

4b2e00c

mask and vector transpose v1

952e7a0

Broken constructor rules

b2289bf

arithmetic groundwork

b665fa7

arithmetic rules for mul and elwise 1st pass

0f4509e

tests and a few fixes

b4ec8c5

Add mask function, fix eadd(PLUS)

7991aa5

correct mul rrules

bb4dc6e

test folder structure

fd8433b

Broken constructor rules

965a983

Merge branch 'arithmeticchains' of https://github.com/JuliaSparse/Sui…

2f810da

…teSparseGraphBLAS.jl into arithmeticchains

Move out constructor rules for now

9369b60

compat

9c6f478

rm constructorrule includes

c769833

rayegun merged commit f0dd5c9 into master Jul 11, 2021

rayegun deleted the arithmeticchains branch July 11, 2021 02:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mul/ewise rules for basic arithmetic semiring #26

mul/ewise rules for basic arithmetic semiring #26

rayegun commented Jul 6, 2021

This comment has been minimized.

rayegun commented Jul 6, 2021

mzgubic left a comment

mzgubic Jul 6, 2021

mzgubic commented Jul 7, 2021

mul/ewise rules for basic arithmetic semiring #26

mul/ewise rules for basic arithmetic semiring #26

Conversation

rayegun commented Jul 6, 2021

This comment has been minimized.

rayegun commented Jul 6, 2021

mzgubic left a comment

Choose a reason for hiding this comment

mzgubic Jul 6, 2021

Choose a reason for hiding this comment

mzgubic commented Jul 7, 2021